A somewhat unexpected behavior has been observed while investigating different aspects of the data type coercion. By coercion I mean an implicit casting to the "common" data type that happens when a few operands have different data types but the expression requires some predefined data type to be chosen for the resulting value. In Firebird, common data type determination is used in two cases: (a) describing the select list expressions of the union and (b) describing the resulting data type of the CASE/COALESCE functions.
In old versions, different data types couldn't be used as arguments in these cases, throwing the famous error "data type unknown", see this simple test case:
A union without the ALL clause implies DISTINCT which in turn involves the comparison operation to decide whether values are equal or distinct. The SQL specification doesn't allow heterogeneous comparisons, so the data type coercion perfectly fits the declared behavior by initially casting the values to the common data type and then comparing them being of the same data type. But Firebird does allow heterogeneous comparisons and it brings a few interesting questions.
Look at these examples:
Now let's modify the example with unions a little:
Okay, let's look at other database engines:
In old versions, different data types couldn't be used as arguments in these cases, throwing the famous error "data type unknown", see this simple test case:
Starting with Firebird 2.0, the common data type determination logic has been added, so that heterogeneous expressions are now allowed and the aforementioned test case works as expected. The implementation follows the SQL specification, subclause 9.3 - "Result of data type combinations". In particular, case [3a] declares:select 1 from rdb$databaseunionselect '1' from rdb$database
If any of the data types in DTS is character string, then:This means that in the above heterogeneous example, the resulting value will be a character string and the non-conforming expressions will be implicitly casted to become strings. And it looks sensible, as otherwise (if a numeric would be chosen as the common data type) the following example would cause the data type conversion error:
i) All data types in DTS shall be character string, and all of them shall have the same character
repertoire. The character set of the result is the character set of the data type in DTS that has the character encoding form with the highest precedence.
However, there's a hidden issue which remained unnoticed for a few recent years.select 1 from rdb$databaseunionselect 'a' from rdb$database
A union without the ALL clause implies DISTINCT which in turn involves the comparison operation to decide whether values are equal or distinct. The SQL specification doesn't allow heterogeneous comparisons, so the data type coercion perfectly fits the declared behavior by initially casting the values to the common data type and then comparing them being of the same data type. But Firebird does allow heterogeneous comparisons and it brings a few interesting questions.
Look at these examples:
select 1 from rdb$database where 1 = '1'All of them return one row as the result. In order to do so, Firebird involves special comparison priorities, in particular character strings are implicitly casted to numerics and only then compared as numerics. In other words, the common data type becomes numeric. On the negative side, it causes the following example to fail with a conversion error:
select 1 from rdb$database where 1 = '+1'
select 1 from rdb$database where 1 = '01'
select 1 from rdb$database where 1 = '1.0'
select 1 from rdb$database where 1 = 'a'On the positive side, non-trivial comparisons like the ones mentioned above return the practically correct results. And let's note this is the originally intended behavior which works this way for the past few decades.
Now let's modify the example with unions a little:
select 1 from rdb$databaseThis query returns two resulting rows, thus reporting the values as distinct. But accordingly to the comparison rules, these values are equal! A somewhat tricky situation, isn't it?
union
select '1.0' from rdb$database
Okay, let's look at other database engines:
- PostgreSQL 9.0
Mixed-type comparisons are not allowed, mixed-type unions are not allowed. The safest but at the same time the most limited solution.
- Oracle 10g
Mixed-type comparisons are allowed and work like in Firebird, mixed-type unions are not allowed. This matches the situation existed in Firebird prior to version 2.0.
- MSSQL 2008
Both mixed-type comparisons and unions are allowed and the coercion rules are the same: a character string gets casted to a numeric, like in Firebird. IMHO, this is the most flexible and at the same time absolutely consistent solution, something not really expected from Microsoft ;-) However, beware that (1 union 'a') would throw a data type conversion error.
No comments:
Post a Comment