Comparing a hash

  • john.arnott (2/2/2010)


    When the HASHBYTES() function fails with the input truncation error, it does not return NULL, but rather doesn't return anything.

    But usually SQL Server considers "nothing" as a NULL value.

    Here is the example:

    declare @tab table (a int)

    select a from @tab

    if (select a from @tab) is null

    print 'null'

    else

    print 'not null'

    The result is:

    a

    -----------

    (0 row(s) affected)

    null

    As we can see, "nothing" = "null" in this case.

    I think that the warning (or error, or exception, or whatever) plays its important role.

    Here is the modified example:

    declare @tab table (a int)

    select a from @tab where 1 = 1/0

    if (select a from @tab where 1 = 1/0) is null

    print 'null'

    else

    print 'not null'

    The result is:

    a

    -----------

    Msg 8134, Level 16, State 1, Line 3

    Divide by zero error encountered.

    Msg 8134, Level 16, State 1, Line 5

    Divide by zero error encountered.

    not null

    Does it look like "nothing" <> "null"? I don't think so, because of the error message. I think that SQL Server executes "select a from @tab where 1 = 1/0", gets an error, and does not even try to compare this with NULL. Since there's no TRY..CATCH block, batch execution continues and... steps into the ELSE block. But why is the ELSE block not skipped? Maybe it's "by design" 🙂

  • The "inconsistencies" I point to in this post may actually be controlled by the use of settings as pointed out by Paul White in a post later in this thread. Here's that post:


    Paul White NZ (3/30/2010)


    Everything that has been observed, regarding overflows, truncations, ELSE clauses executing and so on...is all documented in Books Online, under the following topics:

    SET ANSI_WARNINGS

    SET ARITHABORT

    SET ARITHIGNORE

    The various behaviours described in this thread can all be varied, and explained, by reading those references.

    The other thing is that a scalar sub-query like in the example (SELECT a FROM @tab) always returns a value - since it returns a column reference. If no rows are produced, the scalar sub-query presents a NULL.


    vk-kirov,

    Your example is consistent with the HASHBYTES example from Hugo. And it DOES look as though SQL is not treating "no result" as the same thing as "value undetermined" or NULL. But then, I suppose it makes a certain amount of logical sense that a test for NULL return FALSE when the thing being tested is not defined; and "not defined" is not the same as "has an undetermined value".

    TSQL is inconsistent. A simple conversion error stops the processing.

    Declare @x datetime

    set @x = convert(datetime,'2010-0518')

    If @x is NULL

    print 'Null found'

    else print '@x is not NULL'

    --------------- Returns:

    --Msg 241, Level 16, State 1, Line 3

    --Conversion failed when converting datetime from character string.

    But the inconsistency is that the HASHBYTES overflow allows the IF to continue, which finds that the local variable has an undefined value, expressed as NULL:

    Declare @b-2 varbinary

    Set @b-2 = HASHBYTES('SHA1', REPLICATE(CAST('A' AS varchar(max)), 9000))

    If @b-2 is NULL

    print 'Null found'

    else print '@b is not NULL'

    --------------- Returns:

    --Msg 8152, Level 16, State 10, Line 2

    --String or binary data would be truncated.

    --Null found

    Finallly, testing the HASHBYTES function directly gets back to the original form. Again the error doesn't stop further processing, but this time, the execution result itself is undefined, so the test fails and the ELSE branch is followed. At this point, it doesn't matter whether you test for IS NULL or for IS NOT NULL.

    If HASHBYTES('SHA1', REPLICATE(CAST('A' AS varchar(max)), 9000)) is NULL

    print 'Null found'

    else print 'Result is not NULL'

    If HASHBYTES('SHA1', REPLICATE(CAST('A' AS varchar(max)), 9000)) is NOT NULL

    print 'Null found'

    else print 'Result is not NULL'

    -- --------------------- Returns:

    --Msg 8152, Level 16, State 10, Line 2

    --String or binary data would be truncated.

    --Result is not NULL

    --Msg 8152, Level 16, State 10, Line 6

    --String or binary data would be truncated.

    --Result is not NULL

    ------

    edited to add preface with post from Paul.

    ------

  • "Inconsistent" - indeed. I thought this should generate an error, so I don't get the 1 point.

    But this is a bug in SQL Server! Given one often uses hashing to verify things are identical when confirming data integrity, a hash which treats two different strings as identical is a a serious failure. The string size limit for HASHBYTES may well be 8,000 (4,000 for nvarchar), but larger strings HASHBYTES should NOT result in the hash of a truncated string - it should produce an ERROR. Or allow long strings when the data type permits them.

    Has anyone raised this as a bug yet?

  • a hash which treats two different strings as identical is a a serious failure.

    No, it's not. Collisions in a hash are by design. After all, the possible number of different values in a 8000-character string is much larger than the possible number of 160-but hash values.

    That's not what causes the problem here, though. As I'm sure you understand after reading all the other replies. The real problem is that REPLICATE uses the same input as output data type, which in this case has an 8000-byte limit, and that longer values are truncated. Raising a bug is pointless, as this is all both by design and well documented.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • The real problem is that REPLICATE uses the same input as output data type, which in this case has an 8000-byte limit, and that longer values are truncated.

    I would have thought that although this reasonable (if not obvious) behavior of REPLICATE led to the two HASHBYTES calls evaluating identical strings in the original QOD, the real problem is that when HASHBYTES fails due to input-overflow (when we feed it a varchar(max), it apparently does not present a result of NULL that may be tested. Rather, as I showed in my previous post, an IF statement testing the output of HASHBYTES goes to the FALSE branch regardless of whether you test for "IS NULL" or for "IS NOT NULL". That would seem more of a candidate for bug status.

  • john.arnott (2/3/2010)


    The real problem is that REPLICATE uses the same input as output data type, which in this case has an 8000-byte limit, and that longer values are truncated.

    I would have thought that although this reasonable (if not obvious) behavior of REPLICATE led to the two HASHBYTES calls evaluating identical strings in the original QOD, the real problem is that when HASHBYTES fails due to input-overflow (when we feed it a varchar(max), it apparently does not present a result of NULL that may be tested. Rather, as I showed in my previous post, an IF statement testing the output of HASHBYTES goes to the FALSE branch regardless of whether you test for "IS NULL" or for "IS NOT NULL". That would seem more of a candidate for bug status.

    That's the real problem that surfaxced in this discussion; the problem exposed by the question (which I believed David to be responding to) is not related to the input-overflow behaviour.

    And I agree that the overflow behaviour is a lot more bug-worthy than the 8000-character limit on varchar data.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Good Question ... there is always something new to learn



    [font="System"]Bhavesh Patel[/font]

    http://bhaveshgpatel.wordpress.com/
  • Everything that has been observed, regarding overflows, truncations, ELSE clauses executing and so on...is all documented in Books Online, under the following topics:

    SET ANSI_WARNINGS

    SET ARITHABORT

    SET ARITHIGNORE

    The various behaviours described in this thread can all be varied, and explained, by reading those references.

    The other thing is that a scalar sub-query like in the example (SELECT a FROM @tab) always returns a value - since it returns a column reference. If no rows are produced, the scalar sub-query presents a NULL.

Viewing 8 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply