Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««123»»

Comparing a hash Expand / Collapse
Author
Message
Posted Monday, February 1, 2010 8:30 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: 2 days ago @ 12:32 PM
Points: 2,437, Visits: 1,309
Good question! I learned something today.
Post #857226
Posted Monday, February 1, 2010 10:43 AM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Yesterday @ 9:46 PM
Points: 21,187, Visits: 14,880
Thanks. I, too, learned something new today.



Jason AKA CirqueDeSQLeil
I have given a name to my pain...
MCM SQL Server


SQL RNNR

Posting Performance Based Questions - Gail Shaw
Posting Data Etiquette - Jeff Moden
Hidden RBAR - Jeff Moden
VLFs and the Tran Log - Kimberly Tripp
Post #857404
Posted Tuesday, February 2, 2010 1:58 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Yesterday @ 1:45 AM
Points: 2,826, Visits: 3,866
Now this one still bothers me...

Why does it execute the no part after the 8125 error?
By the way, it only executes it with the following SET options:
SET XACT_ABORT OFF
SET ANSI_WARNINGS ON

Funny enough, one would think that if the warnings are on, the error would stop execution, but instead you need to turn warnings off to make SQL Server stop the execution. And unfortunately BOL does not really assist in understanding this.

At least the XACT_ABORT setting makes some kind of sense with regards to the error.


Best Regards,
Chris Büttner
Post #857770
Posted Tuesday, February 2, 2010 3:30 AM


SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Friday, July 5, 2013 11:51 PM
Points: 488, Visits: 336


SELECT HASHBYTES('SHA1', REPLICATE(CAST('A' AS varchar(max)), 8000)) returns a 20-byte value, and SELECT HASHBYTES('SHA1', REPLICATE(CAST('A' AS varchar(max)), 9000)) returns the same warning message and no value.

Why the IF comparison between the two doesn't abort but chooses to execute the ELSE clause is, in all honesty, beyond me. I firsth thought it was because the too-long input causes the function to return NULL (in addition to the message), but when I use IF HASHBYTES(...) IS NULL, I still got the "no" result.

Anyhoo, I stand corrected. The HASHBYTES does apparently indeed have an (undocumented!!) limit of 8000 bytes on it's input string. Thanks for pointing that out!


Despite the fact that I knew that Hashbytes has an 8000 byte limit on its input, I was stumped at the very point you have highlighted above - I thought it would error out. But it doesn't.

The interesting thing I found when I did some testing on my sql server 2008 developer edition is that:

When I run the code -

select HASHBYTES('sha1', REPLICATE(cast('a' as varchar(max)), 9000))

I get the error msg as indicated by you.

BUT when I run

select HASHBYTES('sha1', REPLICATE('a', 9000))

I do not get an error but a certain Hashbyte output. I realized further that actually, Replicate('a', 9000) was actually Replicating the 'A' value upto 8000 bytes and ignoring the rest. It wasn't erroring out. So in effect, the input condition to HASHBYTES in the question posted wasn't getting violated as presented in the QoTD.
"
However, casting the 'A' to Varchar(Max) and then replicating it 9000 times produces 9000 "A's" and THIS causes the input to HASHBYTES to exceed the permissible limit.

Running this code produces an ERROR Message BUT and the Output NO.

IF HASHBYTES('SHA1', REPLICATE(cast('A' as varchar(max)), 9000)) = HASHBYTES('SHA1', REPLICATE('A', 8000))
PRINT 'yes.'
ELSE PRINT 'no.'


Saurabh Dwivedy
___________________________________________________________

My Blog: http://tinyurl.com/dwivedys

For better, quicker answers, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537

Be Happy!
Post #857812
Posted Tuesday, February 2, 2010 10:03 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Sunday, April 27, 2014 6:26 AM
Points: 1,521, Visits: 3,036
Hugo Kornelis (2/1/2010)

SELECT HASHBYTES('SHA1', REPLICATE(CAST('A' AS varchar(max)), 8000)) returns a 20-byte value, and SELECT HASHBYTES('SHA1', REPLICATE(CAST('A' AS varchar(max)), 9000)) returns the same warning message and no value. Why the IF comparison between the two doesn't abort but chooses to execute the ELSE clause is, in all honesty, beyond me. I firsth thought it was because the too-long input causes the function to return NULL (in addition to the message), but when I use IF HASHBYTES(...) IS NULL, I still got the "no" result.


This piqued my curiosity. Could the "no" result from your direct test
IF  HASHBYTES('SHA1', REPLICATE(CAST('A' AS varchar(max)), 9000)) 
is NULL
PRINT 'yes.';
ELSE
PRINT 'no.';

...be because the result of the IF test is NULL? When the HASHBYTES() function fails with the input truncation error, it does not return NULL, but rather doesn't return anything. Then, the IF statement executes and doesn't find a NULL, so returns FALSE, which of course leads to the "Print 'No'" branch.

Running in SSMS with results to text makes it a bit easier to see because it says "NULL" for null results and nothing for the failed HASHBYTES().
select  IsNull(HASHBYTES('SHA1', REPLICATE(CAST('A' AS varchar(max)), 9000)),0x2222aaaa9999) 
select HASHBYTES('SHA1', REPLICATE(CAST('A' AS varchar(max)), 4000))
Declare @MyNull varbinary
select @myNull

Returns
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Msg 8152, Level 16, State 10, Line 1
String or binary data would be truncated.


------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0xD5FE2200837AB7B4FB073787F8E531D6C6A8CDD2

(1 row(s) affected)


----
NULL

(1 row(s) affected)

Post #858085
Posted Tuesday, February 2, 2010 2:37 PM
Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Friday, May 9, 2014 12:47 AM
Points: 3,448, Visits: 4,407
john.arnott (2/2/2010)
When the HASHBYTES() function fails with the input truncation error, it does not return NULL, but rather doesn't return anything.

But usually SQL Server considers "nothing" as a NULL value.

Here is the example:
declare @tab table (a int)

select a from @tab

if (select a from @tab) is null
print 'null'
else
print 'not null'

The result is:
a
-----------

(0 row(s) affected)

null


As we can see, "nothing" = "null" in this case.

I think that the warning (or error, or exception, or whatever) plays its important role.

Here is the modified example:
declare @tab table (a int)

select a from @tab where 1 = 1/0

if (select a from @tab where 1 = 1/0) is null
print 'null'
else
print 'not null'

The result is:
a
-----------
Msg 8134, Level 16, State 1, Line 3
Divide by zero error encountered.

Msg 8134, Level 16, State 1, Line 5
Divide by zero error encountered.
not null

Does it look like "nothing" <> "null"? I don't think so, because of the error message. I think that SQL Server executes "select a from @tab where 1 = 1/0", gets an error, and does not even try to compare this with NULL. Since there's no TRY..CATCH block, batch execution continues and... steps into the ELSE block. But why is the ELSE block not skipped? Maybe it's "by design"
Post #858265
Posted Tuesday, February 2, 2010 3:32 PM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Sunday, April 27, 2014 6:26 AM
Points: 1,521, Visits: 3,036
The "inconsistencies" I point to in this post may actually be controlled by the use of settings as pointed out by Paul White in a post later in this thread. Here's that post:


Paul White NZ (3/30/2010)
Everything that has been observed, regarding overflows, truncations, ELSE clauses executing and so on...is all documented in Books Online, under the following topics:

SET ANSI_WARNINGS
SET ARITHABORT
SET ARITHIGNORE

The various behaviours described in this thread can all be varied, and explained, by reading those references.

The other thing is that a scalar sub-query like in the example (SELECT a FROM @tab) always returns a value - since it returns a column reference. If no rows are produced, the scalar sub-query presents a NULL.





vk-kirov,

Your example is consistent with the HASHBYTES example from Hugo. And it DOES look as though SQL is not treating "no result" as the same thing as "value undetermined" or NULL. But then, I suppose it makes a certain amount of logical sense that a test for NULL return FALSE when the thing being tested is not defined; and "not defined" is not the same as "has an undetermined value".

TSQL is inconsistent. A simple conversion error stops the processing.
Declare @x datetime

set @x = convert(datetime,'2010-0518')
If @x is NULL
print 'Null found'
else print '@x is not NULL'
--------------- Returns:
--Msg 241, Level 16, State 1, Line 3
--Conversion failed when converting datetime from character string.

But the inconsistency is that the HASHBYTES overflow allows the IF to continue, which finds that the local variable has an undefined value, expressed as NULL:
Declare @b varbinary
Set @b = HASHBYTES('SHA1', REPLICATE(CAST('A' AS varchar(max)), 9000))
If @b is NULL
print 'Null found'
else print '@b is not NULL'
--------------- Returns:
--Msg 8152, Level 16, State 10, Line 2
--String or binary data would be truncated.
--Null found

Finallly, testing the HASHBYTES function directly gets back to the original form. Again the error doesn't stop further processing, but this time, the execution result itself is undefined, so the test fails and the ELSE branch is followed. At this point, it doesn't matter whether you test for IS NULL or for IS NOT NULL.

If HASHBYTES('SHA1', REPLICATE(CAST('A' AS varchar(max)), 9000)) is NULL
print 'Null found'
else print 'Result is not NULL'

If HASHBYTES('SHA1', REPLICATE(CAST('A' AS varchar(max)), 9000)) is NOT NULL
print 'Null found'
else print 'Result is not NULL'

-- --------------------- Returns:
--Msg 8152, Level 16, State 10, Line 2
--String or binary data would be truncated.
--Result is not NULL
--Msg 8152, Level 16, State 10, Line 6
--String or binary data would be truncated.
--Result is not NULL

------
edited to add preface with post from Paul.
------
Post #858283
Posted Wednesday, February 3, 2010 9:51 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Thursday, July 17, 2014 10:41 AM
Points: 115, Visits: 744
"Inconsistent" - indeed. I thought this should generate an error, so I don't get the 1 point.

But this is a bug in SQL Server! Given one often uses hashing to verify things are identical when confirming data integrity, a hash which treats two different strings as identical is a a serious failure. The string size limit for HASHBYTES may well be 8,000 (4,000 for nvarchar), but larger strings HASHBYTES should NOT result in the hash of a truncated string - it should produce an ERROR. Or allow long strings when the data type permits them.

Has anyone raised this as a bug yet?

Post #858825
Posted Wednesday, February 3, 2010 12:38 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 11:59 AM
Points: 5,916, Visits: 8,165
a hash which treats two different strings as identical is a a serious failure.
No, it's not. Collisions in a hash are by design. After all, the possible number of different values in a 8000-character string is much larger than the possible number of 160-but hash values.

That's not what causes the problem here, though. As I'm sure you understand after reading all the other replies. The real problem is that REPLICATE uses the same input as output data type, which in this case has an 8000-byte limit, and that longer values are truncated. Raising a bug is pointless, as this is all both by design and well documented.



Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Post #858950
Posted Wednesday, February 3, 2010 12:53 PM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Sunday, April 27, 2014 6:26 AM
Points: 1,521, Visits: 3,036
The real problem is that REPLICATE uses the same input as output data type, which in this case has an 8000-byte limit, and that longer values are truncated.

I would have thought that although this reasonable (if not obvious) behavior of REPLICATE led to the two HASHBYTES calls evaluating identical strings in the original QOD, the real problem is that when HASHBYTES fails due to input-overflow (when we feed it a varchar(max), it apparently does not present a result of NULL that may be tested. Rather, as I showed in my previous post, an IF statement testing the output of HASHBYTES goes to the FALSE branch regardless of whether you test for "IS NULL" or for "IS NOT NULL". That would seem more of a candidate for bug status.
Post #858967
« Prev Topic | Next Topic »

Add to briefcase ««123»»

Permissions Expand / Collapse