SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


HASHBYTES can help quickly load a Data Warehouse


HASHBYTES can help quickly load a Data Warehouse

Author
Message
mackie
mackie
Valued Member
Valued Member (73 reputation)Valued Member (73 reputation)Valued Member (73 reputation)Valued Member (73 reputation)Valued Member (73 reputation)Valued Member (73 reputation)Valued Member (73 reputation)Valued Member (73 reputation)

Group: General Forum Members
Points: 73 Visits: 219
Hey guys,

Check out this website, will probably solve your issues with Hash Bytes within SSIS:

http://ssismhash.codeplex.com/

God Bless.
Bill Anton
Bill Anton
Mr or Mrs. 500
Mr or Mrs. 500 (515 reputation)Mr or Mrs. 500 (515 reputation)Mr or Mrs. 500 (515 reputation)Mr or Mrs. 500 (515 reputation)Mr or Mrs. 500 (515 reputation)Mr or Mrs. 500 (515 reputation)Mr or Mrs. 500 (515 reputation)Mr or Mrs. 500 (515 reputation)

Group: General Forum Members
Points: 515 Visits: 551
How do you determine what size to make your VARBINARY field to hold the results of the HASHBYTES function?


Is it simply a matter of setting it to be equal to the largest HashByte function input value?


DECLARE @vHashResults AS VARBINARY(40)
DECLARE @vHashInput01 AS CHAR(20)
DECLARE @vHashInput02 AS CHAR(20)

SET @vHashInput01 = 'thisisa20bytechar123'
SET @vHashInput02 = 'thisisa21bytechar1234'

SET @vHashResults = HashBytes('SHA1', @vHashInput01 + @vHashInput02)



Data Enthusiast | @SQLbyoBI | www.opifexsolutions.com
TomThomson
TomThomson
SSChampion
SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)

Group: General Forum Members
Points: 14290 Visits: 12197
wta306 (4/25/2011)
How do you determine what size to make your VARBINARY field to hold the results of the HASHBYTES function?


Is it simply a matter of setting it to be equal to the largest HashByte function input value?


DECLARE @vHashResults AS VARBINARY(40)
DECLARE @vHashInput01 AS CHAR(20)
DECLARE @vHashInput02 AS CHAR(20)

SET @vHashInput01 = 'thisisa20bytechar123'
SET @vHashInput02 = 'thisisa21bytechar1234'

SET @vHashResults = HashBytes('SHA1', @vHashInput01 + @vHashInput02)


I would be somewhat horrified if HashBytes('SHA1', <anything>Wink delived a result with any length other than 160 bits (20 bytes)! That's what MS thinks it delivers too, according to BoL.

Tom

Bill Anton
Bill Anton
Mr or Mrs. 500
Mr or Mrs. 500 (515 reputation)Mr or Mrs. 500 (515 reputation)Mr or Mrs. 500 (515 reputation)Mr or Mrs. 500 (515 reputation)Mr or Mrs. 500 (515 reputation)Mr or Mrs. 500 (515 reputation)Mr or Mrs. 500 (515 reputation)Mr or Mrs. 500 (515 reputation)

Group: General Forum Members
Points: 515 Visits: 551
Tom.Thomson (4/26/2011)
wta306 (4/25/2011)
How do you determine what size to make your VARBINARY field to hold the results of the HASHBYTES function?


Is it simply a matter of setting it to be equal to the largest HashByte function input value?


DECLARE @vHashResults AS VARBINARY(40)
DECLARE @vHashInput01 AS CHAR(20)
DECLARE @vHashInput02 AS CHAR(20)

SET @vHashInput01 = 'thisisa20bytechar123'
SET @vHashInput02 = 'thisisa21bytechar1234'

SET @vHashResults = HashBytes('SHA1', @vHashInput01 + @vHashInput02)


I would be somewhat horrified if HashBytes('SHA1', <anything>Wink delived a result with any length other than 160 bits (20 bytes)! That's what MS thinks it delivers too, according to BoL.


Tom - thank you for the response.

Unless I'm mistaken, MD5 is only 128 bits (16bytes)...and yet the author of the original article stores the resulting values in a VARBINARY(100)? What am I missing?

Data Enthusiast | @SQLbyoBI | www.opifexsolutions.com
TomThomson
TomThomson
SSChampion
SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)

Group: General Forum Members
Points: 14290 Visits: 12197
wta306 (4/27/2011)
Unless I'm mistaken, MD5 is only 128 bits (16bytes)...and yet the author of the original article stores the resulting values in a VARBINARY(100)? What am I missing?

Yes, MD5 is 16 bytes. I suspect VARBINARY(100) was used because it will cater for using SHA1 (20 bytes) and, when SHA2 family of hash functions (SHA256 - 32 bytes, SHA512 - 64 bytes, SHA224 -28 bytes, and SHA384 - 48 bytes) are supported VARBINARY(100) will cater for all of them; and maybe it will even cater for SHA3. Presumably we'll know when NIST announces the winner of the SHA3 design contest next year; I'm pretty sure I've seen something about lengths for SHA3, so maybe we could know already, but I don't recall what so maybe not; I suspect VARBINARY(100) may turn out to be too short to cover all variants of SHA3 (but it will be years before SQL Server supports SHA3).

To be certain why VARBINARY(100) was used in the article we'll have to hope for a response from the author; but I'm reasonably happy with my guess that it was intended to cope with known future growth in hash sizes (to 512 bits) and have a little bit to spare. After all, it has to be at least VARBINARY(20) not BINARY(16) to cope with all hashes supported by SQL Server 2008 unless one uses different data type depending on which hash is being used (which would make changing from one hash to another twice as hard) and there's no extra overhead in using VARBINART(64) or VARBINARY(100) so that it can cover SHA2 instead of restricting it to MDn and SHA1 by using VARBINARY(20). If I'd been writing something about hashing and wanted to cover future possibilities I might even have chosen VARBINARY(256) in the hope of catering for SHA4 when/if that happens!

Tom

Bill Anton
Bill Anton
Mr or Mrs. 500
Mr or Mrs. 500 (515 reputation)Mr or Mrs. 500 (515 reputation)Mr or Mrs. 500 (515 reputation)Mr or Mrs. 500 (515 reputation)Mr or Mrs. 500 (515 reputation)Mr or Mrs. 500 (515 reputation)Mr or Mrs. 500 (515 reputation)Mr or Mrs. 500 (515 reputation)

Group: General Forum Members
Points: 515 Visits: 551
Tom.Thomson (4/27/2011)
wta306 (4/27/2011)
Unless I'm mistaken, MD5 is only 128 bits (16bytes)...and yet the author of the original article stores the resulting values in a VARBINARY(100)? What am I missing?

Yes, MD5 is 16 bytes. I suspect VARBINARY(100) was used because it will cater for using SHA1 (20 bytes) and, when SHA2 family of hash functions (SHA256 - 32 bytes, SHA512 - 64 bytes, SHA224 -28 bytes, and SHA384 - 48 bytes) are supported VARBINARY(100) will cater for all of them; and maybe it will even cater for SHA3. Presumably we'll know when NIST announces the winner of the SHA3 design contest next year; I'm pretty sure I've seen something about lengths for SHA3, so maybe we could know already, but I don't recall what so maybe not; I suspect VARBINARY(100) may turn out to be too short to cover all variants of SHA3 (but it will be years before SQL Server supports SHA3).

To be certain why VARBINARY(100) was used in the article we'll have to hope for a response from the author; but I'm reasonably happy with my guess that it was intended to cope with known future growth in hash sizes (to 512 bits) and have a little bit to spare. After all, it has to be at least VARBINARY(20) not BINARY(16) to cope with all hashes supported by SQL Server 2008 unless one uses different data type depending on which hash is being used (which would make changing from one hash to another twice as hard) and there's no extra overhead in using VARBINART(64) or VARBINARY(100) so that it can cover SHA2 instead of restricting it to MDn and SHA1 by using VARBINARY(20). If I'd been writing something about hashing and wanted to cover future possibilities I might even have chosen VARBINARY(256) in the hope of catering for SHA4 when/if that happens!


wow, ok - that makes a ton of sense. thank you!!

Data Enthusiast | @SQLbyoBI | www.opifexsolutions.com
Cyclone
Cyclone
Valued Member
Valued Member (68 reputation)Valued Member (68 reputation)Valued Member (68 reputation)Valued Member (68 reputation)Valued Member (68 reputation)Valued Member (68 reputation)Valued Member (68 reputation)Valued Member (68 reputation)

Group: General Forum Members
Points: 68 Visits: 244
Hi,

We use SHA1 and store the result as Varbinary(MAX).
Do you see any problems with this ?

Thanks
TomThomson
TomThomson
SSChampion
SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)

Group: General Forum Members
Points: 14290 Visits: 12197
Cyclone (4/27/2011)
Hi,

We use SHA1 and store the result as Varbinary(MAX).
Do you see any problems with this ?

Thanks

Yes, there is a problem: a varbinary(max) column can't be a key column in an index, so if you want to use your hash in an index you shouldn't use varbinary(max). That's the only problem I can think of.

Tom

Langston Montgomery
Langston Montgomery
Valued Member
Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)

Group: General Forum Members
Points: 64 Visits: 508
Has anyone come across cases where SHA1 has produced output longer than 20 bytes, because I think I have. We are using HASHBYTES to detect changes in our data warehouse, and we were getting type 2 changes, even though all the columns values were exactly the same between the most current and previous rows; even the hashbyte value looked the same!

But, upon looking closer, the HASHBYTE function was returning this: 0xEE0816119692A1C4DCC2045EA1E37CD47EFF4900
But, the value in our varbinary(20) column in the table was storing this: 0xEE0816119692A1C4DCC2045EA1E37CD47EFF49

Notice the two zeros at the end of the function's output. For some reason, when SQL writes the output from the function to the table it truncates those last two "characters" at the end!

We have no idea why it does that, but when we increase the varbinary column in the table from 20 to 40, SQL doesn't truncate the output anymore, and all is well.

Anyway, if anyone has any insight on why SQL seems to be truncating the output from the function as it writes to the varbinary(20) column, please let me know, I'm curious!

L' Eomot Inversé (4/26/2011)
wta306 (4/25/2011)
How do you determine what size to make your VARBINARY field to hold the results of the HASHBYTES function?


Is it simply a matter of setting it to be equal to the largest HashByte function input value?


DECLARE @vHashResults AS VARBINARY(40)
DECLARE @vHashInput01 AS CHAR(20)
DECLARE @vHashInput02 AS CHAR(20)

SET @vHashInput01 = 'thisisa20bytechar123'
SET @vHashInput02 = 'thisisa21bytechar1234'

SET @vHashResults = HashBytes('SHA1', @vHashInput01 + @vHashInput02)


I would be somewhat horrified if HashBytes('SHA1', <anything>Wink delived a result with any length other than 160 bits (20 bytes)! That's what MS thinks it delivers too, according to BoL.

TomThomson
TomThomson
SSChampion
SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)

Group: General Forum Members
Points: 14290 Visits: 12197
Langston Montgomery (10/8/2011)
Has anyone come across cases where SHA1 has produced output longer than 20 bytes, because I think I have. We are using HASHBYTES to detect changes in our data warehouse, and we were getting type 2 changes, even though all the columns values were exactly the same between the most current and previous rows; even the hashbyte value looked the same!

But, upon looking closer, the HASHBYTE function was returning this: 0xEE0816119692A1C4DCC2045EA1E37CD47EFF4900
But, the value in our varbinary(20) column in the table was storing this: 0xEE0816119692A1C4DCC2045EA1E37CD47EFF49
I don't think "longer that 20 bytes" is relevant, as the number ending 00 is 20 bytes (the one without 00 is only 19).

Notice the two zeros at the end of the function's output. For some reason, when SQL writes the output from the function to the table it truncates those last two "characters" at the end!

We have no idea why it does that, but when we increase the varbinary column in the table from 20 to 40, SQL doesn't truncate the output anymore, and all is well.

Anyway, if anyone has any insight on why SQL seems to be truncating the output from the function as it writes to the varbinary(20) column, please let me know, I'm curious!

I tried a few things to see if I could reproduce this behaviour, using hashbytes to generate values for local variables, for columns in table variable, for columns in temp tables, and for columns in premanent tables, but got no truncation ever when using varchar(20). Maybe I'm on a different version - I'm using SQL 2008 R2.

Tom

Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search