Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


checksums and unicode data


checksums and unicode data

Author
Message
RBarryYoung
RBarryYoung
SSCrazy Eights
SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)

Group: General Forum Members
Points: 9438 Visits: 9517
Jeff Moden (8/9/2009)
Paul White (8/8/2009)
HashBytes (perhaps including the PK as a salt) is a good choice if the task is to detect changes.
CHECKSUM is quick and efficient for creating hash indexes - which are awesome if you need to search long strings (for example).


Even then, it can and will give an incorrect answer if you rely only on it. I'll admit that CHECKSUM will help narrow things down on long strings, though.

I'm, going to have to disagree with this one, Jeff, at least as far as HashBytes using the SHA1 algorithm is concerned. Assuming you did a Trillion HashByte comparisons of different data a day (and no one on earth is doing that many), it would still take about 6000 to 12000 Years before you were likely to get an accidental match. Those are odds that I can live with. :-)

-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung
Proactive Performance Solutions, Inc.
"Performance is our middle name."
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)

Group: General Forum Members
Points: 45125 Visits: 39922
Ummmm.... does SQL Server's CHECKSUM use the SHA1 algorithm because I was talking about that CHECKSUM. Obviously that doesn't take 6000 to 12000 years to happen because I've seen two of these same types of posts in the last 3 months. ;-)

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
RBarryYoung
RBarryYoung
SSCrazy Eights
SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)

Group: General Forum Members
Points: 9438 Visits: 9517
Jeff Moden (8/10/2009)
Ummmm.... does SQL Server's CHECKSUM use the SHA1 algorithm because I was talking about that CHECKSUM. Obviously that doesn't take 6000 to 12000 years to happen because I've seen two of these same types of posts in the last 3 months. ;-)

Oh no, sorry, CHECKSUM produces duplicates if you so much as swap the order of two of the characters. So it stinks for this purpose, and we are in agreement on that. :-D

-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung
Proactive Performance Solutions, Inc.
"Performance is our middle name."
RBarryYoung
RBarryYoung
SSCrazy Eights
SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)

Group: General Forum Members
Points: 9438 Visits: 9517
To clarify: HashBytes w/ SHA1 is nigh-unbreakable.

-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung
Proactive Performance Solutions, Inc.
"Performance is our middle name."
Paul White
Paul White
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10338 Visits: 11350
RBarryYoung (8/10/2009)
To clarify: HashBytes w/ SHA1 is nigh-unbreakable.

Hey Barry,

On the other hand, it is relatively slow, only works on strings, and returns varbinary(8000).
If the task is to detect changes in a row of data, would you use HashBytes (SHA1) alone?

Paul



Paul White
SQLPerformance.com
SQLblog.com
@SQL_Kiwi
Lowell
Lowell
SSChampion
SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)

Group: General Forum Members
Points: 14959 Visits: 38977
just looking for confirmation...i used my google-fu and found info and examples for using hashbytes with SHA1, ie

select HASHBYTES('SHA1','My Plain Text')
--results
0x6D99DDF6FE7A32547B6766E0BF88B1F50835F0FF



everything i read says that this is a one way operation, so you can use it to generate a unique value like checksum, but you cannot unhash this back to it's original value, right?

so this is not encryption, but rather a tool to generate a unique identifier, correct?

Lowell

--
help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)

Group: General Forum Members
Points: 45125 Visits: 39922
Paul White (8/10/2009)
RBarryYoung (8/10/2009)
To clarify: HashBytes w/ SHA1 is nigh-unbreakable.

Hey Barry,

On the other hand, it is relatively slow, only works on strings, and returns varbinary(8000).
If the task is to detect changes in a row of data, would you use HashBytes (SHA1) alone?

Paul


In Lowell's example, it also turned a 13 byte string into a 20 byte Varbinary so it's expensive, too.

But, I believe that Barry's point was simply that it is unbreakable... not like CHECKSUM at all. :-)

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Paul White
Paul White
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10338 Visits: 11350
Jeff Moden (8/10/2009)
In Lowell's example, it also turned a 13 byte string into a 20 byte Varbinary so it's expensive, too. But, I believe that Barry's point was simply that it is unbreakable... not like CHECKSUM at all. :-)

So...SHA1 always returns 160 bits (20 bytes) regardless of the input. HashBytes also appears to be limited to 8000 bytes of input - 8000 ANSI characters or 4000 Unicode. Finally, it doesn't appear unbreakable either: Wikipedia Link.

I mentioned HashBytes way back - now I'm just pointing out some of the reasons that I have yet to use it in a real production system.

Paul



Paul White
SQLPerformance.com
SQLblog.com
@SQL_Kiwi
RBarryYoung
RBarryYoung
SSCrazy Eights
SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)

Group: General Forum Members
Points: 9438 Visits: 9517
Paul White (8/10/2009)
RBarryYoung (8/10/2009)
To clarify: HashBytes w/ SHA1 is nigh-unbreakable.

Hey Barry,

On the other hand, it is relatively slow, only works on strings, and returns varbinary(8000).
If the task is to detect changes in a row of data, would you use HashBytes (SHA1) alone?

Paul

Despite the datatype, I think that it actually only returns 19 bytes. And since VARBINARY is a string and *everything* converts to varbinary very easily, thats not that big a problem (nulls are a bigger problem). So yeah, I *might* use it in situations were I was doing remote comparisons through linked servers and just recording the hash every day for later comparison. The problem with the field by field comparison is that you have to have the entire previous record around to do it.

Though I think that I usually used MD5 in the past.

-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung
Proactive Performance Solutions, Inc.
"Performance is our middle name."
RBarryYoung
RBarryYoung
SSCrazy Eights
SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)SSCrazy Eights (9.4K reputation)

Group: General Forum Members
Points: 9438 Visits: 9517
Paul White (8/10/2009)
Jeff Moden (8/10/2009)
In Lowell's example, it also turned a 13 byte string into a 20 byte Varbinary so it's expensive, too. But, I believe that Barry's point was simply that it is unbreakable... not like CHECKSUM at all. :-)

So...SHA1 always returns 160 bits (20 bytes) regardless of the input. HashBytes also appears to be limited to 8000 bytes of input - 8000 ANSI characters or 4000 Unicode. Finally, it doesn't appear unbreakable either: Wikipedia Link.

I mentioned HashBytes way back - now I'm just pointing out some of the reasons that I have yet to use it in a real production system.

Paul

Hmm, I though that I counted 19 bytes... Anyway, qualifying myself again :-), "nigh-unbreakable" was referring to its use as a signature in detecting random data changes, not as a security device.

-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung
Proactive Performance Solutions, Inc.
"Performance is our middle name."
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search