SQL Server Encryption - Hashing Collisions

A hash is a computation that transforms one set of data into another (hopefully smaller) set of data. So a hash on your 2,000 character blog post should generate a smaller, 10-20 byte value. In doing that, obviously there are many more possible 2,000 character sets of data then there are 10 or 20 byte sets of data. Most of those don't make sense since they would be random scrambling of character values, but they still exist.

When two or more large set sets of data were to generate the same 20byte hash, we have a collision. And collisions are bad.

Why?

The short answer is that you then can't be sure that the source data the 2,000 character strings are equivalent. Their hashes are, but they aren't. They could be, and that's one thing that you are usually using a has to check. For example, one common use of hashing is to detect if data has changed. You might calculate a hash from your application, and one from the database and compare them. If they're the same, you assume nothing has changed.

Why do you do this? It is often much less resource intensive than comparing all fields, or sending a large amount of data to (or from) the server. You could more easily send a 20 byte hash than a 2,000 byte chunk of data.

The HASHBYTES function in SQL Server can be used to generate hashes with a few algorithms, and SHA-1 is the best one to use. It has a low chance of collisions and is considered fairly secure by the cryptographic community. Not great, and it's been superseded by SHA-2, but it should work for most data detection changes.

However there are a few other functions have been supported for some time. There are CHECKSUM, BINARY_CHECKSUM, and CHECKSUM_AGG that exist. You shouldn't use these because of a high chance of collisions. I'll repeat that.

Don't use these!

If you are detecting data changes, consider this code:

select 'LE' 'String', checksum('le') 'CheckSumHash'

It returns

Now, consider that you run this:

select 'AAAAAAAAAAAAAAAALE' 'String', checksum('AAAAAAAAAAAAAAAALE') 'CheckSumHash'

you get this:

Not exactly what you were expecting!

The issue is that there are changes of collisions with these functions, perhaps too high to take the chance of two values being incorrectly compared. I would be very careful about using these functions, and if you do, be sure that you note this potential issue to all support people. If the application appears to be missing changes because of a checksum, be sure you do not have any type of collision taking place.

Book Review: Big Red - Voyage of a Trident Submarine

by Andy Warren

SQLServerCentral.com

Blogs

I've grown up reading Tom Clancy and probably most of you have at least seen Red October, so this book caught my eye when browsing used books for a recent trip. It's a fairly human look at what's involved in sailing on a Trident missile submarine...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-03-10

1,439 reads

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

by Robert Davis

SQLServerCentral.com

Blogs

Question: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? This question was sent to me via email. My reply follows. Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? Databases to be mirrored are currently running on 2005 SQL instances but will be upgraded to 2008 SQL in the near future.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-23

1,567 reads

Inserting Markup into a String with SQL

by Phil Factor

SQLServerCentral.com

T-SQL

In which Phil illustrates an old trick using STUFF to intert a number of substrings from a table into a string, and explains why the technique might speed up your code...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-18

1,631 reads

Networking - Part 4

by Andy Warren

SQLServerCentral.com

Blogs

You may want to read Part 1 , Part 2 , and Part 3 before continuing. This time around I'd like to talk about social networking. We'll start with social networking. Facebook, MySpace, and Twitter are all good examples of using technology to let...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-17

1,530 reads

Speaking at Community Events - More Thoughts

by Andy Warren

SQLServerCentral.com

Blogs

Last week I posted Speaking at Community Events - Time to Raise the Bar?, a first cut at talking about to what degree we should require experience for speakers at events like SQLSaturday as well as when it might be appropriate to add additional focus/limitations on the presentations that are accepted. I've got a few more thoughts on the topic this week, and I look forward to your comments.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-13

360 reads

SQL Server Encryption - Hashing Collisions

Rate

Share

Share

Rate

SQL Server Encryption - Hashing Collisions

Rate

Share

Share

Rate

Related content

Book Review: Big Red - Voyage of a Trident Submarine

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

Inserting Markup into a String with SQL

Networking - Part 4

Speaking at Community Events - More Thoughts