http://www.sqlservercentral.com/blogs/steve_jones/2012/04/11/hashing-collisions/

Printed 2014/04/17 06:30PM

Hashing Collisions

2012/04/11

One of the problems with hashing is that you can have collisions from values that are not very similar. This means that if you are using hashing as a way to identify similar values, you need to make further checks with the original data after the hash matches are gathered.

This post will show a few examples of the collisions that can occur if you use the CHECKSUM() or BINARY_CHECKSUM() functions.

If we examine this code:

-- Checksum
declare 
  @i varchar(200)
, @j varchar(200);

select @i = 'LE';
select @j = 'AAAAAAAAAAAAAAAALE';

select 
  Plaintext = @i
, checksum = CHECKSUM(@i)
UNION ALL 
SELECT
  Plaintext = @j
, checksum = CHECKSUM(@j);
GO

This returns a result like this:

hashing2

Note that these two values are the same as far as the checksum hash goes.

If we switch to BINARY_CHECKSUM(), we can get similar results.

-- binary_checksum is no better
declare 
  @i varchar(200)
, @j varchar(200)
, @k varchar(200);


select @i = 'LE'
select @j = 'Ou'
select @k = 'MU'

select 
  Plaintext = @i
, BINARY_CHECKSUM(@i)
UNION ALL 
SELECT
  Plaintext = @j
, BINARY_CHECKSUM(@j)
UNION ALL 
SELECT
  Plaintext = @k
, BINARY_CHECKSUM(@k)
GO

hashing3

While these two functions can be useful, you do have to be careful with the results. A matching hash from these functions does not mean that the source data is the same.


Filed under: Blog Tagged: encryption, security, syndicated, T-SQL
Copyright © 2002-2014 Simple Talk Publishing. All Rights Reserved. Privacy Policy. Terms of Use. Report Abuse.