Blog Post

Hashing Collisions

,

One of the problems with hashing is that you can have collisions from values that are not very similar. This means that if you are using hashing as a way to identify similar values, you need to make further checks with the original data after the hash matches are gathered.

This post will show a few examples of the collisions that can occur if you use the CHECKSUM() or BINARY_CHECKSUM() functions.

If we examine this code:

-- Checksum
declare 
  @i varchar(200)
, @j varchar(200);
select @i = 'LE';
select @j = 'AAAAAAAAAAAAAAAALE';
select 
  Plaintext = @i
, checksum = CHECKSUM(@i)
UNION ALL 
SELECT
  Plaintext = @j
, checksum = CHECKSUM(@j);
GO

This returns a result like this:

hashing2

Note that these two values are the same as far as the checksum hash goes.

If we switch to BINARY_CHECKSUM(), we can get similar results.

-- binary_checksum is no better
declare 
  @i varchar(200)
, @j varchar(200)
, @k varchar(200);
select @i = 'LE'
select @j = 'Ou'
select @k = 'MU'
select 
  Plaintext = @i
, BINARY_CHECKSUM(@i)
UNION ALL 
SELECT
  Plaintext = @j
, BINARY_CHECKSUM(@j)
UNION ALL 
SELECT
  Plaintext = @k
, BINARY_CHECKSUM(@k)
GO

hashing3

While these two functions can be useful, you do have to be careful with the results. A matching hash from these functions does not mean that the source data is the same.

Filed under: Blog Tagged: encryption, security, syndicated, T-SQL

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating