Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

The Voice of the DBA

Steve Jones is the editor of SQLServerCentral.com and visits a wide variety of data related topics in his daily editorial. Steve has spent years working as a DBA and general purpose Windows administrator, primarily working with SQL Server since it was ported from Sybase in 1990. You can follow Steve on Twitter at twitter.com/way0utwest

Hashing Collisions

One of the problems with hashing is that you can have collisions from values that are not very similar. This means that if you are using hashing as a way to identify similar values, you need to make further checks with the original data after the hash matches are gathered.

This post will show a few examples of the collisions that can occur if you use the CHECKSUM() or BINARY_CHECKSUM() functions.

If we examine this code:

-- Checksum
declare 
  @i varchar(200)
, @j varchar(200);

select @i = 'LE';
select @j = 'AAAAAAAAAAAAAAAALE';

select 
  Plaintext = @i
, checksum = CHECKSUM(@i)
UNION ALL 
SELECT
  Plaintext = @j
, checksum = CHECKSUM(@j);
GO

This returns a result like this:

hashing2

Note that these two values are the same as far as the checksum hash goes.

If we switch to BINARY_CHECKSUM(), we can get similar results.

-- binary_checksum is no better
declare 
  @i varchar(200)
, @j varchar(200)
, @k varchar(200);


select @i = 'LE'
select @j = 'Ou'
select @k = 'MU'

select 
  Plaintext = @i
, BINARY_CHECKSUM(@i)
UNION ALL 
SELECT
  Plaintext = @j
, BINARY_CHECKSUM(@j)
UNION ALL 
SELECT
  Plaintext = @k
, BINARY_CHECKSUM(@k)
GO

hashing3

While these two functions can be useful, you do have to be careful with the results. A matching hash from these functions does not mean that the source data is the same.


Filed under: Blog Tagged: encryption, security, syndicated, T-SQL

Comments

Leave a comment on the original post [voiceofthedba.wordpress.com, opens in a new window]

Loading comments...