How to use HASHBYTES function in sql server for multiple columns

  • I have a requirement wherein I have to create hashvalue which consist of all columns of a table. With Checksum this can be done easily, but Checksum is not recommended as per Microsoft: If at least one of the values in the expression list changes, the list checksum will probably change. However, this is not guaranteed. Therefore, to detect whether values have changed, we recommend the use of CHECKSUM only if your application can tolerate an occasional missed change. Otherwise, consider using HashBytes instead. With a specified MD5 hash algorithm, the probability that HashBytes will return the same result, for two different inputs, is much lower compared to CHECKSUM.

    HASHBYTES accepts only 2 parameters (algorithm type, column)

    Now the problem is even though HASHBYTES is more reliable compared to checksum but there doesn't seem to be an easy way to create it on multiple columns.

    An example in the checksum,

    create table dbo.chksum_demo1(  id int not null,  name varchar(25),  address varchar(250),  HashValue as Checksum (id,name,address)  CONSTRAINT PK_chksum_demo1 PRIMARY KEY (Id))

    How can we do above using Hashbytes instead of checksum?

  • Hi,

    You will need to set up the HashValue column to use two parameters. The first is the algorithm (e.g. SHA1); the second is the unique value to be hashed.  I suggest concatenating all your values with a suitable separator to ensure the same value in an adjacent column is not hashed the same.  I have used "|" in the example:  


    create table dbo.hash_demo1( id int not null, name varchar(25), address varchar(250), HashValue as hashbytes ('SHA1',concat_ws('|',id,name,address)) CONSTRAINT PK_hash_demo1 PRIMARY KEY (Id));

    insert dbo.hash_demo1( id , name , address)
    values (1, 'bob', '123 A street'),
        (2, 'mark', '564 A street'),
        (3, 'ricky', '75 A street'),
        (4, 'ricky7', '5 A street')
    ;
    select * from dbo.hash_demo1;

    update dbo.hash_demo1
    set address = '70 A street'
    where id =1;

    select * from dbo.hash_demo1;

    The example also shows the hash value changing automatically.

    Bevan

  • Bevan Keighley - Monday, September 24, 2018 4:03 PM

    Hi,

    You will need to set up the HashValue column to use two parameters. The first is the algorithm (e.g. SHA1); the second is the unique value to be hashed.  I suggest concatenating all your values with a suitable separator to ensure the same value in an adjacent column is not hashed the same.  I have used "|" in the example:  


    create table dbo.hash_demo1( id int not null, name varchar(25), address varchar(250), HashValue as hashbytes ('SHA1',concat_ws('|',id,name,address)) CONSTRAINT PK_hash_demo1 PRIMARY KEY (Id));

    insert dbo.hash_demo1( id , name , address)
    values (1, 'bob', '123 A street'),
        (2, 'mark', '564 A street'),
        (3, 'ricky', '75 A street'),
        (4, 'ricky7', '5 A street')
    ;
    select * from dbo.hash_demo1;

    update dbo.hash_demo1
    set address = '70 A street'
    where id =1;

    select * from dbo.hash_demo1;

    The example also shows the hash value changing automatically.

    Bevan

    Just a quick observation... According to BOL, CONCAT_WS is only available in SQL Server 2017 and Azure... If the OP is using 2016 they will need to use CONCAT or +...

  • I suggest that the calculated column is persisted since the function is deterministic.
    😎

    Example for both concatenation functions

    -- SQL SERVER 2017 OR LATER
    CREATE TABLE DBO.HASH_DEMO_2017
    (
      ID    INT     NOT NULL  CONSTRAINT PK_DBO_HASH_DEMO_2017 PRIMARY KEY CLUSTERED
    , [NAME]  VARCHAR(25)  NOT NULL
    , [ADDRESS] VARCHAR(250)  NOT NULL
    , HASHVALUE AS HASHBYTES ('SHA1',CONCAT_WS('|',ID,NAME,ADDRESS)) PERSISTED
    )
    ;
    -- SQL SERVER 2012 OR LATER
    CREATE TABLE DBO.HASH_DEMO_2012
    (
      ID    INT     NOT NULL  CONSTRAINT PK_DBO_HASH_DEMO_2012 PRIMARY KEY CLUSTERED
    , [NAME]  VARCHAR(25)  NOT NULL
    , [ADDRESS] VARCHAR(250)  NOT NULL
    , HASHVALUE AS HASHBYTES ('SHA1',CONCAT([ID],'|',[NAME],'|',[ADDRESS])) PERSISTED
    )
    ;

  • Bevan Keighley - Monday, September 24, 2018 4:03 PM

    Hi,

    You will need to set up the HashValue column to use two parameters. The first is the algorithm (e.g. SHA1); the second is the unique value to be hashed.  I suggest concatenating all your values with a suitable separator to ensure the same value in an adjacent column is not hashed the same.  I have used "|" in the example:  


    create table dbo.hash_demo1( id int not null, name varchar(25), address varchar(250), HashValue as hashbytes ('SHA1',concat_ws('|',id,name,address)) CONSTRAINT PK_hash_demo1 PRIMARY KEY (Id));

    insert dbo.hash_demo1( id , name , address)
    values (1, 'bob', '123 A street'),
        (2, 'mark', '564 A street'),
        (3, 'ricky', '75 A street'),
        (4, 'ricky7', '5 A street')
    ;
    select * from dbo.hash_demo1;

    update dbo.hash_demo1
    set address = '70 A street'
    where id =1;

    select * from dbo.hash_demo1;

    The example also shows the hash value changing automatically.

    Bevan

    Thanks, Bevan. it worked. But had 1 doubt, when you say concatenating all your values with a suitable separator to ensure the same value in an adjacent column is not hashed the same.  I have used "|" in the example:  
    Does that mean if in case separator is not added and if two columns are having the same value won't hash value be different? I mean next time if existing record is updated which has let's say hashvalue as 1001 after update 2 of the columns have changed and have same value won't hashvalue be something like 1004

  • vikasjagadale8 - Tuesday, September 25, 2018 12:40 AM

    Bevan Keighley - Monday, September 24, 2018 4:03 PM

    Hi,

    You will need to set up the HashValue column to use two parameters. The first is the algorithm (e.g. SHA1); the second is the unique value to be hashed.  I suggest concatenating all your values with a suitable separator to ensure the same value in an adjacent column is not hashed the same.  I have used "|" in the example:  


    create table dbo.hash_demo1( id int not null, name varchar(25), address varchar(250), HashValue as hashbytes ('SHA1',concat_ws('|',id,name,address)) CONSTRAINT PK_hash_demo1 PRIMARY KEY (Id));

    insert dbo.hash_demo1( id , name , address)
    values (1, 'bob', '123 A street'),
        (2, 'mark', '564 A street'),
        (3, 'ricky', '75 A street'),
        (4, 'ricky7', '5 A street')
    ;
    select * from dbo.hash_demo1;

    update dbo.hash_demo1
    set address = '70 A street'
    where id =1;

    select * from dbo.hash_demo1;

    The example also shows the hash value changing automatically.

    Bevan

    Thanks, Bevan. it worked. But had 1 doubt, when you say concatenating all your values with a suitable separator to ensure the same value in an adjacent column is not hashed the same.  I have used "|" in the example:  
    Does that mean if in case separator is not added and if two columns are having the same value won't hash value be different? I mean next time if existing record is updated which has let's say hashvalue as 1001 after update 2 of the columns have changed and have same value won't hashvalue be something like 1004

    That's not how hashing works... It's not a "change counter". A hash value is a completely unpredictable value that is based solely on the data being hashed.
    So, lets say that you're hashing with SHA1 and the original string is "Hello World."... the Hash value is going to be 0xB701146CF2C1262A6385C8B1FB1DB98F05820499.
    If you change the period to a question mark (Hello World?) the new hash value is going to be 0x19B2E855874FDA4939883FBDE1714A77C3BBF0B8.
    If you then change it back to the original value, the has value will go back to it's original value as well.
    See below...
    DECLARE @string VARCHAR(15) = 'Hello World.';
    SELECT @string, HASHBYTES('SHA1', @string);
    SET @string = REPLACE(@string, '.', '?');
    SELECT @string, HASHBYTES('SHA1', @string);
    SET @string = REPLACE(@string, '?', '.');
    SELECT @string, HASHBYTES('SHA1', @string);

  • Jason A. Long - Tuesday, September 25, 2018 7:59 AM

    vikasjagadale8 - Tuesday, September 25, 2018 12:40 AM

    Bevan Keighley - Monday, September 24, 2018 4:03 PM

    Hi,

    You will need to set up the HashValue column to use two parameters. The first is the algorithm (e.g. SHA1); the second is the unique value to be hashed.  I suggest concatenating all your values with a suitable separator to ensure the same value in an adjacent column is not hashed the same.  I have used "|" in the example:  


    create table dbo.hash_demo1( id int not null, name varchar(25), address varchar(250), HashValue as hashbytes ('SHA1',concat_ws('|',id,name,address)) CONSTRAINT PK_hash_demo1 PRIMARY KEY (Id));

    insert dbo.hash_demo1( id , name , address)
    values (1, 'bob', '123 A street'),
        (2, 'mark', '564 A street'),
        (3, 'ricky', '75 A street'),
        (4, 'ricky7', '5 A street')
    ;
    select * from dbo.hash_demo1;

    update dbo.hash_demo1
    set address = '70 A street'
    where id =1;

    select * from dbo.hash_demo1;

    The example also shows the hash value changing automatically.

    Bevan

    Thanks, Bevan. it worked. But had 1 doubt, when you say concatenating all your values with a suitable separator to ensure the same value in an adjacent column is not hashed the same.  I have used "|" in the example:  
    Does that mean if in case separator is not added and if two columns are having the same value won't hash value be different? I mean next time if existing record is updated which has let's say hashvalue as 1001 after update 2 of the columns have changed and have same value won't hashvalue be something like 1004

    That's not how hashing works... It's not a "change counter". A hash value is a completely unpredictable value that is based solely on the data being hashed.
    So, lets say that you're hashing with SHA1 and the original string is "Hello World."... the Hash value is going to be 0xB701146CF2C1262A6385C8B1FB1DB98F05820499.
    If you change the period to a question mark (Hello World?) the new hash value is going to be 0x19B2E855874FDA4939883FBDE1714A77C3BBF0B8.
    If you then change it back to the original value, the has value will go back to it's original value as well.
    See below...
    DECLARE @string VARCHAR(15) = 'Hello World.';
    SELECT @string, HASHBYTES('SHA1', @string);
    SET @string = REPLACE(@string, '.', '?');
    SELECT @string, HASHBYTES('SHA1', @string);
    SET @string = REPLACE(@string, '?', '.');
    SELECT @string, HASHBYTES('SHA1', @string);

    Further on Jason's point, since the values are concatenated, implicitly converted to a string and divided by a delimiter, any changes in any column will introduce a change in the hashed value on a row level. Individual column values are therefore only contributing to the row hash value, cannot be individually identified as the granularity of the hash product is a row, not a column.
    😎 

    Again, I strongly recommend that you persist the calculated column, allows you to index it and greatly improve the performance of any comparative operations.

  • And stop using SHA1.  It has been deprecated for sometime now and your code will fail sometime in the not so distant future.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Tuesday, September 25, 2018 8:48 AM

    And stop using SHA1.  It has been deprecated for sometime now and your code will fail sometime in the not so distant future.

    Or extremely distant future.  Never know with Microsoft.  Look how long text, ntext, and image have been deprecated.

    Also, I would say it actually depends on how you are using the hash algorithms which you should use or not, as well as what may be supported on the various systems you are supporting.  There are people out there still using SQL Server 2000 and SQL Server 2005 just as examples.

  • Eirikur Eiriksson - Tuesday, September 25, 2018 12:24 AM

    I suggest that the calculated column is persisted since the function is deterministic.
    😎

    Example for both concatenation functions

    -- SQL SERVER 2017 OR LATER
    CREATE TABLE DBO.HASH_DEMO_2017
    (
      ID    INT     NOT NULL  CONSTRAINT PK_DBO_HASH_DEMO_2017 PRIMARY KEY CLUSTERED
    , [NAME]  VARCHAR(25)  NOT NULL
    , [ADDRESS] VARCHAR(250)  NOT NULL
    , HASHVALUE AS HASHBYTES ('SHA1',CONCAT_WS('|',ID,NAME,ADDRESS)) PERSISTED
    )
    ;
    -- SQL SERVER 2012 OR LATER
    CREATE TABLE DBO.HASH_DEMO_2012
    (
      ID    INT     NOT NULL  CONSTRAINT PK_DBO_HASH_DEMO_2012 PRIMARY KEY CLUSTERED
    , [NAME]  VARCHAR(25)  NOT NULL
    , [ADDRESS] VARCHAR(250)  NOT NULL
    , HASHVALUE AS HASHBYTES ('SHA1',CONCAT([ID],'|',[NAME],'|',[ADDRESS])) PERSISTED
    )
    ;

    Actually, iirc, a deterministic computed column can be indexed with out being persisted.  Also, I think that has to be SQL Server 2016 or newer.  Seem to recall a QotD on this a while back.

    Then again, I could be wrong.

  • Lynn Pettis - Tuesday, September 25, 2018 9:10 AM

    Jeff Moden - Tuesday, September 25, 2018 8:48 AM

    And stop using SHA1.  It has been deprecated for sometime now and your code will fail sometime in the not so distant future.

    Or extremely distant future.  Never know with Microsoft.  Look how long text, ntext, and image have been deprecated.

    Also, I would say it actually depends on how you are using the hash algorithms which you should use or not, as well as what may be supported on the various systems you are supporting.  There are people out there still using SQL Server 2000 and SQL Server 2005 just as examples.

    I've got to go with Lynn on this one... Application context is an important consideration. If you're hashing for security purposes, then you need to salt the original value and use SHA_512. 
    If all you're doing is a bit of change detection, then salting is pointless and SHA_512 is massive overkill.
    Given that the different algorithms have different storage & computational demands, The need to avoid collisions should to be balanced against storage & performance demands.

    Jeff,
    I know that SHA1 has depricated for password hashing & certificate signing (or anything else related to security) but is there any documentation that it has been depricated as a legitimate parameter value in the HASHBYTES function?
    I ask because I don't see any prohibitions or warning about it on BOL HASHBYTES (Transact-SQL)

  • Eirikur Eiriksson - Tuesday, September 25, 2018 8:38 AM

    Further on Jason's point, since the values are concatenated, implicitly converted to a string and divided by a delimiter, any changes in any column will introduce a change in the hashed value on a row level. Individual column values are therefore only contributing to the row hash value, cannot be individually identified as the granularity of the hash product is a row, not a column.
    😎 

    Again, I strongly recommend that you persist the calculated column, allows you to index it and greatly improve the performance of any comparative operations.

    Thanks for detail explanation. Making a computed column PERSISTED makes sense. But after I made it persisted it gave below error while inserting records in the table

    INSERT failed because the following SET options have incorrect settings: 'ANSI_WARNINGS'. Verify that SET options are correct for use with indexed views and/or indexes on computed columns and/or filtered indexes and/or query notifications and/or XML data type methods and/or spatial index operations.

    I then did some google around this and found out that before creating the table if I set below properties ON will solve the issue. But it didn't work 

    SET ANSI_NULLS ON
    SET ANSI_PADDING ON
    SET ANSI_WARNINGS ON
    SET ARITHABORT ON
    SET CONCAT_NULL_YIELDS_NULL ON
    SET NUMERIC_ROUNDABORT ON
    SET QUOTED_IDENTIFIER ON

    create table dbo.hashbytes_demo
    (
      id char(32) not null,
      name varchar(25),
      address varchar(250),
       Systemmodstamp datetime2,
     HashValue as Hashbytes('MD5', CONCAT('|',id, name, address,Systemmodstamp)),
      CONSTRAINT PK_hashbytes_demo PRIMARY KEY (Id)
    )

    Now the error is: CREATE TABLE failed because the following SET options have incorrect settings: 'NUMERIC_ROUNDABORT'. Verify that SET options are correct for use with indexed views and/or indexes on computed columns and/or filtered indexes and/or query notifications and/or XML data type methods and/or spatial index operations.

    What am I missing here?

  • Lynn Pettis - Tuesday, September 25, 2018 9:28 AM

    Actually, iirc, a deterministic computed column can be indexed with out being persisted.  Also, I think that has to be SQL Server 2016 or newer.  Seem to recall a QotD on this a while back.

    Then again, I could be wrong.

    Thanks for the example, I am using SQL Server 2014

  • vikasjagadale8 - Wednesday, September 26, 2018 10:07 AM

    Thanks for detail explanation. Making a computed column PERSISTED makes sense. But after I made it persisted it gave below error while inserting records in the table

    INSERT failed because the following SET options have incorrect settings: 'ANSI_WARNINGS'. Verify that SET options are correct for use with indexed views and/or indexes on computed columns and/or filtered indexes and/or query notifications and/or XML data type methods and/or spatial index operations.

    I then did some google around this and found out that before creating the table if I set below properties ON will solve the issue. But it didn't work 

    SET ANSI_NULLS ON
    SET ANSI_PADDING ON
    SET ANSI_WARNINGS ON
    SET ARITHABORT ON
    SET CONCAT_NULL_YIELDS_NULL ON
    SET NUMERIC_ROUNDABORT ON
    SET QUOTED_IDENTIFIER ON

    create table dbo.hashbytes_demo
    (
      id char(32) not null,
      name varchar(25),
      address varchar(250),
       Systemmodstamp datetime2,
     HashValue as Hashbytes('MD5', CONCAT('|',id, name, address,Systemmodstamp)),
      CONSTRAINT PK_hashbytes_demo PRIMARY KEY (Id)
    )

    Now the error is: CREATE TABLE failed because the following SET options have incorrect settings: 'NUMERIC_ROUNDABORT'. Verify that SET options are correct for use with indexed views and/or indexes on computed columns and/or filtered indexes and/or query notifications and/or XML data type methods and/or spatial index operations.

    What am I missing here?

    Check this out: https://docs.microsoft.com/en-us/sql/relational-databases/views/create-indexed-views?view=sql-server-2017

  • Jason A. Long - Tuesday, September 25, 2018 9:41 AM

    I've got to go with Lynn on this one... Application context is an important consideration. If you're hashing for security purposes, then you need to salt the original value and use SHA_512. 
    If all you're doing is a bit of change detection, then salting is pointless and SHA_512 is massive overkill.
    Given that the different algorithms have different storage & computational demands, The need to avoid collisions should to be balanced against storage & performance demands.

    Jeff,
    I know that SHA1 has depricated for password hashing & certificate signing (or anything else related to security) but is there any documentation that it has been depricated as a legitimate parameter value in the HASHBYTES function?
    I ask because I don't see any prohibitions or warning about it on BOL HASHBYTES (Transact-SQL)

    I don't believe there's any issue with SHA1 in change detection. For security, it's cryptologically insecure.

Viewing 15 posts - 1 through 15 (of 30 total)

You must be logged in to reply to this topic. Login to reply