Difference in checksum calculations.

  • All,
    Thank you for looking into my question. I am facing an issue with difference in checksum calculations for the same data between an on-prem SQL Server 2016 and the Azure version. 

    Here is the scenario. 

    I have an on-prem SQL 2016 database and a SQL Azure database.  The schema is exactly the same between these two databases.  I updated the Azure database using the SQL 2016 as the source.  

    After updating, I was expecting the checksum calculation between the tables for the updated rows to be the same, but it is not.  

    Has anyone seen this issue before? 

    the source and destination has numeric and char datatypes only.

  • You need to anally check to be sure that your method of calculation of CHECKSUM is 100% identical in both environments, and that the data is 100% identical as well.   Please post both queries that do the CHECKSUM in each environment, and maybe someone here will see something that you didn't happen to see.  All of us occasionally need a 2nd set of eyes on something.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Thanks for the response. 

    Please find the query. I copied and pasted the same query in both the instances. 

    SELECT

    CONVERT(VARBINARY(32),HASHBYTES('SHA2_256',CONCAT([AGENCY_NUM],LTRIM(RTRIM([CARRIER_CODE])),[TS_D1],[EXT_FLAG],[FM_USERID],[TS_D2],[TOTAL_NUM_TXNS],[NUM_TXN],LTRIM(RTRIM([TXNS]))
    ))) AS Checksum

    This is the table definition. It is also same between source and destination.
       [AGENCY_NUM] [numeric](10, 0) NOT NULL,
        [CARRIER_CODE] [char](4) NOT NULL,
        [TS_D1] [numeric](19, 0) NOT NULL,
        [EXT_FLAG] [numeric](5, 0) NOT NULL,
        [FM_USERID] [numeric](5, 0) NULL,
        [TS_D2] [numeric](19, 0) NULL,
        [TOTAL_NUM_TXNS] [numeric](5, 0) NULL,
        [NUM_TXN] [numeric](5, 0) NULL,
        [TXNS] [varchar](4000) NULL,

  • Are you sure that the results of HASHBYTES will never exceed VARBINARY(32) ?   I've never used that function, so I have no idea what it does or how it works, beyond the obvious that it computes a hash value.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Now that I've looked that up, I'm thinking that your varchar(4000) column might be a pain point.   Any row with a couple thousand bytes in it is trouble for hashing without going to a larger hash value.   I'd be concerned about a higher shot at duplicate hashes at those kinds of lengths.   Have you tried using SHA2_512 and VARBINARY(64) ?

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Yeah. That column with 4000 bytes is the issue.  If I take that out, the checksum value matches between the instances.  I even tried the following, but still no luck. 

    CONVERT(VARBINARY(max),HASHBYTES('SHA2_512',CONCAT([AGENCY_NUM],LTRIM(RTRIM([CARRIER_CODE])),[TS_D1],[EXT_FLAG],[FM_USERID],[TS_D2],[TOTAL_NUM_TXNS],[NUM_TXN],LTRIM(RTRIM([TXNS]))
    ))) AS Checksum

  • Honestly, a checksum value of only 64 bits for 4,000 bytes just isn't realistic or practical.   What is the business case for the checksum value?

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • I completely agree with you. My business scenario demands it.  The destination gets updated/inserted every night to keep historical data. I have to update the destination if the checksum values are different between the records.

  • hum.. I've used this even with higher than 4k and never an issue.
    from docs - For SQL Server 2014 (12.x) and earlier, allowed input values are limited to 8000 bytes.
    otherwise there is no limit.

    if the hash is different then something on the data is different even if it looks the same.

    I would convert the individual columns to hex and compare those manually to see if there are differences.
    then do the same to the concat output.

  • Siva Ramasamy - Thursday, March 14, 2019 9:28 AM

    I completely agree with you. My business scenario demands it.  The destination gets updated/inserted every night to keep historical data. I have to update the destination if the checksum values are different between the records.

    Hmmm...   wondering if you might have to use CHECKSUM solely on the other columns, and go with direct comparison on the varchar(4000) column.   Definitely a huge PITA and considerable resource hog, but what other choice do you have?

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Maybe two separate hashes: (1) all the other columns and (2) the 4K column alone?

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Thank you. That did not work either. 

    I went for a different approach. Started using change tracking instead of checksum.

    THanks

  • Siva Ramasamy - Wednesday, March 13, 2019 11:33 AM

    All,
    Thank you for looking into my question. I am facing an issue with difference in checksum calculations for the same data between an on-prem SQL Server 2016 and the Azure version. 

    Here is the scenario. 

    I have an on-prem SQL 2016 database and a SQL Azure database.  The schema is exactly the same between these two databases.  I updated the Azure database using the SQL 2016 as the source.  

    After updating, I was expecting the checksum calculation between the tables for the updated rows to be the same, but it is not.  

    Has anyone seen this issue before? 

    the source and destination has numeric and char datatypes only.

    Can you post the DDL and DML for both sources, the hash statement and some values where you are seeing different results please?
    😎
    If you are correct, this is a bug!

  • Eirikur Eiriksson - Saturday, March 16, 2019 8:10 AM

    Siva Ramasamy - Wednesday, March 13, 2019 11:33 AM

    All,
    Thank you for looking into my question. I am facing an issue with difference in checksum calculations for the same data between an on-prem SQL Server 2016 and the Azure version. 

    Here is the scenario. 

    I have an on-prem SQL 2016 database and a SQL Azure database.  The schema is exactly the same between these two databases.  I updated the Azure database using the SQL 2016 as the source.  

    After updating, I was expecting the checksum calculation between the tables for the updated rows to be the same, but it is not.  

    Has anyone seen this issue before? 

    the source and destination has numeric and char datatypes only.

    Can you post the DDL and DML for both sources, the hash statement and some values where you are seeing different results please?
    😎
    If you are correct, this is a bug!

    Maybe not... we don't actually know that the OP "updated" the data correctly.  The difference in the HASHBYTES might be screaming that's true.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Saturday, March 16, 2019 10:42 AM

    Eirikur Eiriksson - Saturday, March 16, 2019 8:10 AM

    Siva Ramasamy - Wednesday, March 13, 2019 11:33 AM

    All,
    Thank you for looking into my question. I am facing an issue with difference in checksum calculations for the same data between an on-prem SQL Server 2016 and the Azure version. 

    Here is the scenario. 

    I have an on-prem SQL 2016 database and a SQL Azure database.  The schema is exactly the same between these two databases.  I updated the Azure database using the SQL 2016 as the source.  

    After updating, I was expecting the checksum calculation between the tables for the updated rows to be the same, but it is not.  

    Has anyone seen this issue before? 

    the source and destination has numeric and char datatypes only.

    Can you post the DDL and DML for both sources, the hash statement and some values where you are seeing different results please?
    😎
    If you are correct, this is a bug!

    Maybe not... we don't actually know that the OP "updated" the data correctly.  The difference in the HASHBYTES might be screaming that's true.

    yeah.. I did ask him to confirm by standard means

    I would convert the individual columns to hex and compare those manually to see if there are differences.
    then do the same to the concat output. 

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply