Difference in checksum calculations.

Question

Difference in checksum calculations.

Siva Ramasamy

SSCarpal Tunnel

Points: 4026
More actions
March 13, 2019 at 11:33 am

#383126

All,
Thank you for looking into my question. I am facing an issue with difference in checksum calculations for the same data between an on-prem SQL Server 2016 and the Azure version.
Here is the scenario.
I have an on-prem SQL 2016 database and a SQL Azure database. The schema is exactly the same between these two databases. I updated the Azure database using the SQL 2016 as the source.
After updating, I was expecting the checksum calculation between the tables for the updated rows to be the same, but it is not.
Has anyone seen this issue before?
the source and destination has numeric and char datatypes only.

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply

sgmunson SSC Guru Points: 110651 More actions · Answer 1

You need to anally check to be sure that your method of calculation of CHECKSUM is 100% identical in both environments, and that the data is 100% identical as well. Please post both queries that do the CHECKSUM in each environment, and maybe someone here will see something that you didn't happen to see. All of us occasionally need a 2nd set of eyes on something.

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

Siva Ramasamy SSCarpal Tunnel Points: 4026 More actions · Answer 2

Thanks for the response.

Please find the query. I copied and pasted the same query in both the instances.

SELECT

CONVERT(VARBINARY(32),HASHBYTES('SHA2_256',CONCAT([AGENCY_NUM],LTRIM(RTRIM([CARRIER_CODE])),[TS_D1],[EXT_FLAG],[FM_USERID],[TS_D2],[TOTAL_NUM_TXNS],[NUM_TXN],LTRIM(RTRIM([TXNS]))
))) AS Checksum

This is the table definition. It is also same between source and destination.
   [AGENCY_NUM] [numeric](10, 0) NOT NULL,
    [CARRIER_CODE] [char](4) NOT NULL,
    [TS_D1] [numeric](19, 0) NOT NULL,
    [EXT_FLAG] [numeric](5, 0) NOT NULL,
    [FM_USERID] [numeric](5, 0) NULL,
    [TS_D2] [numeric](19, 0) NULL,
    [TOTAL_NUM_TXNS] [numeric](5, 0) NULL,
    [NUM_TXN] [numeric](5, 0) NULL,
    [TXNS] [varchar](4000) NULL,

sgmunson SSC Guru Points: 110651 More actions · Answer 3

Are you sure that the results of HASHBYTES will never exceed VARBINARY(32) ? I've never used that function, so I have no idea what it does or how it works, beyond the obvious that it computes a hash value.

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

sgmunson SSC Guru Points: 110651 More actions · Answer 4

Now that I've looked that up, I'm thinking that your varchar(4000) column might be a pain point. Any row with a couple thousand bytes in it is trouble for hashing without going to a larger hash value. I'd be concerned about a higher shot at duplicate hashes at those kinds of lengths. Have you tried using SHA2_512 and VARBINARY(64) ?

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

Siva Ramasamy SSCarpal Tunnel Points: 4026 More actions · Answer 5

Yeah. That column with 4000 bytes is the issue. If I take that out, the checksum value matches between the instances. I even tried the following, but still no luck.

CONVERT(VARBINARY(max),HASHBYTES('SHA2_512',CONCAT([AGENCY_NUM],LTRIM(RTRIM([CARRIER_CODE])),[TS_D1],[EXT_FLAG],[FM_USERID],[TS_D2],[TOTAL_NUM_TXNS],[NUM_TXN],LTRIM(RTRIM([TXNS]))
))) AS Checksum

sgmunson SSC Guru Points: 110651 More actions · Answer 6

Honestly, a checksum value of only 64 bits for 4,000 bytes just isn't realistic or practical. What is the business case for the checksum value?

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

Siva Ramasamy SSCarpal Tunnel Points: 4026 More actions · Answer 7

I completely agree with you. My business scenario demands it. The destination gets updated/inserted every night to keep historical data. I have to update the destination if the checksum values are different between the records.

frederico_fonseca SSCoach Points: 16277 More actions · Answer 8

hum.. I've used this even with higher than 4k and never an issue.
from docs - For SQL Server 2014 (12.x) and earlier, allowed input values are limited to 8000 bytes.
otherwise there is no limit.

if the hash is different then something on the data is different even if it looks the same.

I would convert the individual columns to hex and compare those manually to see if there are differences.
then do the same to the concat output.

sgmunson SSC Guru Points: 110651 More actions · Answer 9

Siva Ramasamy - Thursday, March 14, 2019 9:28 AM
I completely agree with you. My business scenario demands it. The destination gets updated/inserted every night to keep historical data. I have to update the destination if the checksum values are different between the records.

Hmmm... wondering if you might have to use CHECKSUM solely on the other columns, and go with direct comparison on the varchar(4000) column. Definitely a huge PITA and considerable resource hog, but what other choice do you have?

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

ScottPletcher SSC Guru Points: 101205 More actions · Answer 10

Maybe two separate hashes: (1) all the other columns and (2) the 4K column alone?

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Siva Ramasamy SSCarpal Tunnel Points: 4026 More actions · Answer 11

Thank you. That did not work either.

I went for a different approach. Started using change tracking instead of checksum.

THanks

Eirikur Eiriksson SSC Guru Points: 182951 More actions · Answer 12

Siva Ramasamy - Wednesday, March 13, 2019 11:33 AM
All,
Thank you for looking into my question. I am facing an issue with difference in checksum calculations for the same data between an on-prem SQL Server 2016 and the Azure version.
Here is the scenario.
I have an on-prem SQL 2016 database and a SQL Azure database. The schema is exactly the same between these two databases. I updated the Azure database using the SQL 2016 as the source.
After updating, I was expecting the checksum calculation between the tables for the updated rows to be the same, but it is not.
Has anyone seen this issue before?
the source and destination has numeric and char datatypes only.

Can you post the DDL and DML for both sources, the hash statement and some values where you are seeing different results please?
😎
If you are correct, this is a bug!

Jeff Moden SSC Guru Points: 1004664 More actions · Answer 13

Eirikur Eiriksson - Saturday, March 16, 2019 8:10 AM
Siva Ramasamy - Wednesday, March 13, 2019 11:33 AM
All,
Thank you for looking into my question. I am facing an issue with difference in checksum calculations for the same data between an on-prem SQL Server 2016 and the Azure version.
Here is the scenario.
I have an on-prem SQL 2016 database and a SQL Azure database. The schema is exactly the same between these two databases. I updated the Azure database using the SQL 2016 as the source.
After updating, I was expecting the checksum calculation between the tables for the updated rows to be the same, but it is not.
Has anyone seen this issue before?
the source and destination has numeric and char datatypes only.
Can you post the DDL and DML for both sources, the hash statement and some values where you are seeing different results please?
😎
If you are correct, this is a bug!

Maybe not... we don't actually know that the OP "updated" the data correctly. The difference in the HASHBYTES might be screaming that's true.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

frederico_fonseca SSCoach Points: 16277 More actions · Answer 14

Jeff Moden - Saturday, March 16, 2019 10:42 AM
Eirikur Eiriksson - Saturday, March 16, 2019 8:10 AM
Siva Ramasamy - Wednesday, March 13, 2019 11:33 AM
All,
Thank you for looking into my question. I am facing an issue with difference in checksum calculations for the same data between an on-prem SQL Server 2016 and the Azure version.
Here is the scenario.
I have an on-prem SQL 2016 database and a SQL Azure database. The schema is exactly the same between these two databases. I updated the Azure database using the SQL 2016 as the source.
After updating, I was expecting the checksum calculation between the tables for the updated rows to be the same, but it is not.
Has anyone seen this issue before?
the source and destination has numeric and char datatypes only.
Can you post the DDL and DML for both sources, the hash statement and some values where you are seeing different results please?
😎
If you are correct, this is a bug!
Maybe not... we don't actually know that the OP "updated" the data correctly. The difference in the HASHBYTES might be screaming that's true.

yeah.. I did ask him to confirm by standard means

I would convert the individual columns to hex and compare those manually to see if there are differences.
then do the same to the concat output.