Heads up on changes to the CHECKSUM function in APDW

  • There's plenty of forum information available on the weakness of the checksum function. I'm currently using it just for testing so whilst this is a pain for me, it's not a huge issue - but for others ... :w00t:

    It looks like something might have changed with Azure Data Warehouse (not sure about other platforms).

    Running this statement in SQL2016 results in the same value, despite the cast. However in APDW different values will be returned - APDW AsChar is different, the other 3 are the same. Thanks Microsoft!!

    select checksum( cast('abc' as char(3))) as [AsChar], checksum( cast('abc' as nchar(3))) as [AsNChar]

    Might cause issues for people just copying data from one platform to another.

    pcd

  • I also tested this in Azure SQL Database. It's showing two different values as well.

    However, I checked in my version of 2016 SP1, and it also showed two different values.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant

    Same value (1132495864) on 2008 R2 (SP2) - 10.50.4042.0 (X64), collation Latin1_General_CI_AS

    Same value (1132495864) on 2014 - 12.0.2269.0 (X64), collation Latin1_General_CI_AS

    I'll post other results when I get to the machines.

    select checksum( cast('abc' as char(3))) as [AsChar], checksum( cast('abc' as nchar(3))) as [AsNChar]

    select @@version

    select SERVERPROPERTY('collation')

    pcd

  • May be a recent change then.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Additional info from different servers.

    Same value (1132495864) on 2016 (SP1) (KB3182545) - 13.0.4001.0 (X64), collation Latin1_General_CI_AS

    Different values (AsChar=34400, AsNChar=1132495864) on Azure SQL Data Warehouse - 10.0.8506.1, collation SQL_Latin1_General_CP1_CI_AS

    Different values (AsChar=34400, AsNChar=1132495864) on SQL Azure (RTM) - 12.0.2000.8, collation SQL_Latin1_General_CP1_CI_AS

    Noticing that the collation was different, for giggles I then ran the following on SQL 2016 (SP1).

    [Code]

    select checksum( cast('abc' collate SQL_Latin1_General_CP1_CI_AS as char(3) )) as [AsChar], checksum( cast('abc' collate SQL_Latin1_General_CP1_CI_AS as nchar(3))) as [AsNChar]

    [/Code]

    Behold, the figure were different!!! (AsChar=34400, AsNChar=1132495864)

    So it appears to be a collation issue and not a change between the Earth and Cloud versions.

    Problem solved.

    pcd

    PS Thank you Grant for your initial response, that gave me the nugget I needed to work out what was happening.

  • Ah, cool. I wouldn't even have thought to check that.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • pcd_au (1/5/2017)


    There's plenty of forum information available on the weakness of the checksum function. I'm currently using it just for testing so whilst this is a pain for me, it's not a huge issue - but for others ... :w00t:

    Getting back to that and as a warning to others that may be reading this thread, CHECKSUM isn't just weak, it's downright poor. It should only be used for quickly converting NEWID() to large random integers or for checking things that aren't the same. If two different items come up with the same checksum, they can still be different and it doesn't take much for it to come up with the same value for things that aren't the same.

    Here's a simple example.

    SELECT CS1 = CHECKSUM('A352KD')

    ,CS2 = CHECKSUM('A352NT')

    ;

    Here's the result from that...

    CS1 CS2

    ----------- -----------

    141500177 141500177

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply