Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Roll Your Own Materialized Views Expand / Collapse
Author
Message
Posted Monday, January 25, 2010 7:42 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, September 4, 2012 8:47 AM
Points: 3, Visits: 35
Comments posted to this topic are about the item Roll Your Own Materialized Views
Post #853407
Posted Thursday, February 11, 2010 11:54 AM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Yesterday @ 8:10 AM
Points: 861, Visits: 2,360
I would primarily note that CHECKSUM() returns a 4 byte CRC-32 value, and therefore has a very small number of possible values; a four and a half billion row table (i.e. data warehouse) absolutely guarantees collisions.

Unfortunately, the chance of a collision is much greater than that. For trivial collisions, try:
SELECT CHECKSUM(1,1,1,1)
SELECT CHECKSUM(0,1,1,1,1)
SELECT CHECKSUM(0,0,1,1,1,1)

I'd recommend changing this to actually do a full field by field comparison (noting which fields are NOT NULL and which allow NULL values). If you truly insist on a hash, your best, still fairly simple, bet is probably to use SHA512 out of some CLR code. HASHBYTES with SHA1 is built into SQL 2005+, though SHA1 is still not very good... but 160 bits of SHA1 hash is immensely better than 32 bits of CRC-32.

Untested reference for SHA512:
http://sqlblog.com/blogs/michael_coles/archive/2009/04/12/let-s-hash-a-blob.aspx
Post #864222
Posted Thursday, February 11, 2010 12:21 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, September 4, 2012 8:47 AM
Points: 3, Visits: 35
Point well taken, checksum is not very unique. The code should mitigate this, if there are duplicates you may end up deleting records that haven't changed, then adding them back. Little extra work for server, that's ok...
Post #864251
Posted Thursday, February 11, 2010 12:27 PM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Yesterday @ 8:10 AM
Points: 861, Visits: 2,360
As another example of the issue: assume the data actually changes in the base table, but the checksum of the new data is the same as the checksum of the old data.

From here on out, the view will continue to return invalid (old) results until the data changes to a data set that returns a different checksum.

When you have bulk changes on large numbers of rows, large numbers of changes, or very specific collision generating changes (adding leading zeroes in particular, for CHECKSUM()), the code is more likely to fail to notice changes that actually happened.
Post #864252
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse