• You can use a checksum to narrow down the search. It can make a good leading-edge for what would otherwise be a multi-column index.

    Checksum isn't exact, but it can reduce the number of rows that need to be compared, which can reduce I/O on large data sets.

    What you do is work out what columns you are most likely to need to compare (might be all of them, might be a subset), and generate checksums for them. Persist and index those. Then, join the source table and target table on the checksum columns, as well as other criteria.

    I've done speed tests with this in very large tables (one was half a petabyte for one table with hundreds of millions of rows), and it can result in much faster returns and much less I/O, when used on multi-column seeks/compares.

    That's the general idea. Are you looking for specifics? If so, please clarify what specifics you need help on.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon