RE: T-SQL Data Processing – SQLServerCentral

SSCarpal Tunnel

Points: 4114

June 23, 2008 at 9:33 am

I'm not a fan of dedup either, and didn't expect a harty round of approbation from this article...

I've found that you should ALWAYS consider your alternatives when it comes to SQL programming. Things frequently don't "work" as we think they "should".

I use this code to dedup databases of over 1 billion (yep, the B word) rows, a quarter of which (~250,000,000) are duplicates of some kind (exact or based on some "fuzzy" logic). One database processes in about 3 days, the other in about 18 hours (one has a larger record size than the other).

This represented a significant reduction in processing times for both these databases over the previous methodology written using a hybrid of external and internal (SQL w/cursors) coding; and its all done in SQL.

I am constantly looking for betterprocessing techniques that perform the required functions AND run quicker than existing procedures. So far, this is it.

As for SSIS, I will look into it, having not used it before.

PeteK
I have CDO. It's like OCD but all the letters are in alphabetical order... as they should be.