Tao Klerks (8/5/2008)
Hmm, my curiousity is piqued... The solution presented is quite interesting, but I believe the purported interest in bits and bytes is a little exaggerated, as the solution is still quite expensive (rewrites an entire column in a single update statement, and rewrites/deletes the bulk of the data if there are many duplicates - needing up to almost double the original storage space to complete). So I'm quite curious:What would the most efficient way of addressing this problem be? (using least memory and / or using least I/O, especially on a very large data set; also consider whether the data can still be accessed during the update process)
This should be quite easy to test, given a few hours to create a large (GB-sized?) data set and try out a few different approaches...
It seems a problem that many more people are likely to face nowadays - powerful hardware, very robust systems that can remain online as you do things like add columns, but extremely large amounts of data (many gigabytes) and no maintenance time / downtime available!
--Jeff Moden
Change is inevitable... Change for the better is not.