• Hmm, my curiousity is piqued... The solution presented is quite interesting, but I believe the purported interest in bits and bytes is a little exaggerated, as the solution is still quite expensive (rewrites an entire column in a single update statement, and rewrites/deletes the bulk of the data if there are many duplicates - needing up to almost double the original storage space to complete). So I'm quite curious:

    What would the most efficient way of addressing this problem be? (using least memory and / or using least I/O, especially on a very large data set; also consider whether the data can still be accessed during the update process)

    This should be quite easy to test, given a few hours to create a large (GB-sized?) data set and try out a few different approaches...

    It seems a problem that many more people are likely to face nowadays - powerful hardware, very robust systems that can remain online as you do things like add columns, but extremely large amounts of data (many gigabytes) and no maintenance time / downtime available!

    http://poorsql.com for T-SQL formatting: free as in speech, free as in beer, free to run in SSMS or on your version control server - free however you want it.