RE: Deleting Duplicate Records

Newbie

Points: 5

March 10, 2006 at 3:18 am

My two cents about database design and avoiding duplicates - if you have unstructured, flat data, let's say monitoring system data about 20 years worth, in various excel, text and other forms of (electronic) data storage, with about 27 milion lines of data, where the only identifier might be four out of 5 columns (timestamp, status1, status2, measurandID), the only solution for importing you have is to use DTS... And because you have to import the data as fast as possible, you might want to avoid having any kind of indexing in place, let alone constraint checks. So you end up with about 10% duplicates or even more, but you would not care too much, except that your database blows up to 16 GB, after you finish indexing in order to be able to use the data at all.