Duplicates in import file

  • I'm trying to use Bulk Insert or the DTS import wizard to import 7 million records into a new table.  The table has a primary key, consisting of five fields.  The text file with the 7 million records has thousands of duplicates.  I'm so use to Access just warning me about the number of rejected duplicate records and then letting me continue.  But in SQL Server the entire import fails because of a duplicate.  How can I get around this?

  • One way:  Bring all the records into a "staging" area and then apply the business rules from there...

     



    Good Hunting!

    AJ Ahrens


    webmaster@kritter.net

  • What would that mean?  What would a staging area be and how do you apply business rules?  Sorry for the stupid questions.

  • Upload the table to a temporary location into a table with no contraints or unique indexes. These are the business rules. Then weed out the duplicates, might use select distinct, or some kind of grouping, etc. Then take the data into your final table where your constraints are.



    Michelle

  • That would work for the first time.  But then I will be weekly appending new data to the table and there may be matches between what's already in the table and what I'm trying to add.  Is there a way to kick out duplicates as I'm appending/importing? 

  • If you perform a truncate on the temporary table every week before you perform the import (assuming that you do not wish to keep a history of what has been imported and when) and then proceed as described by mimorr, this still works. Your INSERT statement needs to be dependent on the data that has already been imported:

    insert into [main_table] ([fieldlist])

    select [fieldlist] from [import_table]

    where [import_table].[primary_key] not in

    (select [primary_key] from [main_table])

    Phil


  • Thank you. 

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply