Duplicates in import file

Question

Post reply

Duplicates in import file

Becky Beavers

Newbie

Points: 4
More actions
December 29, 2004 at 12:54 pm

#163405

I'm trying to use Bulk Insert or the DTS import wizard to import 7 million records into a new table. The table has a primary key, consisting of five fields. The text file with the 7 million records has thousands of duplicates. I'm so use to Access just warning me about the number of rejected duplicate records and then letting me continue. But in SQL Server the entire import fails because of a duplicate. How can I get around this?

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

AJ Ahrens SSC-Insane Points: 20676 More actions · Answer 1

One way: Bring all the records into a "staging" area and then apply the business rules from there...

Good Hunting!

AJ Ahrens

webmaster@kritter.net

Becky Beavers Newbie Points: 4 More actions · Answer 2

What would that mean? What would a staging area be and how do you apply business rules? Sorry for the stupid questions.

mimorr SSCarpal Tunnel Points: 4850 More actions · Answer 3

Upload the table to a temporary location into a table with no contraints or unique indexes. These are the business rules. Then weed out the duplicates, might use select distinct, or some kind of grouping, etc. Then take the data into your final table where your constraints are.

Michelle

Becky Beavers Newbie Points: 4 More actions · Answer 4

That would work for the first time. But then I will be weekly appending new data to the table and there may be matches between what's already in the table and what I'm trying to add. Is there a way to kick out duplicates as I'm appending/importing?

Phil Parkin SSC Guru Points: 247159 More actions · Answer 5

If you perform a truncate on the temporary table every week before you perform the import (assuming that you do not wish to keep a history of what has been imported and when) and then proceed as described by mimorr, this still works. Your INSERT statement needs to be dependent on the data that has already been imported:

insert into [main_table] ([fieldlist])

select [fieldlist] from [import_table]

where [import_table].[primary_key] not in

(select [primary_key] from [main_table])

Phil

Becky Beavers Newbie Points: 4 More actions · Answer 6

Becky Beavers

Newbie

Points: 4

December 29, 2004 at 5:42 pm

#535360

Thank you.

Duplicates in import file

Cookies on SQLServerCentral