Best Way to avoid duplicate rows insertion

Question

Post reply

Best Way to avoid duplicate rows insertion

sarankamal2k12

Old Hand

Points: 322
More actions
December 13, 2012 at 2:56 am

#272032

Hi,
I am using sql server 2008 enterprise edition. I need to insert large number of rows to my table frequently. I have a primary key column in my table that has the hash value by computing 3 columns of the row. if i use bcp process it terminates the process because of duplicate rows in my data. What other best method to insert large datas to table without dublication. It should be as faster as bcp process.
Thanks in advance.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

jeetsingh.cs SSCommitted Points: 1741 More actions · Answer 1

WITH CTE (

[ColumnName]

,[DuplicateCount]

)

AS(

SELECT

[ColumnName]

,[DuplicateCount]

,ROW_NUMBER() OVER(PARTITION BY [ColumnName]

ORDER BY [ColumnName]

AS DuplicateCount

FROM sample_table

)

DELETE

FROMCTE

WHERE (DuplicateCount > 1)

I think this an efficient way.

sarankamal2k12 Old Hand Points: 322 More actions · Answer 2

But I dont want to delete the records after insertion. Instead I want to skip the duplicate rows while inserting large number of rows.

Jason-299789 SSC-Insane Points: 21601 More actions · Answer 3

If the data is already in the database then you should be able to do

WITH CTE

AS

(

Select

ROW_NUMBER() OVER(Partition by UniqueColumn ORDER BY UniqueColumn) rn

,<columnlist>

From myTable

)

Insert into BULK Desttable

Select <colunlist>

from CTE

where Rn=1

If its not then you may have to preload it.

_________________________________________________________________________
SSC Guide to Posting and Best Practices

jeetsingh.cs SSCommitted Points: 1741 More actions · Answer 4

CTE can be modified to select non duplicate values .

Please Refer,

http://msdn.microsoft.com/en-us/library/ms190766(v=sql.105).aspx

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 5

saravanakumar-485245 (12/13/2012)
But I dont want to delete the records after insertion. Instead I want to skip the duplicate rows while inserting large number of rows.

You could change your PK to "Ignore Dupes".

The real key is that I would never ever import data directly to its final resting spot. I ALWAYS use a staging table because I ALWAYS validate the data first. If there are dupes present, it's good to know what they are and how many there are. You might also find out that what you're calling a duplicate, really isn't and that you may have selected an incorrect set of columns as the PK.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)