Best Way to avoid duplicate rows insertion

  • Hi,

    I am using sql server 2008 enterprise edition. I need to insert large number of rows to my table frequently. I have a primary key column in my table that has the hash value by computing 3 columns of the row. if i use bcp process it terminates the process because of duplicate rows in my data. What other best method to insert large datas to table without dublication. It should be as faster as bcp process.

    Thanks in advance.

  • WITH CTE (

    [ColumnName]

    ,[DuplicateCount]

    )

    AS(

    SELECT

    [ColumnName]

    ,[DuplicateCount]

    ,ROW_NUMBER() OVER(PARTITION BY [ColumnName]

    ORDER BY [ColumnName]

    AS DuplicateCount

    FROM sample_table

    )

    DELETE

    FROMCTE

    WHERE (DuplicateCount > 1)

    I think this an efficient way.

  • But I dont want to delete the records after insertion. Instead I want to skip the duplicate rows while inserting large number of rows.

  • If the data is already in the database then you should be able to do

    WITH CTE

    AS

    (

    Select

    ROW_NUMBER() OVER(Partition by UniqueColumn ORDER BY UniqueColumn) rn

    ,<columnlist>

    From myTable

    )

    Insert into BULK Desttable

    Select <colunlist>

    from CTE

    where Rn=1

    If its not then you may have to preload it.

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • CTE can be modified to select non duplicate values .

    Please Refer,

    http://msdn.microsoft.com/en-us/library/ms190766(v=sql.105).aspx

  • saravanakumar-485245 (12/13/2012)


    But I dont want to delete the records after insertion. Instead I want to skip the duplicate rows while inserting large number of rows.

    You could change your PK to "Ignore Dupes".

    The real key is that I would never ever import data directly to its final resting spot. I ALWAYS use a staging table because I ALWAYS validate the data first. If there are dupes present, it's good to know what they are and how many there are. You might also find out that what you're calling a duplicate, really isn't and that you may have selected an incorrect set of columns as the PK.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply