inserting a list into a table that does not allow duplicates

  • I have a table where the schema is as follows

    id bigtin identity(1,1)

    sn varchar(50)

    create primary key pk_id on table(id)

    create unique index ui_sn on table(sn)

    now if i use the import/export wizord, i would like to import a text file, but if there is a duplicate, i would like it to ignore it and keep going, In the import wizord, there is even a drop down to continue on error. But it always seems to stop as soon as it finds a duplicate.

    I rather not have to import the text file into a temp table, and then to a insert statment with a left outer join on itself and saying where table2.sn is null (one time i even ran this and it still says there was a duplicate!!! which is freaking impossible!!!) there were no other inserts going on to the table at the time!

    just to clear up my sql statment was

    insert into table1 t1 (select t2 from table2 left outer joing table1 t1 on t1.sn = t2.sn where t1 is null)

    did i do the statment wrong? i mean i ran this before successfully with no problems

    Im sorry for the cross post , i found out later this might be more relevant in this forum

  • i think the statement you pasted is missing which column in t1 is null; was that a copy/paste error?

    it looks a little like pseudo code;

    this is how I would do it:

    insert into t1 (col1,col2,col3...)

    select t2.col1,t2.col2.t2.col3

    from table2 t2

    left outer join table1 t1 on t2.sn = t1.sn

    where t1.sn is null

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • yes it was a copy and past error

  • An alternative method (albeit less satisfactory) would be to define the UNIQUE index with IGNORE_DUP_KEY = ON.

    Another method would be to use WHERE NOT EXISTS or the EXCEPT operator.

    Cheers,

    Paul

  • xgcmcbain (4/5/2009)


    I rather not have to import the text file into a temp table, and then to a insert statment with a left outer join on itself and saying where table2.sn is null (one time i even ran this and it still says there was a duplicate!!! which is freaking impossible!!!) there were no other inserts going on to the table at the time!

    did i do the statment wrong? i mean i ran this before successfully with no problems

    Most probably the described problem was caused by the fact that the offending row was included twice in the file you were importing.

    What you did is not a self join, you joined two different tables to find rows that already exist in the target table. What you did not was to check whether there is a duplicity inside the new data in temporary table.

    IMHO importing the data into a temp table without any constraints, and then moving them into the target table with all necessary checks (for duplicity and other things) is preferred way to do this. In many cases, you need to insert rows that does not exist and update those that already exist - this will allow you to do it easily. Of course, you could also use OPENROWSET or some similar way to open the source file instead of importing it into a temp table, but then the rest including checks will be the same.

  • Vladan (4/6/2009)


    xgcmcbain (4/5/2009)


    I rather not have to import the text file into a temp table, and then to a insert statment with a left outer join on itself and saying where table2.sn is null (one time i even ran this and it still says there was a duplicate!!! which is freaking impossible!!!) there were no other inserts going on to the table at the time!

    did i do the statment wrong? i mean i ran this before successfully with no problems

    Most probably the described problem was caused by the fact that the offending row was included twice in the file you were importing.

    What you did is not a self join, you joined two different tables to find rows that already exist in the target table. What you did not was to check whether there is a duplicity inside the new data in temporary table.

    IMHO importing the data into a temp table without any constraints, and then moving them into the target table with all necessary checks (for duplicity and other things) is preferred way to do this. In many cases, you need to insert rows that does not exist and update those that already exist - this will allow you to do it easily. Of course, you could also use OPENROWSET or some similar way to open the source file instead of importing it into a temp table, but then the rest including checks will be the same.

    im sorry i forgot to mention the list i was importing ,was a distinct list, i was thinking maybe it was a null value or something and made sure that wasnt the case either

  • it just seems stupid to me that i have to write some vb or c# script to loop every record and insert one record at a time. i would think the import/export wizord should have this functionality

  • Hi

    I had the similar requirement, where in i disabled the Unique Index on the column and loaded everything and deleted the duplicate records and enabled index.

    Please let me know gurus if this is the right way or not!!

  • i do it very similarly, Vijaya Kadiyala;

    i always use a staging table, because once it's in a table, it's so easy to analyze...

    BULK INSERT or bcp can be your friend there for big data files;

    ...GROUP BY [ what should be the primary key columns] HAVING count(*) > 1

    identifies problems like this instantly for me, and then it's a simple insert from my staging table.

    if THAT fails, i can compare the staging table and see what went wrong.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Hi Lowell

    thank you for your confirmation on my approach 🙂

  • xgcmcbain (4/6/2009)


    it just seems stupid to me that i have to write some vb or c# script to loop every record and insert one record at a time. i would think the import/export wizord should have this functionality

    I didn't suggest to import individual rows - what I meant was something like Lowell posted. Import the file into temporary table, eliminate any duplicities inside the list using GROUP BY (and check other things if necessary), and then insert into the target table using left join with IS NULL.

    This allows you to decide how you wish to handle situations when imported data have some problem, e.g. insert all rows with some error into another table to be processed manually, aggregate multiple rows with the same primary key into one row (or ignore older of the duplicate rows or whatever it is you need). No loops, standard set-based processing.

  • Vladan (4/7/2009)


    xgcmcbain (4/6/2009)


    it just seems stupid to me that i have to write some vb or c# script to loop every record and insert one record at a time. i would think the import/export wizord should have this functionality

    I didn't suggest to import individual rows - what I meant was something like Lowell posted. Import the file into temporary table, eliminate any duplicities inside the list using GROUP BY (and check other things if necessary), and then insert into the target table using left join with IS NULL.

    This allows you to decide how you wish to handle situations when imported data have some problem, e.g. insert all rows with some error into another table to be processed manually, aggregate multiple rows with the same primary key into one row (or ignore older of the duplicate rows or whatever it is you need). No loops, standard set-based processing.

    this is already the process that i do. 99% of the time.

    odly one time id didnt work said there was a duplicate, but i know that was impossible.

    my main point is i cant belive the import explort wizord doenst allow this, what i could code up in 5 minutes, after how many years SQL Server has been working on their import explort wizord.

    Alternatively i could use SSIS but god thats cofusing and a pain , im a code person, and sometimes have no idea wha to click, need to take a class in that

  • xgcmcbain (4/7/2009)


    my main point is i cant belive the import explort wizord doenst allow this, what i could code up in 5 minutes, after how many years SQL Server has been working on their import explort wizord.

    Alternatively i could use SSIS but god thats cofusing and a pain , im a code person, and sometimes have no idea wha to click, need to take a class in that

    🙂

    The Import/Export wizard allows you to choose how you'd lke the source to be imported by specifying a query. This query can use DISTINCT/JOIN/etc.

    I'm not sure if it works for flat files, but it does work between two tables.

    Unfortunately, "remove duplicates" means different things to different people. That's why you need to write a specific SQL query (or SSIS data flow) to implement what YOU mean.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply