Checking Existence of Row Prior to Insert

  • I have a situation where our stored procedure inserts records from table_1 to table_2 when they don't already exist (uses the EXIST statement) on that table. If table_1 contains multiple records that are the same, it appears after the 1st record has been inserted, it does not recognize it as being there when it checks the existence when attempting to insert record 2.

    Here's an example of the script:

    insert into table_2 (col1,col2,col3)

    select col1,col2,col3

    from table_1 t1

    where not exists (select '1'

    from table_2 t2

    where t1.col1 = t2.col1

    and t1.col2 = t2.col2

    and t1.col3 = t2.col3)

    Data from Table_1 -- Assume that table_2 does not contain these records

    col1 col2 col3

    AA 11 A1

    AA 11 A1

    BB 22 B2

    All 3 records would be inserted to table_2 in this example. Can anyone tell me if this is a buffering issue? I am stumped to how this is happening and that's the only thing I can come up with.

  • Libby (8/11/2015)


    I have a situation where our stored procedure inserts records from table_1 to table_2 when they don't already exist (uses the EXIST statement) on that table. If table_1 contains multiple records that are the same, it appears after the 1st record has been inserted, it does not recognize it as being there when it checks the existence when attempting to insert record 2.

    Here's an example of the script:

    insert into table_2 (col1,col2,col3)

    select col1,col2,col3

    from table_1 t1

    where not exists (select '1'

    from table_2 t2

    where t1.col1 = t2.col1

    and t1.col2 = t2.col2

    and t1.col3 = t2.col3)

    Data from Table_1 -- Assume that table_2 does not contain these records

    col1 col2 col3

    AA 11 A1

    AA 11 A1

    BB 22 B2

    All 3 records would be inserted to table_2 in this example. Can anyone tell me if this is a buffering issue? I am stumped to how this is happening and that's the only thing I can come up with.

    The problem is your assumption. When the insert runs, none of the data in Table_1 exists in Table_2 (your assumption). The insert is done as a set resulting in both duplicate rows being inserted into Table_2. What you should do if you don't want duplicate rows inserted is to dedup the data being inserted from Table_1 to Table_2.

    Here is one way to accomplish this:

    with dedupedData as (

    select distinct

    col1,col2,col3

    from table_1)

    insert into table_2 (col1,col2,col3)

    select col1,col2,col3

    from dedupedData t1

    where not exists (select '1'

    from table_2 t2

    where t1.col1 = t2.col1

    and t1.col2 = t2.col2

    and t1.col3 = t2.col3);

  • The problem is that you have to remove the duplicates before you insert the batch of records.

    This is one option:

    insert into table_2 (col1,col2,col3)

    select DISTINCT col1,col2,col3

    from table_1 t1

    where not exists (select '1'

    from table_2 t2

    where t1.col1 = t2.col1

    and t1.col2 = t2.col2

    and t1.col3 = t2.col3)

  • pietlinden (8/11/2015)


    The problem is that you have to remove the duplicates before you insert the batch of records.

    This is one option:

    insert into table_2 (col1,col2,col3)

    select DISTINCT col1,col2,col3

    from table_1 t1

    where not exists (select '1'

    from table_2 t2

    where t1.col1 = t2.col1

    and t1.col2 = t2.col2

    and t1.col3 = t2.col3)

    Much simpler, I tend to use CTEs a lot so that's the direct I went from the start.

  • There is another way to do it using MERGE. I actually prefer this way, because it's a little more intuitive, being able to specify WHEN MATCHED and WHEN NOT MATCHED. I understand that there have been some issues found with MERGE, but in this scenario, they don't apply. Here's how I did it:

    if object_id('tempdb..#Table1') is not null drop table #Table1

    if object_id('tempdb..#Table2') is not null drop table #Table2

    create table #Table1

    (

    Col1 varchar(50)

    ,Col2 varchar(50)

    ,Col3 varchar(50)

    ,DateAdded datetime2(1)

    )

    go

    create table #Table2

    (

    Col1 varchar(50)

    ,Col2 varchar(50)

    ,Col3 varchar(50)

    ,DateAdded datetime2(1)

    )

    go

    insert into #Table1 (Col1, Col2, Col3, DateAdded) values ('Line1.1', 'Line1.2', 'Line1.3', getdate());

    insert into #Table1 (Col1, Col2, Col3, DateAdded) values ('Line2.1', 'Line2.2', 'Line2.3', getdate());

    insert into #Table1 (Col1, Col2, Col3, DateAdded) values ('Line3.1', 'Line3.2', 'Line3.3', getdate());

    insert into #Table1 (Col1, Col2, Col3, DateAdded) values ('Line4.1', 'Line4.2', 'Line4.3', getdate());

    insert into #Table2 (Col1, Col2, Col3, DateAdded) values ('Line4.1', 'Line4.2', 'Line4.3', null);

    select * from #Table1;

    select * from #Table2;

    merge into #Table2 as D

    using (select Col1, Col2, Col3, DateAdded from #Table1) as S

    on S.Col1 = D.Col1 and S.Col2 = D.Col2 and S.Col3 = D.Col3

    when matched

    then

    update set DateAdded = getdate()

    when not matched by target

    then

    insert (Col1, Col2, Col3, DateAdded)

    values (S.Col1, S.Col2, Col3, DateAdded);

    select * from #Table1;

    select * from #Table2;

    drop table #Table1;

    drop table #Table2;

  • Thanks all for your replies. I appreciate it!

    My example didn't represent the whole story. I too agree that eliminating dups prior to the script would be the way to go, but there are actually other fields that make each record unique. That is probably another issue that needs to be resolved. I was just wondering if anyone knows why SQL doesn't recognize that the 1st record was already inserted when it checks for matches prior to inserting the 2nd record.

  • Because it doesn't check prior to inserting the second record. It checks prior to starting the insert. The way an insert works (has to work) is that the rows to be inserted are defined then, once all the rows are defined, the insert portion of the query starts.

    If the select portion of an insert returns 200 rows, then the insert MUST either insert all 200 rows, or fail and insert none. That's part of the requirements and guarantees of a relational database.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks Gail! I didn't realize it does all the checking prior to the actual inserts. I thought it did this after each insert. I learned something new today!!

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply