Checking Existence of Row Prior to Insert

Question

Checking Existence of Row Prior to Insert

Libby

Ten Centuries

Points: 1175
More actions
August 11, 2015 at 3:49 pm

#398885

I have a situation where our stored procedure inserts records from table_1 to table_2 when they don't already exist (uses the EXIST statement) on that table. If table_1 contains multiple records that are the same, it appears after the 1st record has been inserted, it does not recognize it as being there when it checks the existence when attempting to insert record 2.
Here's an example of the script:
insert into table_2 (col1,col2,col3)
select col1,col2,col3
from table_1 t1
where not exists (select '1'
from table_2 t2
where t1.col1 = t2.col1
and t1.col2 = t2.col2
and t1.col3 = t2.col3)
Data from Table_1 -- Assume that table_2 does not contain these records
col1 col2 col3
AA 11 A1
AA 11 A1
BB 22 B2
All 3 records would be inserted to table_2 in this example. Can anyone tell me if this is a buffering issue? I am stumped to how this is happening and that's the only thing I can come up with.

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 1

Libby (8/11/2015)
I have a situation where our stored procedure inserts records from table_1 to table_2 when they don't already exist (uses the EXIST statement) on that table. If table_1 contains multiple records that are the same, it appears after the 1st record has been inserted, it does not recognize it as being there when it checks the existence when attempting to insert record 2.
Here's an example of the script:
insert into table_2 (col1,col2,col3)
select col1,col2,col3
from table_1 t1
where not exists (select '1'
from table_2 t2
where t1.col1 = t2.col1
and t1.col2 = t2.col2
and t1.col3 = t2.col3)
Data from Table_1 -- Assume that table_2 does not contain these records
col1 col2 col3
AA 11 A1
AA 11 A1
BB 22 B2
All 3 records would be inserted to table_2 in this example. Can anyone tell me if this is a buffering issue? I am stumped to how this is happening and that's the only thing I can come up with.

The problem is your assumption. When the insert runs, none of the data in Table_1 exists in Table_2 (your assumption). The insert is done as a set resulting in both duplicate rows being inserted into Table_2. What you should do if you don't want duplicate rows inserted is to dedup the data being inserted from Table_1 to Table_2.

Here is one way to accomplish this:

with dedupedData as (

select distinct

col1,col2,col3

from table_1)

insert into table_2 (col1,col2,col3)

select col1,col2,col3

from dedupedData t1

where not exists (select '1'

from table_2 t2

where t1.col1 = t2.col1

and t1.col2 = t2.col2

and t1.col3 = t2.col3);

pietlinden SSC Guru Points: 63594 More actions · Answer 2

The problem is that you have to remove the duplicates before you insert the batch of records.

This is one option:

insert into table_2 (col1,col2,col3)

select DISTINCT col1,col2,col3

from table_1 t1

where not exists (select '1'

from table_2 t2

where t1.col1 = t2.col1

and t1.col2 = t2.col2

and t1.col3 = t2.col3)

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 3

pietlinden (8/11/2015)
The problem is that you have to remove the duplicates before you insert the batch of records.
This is one option:
insert into table_2 (col1,col2,col3)
select DISTINCT col1,col2,col3
from table_1 t1
where not exists (select '1'
from table_2 t2
where t1.col1 = t2.col1
and t1.col2 = t2.col2
and t1.col3 = t2.col3)

Much simpler, I tend to use CTEs a lot so that's the direct I went from the start.

sickb00000y Old Hand Points: 313 More actions · Answer 4

There is another way to do it using MERGE. I actually prefer this way, because it's a little more intuitive, being able to specify WHEN MATCHED and WHEN NOT MATCHED. I understand that there have been some issues found with MERGE, but in this scenario, they don't apply. Here's how I did it:

if object_id('tempdb..#Table1') is not null drop table #Table1

if object_id('tempdb..#Table2') is not null drop table #Table2

create table #Table1

(

Col1 varchar(50)

,Col2 varchar(50)

,Col3 varchar(50)

,DateAdded datetime2(1)

)

go

create table #Table2

(

Col1 varchar(50)

,Col2 varchar(50)

,Col3 varchar(50)

,DateAdded datetime2(1)

)

go

insert into #Table1 (Col1, Col2, Col3, DateAdded) values ('Line1.1', 'Line1.2', 'Line1.3', getdate());

insert into #Table1 (Col1, Col2, Col3, DateAdded) values ('Line2.1', 'Line2.2', 'Line2.3', getdate());

insert into #Table1 (Col1, Col2, Col3, DateAdded) values ('Line3.1', 'Line3.2', 'Line3.3', getdate());

insert into #Table1 (Col1, Col2, Col3, DateAdded) values ('Line4.1', 'Line4.2', 'Line4.3', getdate());

insert into #Table2 (Col1, Col2, Col3, DateAdded) values ('Line4.1', 'Line4.2', 'Line4.3', null);

select * from #Table1;

select * from #Table2;

merge into #Table2 as D

using (select Col1, Col2, Col3, DateAdded from #Table1) as S

on S.Col1 = D.Col1 and S.Col2 = D.Col2 and S.Col3 = D.Col3

when matched

then

update set DateAdded = getdate()

when not matched by target

then

insert (Col1, Col2, Col3, DateAdded)

values (S.Col1, S.Col2, Col3, DateAdded);

select * from #Table1;

select * from #Table2;

drop table #Table1;

drop table #Table2;

Libby Ten Centuries Points: 1175 More actions · Answer 5

Thanks all for your replies. I appreciate it!

My example didn't represent the whole story. I too agree that eliminating dups prior to the script would be the way to go, but there are actually other fields that make each record unique. That is probably another issue that needs to be resolved. I was just wondering if anyone knows why SQL doesn't recognize that the 1st record was already inserted when it checks for matches prior to inserting the 2nd record.

Gail Shaw SSC Guru Points: 1004504 More actions Follow · Answer 6

Because it doesn't check prior to inserting the second record. It checks prior to starting the insert. The way an insert works (has to work) is that the rows to be inserted are defined then, once all the rows are defined, the insert portion of the query starts.

If the select portion of an insert returns 200 rows, then the insert MUST either insert all 200 rows, or fail and insert none. That's part of the requirements and guarantees of a relational database.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Libby Ten Centuries Points: 1175 More actions · Answer 7

Thanks Gail! I didn't realize it does all the checking prior to the actual inserts. I thought it did this after each insert. I learned something new today!!

Checking Existence of Row Prior to Insert

Cookies on SQLServerCentral