August 11, 2015 at 3:49 pm
I have a situation where our stored procedure inserts records from table_1 to table_2 when they don't already exist (uses the EXIST statement) on that table. If table_1 contains multiple records that are the same, it appears after the 1st record has been inserted, it does not recognize it as being there when it checks the existence when attempting to insert record 2.
Here's an example of the script:
insert into table_2 (col1,col2,col3)
select col1,col2,col3
from table_1 t1
where not exists (select '1'
from table_2 t2
where t1.col1 = t2.col1
and t1.col2 = t2.col2
and t1.col3 = t2.col3)
Data from Table_1 -- Assume that table_2 does not contain these records
col1 col2 col3
AA 11 A1
AA 11 A1
BB 22 B2
All 3 records would be inserted to table_2 in this example. Can anyone tell me if this is a buffering issue? I am stumped to how this is happening and that's the only thing I can come up with.
August 11, 2015 at 3:57 pm
Libby (8/11/2015)
I have a situation where our stored procedure inserts records from table_1 to table_2 when they don't already exist (uses the EXIST statement) on that table. If table_1 contains multiple records that are the same, it appears after the 1st record has been inserted, it does not recognize it as being there when it checks the existence when attempting to insert record 2.Here's an example of the script:
insert into table_2 (col1,col2,col3)
select col1,col2,col3
from table_1 t1
where not exists (select '1'
from table_2 t2
where t1.col1 = t2.col1
and t1.col2 = t2.col2
and t1.col3 = t2.col3)
Data from Table_1 -- Assume that table_2 does not contain these records
col1 col2 col3
AA 11 A1
AA 11 A1
BB 22 B2
All 3 records would be inserted to table_2 in this example. Can anyone tell me if this is a buffering issue? I am stumped to how this is happening and that's the only thing I can come up with.
The problem is your assumption. When the insert runs, none of the data in Table_1 exists in Table_2 (your assumption). The insert is done as a set resulting in both duplicate rows being inserted into Table_2. What you should do if you don't want duplicate rows inserted is to dedup the data being inserted from Table_1 to Table_2.
Here is one way to accomplish this:
with dedupedData as (
select distinct
col1,col2,col3
from table_1)
insert into table_2 (col1,col2,col3)
select col1,col2,col3
from dedupedData t1
where not exists (select '1'
from table_2 t2
where t1.col1 = t2.col1
and t1.col2 = t2.col2
and t1.col3 = t2.col3);
August 11, 2015 at 3:59 pm
The problem is that you have to remove the duplicates before you insert the batch of records.
This is one option:
insert into table_2 (col1,col2,col3)
select DISTINCT col1,col2,col3
from table_1 t1
where not exists (select '1'
from table_2 t2
where t1.col1 = t2.col1
and t1.col2 = t2.col2
and t1.col3 = t2.col3)
August 11, 2015 at 4:01 pm
pietlinden (8/11/2015)
The problem is that you have to remove the duplicates before you insert the batch of records.This is one option:
insert into table_2 (col1,col2,col3)
select DISTINCT col1,col2,col3
from table_1 t1
where not exists (select '1'
from table_2 t2
where t1.col1 = t2.col1
and t1.col2 = t2.col2
and t1.col3 = t2.col3)
Much simpler, I tend to use CTEs a lot so that's the direct I went from the start.
August 11, 2015 at 11:58 pm
There is another way to do it using MERGE. I actually prefer this way, because it's a little more intuitive, being able to specify WHEN MATCHED and WHEN NOT MATCHED. I understand that there have been some issues found with MERGE, but in this scenario, they don't apply. Here's how I did it:
if object_id('tempdb..#Table1') is not null drop table #Table1
if object_id('tempdb..#Table2') is not null drop table #Table2
create table #Table1
(
Col1 varchar(50)
,Col2 varchar(50)
,Col3 varchar(50)
,DateAdded datetime2(1)
)
go
create table #Table2
(
Col1 varchar(50)
,Col2 varchar(50)
,Col3 varchar(50)
,DateAdded datetime2(1)
)
go
insert into #Table1 (Col1, Col2, Col3, DateAdded) values ('Line1.1', 'Line1.2', 'Line1.3', getdate());
insert into #Table1 (Col1, Col2, Col3, DateAdded) values ('Line2.1', 'Line2.2', 'Line2.3', getdate());
insert into #Table1 (Col1, Col2, Col3, DateAdded) values ('Line3.1', 'Line3.2', 'Line3.3', getdate());
insert into #Table1 (Col1, Col2, Col3, DateAdded) values ('Line4.1', 'Line4.2', 'Line4.3', getdate());
insert into #Table2 (Col1, Col2, Col3, DateAdded) values ('Line4.1', 'Line4.2', 'Line4.3', null);
select * from #Table1;
select * from #Table2;
merge into #Table2 as D
using (select Col1, Col2, Col3, DateAdded from #Table1) as S
on S.Col1 = D.Col1 and S.Col2 = D.Col2 and S.Col3 = D.Col3
when matched
then
update set DateAdded = getdate()
when not matched by target
then
insert (Col1, Col2, Col3, DateAdded)
values (S.Col1, S.Col2, Col3, DateAdded);
select * from #Table1;
select * from #Table2;
drop table #Table1;
drop table #Table2;
August 12, 2015 at 6:58 am
Thanks all for your replies. I appreciate it!
My example didn't represent the whole story. I too agree that eliminating dups prior to the script would be the way to go, but there are actually other fields that make each record unique. That is probably another issue that needs to be resolved. I was just wondering if anyone knows why SQL doesn't recognize that the 1st record was already inserted when it checks for matches prior to inserting the 2nd record.
August 12, 2015 at 7:03 am
Because it doesn't check prior to inserting the second record. It checks prior to starting the insert. The way an insert works (has to work) is that the rows to be inserted are defined then, once all the rows are defined, the insert portion of the query starts.
If the select portion of an insert returns 200 rows, then the insert MUST either insert all 200 rows, or fail and insert none. That's part of the requirements and guarantees of a relational database.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 12, 2015 at 7:15 am
Thanks Gail! I didn't realize it does all the checking prior to the actual inserts. I thought it did this after each insert. I learned something new today!!
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply