Blog Post

Inserting missing rows

,

I can think of three ways to insert just the missing rows from one table or query into another table. 

The first way is so ugly that I hate to even bring it up, yet I know that there are developers and (gasp) DBAs using this very technique.  Basically, you create a cursor to loop through each row of the first table and check, one row at a time, to see if the row exists in the second table before inserting it.  This is so inefficient, I'm not giving a code example.

The second way is better: Use a sub-query in the WHERE clause along with the NOT IN operator.  Several years ago when I was working on my Master's degree, I took a course in the SQL language using Oracle.  I'm not sure if sub-queries and especially correlated sub-queries are better in Oracle than SQL Server, but the professor taught us to use this method to solve almost every problem.  Needless to say, a lot of my old code uses this approach as well.  This method gets a little more complicated with multi-valued primary keys.  I recently ran across an example that had two sub-queries in the WHERE clause. 

Here is an example of what it looked like and it does work (Note: after posting this and riding into work, I realized that this will not work in all cases.  I have seen it used in production; I suspect the primary key was actually defined incorrectly):

insert into tableB(PK_Col1, PK_Col2, Col3, Col4)
select
a.PK_Col1, a.PK_Col2, a.Col3, a.Col4
from
tableA as a
where
a.PK_Col1 not in (select PK_Col1 from tableB)
    and a.PK_Col2 not in (select PK_col2 from tableB)

The "not in" method does work with a one-column primary key.

The third method takes advantage of the OUTER JOIN syntax.  By joining the two tables with this syntax, you can retrieve all of the rows in tableA regardless of whether there is a match in tableB.  For those rows where there isn't a match, all of the columns in tableB will be null.  By selecting only those rows where the values in tableB are null you have found your missing rows.

Here is an example:


insert
into tableB (PK_Col1, PK_Col2, Col3, Col4)
select a.PK_Col1, a.PK_Col2, a.Col3, a.Col4
from
tableA as a left outer join tableB as b
on
a.PK_Col1 = b.PK_Col1 and a.PK_Col2 = a.PK_Col2
where
b.PK_Col1 is null

While I didn't find much difference in the estimated query plans between the two with just a small amount of  sample data, it would be interesting to see what would happen with millions of rows.  Both methods did two clustered index scans.  I think the LEFT OUTER JOIN method would be much more efficient with a large amount of data. 

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating