Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Finding Available Keys When Merging Two Tables

By Austin Zellner,

When our company was purchased by a private equity firm, we decided to consolidate our operations and databases with a sister comapny. I was responsible for merging about 300 tables between two different database over the course of two years. Duplicate keys were the most common problem, and in most cases I could simply renumber sequentially the duplicate keys without an issue.

However, in some cases, simply renumbering sequentially would create new duplicates; in my case, two employee files had duplicate employee numbers, and there was an artificial limit imposed that the keys had to be between 0000 and 9999, so sequentially renumbering was not an option as it would have created new duplicate keys.

To solve the problem I created a dummy employee table with entries from 0000 to 9999. I used Powerbuilder to loop through and insert the rows, but could have just as easily done an import from Excel, or used any scripting language that allows for a loop and SQL insert.

Since my new numbers could not be a duplicate in either table, I needed to remove from the dummy table any keys that were already in use. To do this, I used a table variable loaded with my dummy records, and then deleted any records that existed in either employee table. The records that were left in the dummy table were good, unused keys that I could renumber my existing records with.

Syntax

The syntax to create the dummy table was:

CREATE TABLE [dbo].[emp_no_test](
[id] [numeric](18, 0) IDENTITY(1,1) NOT NULL,
[emp_no] [varchar](4) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]

The employee data existed in two separate databases, which I'll call database_one and database_two, in a table emp, with the key of emp_no.

-- Create the local table that will hold the good clean records
declare @empty_emp table(
emp_no varchar(4)
)
--Load the local table with the total set of records 0000 to 9999
insert into @empty_emp
select emp_no from database_one.dbo.emp_no_test
--Remove the records from database one that are duplicates
delete from @empty_emp where
emp_no in ( select emp_no from database_one.dbo.emp )
-- Remove the records from database two that are duplicates
delete from @empty_emp where
emp_no in ( select emp_no from database_two.dbo.emp )
-- Select back all the clean records
select * from @empty_emp

Data Example

So for example, lets walk through this scenario with a much smaller dataset, confined to having to have an employee number key between 1 and 10. Database_one has the following employee table:

Database_One
Emp_no
1
3
7
8
9

 

Database_two has the following employee table:

Database_Two
Emp_no
1
4
5
6
7

 

So, the duplicate keys are 1 and 7. These two records must have their key changed to a new value. But because the numbers are not sequential, there is a possibility of creating another duplicate by just blindly assigning a new key. Using the above solution, here is what the data sets would look like.

First, I create the emp_no_test table with all possible values in the range ( in this case, 1 through 10 ).

Database_One Emp No Test
1
2
3
4
5
6
7
8
9
10

Next, the script deletes all records from the emp_no_test where the emp_no is in the Database_One employee table. The emp_no_test table now looks like ( I left the blanks to indicate records that were deleted ):

Database_One Emp No Test
 
2
 
4
5
6
 
 
 
10

 

Finally the script deletes all records from emp_no_test where emp_no is in the Database_Two employee table. The emp_no_test table now looks like:

Database_One Emp No Test
 
2
 
 
 
 
 
 
 
10

Returning back the records left in the emp_no_test table gives us two keys that are not being used in either table: 2 and 10. I can safely renumber the keys to 2 and 10.

Conclusion

Resolving duplicate keys is often straightforward, but when there are limits imposed, you have to get creative. In my case we ended up renumbering several hundred employee numbers, which would have taken days of time for the human resource departments to sort out. By using this approach, I was able to find my new keys in a matter of minutes, and did not have to deal with any new duplicates created by renumbering.

Total article views: 4490 | Views in the last 30 days: 4
 
Related Articles
FORUM

UPDATE statement that creates duplicate records

Problem with an UPDATE statement that creates duplicate records

SCRIPT

Deleting Duplicate Records

In Datawarehousing, a normal scenario is eleminating duplicate records/rows or deleting duplicate re...

SCRIPT

Deleting Duplicate Records

This article explains a simple solution for the common problem in MSSQL 7.0/2000, finding & deleting...

SCRIPT

Remove Duplicate Records

To remove duplicate records with out using temporary tables.

FORUM

Deleting Duplicate Record in Production

Deleting Duplicate Record in Production

Tags
data quality    
t-sql    
 
Contribute

Join the most active online SQL Server Community

SQL knowledge, delivered daily, free:

Email address:  

You make SSC a better place

As a member of SQLServerCentral, you get free access to loads of fresh content: thousands of articles and SQL scripts, a library of free eBooks, a weekly database news roundup, a great Q & A platform… And it’s our huge, buzzing community of SQL Server Professionals that makes it such a success.

Join us!

Steve Jones
Editor, SQLServerCentral.com

Already a member? Jump in:

Email address:   Password:   Remember me: Forgotten your password?
Steve Jones