Compare 2 datasets and output delta in one TSQL statement

, 2011-11-04 (first published: )

It's a common problem we all share from time to time, and i thought my solution is worth a post πŸ™‚

You have 2 datasets and you wonder.....  has anything changed, and if so, did it change in A or B?

In my example i show a way to compare 2 sets of employee data and output the changes in one TSQL statement

Table A

# EmpNr Name HireDT FireDT
1 1 janssen 2000-01-01 00:00:00.000 {NULL}
2 2 de bruin 2001-05-01 00:00:00.000 2005-12-31 00:00:00.000
3 3 pieterssen 2004-08-01 00:00:00.000 {NULL}
4 4 de vries 2003-09-01 00:00:00.000 2011-01-01 00:00:00.000
5 5 Klaassen 2009-09-01 00:00:00.000 2011-01-01 00:00:00.000
6 6 van Speyk 2009-09-01 00:00:00.000 {NULL}
7 7 Klaassen 2009-09-01 00:00:00.000 2011-01-01 00:00:00.000

Table B

# EmpNr Name HireDT FireDT
1 2 de bruin 2001-05-01 00:00:00.000 2005-12-31 00:00:00.000
2 3 pietersse 2004-08-01 00:00:00.000 {NULL}
3 4 de vries 2003-09-01 00:00:00.000 2011-01-01 00:00:00.000
4 5 Klaassen 2009-09-01 00:00:00.000 2011-01-01 00:00:00.000
5 6 van Speijk 2009-09-01 00:00:00.000 {NULL}
6 7 Klaassen 2009-09-01 00:00:00.000 2011-01-01 00:00:00.000
7 8 de groot 2003-09-01 00:00:00.000 2011-01-01 00:00:00.000

The differences

# EmpNr_a EmpNr_b Name_a Name_b HireDT_a HireDT_b FireDT_a FireDT_b
1 1 {NULL} janssen {NULL} 2000-01-01 00:00:00.000 {NULL} {NULL} {NULL}
2 3 3 pieterssen pietersse 2004-08-01 00:00:00.000 2004-08-01 00:00:00.000 {NULL} {NULL}
3 6 6 van Speyk van Speijk 2009-09-01 00:00:00.000 2009-09-01 00:00:00.000 {NULL} {NULL}
4 {NULL} 8 {NULL} de groot {NULL} 2003-09-01 00:00:00.000 {NULL} 2011-01-01 00:00:00.000

As you can see, the output shows additions, removals and changes.

Happy Coding!

Theo Ekelmans

Sr MS-SQL DBA @ Ordina.nl

/*
this example will show you how to compare 2 datasets and output the differences in one statement
Author:β€ˆβ€ˆβ€ˆβ€ˆTheo Ekelmans (theo@ekelmans.com)
Version: 1.0 2011-10-07 
*/
set nocount on

-- Original data
CREATE TABLE #tblA (
 [EmpNr]β€ˆβ€ˆβ€ˆβ€ˆ [int], -- Primary key
 [Name] [varchar](15) NULL,
 [HireDT] [datetime] NULL,
 [FireDT] [datetime] NULL,
CONSTRAINT [PK_tblA] PRIMARY KEY CLUSTERED ( [EmpNr] ASC )
)

-- Some copy you need to compare
CREATE TABLE #tblB (
 [EmpNr]β€ˆβ€ˆβ€ˆβ€ˆ [int], -- Primary key
 [Name] [varchar](15) NULL,
 [HireDT] [datetime] NULL,
 [FireDT] [datetime] NULL,
CONSTRAINT [PK_tblB] PRIMARY KEY CLUSTERED ( [EmpNr] ASC )
)

-- Create table to hold the Delta of tblA and tblB
CREATE TABLE #cmpAB (
 [EmpNr_a]β€ˆβ€ˆβ€ˆβ€ˆ[varchar](10) NULL,
 [EmpNr_b] β€ˆβ€ˆβ€ˆβ€ˆ[varchar](10) NULL,
 [Name_a] [varchar](15) NULL,
 [Name_b] [varchar](15) NULL,
 [HireDT_a] β€ˆβ€ˆβ€ˆβ€ˆ[datetime] NULL,
 [HireDT_b] β€ˆβ€ˆβ€ˆβ€ˆ[datetime] NULL,
 [FireDT_a] β€ˆβ€ˆβ€ˆβ€ˆ[datetime] NULL,
 [FireDT_b] β€ˆβ€ˆβ€ˆβ€ˆ[datetime] NULL
)

-- Fill tblA and tblB with some sample data (Epic Dutch names!)
INSERT INTO dbo.#tblA([EmpNr], [Name], [HireDT], [FireDT])
SELECT 1, N'janssen', '20000101 00:00:00.000', NULL UNION ALL
SELECT 2, N'de bruin', '20010501 00:00:00.000', '20051231 00:00:00.000' UNION ALL
SELECT 3, N'pieterssen', '20040801 00:00:00.000', NULL UNION ALL
SELECT 4, N'de vries', '20030901 00:00:00.000', '20110101 00:00:00.000' UNION ALL
SELECT 5, N'Klaassen', '20090901 00:00:00.000', '20110101 00:00:00.000' UNION ALL
SELECT 6, N'van Speyk', '20090901 00:00:00.000', null UNION ALL
SELECT 7, N'Klaassen', '20090901 00:00:00.000', '20110101 00:00:00.000'

INSERT INTO dbo.#tblB([EmpNr], [Name], [HireDT], [FireDT])
SELECT 2, N'de bruin', '20010501 00:00:00.000', '20051231 00:00:00.000' UNION ALL
SELECT 3, N'pietersse', '20040801 00:00:00.000', NULL UNION ALL -- n deleted
SELECT 4, N'de vries', '20030901 00:00:00.000', '20110101 00:00:00.000' UNION ALL
SELECT 5, N'Klaassen', '20090901 00:00:00.000', '20110101 00:00:00.000' UNION ALL
SELECT 6, N'van Speijk', '20090901 00:00:00.000', null UNION ALL -- y replaced by ij
SELECT 7, N'Klaassen', '20090901 00:00:00.000', '20110101 00:00:00.000' UNION ALL
SELECT 8, N'de groot', '20030901 00:00:00.000', '20110101 00:00:00.000'

-- Compare and insert delta into table, in one sql statement
Insert into #cmpAB
β€ˆβ€ˆβ€ˆβ€ˆ(β€ˆβ€ˆβ€ˆβ€ˆ[EmpNr_a],β€ˆβ€ˆβ€ˆβ€ˆ[EmpNr_b], 
β€ˆβ€ˆβ€ˆβ€ˆβ€ˆβ€ˆβ€ˆβ€ˆ[Name_a],β€ˆβ€ˆβ€ˆβ€ˆ[Name_b], 
β€ˆβ€ˆβ€ˆβ€ˆβ€ˆβ€ˆβ€ˆβ€ˆ[HireDT_a],β€ˆβ€ˆβ€ˆβ€ˆ[HireDT_b], 
β€ˆβ€ˆβ€ˆβ€ˆβ€ˆβ€ˆβ€ˆβ€ˆ[FireDT_a],β€ˆβ€ˆβ€ˆβ€ˆ[FireDT_b]
β€ˆβ€ˆβ€ˆβ€ˆ)

select a.[EmpNr] as [EmpNr_a],β€ˆβ€ˆβ€ˆβ€ˆβ€ˆβ€ˆβ€ˆβ€ˆb.[EmpNr] as [EmpNr_b], 
 a.[Name] as [Name_a],β€ˆβ€ˆβ€ˆβ€ˆβ€ˆβ€ˆβ€ˆβ€ˆb.[Name] as [Name_b], 
 a.[HireDT] as [HireDT_a],β€ˆβ€ˆβ€ˆβ€ˆb.[HireDT] as [HireDT_b], 
 a.[FireDT] as [FireDT_a], β€ˆβ€ˆβ€ˆβ€ˆb.[FireDT] as [FireDT_b]

from ( 
 -- Give all from B, if it's new or different
 select [EmpNr], [Name], [HireDT], [FireDT]
 fromβ€ˆβ€ˆβ€ˆβ€ˆ#tblB
 
 except
 
 select [EmpNr], [Name], [HireDT], [FireDT]
 fromβ€ˆβ€ˆβ€ˆβ€ˆ#tblA
 ) b

full outer join -- Join the 2 delta sets 

 (
 -- Give all from B, if it's an orphan or different
 select [EmpNr], [Name], [HireDT], [FireDT]
 fromβ€ˆβ€ˆβ€ˆβ€ˆ#tblA
 
 except
 
 select [EmpNr], [Name], [HireDT], [FireDT]
 fromβ€ˆβ€ˆβ€ˆβ€ˆ#tblB
 ) a

onβ€ˆβ€ˆβ€ˆβ€ˆa.[EmpNr] = b.[EmpNr] -- Join on PK !

-- Show the tables
select * from #tblA
select * from #tblB

-- Show the Delta
select * from #cmpAB

-- Clean up
DROP TABLE #cmpAB 
DROP TABLE #tblA
DROP TABLE #tblB

Rate

β˜… β˜… β˜… β˜… β˜… β˜… β˜… β˜… β˜… β˜…

3 (2)

Share

Share

Rate

β˜… β˜… β˜… β˜… β˜… β˜… β˜… β˜… β˜… β˜…

3 (2)

Related content

A Normalization Primer

For most DBAs, normalization is an understood concept, a bread and butter bit of knowledge. However, it is not at all unusual to review a database design by a development group for an OLTP (OnLine Transaction Processing) environment and find that the schema chosen is anything but properly normalized. This article by Brian Kelley will give you the core knowledge to data model.

β˜… β˜… β˜… β˜… β˜… β˜… β˜… β˜… β˜… β˜…

5 (3)

2003-01-13

17,243 reads

HOWTO: Run Singleton SELECT Queries in a Visual Basic Client

This article demonstrates how to retrieve a single record from SQL Server by using the IRow interface with a singleton SELECT. The main purpose for this technique is to avoid the overhead of creating a recordset when you are fetching a single record. Because no recordset is actually created, only one read-only ADODB.Record is returned. This is true even if the specified SELECT results in multiple records being returned if a normal ADODB.Recordset is used.

2001-11-26

1,684 reads