SQLServerCentral Article

Detecting Changes to a Table

,

Introduction

It's sometimes useful to be able to detect which tables have changed (any insert, update, or deletion) since a particular point in time. For example, a data export job to an external system may need full table data each time (rather than incremental change records), but it is useful to only send through data for tables that have changed since the last export.

Traditionally, requirements such as these can be met using triggers that populate a meta-data table, or a timestamp column that is compared to the same. Code Listing 1 shows an example approach using triggers.

While reading BOL recently, I came across the SQL Server CHECKSUM(), BINARY_CHECKSUM(), and CHECKSUM_AGG() built in functions, the last of which is suggested specifically as a method to "detect changes in a table". However, experimenting further with the CHECKSUM_AGG() function has convinced me that it has a number of significant flaws that render it unsuitable for this kind of application.

This code was tested on SQL 2005, version 9.00.4035.00 (SP3), and also on SQL 2008 R2, version 10.50.1753.0 (RTM with cu5).

-------------------
--Method 1: TRIGGER
------------------- --Base Table Definition
IF OBJECT_ID('CheckSumTest', 'U') IS NOT NULL DROP TABLE CheckSumTest
GO
CREATE TABLE CheckSumTest
(
idintIDENTITY(1,1) NOT NULL PRIMARY KEY,
vc1varchar(1)NOT NULL,
vc2varchar(1)NOT NULL
)
GO
INSERT dbo.CheckSumTest (vc1, vc2) SELECT 'a', 'b'
INSERT dbo.CheckSumTest (vc1, vc2) SELECT 'b', 'a'
GO --Create Audit Summary Table to hold Meta-Data
IF OBJECT_ID('dbo.TableAuditSummary', 'U') IS NOT NULL DROP TABLE dbo.TableAuditSummary
CREATE TABLE dbo.TableAuditSummary
(idINTIDENTITY(1,1)NOT NULL PRIMARY KEY,
TableNamesysnameNOT NULL,
LastUpdateDATETIMENOT NULL,
LastExportDATETIMENOT NULL
)
GO
INSERT dbo.TableAuditSummary (TableName, LastUpdate, LastExport) VALUES ('dbo.CheckSumTest', GETDATE(), GETDATE())
GO --Tables that need exporting
SELECT * FROM dbo.TableAuditSummary WHERE LastUpdate>LastExport --Create Trigger on all Base Tables
--This fires on any insert/update/delete and writes new LastUpdate column for the table set to Current Date and Time
IF OBJECT_ID('dbo.trg_CheckSumTest_MaintainAuditSummary', 'TR') IS NOT NULL DROP TRIGGER dbo.trg_CheckSumTest_MaintainAuditSummary
GO
CREATE TRIGGER dbo.trg_CheckSumTest_MaintainAuditSummary
ON dbo.CheckSumTest
AFTER INSERT, UPDATE, DELETE
AS
BEGIN
IF (object_id('dbo.CheckSumTest') IS NOT NULL)
UPDATE dbo.TableAuditSummary SET LastUpdate=GETDATE() WHERE TableName='dbo.CheckSumTest'
END
GO --Make an Update
UPDATE dbo.CheckSumTest SET vc1='b', vc2='a' WHERE id=1
UPDATE dbo.CheckSumTest SET vc1='a', vc2='b' WHERE id=2 --Check Meta-Data
SELECT * FROM dbo.TableAuditSummary WHERE LastUpdate>LastExport --When we have Exported the data, we run the following to reset MetaData
UPDATE dbo.TableAuditSummary SET LastExport=GETDATE() WHERE LastUpdate>LastExport

Code Listing 1

Using CheckSum() and CheckSum_Agg() Functions

Let's look at using the CHECKSUM_AGG() function, mentioned in Books OnLine and in several websites, touted as a useful way to determine if data in a table has changed. This could be a nice self-contained approach that does not require the use of Triggers, although we may hesitate in very large tables due to its full table scanning nature.

We still need our meta-data table to track which tables have changed, but this time, rather than using LastUpdate, we create a LastChkSum field, which stores the value of the output of the CHECKSUM_AGG(BINARY_CHECKSUM(*)) functions, which generates a single checksum value across the entire table.

This approach is summarised in code Listing 2.

---------------------------------------------
--Method 2 : using CheckSum (not reliable)
---------------------------------------------
--Base Table Definition
IF OBJECT_ID('CheckSumTest', 'U') IS NOT NULL DROP TABLE CheckSumTest
GO
CREATE TABLE CheckSumTest
(
id int IDENTITY(1,1) NOT NULL PRIMARY KEY,
vc1 varchar(1) NOT NULL,
vc2 varchar(1) NOT NULL
)
GO
INSERT dbo.CheckSumTest (vc1, vc2) SELECT 'a', 'b'
INSERT dbo.CheckSumTest (vc1, vc2) SELECT 'b', 'a'
GO --Create Audit Summary Table to hold Meta-Data
IF OBJECT_ID('dbo.TableAuditSummary', 'U') IS NOT NULL DROP TABLE dbo.TableAuditSummary
CREATE TABLE dbo.TableAuditSummary
( id INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
TableName sysname NOT NULL,
LastChkSum INT NOT NULL
)
GO
INSERT dbo.TableAuditSummary (TableName, LastChkSum)
SELECT 'dbo.CheckSumTest', CHECKSUM_AGG(BINARY_CHECKSUM(*)) FROM dbo.CheckSumTest
GO --Tables that need exporting
SELECT * FROM dbo.TableAuditSummary WHERE TableName='dbo.CheckSumTest'
AND LastChkSum<>(SELECT CHECKSUM_AGG(BINARY_CHECKSUM(*)) FROM dbo.CheckSumTest)
UNION ALL
... --Make a Simple (Single row) Update
UPDATE dbo.CheckSumTest SET vc1='c', vc2='a' WHERE id=1 --Tables that need exporting
SELECT * FROM dbo.TableAuditSummary WHERE TableName='dbo.CheckSumTest'
AND LastChkSum<>(SELECT CHECKSUM_AGG(BINARY_CHECKSUM(*)) FROM dbo.CheckSumTest)
UNION ALL
... --Reset MetaData
UPDATE dbo.TableAuditSummary SET LastChkSum=(SELECT CHECKSUM_AGG(BINARY_CHECKSUM(*)) FROM dbo.CheckSumTest)
WHERE TableName='dbo.CheckSumTest' --Make a Symmetric change
UPDATE dbo.CheckSumTest SET vc1='b', vc2='a' WHERE id=1
UPDATE dbo.CheckSumTest SET vc1='c', vc2='a' WHERE id=2 --Tables that need exporting (no rows returned as CHECKSUM_AGG() has not changed!!)
SELECT * FROM dbo.TableAuditSummary WHERE TableName='dbo.CheckSumTest'
AND LastChkSum<>(SELECT CHECKSUM_AGG(BINARY_CHECKSUM(*)) FROM dbo.CheckSumTest)
UNION ALL

Code Listing 2

As you can see, the approach works well for single changes, but comes apart for a symmetric change. (If you are unsure of what I mean by a symmetric change, please see the addendum below where this is defined.)

I find this very surprising. If calculated checkum and binary checkum fields are added to the CheckSumTest table to see what is going on under the hood (see Code Listing 3 below) and the tests are re-run, we can see that different checksum values are generated for every row changed, as we would expect, but the overall table wide CHECKSUM_AGG() is still the same in the case of a symmetric change(!)


--Base Table Definition
IF OBJECT_ID('CheckSumTest', 'U') IS NOT NULL DROP TABLE CheckSumTest
GO
CREATE TABLE CheckSumTest
(
id int IDENTITY(1,1) NOT NULL PRIMARY KEY,
vc1 varchar(1) NOT NULL,
vc2 varchar(1) NOT NULL,
chksum1 AS (CHECKSUM(id, vc1, vc2)),
chksum2 AS (BINARY_CHECKSUM(id, vc1, vc2))
)
GO
INSERT dbo.CheckSumTest (vc1, vc2) SELECT 'a', 'b'
INSERT dbo.CheckSumTest (vc1, vc2) SELECT 'b', 'a'
GO --Show Computed Columns and CheckSum_Agg() value = 199555
SELECT * FROM CheckSumTest
SELECT CHECKSUM_AGG(BINARY_CHECKSUM(*)) FROM CheckSumTest --Make a Simple (Single row) Update
UPDATE dbo.CheckSumTest SET vc1='c', vc2='a' WHERE id=1 --Show Computed Columns and CheckSum_Agg() value = 204816 (Ok)
SELECT * FROM CheckSumTest
SELECT CHECKSUM_AGG(BINARY_CHECKSUM(*)) FROM CheckSumTest --Make a Symmetric change
UPDATE dbo.CheckSumTest SET vc1='b', vc2='a' WHERE id=1
UPDATE dbo.CheckSumTest SET vc1='c', vc2='a' WHERE id=2 --Show Computed Columns and CheckSum_Agg() value = 204816 (Not Ok!)
SELECT * FROM CheckSumTest
SELECT CHECKSUM_AGG(BINARY_CHECKSUM(*)) FROM CheckSumTest

Code Listing 3

Here are the results just prior to the symmetric change:

 

 

and just afterwards:

 

.

Conclusion

Although CHECKSUM_AGG() is described in BOL as "can be used to detect changes in a table". It clearly falls-down when symmetric changes are applied As a disclaimer, BOL does go on to state: "If one of the values in the expression list changes, the checksum of the list also generally changes. However, there is a small chance that the checksum will not change."

If there is very little chance that you will get symmetric changes in your data, then this statement probably holds true, but for me, the fact that the algorithm is short of water-tight is a deal-breaker. I would have preferred to have seen the specific scenario of symmetrc changes mentioned in the documentation as a short-coming of the CHECKSUM_AGG() function.

Adendum : Definition of Symmetric Change

A symmetric change is two changes that mirror each other, commonly used to swap information from one record to another and vice-versa. For example, I have two rows. The first is ID=1, and name "John Smith". The second, ID=2, and name="Jane Doe". If I run the following two updates to swap these records, I consider this to be a "symmetric" change.

UPDATE Table1 SET name='John Smith' WHERE ID=2;
UPDATE Table1 SET name='Jane Doe' WHERE ID=1;

Rate

4 (23)

You rated this post out of 5. Change rating

Share

Share

Rate

4 (23)

You rated this post out of 5. Change rating