I am going to describe quite forgotten feature in SQL Server 2008. It is Change Data Capture (CDC) – powerful feature to track changes in database (tables) without touching table’s schema. There are two features in SQL Server 2008 which support similar thing and look same but they are not – Change Data Capture (CDC) and Change Tracking. I will be focused mainly to CDC in this post.
But what is the difference between CDC and Change Tracking?
It seems to me that CDC is similar to same feature on Oracle – it is asynchronous tracking of changes to user tables by harvesting transaction log changes and translating them user readable relational format. Change tracking does similar thing but is synchronous and provides you less information about changes to tables. It provides only information that table was changed but doesn’t provide row values for changes which were made. CDC is treated in SQL Server documentation as more robust feature than Change Tracking. Maybe this is the reason why CDC is supported only in Enteprise and Datacenter edition while Change Tracking is supported in all versions. CDC is also mentioned together with ETL process where it has probably most useful usage.
This was very basic description of difference between CDC and Change Tracking. Let’s look closer to CDC now.
I borrow following schema from MSDN, please read this paragraph to end before looking to schema. As I mentioned above, CDC provides asynchronous tracking of changes by looking to transaction Log and translating recorded changes for Source tables. Changes are gathered by special SQL Server Agent Job (Capture process on following picture). These changes are inserted to Change tables which can be accessed by special user-defined functions created for that purpose – they are called Change data capture query functions on following picture. Schema depicts situation when ETL process consumes these functions but this is only sample usage.
Setup is quite straightforward (more than querying changes). Here are steps to bring it alive:
Query changes is not as straightforward as is for Change Tracking feature. It basically let you deal with LSN (Log Sequence Number) directly. LSN is unique identifier for each record in transaction log.
It is recommended to use only Change data capture functions for querying changes – not Change tables directly. These functions are created dynamically for each CDC-enabled table (created in setup – step 2.). Let’s Assume that we have table called “Customer” in “dbo” schema. There are than two basic function to retrieve changes for that table:
See this short sample how to setup up CDC and query changed data:
-- create Customer table in current DB CREATE TABLE Customer (ID INT IDENTITY (1,1) PRIMARY KEY, Name VARCHAR(50)) -- Enable CDC for current DB EXEC sys.sp_cdc_enable_db -- Enable CDC for Customer table EXEC sys.sp_cdc_enable_table @source_schema = 'dbo', @source_name = 'Customer', @role_name='test', @supports_net_changes = 1 -- insert some records INSERT INTO Customer (Name) VALUES ('Microsoft') INSERT INTO Customer (Name) VALUES ('Oracle') INSERT INTO Customer (Name) VALUES ('Apple') !!! -- now SQL Server agent job named cdc.<database_name>_capture must be executed -- get min LSN for table customer DECLARE @min_lsn BINARY(10) = sys.fn_cdc_get_min_lsn ('dbo_Customer') -- get max LSN for current DB DECLARE @max_lsn BINARY(10) = sys.fn_cdc_get_max_lsn () -- select all changes made to Customer table SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_Customer(@min_lsn, @max_lsn, 'all update old');
Result is following:
Not nice result, isn’t it? Result contains all columns from Customer table (ID and Name) and some more. _$operation column contains indication of INSERT (2), DELETE (1) or UPDATE with previous values (3), UPDATE for new values (4). _$update_mask is bit mask containing information which column was updated (if applicable). Please see full description here.
As you see, you must handle LSN directly. To track changed data in increments (probably most common scenario) you would have to store last max LSN value you’ve processed changes for and start another run with this value.
There are couple of helper fuctions which may help you with LSN:
That was basics of Change Data Capture (CDC). There’s really much much much more about it – if you’re seriously considering using this feature, read full description of CDC in MSDN.