SQLServerCentral Article

Change Tracking and Database Refactoring

,

I'm going to talk about the new change tracking feature in SQL Server 2008 and how it can be used to refactor large tables with minimal downtime. I was recently tasked with converting multiple columns in several large tables to use Unicode data types. Ordinarily this could be done during a downtime but given the size of the tables the conversion would not have completed during this 8 hour window. There is also an initiative to begin thinking about how we could possibly achieve the goal of "zero downtime". Let's walk through the steps in how change tracking was used to accomplish the goals below.

  • Refactor several large tables to support Unicode data types.
  • Avoid using triggers and modifying the existing tables.
  • Minimize downtime.
  • Minimize impact of enabling change tracking and avoid enabling the snapshot isolation level (Books on Line recommends the usage of snapshot isolation in conjunction with change tracking).

All examples will be done within the Adventureworks2008 database. Don't have it? Grab a copy here.

Avoiding the use of the snapshot isolation level requires a method to keep track of the last_synchronization_version. This was done by creating the dba_change_tracking_synchronization table and by using the @CurrentVersion variable in the usp_change_tracking_synchronization proc.

CREATE TABLE [dbo].[dba_change_tracking_synchronization](
[CTS_ID] [int] IDENTITY(1,1) NOT NULL,
[Source_SchemaName] [varchar](50) NOT NULL,
[Source_TableName] [varchar](100) NOT NULL,
[Target_SchemaName] [varchar](50) NOT NULL,
[Target_TableName] [varchar](100) NOT NULL,
[last_sync_version] [int] NULL,
CONSTRAINT [PK_dba_change_tracking_synchronization] PRIMARY KEY CLUSTERED
(
[Source_SchemaName] ASC,
[Source_TableName] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 85) ON [PRIMARY]
) ON [PRIMARY]

Next, I wanted a stored procedure that could be used to keep the two tables in synch for all updates, inserts and deletes. It had to be able to deal with multiple tables and with different schemas without needing to be modified. The procedure below uses the sys.objects, sys.columns, and sys.indexes to determine the columns and Primary Key (required with the use of change tracking) of the source and target tables.

IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[usp_change_tracking_synchronization]') AND type in (N'P', N'PC'))
DROP PROCEDURE [dbo].[usp_change_tracking_synchronization]
GO CREATE PROCEDURE [dbo].[usp_change_tracking_synchronization] ( @SourceTable VARCHAR(100) ) AS
/*================================================================================================
Author: Luke Campbell
Title: usp_change_tracking_synchronization
Purpose: Localization project
Procedure is used to keep the source and target table synchronized with change tracking until the
source and target can be renamed.
Created: 03/28/2011
=================================================================================================*/ SET NOCOUNT ON --Begin updating target table from change tracking data
DECLARE @last_sync_version VARCHAR(10)
DECLARE @CurrentVersionVARCHAR(10)
DECLARE @TargetTable VARCHAR(100) /*==========================================
Verifying parameters
===========================================*/IF @SourceTable NOT LIKE '%.%'
BEGIN
RAISERROR('Schema must be specified for the @Sourcetable parameter.', 16, 1)
RETURN
END --Add brackets if they dont exist
IF @SourceTable NOT LIKE '[[]%'
SET @SourceTable = '[' + SUBSTRING(@SourceTable, 1, CHARINDEX('.', @SourceTable) - 1)
+ ']' + '.[' + SUBSTRING(@sourceTable, CHARINDEX('.', @SourceTable) + 1, LEN(@SourceTable)) + ']' --Verify the source table exists in the database
IF NOT EXISTS (SELECT name FROM sys.objects WHERE object_id = object_id(@SourceTable))
BEGIN
RAISERROR('Source table specified does not exist in the current database.', 16,1)
RETURN
END
--Verify the source table exist in the dba_change_tracking_synchronization table.
IF NOT EXISTS (SELECT Source_SchemaName, Source_TableName FROM dbo.dba_change_tracking_synchronization
WHERE QUOTENAME(Source_SchemaName) + '.' + QUOTENAME(Source_TableName) = @SourceTable )
BEGIN
RAISERROR('Source table has not been added to the dbo.dba_change_tracking_synchronization table.', 16, 1)
RETURN
END

--Verify that change tracking has been enabled for the source table.
IF NOT EXISTS (SELECT OBJECT_ID FROM sys.change_tracking_tables WHERE OBJECT_ID = OBJECT_ID(@SourceTable))
BEGIN
RAISERROR('Change tracking has not been enabled for the specified source table.', 16, 1)
RETURN
END
--Verify a target table has been specified in the dba_change_tracking_synchronization table for the specified source.
IF NOT EXISTS (SELECT Target_TableName FROM dbo.dba_change_tracking_synchronization
WHERE QUOTENAME(Source_SchemaName) + '.' + QUOTENAME(Source_TableName) = @SourceTable )
BEGIN
RAISERROR('A target table has not been specified in the dbo.dba_change_tracking_synchronization table for this source.', 16,1)
RETURN
END

--Set the target
SELECT @TargetTable =
QUOTENAME(Target_SchemaName) + '.' + QUOTENAME(Target_TableName)
FROM dba_change_tracking_synchronization WHERE QUOTENAME(Source_SchemaName) + '.' + QUOTENAME(Source_TableName) = @SourceTable --Verify the target table exists in the database.
IF NOT EXISTS (SELECT name FROM sys.objects WHERE object_id = object_id(@targetTable))
BEGIN
RAISERROR('Source table specified does not exist in the current database.', 16,1)
RETURN
END /*========================================================
Set the change tracking versions to use for this iteration
=========================================================*/--Get min change tracking version.
IF (SELECT
last_sync_version
FROM dba_change_tracking_synchronization
WHERE QUOTENAME(Source_SchemaName) + '.' + QUOTENAME(Source_TableName) = @SourceTable ) IS NULL
BEGIN
SELECT @last_sync_version = CHANGE_TRACKING_MIN_VALID_VERSION(OBJECT_ID(@SourceTable))
END
ELSE
SELECT
@last_sync_version = last_sync_version
FROM dba_change_tracking_synchronization
WHERE QUOTENAME(Source_SchemaName) + '.' + QUOTENAME(Source_TableName) = @SourceTable --ensure that the @last_sync_version isn't less than the min valid version. If so generate an error.
IF CHANGE_TRACKING_MIN_VALID_VERSION(object_id(@SourceTable)) > @last_sync_version
BEGIN
RAISERROR ('Change Tracking has cleaned up tracking information for table. To recover from this error, the target table must be reinitialized.', 16, 1)
RETURN
END --set the currentVersion
SELECT@CurrentVersion = CHANGE_TRACKING_CURRENT_VERSION() /*=======================================================================
Begin synchronizing the source and target. Dynamic sql is used
to generate the column list and to create the joins. Primary Keys are used in all joins.
========================================================================*/
--begin synchronizing all data between the last_synch_version and currentVersion.
--Querybuilder
DECLARE @sql VARCHAR(MAX)
DECLARE @crlf NCHAR(2) = NCHAR(13)+NCHAR(10) --formatting /*==========================
INSERTS
===========================*/BEGIN --initialize the @sql variable
SET @sql = NULL
PRINT 'Processing inserts'

SELECT @sql = 'INSERT INTO ' + @TargetTable +'( ' + @crlf
SELECT @sql = @sql + t.name + ', ' + @crlf FROM sys.columns t WHERE t.object_id = object_id(@targetTable)

--remove the trailing comma in the string and add closing parentheses
SELECT @sql = LEFT(@sql, LEN(@sql) - CHARINDEX(',', REVERSE(@sql))) + ')' + @crlf



SELECT @sql = @sql + 'SELECT '

SELECT @sql = @sql + 'source.' + t.name + ', ' + @crlf FROM sys.columns t WHERE t.object_id = object_id(@targetTable)

--remove the trailing comma in the string
SELECT @sql = LEFT(@sql, LEN(@sql) - CHARINDEX(',', REVERSE(@sql))) + @crlf

SELECT @sql = @sql + 'FROM ' + @SourceTable + ' source' + @crlf

--join to the change tracking table and get inserts
SELECT @sql = @sql + ' INNER JOIN CHANGETABLE(CHANGES ' + @SourceTable + ', ' + @last_sync_version + ') ct' + @crlf + 'ON '

--get PK columns

SELECT @sql = @sql + 'source.' + COL_NAME(ic.OBJECT_ID,ic.column_id) + ' = ct.' + COL_NAME(ic.OBJECT_ID,ic.column_id) + ' AND ' + @crlf
FROM sys.indexes AS i
INNER JOIN sys.index_columns AS ic
ON i.OBJECT_ID = ic.OBJECT_ID
AND i.index_id = ic.index_id
WHERE i.is_primary_key = 1 AND ic.object_id = object_id(@SourceTable)

--remove the trailing "AND" in the string
SELECT @sql = SUBSTRING(@sql, 1, CHARINDEX('AND', RIGHT(@sql, LEN(@sql))) - 1) + @crlf

--add filters
SELECT @sql = @sql + 'WHERE ct.SYS_CHANGE_OPERATION = ''I'' AND ct.SYS_CHANGE_VERSION <=' + @CurrentVersion + @crlf
EXEC(@sql)
--print @sql

END /*==========================
UPDATES
===========================*/

BEGIN
PRINT 'Processing updates'

SELECT @sql = 'UPDATE target ' + @crlf + 'SET '
SELECT @sql = @sql + 'target.' + t.name + ' = source.' + t.name + ', ' + @crlf FROM sys.columns t WHERE t.object_id = object_id(@targetTable)

--remove the trailing comma in the string
SELECT @sql = LEFT(@sql, LEN(@sql) - CHARINDEX(',', REVERSE(@sql))) + @crlf

SELECT @sql = @sql + 'FROM ' + @targetTable + ' target' + @crlf

--join to the change tracking table and get updates
SELECT @sql = @sql + ' INNER JOIN CHANGETABLE(CHANGES ' + @SourceTable + ', ' + @last_sync_version + ') ct' + @crlf + 'ON '

--get PK columns

SELECT @sql = @sql + 'target.' + COL_NAME(ic.OBJECT_ID,ic.column_id) + ' = ct.' + COL_NAME(ic.OBJECT_ID,ic.column_id) + ' AND ' + @crlf
FROM sys.indexes AS i
INNER JOIN sys.index_columns AS ic
ON i.OBJECT_ID = ic.OBJECT_ID
AND i.index_id = ic.index_id
WHERE i.is_primary_key = 1 AND ic.object_id = object_id(@SourceTable)

--remove the trailing "AND" in the string
SELECT @sql = SUBSTRING(@sql, 1, CHARINDEX('AND', RIGHT(@sql, LEN(@sql))) - 1) + @crlf


--join to our source table on the PK

SELECT @sql = @sql + ' INNER JOIN ' + @SourceTable + ' source ON '


SELECT @sql = @sql + 'target.' + COL_NAME(ic.OBJECT_ID,ic.column_id) + ' = source.' + COL_NAME(ic.OBJECT_ID,ic.column_id) + ' AND ' + @crlf
FROM sys.indexes AS i
INNER JOIN sys.index_columns AS ic
ON i.OBJECT_ID = ic.OBJECT_ID
AND i.index_id = ic.index_id
WHERE i.is_primary_key = 1 AND ic.object_id = object_id(@SourceTable)

--remove the trailing "AND" in the string
SELECT @sql = SUBSTRING(@sql, 1, CHARINDEX('AND', RIGHT(@sql, LEN(@sql))) - 1) + @crlf

--add filters
SELECT @sql = @sql + 'WHERE ct.SYS_CHANGE_OPERATION = ''U'' AND ct.SYS_CHANGE_VERSION <=' + @CurrentVersion + @crlf EXEC(@sql)
--print @sql
END /*==========================
DELETES
===========================*/
--initialize the @sql variable
SET @sql = NULL
PRINT 'Processing deletes' BEGIN
SELECT @sql = 'DELETE target FROM ' + @TargetTable + ' target ' + @crlf

--join to the change tracking table and get deletes
SELECT @sql = @sql + ' INNER JOIN CHANGETABLE(CHANGES ' + @SourceTable + ', ' + @last_sync_version + ') ct' + @crlf + 'ON '

--get PK columns

SELECT @sql = @sql + 'target.' + COL_NAME(ic.OBJECT_ID,ic.column_id) + ' = ct.' + COL_NAME(ic.OBJECT_ID,ic.column_id) + ' AND ' + @crlf
FROM sys.indexes AS i
INNER JOIN sys.index_columns AS ic
ON i.OBJECT_ID = ic.OBJECT_ID
AND i.index_id = ic.index_id
WHERE i.is_primary_key = 1 AND ic.object_id = object_id(@TargetTable)

--remove the trailing "AND" in the string
SELECT @sql = SUBSTRING(@sql, 1, CHARINDEX('AND', RIGHT(@sql, LEN(@sql))) - 1) + @crlf

--add filters
SELECT @sql = @sql + 'WHERE ct.SYS_CHANGE_OPERATION = ''D'' AND ct.SYS_CHANGE_VERSION <=' + @CurrentVersion + @crlf
EXEC(@sql)
--print @sql

END


--Insert the @CurrentVersion value into the dba_change_tracking_synchronization table.
--get current version.
UPDATE dbo.dba_change_tracking_synchronization
SET last_sync_version = @CurrentVersion
WHERE QUOTENAME(Source_SchemaName) + '.' + QUOTENAME(Source_TableName) = @SourceTable GO

Once in place we were ready to create the new table using the new data types. Constraints were not added when the table was created since these were being checked on the existing table.

CREATE TABLE [Person].[Person_nonunicode](
[BusinessEntityID] [int] NOT NULL,
[PersonType] [char](2) NOT NULL,
[NameStyle] [dbo].[NameStyle] NOT NULL,
[Title] [varchar](8) NULL,
[FirstName] [dbo].[Name] NOT NULL,
[MiddleName] [dbo].[Name] NULL,
[LastName] [dbo].[Name] NOT NULL,
[Suffix] [varchar](10) NULL,
[EmailPromotion] [int] NOT NULL,
[AdditionalContactInfo] [xml](CONTENT [Person].[AdditionalContactInfoSchemaCollection]) NULL,
[Demographics] [xml](CONTENT [Person].[IndividualSurveySchemaCollection]) NULL,
[rowguid] [uniqueidentifier] ROWGUIDCOL NOT NULL,
[ModifiedDate] [datetime] NOT NULL,
CONSTRAINT [PK_Person_nonunicode_BusinessEntityID] PRIMARY KEY CLUSTERED
(
[BusinessEntityID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]
) ON [PRIMARY]

Now we need to populate the dba_change_tracking_synchronization table with the source and target.

INSERT dba_change_tracking_synchronization
(
Source_SchemaName
, Source_TableName
, Target_SchemaName
, Target_TableName
)
VALUES (
'Person'
,'Person'
,'Person'
,'Person_nonunicode'
)

Change tracking must first be enabled at the database level and then for each table. You can review the available options that can be used and there definition here.

ALTER DATABASE AdventureWorks2008
SET
CHANGE_TRACKING = ON
(
CHANGE_RETENTION = 7 DAYS,
AUTO_CLEANUP = ON
) --enable change tracking on the source table ALTER TABLE Person.Person ENABLE CHANGE_TRACKING

In the real world when you're dealing with tables that contain millions of rows it's not feasible to load the target table all at once due to locks placed on the table. We took an incremental approach using the create_datetime columns that existed on the tables we were refactoring. We added a column called max_load_create_datetime to the dba_change_tracking_synchronization table and selected the max create_datetime from the source table at the same time we enabled change_tracking. Using this method we knew where to stop with the initial loading of the table and began using the usp_change_tracking_synchronization proc. Here we'll load the new Person.Person_nonunicode table all at once.

INSERT person.person_nonunicode
(
BusinessEntityID
, PersonType
, NameStyle
, Title
, FirstName
, MiddleName
, LastName
, Suffix
, EmailPromotion
, AdditionalContactInfo
, Demographics
, rowguid
, ModifiedDate
) SELECT
BusinessEntityID
, PersonType
, NameStyle
, Title
, FirstName
, MiddleName
, LastName
, Suffix
, EmailPromotion
, AdditionalContactInfo
, Demographics
, rowguid
, ModifiedDate
FROM Person.Person

For our example we'll create a few deletes, inserts, and updates to the existing table and ensure our new table gets synchronized.

--due to the FK referencing person.BusinessEntity we need a new BusinessEntityID
DECLARE @tmpTable TABLE (BusinessEntitiyID INT)
DECLARE @BusinessEntityID int INSERT Person.BusinessEntity (ModifiedDate)
OUTPUT INSERTED.BusinessEntityID INTO @TmpTable
VALUES (GETDATE()) SELECT @BusinessEntityID = BusinessEntitiyID FROM @tmpTable -- do two inserts we'll delete one for the example
INSERT INTO Person.Person
(
BusinessEntityID
, PersonType
, NameStyle
, Title
, FirstName
, MiddleName
, LastName
, Suffix
, EmailPromotion
, AdditionalContactInfo
, Demographics
, ModifiedDate
)
VALUES (
@BusinessEntityID
, 'EM'
, 0
, 'ChangeTr'
, 'John'
, 'A'
, 'Smith'
, 'Jr.'
, 1
, NULL
, '<IndividualSurvey xmlns="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/IndividualSurvey">
<TotalPurchaseYTD>4.99</TotalPurchaseYTD>
<DateFirstPurchase>2004-06-12Z</DateFirstPurchase>
<BirthDate>1946-03-03Z</BirthDate>
<MaritalStatus>S</MaritalStatus>
<YearlyIncome>50001-75000</YearlyIncome>
<Gender>F</Gender>
<TotalChildren>4</TotalChildren>
<NumberChildrenAtHome>0</NumberChildrenAtHome>
<Education>Bachelors </Education>
<Occupation>Management</Occupation>
<HomeOwnerFlag>1</HomeOwnerFlag>
<NumberCarsOwned>2</NumberCarsOwned>
<CommuteDistance>2-5 Miles</CommuteDistance>
</IndividualSurvey>'
, GETDATE()
) DELETE FROM @tmpTable --clean up the tmptable. INSERT Person.BusinessEntity (ModifiedDate)
OUTPUT INSERTED.BusinessEntityID INTO @TmpTable
VALUES (GETDATE()) SELECT @BusinessEntityID = BusinessEntitiyID FROM @tmpTable INSERT INTO Person.Person
(
BusinessEntityID
, PersonType
, NameStyle
, Title
, FirstName
, MiddleName
, LastName
, Suffix
, EmailPromotion
, AdditionalContactInfo
, Demographics
, ModifiedDate
)
VALUES (
@BusinessEntityID
, 'EM'
, 0
, 'ChangeTr'
, 'John'
, 'A'
, 'Smith'
, 'Sr.'
, 1
, NULL
, '<IndividualSurvey xmlns="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/IndividualSurvey">
<TotalPurchaseYTD>4.99</TotalPurchaseYTD>
<DateFirstPurchase>2004-06-12Z</DateFirstPurchase>
<BirthDate>1902-03-03Z</BirthDate>
<MaritalStatus>S</MaritalStatus>
<YearlyIncome>50001-75000</YearlyIncome>
<Gender>F</Gender>
<TotalChildren>4</TotalChildren>
<NumberChildrenAtHome>0</NumberChildrenAtHome>
<Education>Bachelors </Education>
<Occupation>Management</Occupation>
<HomeOwnerFlag>1</HomeOwnerFlag>
<NumberCarsOwned>2</NumberCarsOwned>
<CommuteDistance>2-5 Miles</CommuteDistance>
</IndividualSurvey>'
, GETDATE()
) --update UPDATE person.person
SET ModifiedDate = GETDATE()
WHERE BusinessEntityID >=1 and BusinessEntityID <= 10 --delete a row
DELETE FROM person.Person WHERE BusinessEntityID = @BusinessEntityID

Run the usp_change_tracking_synchronization proc.

EXEC usp_change_tracking_synchronization @sourceTable = 'person.person'

Once complete we can review the target table and ensure the data matches. This can easily be done using RedGate's Data Compare tool or other methods. We had this procedure scheduled to run every 15 minutes over a 2 week period without seeing a performance impact. The final steps to this process were to

  1. Drop all Foreign Keys that reference Person.Person.
  2. Rename Person.Person to Person.Person_old and Person.Person_nonunicode to Person.Person.
  3. Add all constraints to the new table and all referencing Foreign Keys. These were added with the "no check" option and then enabled. This avoided a lengthy recheck of all keys.
  4. Update the dba_change_tracking_synchronization table with the new name of the old table and flip the source and target tables. This allows for one final synchronization catching any rows that may have been modified prior to the rename.
  5. Execute usp_change_tracking_synchronization specifying "person.person_old" as the source table.
  6. After this final synch change tracking can be disabled. The old table can also be dropped.

 

I'd be interested to know how others are using change tracking and / or approaching code deployments while minimizing downtime. Thanks!

 

Rate

4.33 (9)

You rated this post out of 5. Change rating

Share

Share

Rate

4.33 (9)

You rated this post out of 5. Change rating