Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Chris's SQL Blog

SQL Server Consultant and Exceptional DBA Awards 2011 Finalist with extensive experience in the financial, health and retail sectors. I’m extremely strong in all areas of SQL Server and specialise in Security but really get my kicks from System Optimisation and Performance Tuning. Being a DBA is a thankless task at times but sadly I do enjoy it. Any spare time I have (when not playing rugby, football, cricket, in the gym or spending time with the family) I’m trying to help out others via social network sites such as the #sqlhelp hastag on Twitter (follow me @billysassage),www.sqlservercentral.com (ChrisTaylor) or by attending and contributing to the UK SQL User Groups (www.sqlserverfaq.com) held in Manchester and Leeds.

SSIS SCD vs MERGE Statement – Performance Comparison

I wouldn’t class myself as an expert in SSIS but I certainly know my way around but came across something today which I thought I’d share. As with a lot of things there are “many ways to skin a cat”, none of which is something I’ll go into at the moment but what i will concentrate on is updating columns in a table where the data has changed in the source.

One of the projects I’m currently working on requires this very process and when i set about doing so I created the T-SQL Merge statement to do the business. However, the question was raised as to why I didn’t use SSIS’s built in component Slowly Changing Dimension (SCD)? I didn’t really have an answer other than personal preference but decided to delve into it a bit further and compare the performance of each method.

As a test, I created a source table with a Key and Name column:

USE TempDB;

IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'dbo.iSource') AND type in (N'U'))
	DROP TABLE dbo.iSource;

CREATE TABLE dbo.iSource
(
   ID INT,
   Name varchar(100)
);

IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'dbo.iTarget') AND type in (N'U'))
	DROP TABLE dbo.iTarget;
	
CREATE TABLE dbo.iTarget
(
   ID INT,
   Name varchar(100)
);

and populated it with some dummy data:

INSERT INTO dbo.iSource (ID,Name)
SELECT TOP 10000
ROW_NUMBER() OVER (ORDER BY t.object_id) AS rownumber
,'Name_'+convert(varchar(4),ROW_NUMBER() OVER (ORDER BY t.object_id))
FROM sys.tables t
CROSS JOIN sys.stats s;

INSERT INTO dbo.iTarget (ID,Name)
SELECT TOP 10000 
ROW_NUMBER() OVER (ORDER BY t.object_id DESC) AS rownumber --Done in descending order
,'Name_'+convert(varchar(4),ROW_NUMBER() OVER (ORDER BY t.object_id))
FROM sys.tables t
CROSS JOIN sys.stats s;

SELECT ID, Name FROM iSource;
SELECT ID, Name FROM iTarget;

So we now have a source and target table with different Names and we’ll look to update the iTarget table with the information coming from iSource.

Method 1 – MERGE Statement

MERGE dbo.iTarget AS target
	USING (
	SELECT ID, Name
	FROM dbo.iSource
	 ) AS  source (ID, Name)
		ON (target.ID = source.ID)
		WHEN MATCHED AND target.Name <> source.Name 
		THEN 
			UPDATE SET Name = source.Name
	 WHEN NOT MATCHED THEN 
		 INSERT (ID, Name)
		 VALUES (source.ID, source.Name); 

Using this method simply in SSMS for simplicity, profiler output 2 rows for Batch Starting and Batch Completing, CPUTime of 125ms and Duration of 125ms and it updated 6678 records. Top stuff, as expected.

Method 2 – SSIS SCD Component
I rebuilt the tables to put them back to where we started and set about creating the same thing in SCD setting ID as the business key and Name as the changing attribute and not setting inferred members, below is a screen dump of the outcome of this:

BEFORE:

I clear down the profiler and run the ssis package and the outcome is quite astounding.

DURING/AFTER:

The profiler output 13456 rows including 6678 rows of queries like this:

exec sp_executesql N'SELECT [ID], [Name] FROM [dbo].[iTarget] WHERE ([ID]=@P1)',N'@P1 int',8

as well as 6678 rows of queries similar to this:

exec sp_execute 1,'Name_3304',3304

Total Duration of 37 seconds (yes that’s seconds not ms!!)…….and this is on a table of only ~7k rows!

Well I’ll be damned, the SCD basically runs a cursor looping each record checking for a match on ID and updating that record if so. I can’t actually believe that MS have built a component which performs in this way.

So, to answer the question asked ” why I didn’t use SSIS’s built in component Slowly Changing Dimension (SCD)?”, I now have a definitive answer, it doesn’t perform!

I’m sure SCD has its place but for me, the requirements and the datasets I’m working on I think I’ll stick with MERGE for now….. :)

NOTE: This was done on SQL Server 2008R2 Developer Edition running on Windows 7 Ultimate, not sure if SQL Server 2012 has improved the SCD performance but I’ll leave that for another day.


Comments

Leave a comment on the original post [chrisjarrintaylor.co.uk, opens in a new window]

Loading comments...