Slowly changing dimensions using T-SQL MERGE

Since the introduction of data warehousing, handling the way that dimension attributes can change over time has become a subject of much discussion. During this time, many technical solutions to the problem of how to handle this changing data have evolved. SQL Server 2008 has added an elegant tool to the array of answers to this challenge, with the T-SQL MERGE statement, which can handle most - if not all - of the types of Slowly Changing Dimension with elegance and minimal complexity.
So the purpose of this article is threefold:

To give a brief overview of the four main types of SCD (Slowly Changing Dimension).
To remind you of the main ways to use the T-SQL MERGE statement.
To look specifically at the OUTPUT clause, and to show how judicious use of this particular function will allow you to maintain SCDs types 2 and 4 with a relatively few lines of code.

I realise that SCD types are, in parts, an unfinished subject, and that not everyone agrees on what they are, what they should be or when to use them. I have no axe to grind on this subject, and am merely attempting to show how to handle fairly standard definitions of these Data Warehousing techniques easily using T-SQL.

Having said that, the use of MERGE here is not restricted to Data Warehousing dimensions. The techniques described below can be used in any circumstances where you need to track data changes from a source table to a second (or even a third) table. For instance, when you need to:

apply UPSERTS to a data table (SCD Type 1);
maintain historical records in a single table (SCD Type 2);
track data changes using a denormalised table (SCD Type 3);
store historical data in a second, "history" table (SCD Type 4);

then you can use the techniques described in this article, irrespective of whether you are in a Data Warehousing environment or not. One thing to note is that I will not be looking at DELETEs using MERGE, despite the fact that records in the destination table can be deleted if they are no longer present in the source table. This is because it is extremely rare to delete dimension data, as it is nearly always needed to ensure the coherence of the data warehouse.

For all of the four types that I will look at, the source data which must be loaded into a dimension table has the same format. It is deliberately simple, to allow me to focus on the process, rather than the minutiae of dimensional modelling. So I am going to presume that we have a "Client" table, which has to be loaded into a relational table "Dim_Client" in the same database. I realise that, in reality, you will probably be loading data from multiple source tables and applying complex data transformation to your source data - and in all probability loading it into a staging database. However, as the real world is only an extension of the techniques that I will look at here, I will keep the model simple, and let you adapt it to your specific circumstances. I am presuming that this "Client" dimension will then be loaded into SSAS - but that is outside the scope of this article.

The source table is:

CREATE TABLE [dbo].[Client](
 [ID] [int] IDENTITY(1,1) NOT NULL,
 [ClientName] [varchar](150) NULL,
 [Country] [varchar](50) NULL,
 [Town] [varchar](50) NULL,
 [County] [varchar](50) NULL,
 [Address1] [varchar](50) NULL,
 [Address2] [varchar](50) NULL,
 [ClientType] [varchar](20) NULL,
 [ClientSize] [varchar](10) NULL,
 CONSTRAINT [PK_Client] PRIMARY KEY CLUSTERED 
(
 [ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

And here is some data to populate it:

SET IDENTITY_INSERT [dbo].[Client] ON
INSERT [dbo].[Client] ([ID], [ClientName], [Country], [Town], [County], [Address1], [Address2], [ClientType], [ClientSize]) VALUES (1, N'John Smith', N'UK', N'Uttoxeter', N'Staffs', N'4, Grove Drive', NULL, N'Private', N'M')
INSERT [dbo].[Client] ([ID], [ClientName], [Country], [Town], [County], [Address1], [Address2], [ClientType], [ClientSize]) VALUES (2, N'Bauhaus Motors', N'UK', N'Oxford', N'Oxon', N'Suite 27', N'12-14 Turl Street', N'Business', N'S')
INSERT [dbo].[Client] ([ID], [ClientName], [Country], [Town], [County], [Address1], [Address2], [ClientType], [ClientSize]) VALUES (7, N'Honest Fred', N'UK', N'Stoke', N'Staffs', NULL, NULL, N'Business', N'S')
INSERT [dbo].[Client] ([ID], [ClientName], [Country], [Town], [County], [Address1], [Address2], [ClientType], [ClientSize]) VALUES (8, N'Fast Eddie', N'Wales', N'Cardiff', NULL, NULL, NULL, N'Business', N'L')
INSERT [dbo].[Client] ([ID], [ClientName], [Country], [Town], [County], [Address1], [Address2], [ClientType], [ClientSize]) VALUES (9, N'Slow Sid', N'France', N'Avignon', N'Vaucluse', N'2, Rue des Courtisans', NULL, N'Private', N'M')
SET IDENTITY_INSERT [dbo].[Client] OFF

With this table in place, we can proceed to use the MERGE statement to handle four types of slowly changing dimension.

In the examples below am sticking to Data Warehousing orthodoxy (well, at least according to one school of thought...) and ensuring that the key from the source data remains as a "business key", and the dimension table has its own surrogate key. This is not only more conventional, it also makes processing the data into SSAS easier, in my experience.

Type 1 SCD - Update changed attributes and disregard older attributes

A type 1 slowly changing dimension is by far the easiest to handle, as it consists of a simple in-place update of existing data, with no attempt to track the evolution of the changes. There are many voices which affirm that this is not really a "changing dimension" at all, but whatever the precise definitions, I will go with the flow and use it as a starting point for basic inserts and updates (or "Upserts" to use the conventional term).

First, we need a "destination" table, and here it is:

CREATE TABLE [dbo].[Client_SCD1](
 [ClientID] [int] IDENTITY(1,1) NOT NULL,
 [BusinessKey] [int] NOT NULL,
 [ClientName] [varchar](150) NULL,
 [Country] [varchar](50) NULL,
 [Town] [varchar](50) NULL,
 [County] [varchar](50) NULL,
 [Address1] [varchar](50) NULL,
 [Address2] [varchar](50) NULL,
 [ClientType] [varchar](20) NULL,
 [ClientSize] [varchar](10) NULL
)

Inserting and updating data is as simple as the following piece of T-SQL:

MERGE dbo.Client_SCD1 AS DST
USING CarSales.dbo.Client AS SRC
ON (SRC.ID = DST.BusinessKey)
WHEN NOT MATCHED THEN
INSERT (BusinessKey, ClientName, Country, Town, County, Address1, Address2, ClientType, ClientSize)
VALUES (SRC.ID, SRC.ClientName, SRC.Country, SRC.Town, SRC.County, Address1, Address2, ClientType, ClientSize)
WHEN MATCHED 
AND (
 ISNULL(DST.ClientName,'') <> ISNULL(SRC.ClientName,'') 
 OR ISNULL(DST.Country,'') <> ISNULL(SRC.Country,'') 
 OR ISNULL(DST.Town,'') <> ISNULL(SRC.Town,'')
 OR ISNULL(DST.Address1,'') <> ISNULL(SRC.Address1,'')
 OR ISNULL(DST.Address2,'') <> ISNULL(SRC.Address2,'')
 OR ISNULL(DST.ClientType,'') <> ISNULL(SRC.ClientType,'')
 OR ISNULL(DST.ClientSize,'') <> ISNULL(SRC.ClientSize,'')
 )
THEN UPDATE 
SET 
 DST.ClientName = SRC.ClientName 
 ,DST.Country = SRC.Country 
 ,DST.Town = SRC.Town
 ,DST.Address1 = SRC.Address1
 ,DST.Address2 = SRC.Address2
 ,DST.ClientType = SRC.ClientType
 ,DST.ClientSize = SRC.ClientSize
;

It is important that you ensure that all the columns used to compare data in the "WHEN MATCHED ... AND" clause are also present in the "THEN UPDATE SET" clause. Otherwise data will get painfully out of 'sync.

If you are used to the MERGE statement, then this is probably second nature. If you are not, then just let me take the opportunity of explaining how the MERGE statement works, here. After all, one of the objectives of this article is to get up to speed on the MERGE statement.

First, you have a source table (SRC), and a destination table (DST). The MERGE statement MERGEs into the destination, USING the source to get its data. As the two tables need a mapping join, there is a field - or fields - using which they can link. This is provided by the ON clause. All the other columns are the attributes of the dimension which we are monitoring for eventual changes. The MERGE statement than carries out a series of operations. Here only two are used:

WHEN NOT MATCHED
WHEN MATCHED

The former simply says "if there is no corresponding record in the destination table, then" - and it INSERTs the new data.

The second looks for data that maps on the "join" field and detects any differences in the other columns. If there are any, the fields are updated. To test this, all you need to do is to change a non-key value in the "client" table. The corresponding field in the "Client_SCD1" table should be updated with the new value once you run the code. Do not be deceived by the SSMS message which will say "0 row(s) affected" - this is normal for the MERGE statement.

If we gloss over the other major aspects of the MERGE statement for the moment, then it really can be this simple to keep a dimension table up to date with its source data.

Type 2 SCD - Keep a history of previous attributes as separate records, indicate the dates they were valid, and flag the currently valid record

The next - and to many, the "classic" - slowly changing dimension is the type 2. This approach will add a new record to the dimension table every time that the source data changes. To ensure that the multiple records can be used appropriately, the surrogate key will change every time new data is added, while the business key will remain the same. As is usual for SCD Type 2, three metadata (or "tracking") fields will be added:

ValidFrom - to indicate the date that the record can be used,
ValidTo - the date that it was ceased being valid.
IsCurrent - to flag which of the dimension records is the current one.

An SCD type 2 table, based on the same source data as that used in the previous example, is:

CREATE TABLE [dbo].[Client_SCD2](
 [ClientID] [int] IDENTITY(1,1) NOT NULL,
 [BusinessKey] [int] NOT NULL,
 [ClientName] [varchar](150) NULL,
 [Country] [varchar](50) NULL,
 [Town] [varchar](50) NULL,
 [County] [varchar](50) NULL,
 [Address1] [varchar](50) NULL,
 [Address2] [varchar](50) NULL,
 [ClientType] [varchar](20) NULL,
 [ClientSize] [varchar](10) NULL,
 ValidFrom INT NULL,
 ValidTo INT NULL,
 IsCurrent BIT NULL
) ON [PRIMARY]

As you can see, there are three extra fields - ValidFrom, ValidTo and IsCurrent. Again, following data warehousing orthodoxy, I am going to use the INT data type for the validity dates, as this data type seems more efficient when building and processing data warehouses with SSAS.

With the type 2 dimension table created, the SQL snippet given below will map the two tables on the business key and carry out the three following operations:

Insert a new record into the destination table if the record referenced by the business key if the key is not already present (WHEN NOT MATCHED), as well as adding an auto-incremented surrogate key and setting today's date as the ValidFrom date.
Insert a new record into the destination table if any of the attribute fields are not identical between the source and destination tables (WHEN MATCHED AND ...), as well as adding an auto-incremented surrogate key and setting today's date as the ValidFrom date and flagging this record as the current record.
Updating the previous valid record for this business key. This involves setting the current record flag to False and setting yesterday's date as the ValidTo date

-- Define the dates used in validity - assume whole 24 hour cycles
DECLARE @Yesterday INT = (YEAR(DATEADD(dd,-1,GETDATE())) * 10000) + (MONTH(DATEADD(dd,-1,GETDATE())) * 100) + DAY(DATEADD(dd,-1,GETDATE()))
DECLARE @Today INT = (YEAR(GETDATE()) * 10000) + (MONTH(GETDATE()) * 100) + DAY(GETDATE())
-- Outer insert - the updated records are added to the SCD2 table
INSERT INTO dbo.Client_SCD2 (BusinessKey, ClientName, Country, Town, Address1, Address2, ClientType, ClientSize, ValidFrom, IsCurrent)
SELECT ID, ClientName, Country, Town, Address1, Address2, ClientType, ClientSize, @Today, 1
FROM
(
-- Merge statement
MERGE INTO dbo.Client_SCD2 AS DST
USING CarSales.dbo.Client AS SRC
ON (SRC.ID = DST.BusinessKey)
-- New records inserted
WHEN NOT MATCHED THEN 
INSERT (BusinessKey, ClientName, Country, Town, County, Address1, Address2, ClientType, ClientSize, ValidFrom, IsCurrent)
VALUES (SRC.ID, SRC.ClientName, SRC.Country, SRC.Town, SRC.County, Address1, Address2, ClientType, ClientSize, @Today, 1)
-- Existing records updated if data changes
WHEN MATCHED 
AND IsCurrent = 1
AND (
 ISNULL(DST.ClientName,'') <> ISNULL(SRC.ClientName,'') 
 OR ISNULL(DST.Country,'') <> ISNULL(SRC.Country,'') 
 OR ISNULL(DST.Town,'') <> ISNULL(SRC.Town,'')
 OR ISNULL(DST.Address1,'') <> ISNULL(SRC.Address1,'')
 OR ISNULL(DST.Address2,'') <> ISNULL(SRC.Address2,'')
 OR ISNULL(DST.ClientType,'') <> ISNULL(SRC.ClientType,'')
 OR ISNULL(DST.ClientSize,'') <> ISNULL(SRC.ClientSize,'')
 )
-- Update statement for a changed dimension record, to flag as no longer active
THEN UPDATE 
SET DST.IsCurrent = 0, DST.ValidTo = @Yesterday
OUTPUT SRC.ID, SRC.ClientName, SRC.Country, SRC.Town, SRC.Address1, SRC.Address2, SRC.ClientType, SRC.ClientSize, $Action AS MergeAction
) AS MRG
WHERE MRG.MergeAction = 'UPDATE'
;

Let me explain what is happening here.

As we will need to add - or update - validity dates, these are set as variables at the start of the process.
New records are INSERTed, just like before.
Any changed records (providing that they are the current valid record for a client) are UPDATEd, so that their IsCurrent and ValidTo fields are set to indicate that the record is now no longer current, and is historical.
Where data was updated, it is now INSERTed into the dimension table based on the data in the OUTPUT table. This is detected by using the MERGE statement as an inline query which returns the data defined as the OUTPUT - and filtered on the data which was UPDATEd - which is isolated using the $Action field which is part of the OUTPUT clause.

Any perceived complexity is probably just that - a matter of perception, probably due to the fact that the code cannot be read in a linear fashion from top to bottom. If it makes things easier, then start by looking at the MERGE statement, then proceed on to the OUTPUT clause and finally to the outer INSERT .. INTO statement. This will hopefully help you to understand the process logic which is involved.

The important thing to note is that the MERGE statement can track all the data which is changed, and the type of operation which caused the change. This includes the "before" and "after" state of each item of data. For the moment, the source data (isolated using the SRC alias) was all we needed for a type 2 SCD. You may not need to check for data differences on all columns. If certain columns contain data which is not considered an essential attribute, then do not use it in the "WHEN MATCHED ... AND" clause.

To test this, all you need to do is to change a non-key value in the "client" table. A new record in the "Client_SCD2" table should be added with the new value - and a new surrogate ID and VAlidFrom date once you run the code. The previous record should be updated and flagged as no longer valid.

Rather than use the OUTPUT clause to return data from the MERGE command to a session-scoped temporary table, you can insert the data into a table variable, as part of the MERGE statement. This is shown below in the type 4 SCD. In the context of a type 2 SCD, it is easier to avoid a table variable as it would mean data duplication. As it is, the data from the MERGE statement can be filtered and poured directly into the destination table without transiting via a table variable. Also, I find that temporary tables can be more efficient to use for larger data sets as they use statistics and can be indexed. Consequently I prefer to use them in ETL processes, unless I can be reasonably sure that there will only be a few hundred records output at most.

Were this a real world process, you should also wrap the code block in a transaction, to ensure that the entire process can be rolled back in the case of failure, and that there is no risk of current and historical records getting out of 'sync.

Type 3 SCD - Keep a history of previous attributes in other columns along with validity date

A type 3 SCD will, like a type 2 SCD, store current and historical data in the same table. A type 3 is, put simply, a denormalised take on data tracking, and will need a duplicate column for every data column for which you want to store a previous piece of data. Then you will need to decide how many past pieces of data you wish to store - and will have to create as many columns as there will be previous data. Oh, and you will need a "ValidTo" date for each historical column. This can rapidly make for very wide and unwieldy tables. So here I will only show how to store two previous elements of data, and only for one data field. The dimension table will look like this:

CREATE TABLE [dbo].[Client_SCD3](
 [ClientID] [int] IDENTITY(1,1) NOT NULL,
 [BusinessKey] [int] NOT NULL,
 [ClientName] [varchar](150) NULL,
 [Country] [varchar](50) NULL,
 [Country_Prev1] [varchar](50) NULL,
 [Country_Prev1_ValidTo] [char] (8) NULL,
 [Country_Prev2] [varchar](50) NULL,
 [Country_Prev2_ValidTo] [char] (8) NULL,
)

The SQL snippet given below will map the two tables on the business key and

Insert a new record if one does not exist for the business key.

If the business key exists:

Move the previous value for the Country to the "Previous - 2", and add the date at which the move occurred.
Move the value for the Country to the "Previous - 1", and add the date at which the move occurred.
Add the latest Country to the "Country" column.

DECLARE @Yesterday VARCHAR(8) = CAST(YEAR(DATEADD(dd,-1,GETDATE())) AS CHAR(4)) + RIGHT('0' + CAST(MONTH(DATEADD(dd,-1,GETDATE())) AS VARCHAR(2)),2) + RIGHT('0' + CAST(DAY(DATEADD(dd,-1,GETDATE())) AS VARCHAR(2)),2)
MERGE dbo.Client_SCD3 AS DST
USING CarSales.dbo.Client AS SRC
ON (SRC.ID = DST.BusinessKey)
WHEN NOT MATCHED THEN
INSERT (BusinessKey, ClientName, Country)
VALUES (SRC.ID, SRC.ClientName, SRC.Country)
WHEN MATCHED 
AND (DST.Country <> SRC.Country
 OR DST.ClientName <> SRC.ClientName)
THEN UPDATE 
SET DST.Country = SRC.Country
 ,DST.ClientName = SRC.ClientName
 ,DST.Country_Prev1 = DST.Country
 ,DST.Country_Prev1_ValidTo = @Yesterday
 ,DST.Country_Prev2 = DST.Country_Prev1
 ,DST.Country_Prev2_ValidTo = DST.Country_Prev1_ValidTo
;

So how does this work? Extremely simply is the answer, as everything can be carried out as part of a simple MERGE statement. New records are inserted, and any records to be updated are handled as a set-based operation as part of the WHEN MATCHED ... THEN UPDATE clause. This example does not need an IsCurrent flag, as the column "Town" is, by definition, the current version, and the "_Prev(n)" fields are the historical data.

To test this, all you need to do is to change a country value in the "client" table. You should see the latest value for the country as the "country" field of the "Client_SCD3" table, and the previous value as the "Country_Prev1". The corresponding date(s) should also be updated.

Purely by way of a change, and to provide a variation on a theme, I am setting the date as a VARCHAR(8) field here. You can, of course, use an INT as I the previous example for SCD type 2.

It is, technically, possible to extend this process to include the DDL which actually creates the "_Prev" columns for a Type 3 SCD. However in practice this will slow the process down unbearably, so I will only mention it in passing, and certainly not advise that it be used in a production environment.

Type 4 SCD - Keep the currently valid record in the dimension table, and a history of previous attributes in a separate table along with he dates they were valid

The final variation on this theme that I will show you here, is a type 4 SCD. This is, basically, a type 1 table with a separate history table for the previous versions of the data. So when any attributes change, the current version is removed from the "main" table and added to the "History" table, and a new record containing the latest attributes is added to the "main" table.

Here we will need two tables. The "current" dimension table is identical to the type 1 table described above. However, you will also need a second table to hold the historical data, and two metadata columns: ValidFrom and ValidTo. In our example, it will look like this:

CREATE TABLE [dbo].[Client_SCD4_History]
(
 [HistoryID] [int] IDENTITY(1,1) NOT NULL,
 [BusinessKey] [int] NOT NULL,
 [ClientName] [varchar](150) NULL,
 [Country] [varchar](50) NULL,
 [Town] [varchar](50) NULL,
 [County] [varchar](50) NULL,
 [Address1] [varchar](50) NULL,
 [Address2] [varchar](50) NULL,
 [ClientType] [varchar](20) NULL,
 [ClientSize] [varchar](10) NULL,
 [ValidFrom] [int] NULL,
 [ValidTo] [int] NULL
)

So now you need the code to maintain these two tables. The SQL snippet given below will map the source and main destination tables on the business key and

Insert a new record if one does not exist for the business key.

if one exists for the business key:

Move the old record to the historical table.
Add a new record to the main destination table.

DECLARE @Yesterday INT = (YEAR(DATEADD(dd,-1,GETDATE())) * 10000) + (MONTH(DATEADD(dd,-1,GETDATE())) * 100) + DAY(DATEADD(dd,-1,GETDATE()))
DECLARE @Today INT = (YEAR(GETDATE()) * 10000) + (MONTH(GETDATE()) * 100) + DAY(GETDATE())
DECLARE @Client_SCD4 TABLE
(
 [BusinessKey] [int] NULL,
 [ClientName] [varchar](150) NULL,
 [Country] [varchar](50) NULL,
 [Town] [varchar](50) NULL,
 [County] [varchar](50) NULL,
 [Address1] [varchar](50) NULL,
 [Address2] [varchar](50) NULL,
 [ClientType] [varchar](20) NULL,
 [ClientSize] [varchar](10) NULL,
 [MergeAction] [varchar](10) NULL
) 

-- Merge statement
MERGE dbo.Client_SCD1 AS DST
USING CarSales.dbo.Client AS SRC
ON (SRC.ID = DST.BusinessKey)
WHEN NOT MATCHED THEN
INSERT (BusinessKey, ClientName, Country, Town, Address1, Address2, ClientType, ClientSize)
VALUES (SRC.ID, SRC.ClientName, SRC.Country, SRC.Town, SRC.Address1, SRC.Address2, SRC.ClientType, SRC.ClientSize)
WHEN MATCHED 
AND 
 ISNULL(DST.ClientName,'') <> ISNULL(SRC.ClientName,'') 
 OR ISNULL(DST.Country,'') <> ISNULL(SRC.Country,'') 
 OR ISNULL(DST.Town,'') <> ISNULL(SRC.Town,'')
 OR ISNULL(DST.Address1,'') <> ISNULL(SRC.Address1,'')
 OR ISNULL(DST.Address2,'') <> ISNULL(SRC.Address2,'')
 OR ISNULL(DST.ClientType,'') <> ISNULL(SRC.ClientType,'')
 OR ISNULL(DST.ClientSize,'') <> ISNULL(SRC.ClientSize,'')
THEN UPDATE 
SET 
 DST.ClientName = SRC.ClientName 
 ,DST.Country = SRC.Country 
 ,DST.Town = SRC.Town
 ,DST.Address1 = SRC.Address1
 ,DST.Address2 = SRC.Address2
 ,DST.ClientType = SRC.ClientType
 ,DST.ClientSize = SRC.ClientSize
OUTPUT DELETED.BusinessKey, DELETED.ClientName, DELETED.Country, DELETED.Town, DELETED.Address1, DELETED.Address2, DELETED.ClientType, DELETED.ClientSize, $Action AS MergeAction
INTO @Client_SCD4 (BusinessKey, ClientName, Country, Town, Address1, Address2, ClientType, ClientSize, MergeAction)
;
-- Update history table to set final date and current flag
UPDATE TP4
SET TP4.ValidTo = @Yesterday
FROM dbo.Client_SCD4_History TP4
 INNER JOIN @Client_SCD4 TMP
 ON TP4.BusinessKey = TMP.BusinessKey
WHERE TP4.ValidTo IS NULL
-- Add latest history records to history table
INSERT INTO dbo.Client_SCD4_History (BusinessKey, ClientName, Country, Town, Address1, Address2, ClientType, ClientSize, ValidTo)
SELECT BusinessKey, ClientName, Country, Town, Address1, Address2, ClientType, ClientSize, @Yesterday 
FROM @Client_SCD4

Let me explain what is happening here.

As we will need to add - or update - validity dates, these are set as variables at the start of the process.
A table variable is created to hold the results of the OUTPUT clause of the MERGE statement. This is an alternative to a temporary table, to show you that both can be used.
New records are INSERTed, just like before.
Any changed records are UPDATEd, so that their attributes now reflect the current source data.
All the "old" data which was updated in the dimension table, and the MERGE action (insert, update or delete) that was carried out for each piece of data is OUTPUT into the temporary table.
The history table is updated so that the previous valid record for each client now has a "ValidTo" date.
Where data was updated, it is now INSERTed into the historical dimension table based on the data in the OUTPUT temporary table.

This approach is probably easier to understand than the type 2 SCD technique described above, as it is more linear, and can be read top to bottom, as it progresses from MERGE through to the upsert of the historical table. What the code does is to carry out the INSERT and UPDATE as before for new and existing records, and then it selects the UPDATEd records (funneled via the OUTPUT clause) as a separate INSERT. This way the latest modifications to the source data become a new record - handled as a separate INSERT - and any required metadata, such as the ValidFrom date is added at this stage. This approach is a variation on the previous theme, except that it uses a table variable.

Note the use of the DELETED table to get the previous value of data from the table "Client_SCD4", rather than the current value which is returned by default. There is also an INSERTED table, which you can use to get the data which was INSERTed. It is worth noting that each UPDATE will place data in both the DELETED table and the INSERTED table. The former contains the old data which was there before it was updated. The latter contains the new data.

To test this, all you need to do is to change a non-key value in the "client" table. The latest value should appear in the "Client_SCD1" table, and the previous value (with all relevant metadata) be moved to the "Client_SCD4_History" table.

A real-world implementation of this approach will certainly need to be wrapped in a transaction to ensure that data integrity is maintained.

That is it. You can now - I hope - use T-SQL MERGE to handle slowly changing dimensions, be they type 1, type 2, type 3 or type 4. Equally hopefully, you can see how the MERGE statement, especially when allied to the OUTPUT clause can solve a number of data management problems which otherwise can be somewhat complex.
One thing that I have not dealt with here is the use of WHEN NOT MATCHED BY SOURCE ... DELETE use of the MERGE statement. I can only refer you to BOL (http://technet.microsoft.com/en-us/library/bb510625.aspx) for this, and many other subtleties of the amazingly powerful tool that is the MERGE statement.

Slowly changing dimensions using T-SQL MERGE

Type 1 SCD - Update changed attributes and disregard older attributes

Type 2 SCD - Keep a history of previous attributes as separate records, indicate the dates they were valid, and flag the currently valid record

Type 3 SCD - Keep a history of previous attributes in other columns along with validity date

Type 4 SCD - Keep the currently valid record in the dimension table, and a history of previous attributes in a separate table along with he dates they were valid

Resources

Rate

Share

Categories

Share

Rate

Slowly changing dimensions using T-SQL MERGE

Type 1 SCD - Update changed attributes and disregard older attributes

Type 2 SCD - Keep a history of previous attributes as separate records, indicate the dates they were valid, and flag the currently valid record

Type 3 SCD - Keep a history of previous attributes in other columns along with validity date

Type 4 SCD - Keep the currently valid record in the dimension table, and a history of previous attributes in a separate table along with he dates they were valid

Resources

Rate

Share

Categories

Share

Rate

Related content

SQL Server MERGE statement usage and examples

Using the MERGE Statement to Perform an UPSERT

Performance of the SQL MERGE vs. INSERT/UPDATE

Managing Slowly Changing Dimension with MERGE Statement in SQL Server

A Hazard of Using the SQL Merge Statement