Blog Post

Handling Type II Dimension with the MERGE Statement

,

Microsoft attempted to make our lives easier with the Slowly Changing Dimension task in SSIS.  However, as we all know the task has some major performance issues.  So they introduced the MERGE statement.  When I initially started using the MERGE statement I considered it as a great solution to handle Type I dimensions, but I did not consider it as a solution for Type II.  Recently, I was chatting with Brian Knight and he said that the MERGE statement could handle Type II dimensions.  After spending some time with Books Online I realized that you could SELECT the OUTPUT data from the MERGE Statement.  This set up the perfect mechanism for handling a Type II dimension.  This is how I did it.

Running the examples requires the AdventureWorks2008R2 database.  To get started with the explanation run the following script:

USE tempdb
GO
IF(OBJECT_ID('dbo.Product')) IS NOT NULL
    DROP TABLE dbo.Product
GO
CREATE TABLE dbo.Product
(
    ProductID int identity(1,1),
    ProductNumber varchar(50),
    ProductName varchar(100),
    Color varchar(30),
    ListPrice decimal(12,2)
)
GO
 
INSERT INTO dbo.Product
SELECT 
    ProductNumber,
    Name,
    Color,
    ListPrice
FROM AdventureWorks2008R2.Production.Product
 
GO
 
IF(OBJECT_ID('dbo.DimProduct')) IS NOT NULL
    DROP TABLE dbo.DimProduct
GO
CREATE TABLE dbo.DimProduct
(
    ProductSK int identity(1,1),
    ProductAK varchar(50),
    ProductName varchar(100),
    Color varchar(30),
    ListPrice decimal(12,2),
    EffectiveDate datetime,
    ExpirationDate datetime
)
GO
 
INSERT INTO dbo.DimProduct(ProductAK, ProductName, Color, ListPrice, EffectiveDate)
SELECT 
    ProductNumber,
    ProductName,
    Color,
    ListPrice,
    '1/1/1900'
FROM dbo.Product
GO

 

 

The above scripts creates an operational Product table and initially loads the data into the DimProduct table.  Now let's assume that ListPrice is a Type II attribute.  In other words, if the ListPrice changes, the current effective row should be expired and a new row should be inserted into the dimension as the effective row.  This new row will contain the changed data.  To simulate a Type II change run the following script:

UPDATE dbo.Product
SET 
    ListPrice = 5.00
WHERE
    ProductNumber IN ('AR-5381','BA-8327')
GO

 

The script will change the ListPrice for two products in the operational table.  After the updates have been executed, run the following script:

   1:  INSERT INTO dbo.DimProduct(ProductAK, ListPrice, Color, ProductName, EffectiveDate)
   2:  SELECT ProductNumber, ListPrice, Color, ProductName, EffectiveDate
   3:  FROM 
   4:  (
   5:      MERGE dbo.DimProduct dp
   6:      USING dbo.Product p
   7:          ON dp.ProductAK = p.ProductNumber
   8:      WHEN NOT MATCHED THEN
   9:          INSERT (ProductAK, ProductName, Color, ListPrice, EffectiveDate)
  10:          VALUES (p.ProductNumber, p.ProductName, p.Color, p.ListPrice, '1/1/1900')
  11:      WHEN MATCHED 
  12:          AND ExpirationDate IS NULL 
  13:          AND (dp.ListPrice <> p.ListPrice) THEN
  14:          UPDATE
  15:              SET
  16:                  dp.ExpirationDate = convert(datetime, getdate(), 101)
  17:      OUTPUT $Action MergeAction, p.ProductNumber, p.ListPrice, p.Color, p.ProductName, convert(datetime, getdate(), 101) EffectiveDate
  18:  ) MergeOutput
  19:  WHERE
  20:      MergeAction = 'Update';

Let's start the explanation in the inside of the query.  On line 12 the current effective row is identified.  You may need to change this depending on your Type II dimension design patterns.  Your dimensions may use a specific date as the expiration date and you may also include a ActiveFlag bit column.  Then on line 13 I verify that the column specified as a Type II attribute has actually changed.  Finally, on line 16 the row is expired.  Now you may be thinking how are you going to add the new current row for the corresponding expired rows.  That was the same thing I was thinking.

On line 17 I output any of the Inserts or Updates.  In addition, I added a column that acts as the Effective Date.  The thing that I did not know was that you can SELECT the OUTPUT data from the MERGE statement and INSERT it into a table.  This is accomplished in lines 1 and 2.  Essentially, I derived a table from the OUTPUT of the MERGE statement.  Lastly, I included a WHERE clause to ensure that I only inserted the data that was UPDATED.  That pretty much did it.  To verify the changes run the following query:

SELECT *
FROM dbo.DimProduct
WHERE
    ProductAK IN ('AR-5381','BA-8327')

 

Talk to you soon,

Patrick LeBlanc, SQL Server MPV, MCTS

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating