
This is a blog that I am writing for future me and hopefully it’ll help a few of you save some time too! It’s not often that I get to build out a data warehouse from scratch, but when I do, I want to make sure I do it well with best practices in place. Because this is not something I do a lot of, I frequently forget lessons I’ve learned and have to go back and drop tables to recreate them in the best way before it’s too late. One table type that is vital to do right the first time is a Slowly Changing Dimension Type 2 (SCD2 for short).
Edits have been made (thank you Johnny Winter for the feedback!) to address key limitations in the Fabric data warehouse versus any other SQL database (including Fabric SQL database). You’ll notice two sections of code, one will be specifically for Fabric data warehouse and one will be specifically for other SQL databases.
Key differences between Fabric data warehouses and SQL databases are caused by the underlying engines. You may be wondering why would I use one over the other? There are a million scenarios that belong in their own blog post, but for my client we used a warehouse because we had data landing in a lakehouse that we want to leverage and the Fabric SQL database cannot grab data from the lakehouse as part of a SP without duplicating data while a warehouse can call lakehouse data using a three part naming convention (lakehouse.schema.table).
Here’s what changes for the Fabric Warehouse scripts in particular:
- No IDENITITY_INSERT. The surrogate id is a plain BIGINT generated by MAX(id) + ROW_NUMBER(), which also lets us keep the literal -1 unknown row.
- No CREATE INDEX of any kind. The filtered unique index and the temp-table clustered index are gone. A PRIMARY KEY NONCLUSTERED NOT ENFORCED on id is added purely as an optimizer hint.
- No computed columns. hash_key and hash_diff are real columns, populated explicitly on every insert.
- The filtered unique index used to fail the load if the source produced two current versions of one identity. NOT ENFORCED constraints do not enforce anything, so that safety net is gone. A staging dedup step now guarantees one current row per identity.
A SCD2 is a dimension, or lookup, table that soft deletes old records instead of overwriting, modifying, or fully removing records that no longer exist. They can be useful for a number of circumstances, but the primary reason is for time traveling. For example, let’s say that my sales data contains a customer key, but I want to know where a customer lived when they bought the item, not where they live now. In that case, I would need a record in my customer table that contains both their current and past addresses. The example below is for GL accounts, but it can be used with anything.
Skip to the end of this blog for the full script including the table creation.
In the example below, I am tracking any changes to the descriptions of a given account. For example, if GL code 40 changes from INVENTORY to GYM INVENTORY on 6/10/2026 12:00 PM, then any records before that minute in time will return INVENTORY as the GL code description while everything on or after will return GYM INVENTORY.
There are a couple of columns you need to do a proper SCD2 – an effective start date, an effective end date, and an is current flag. You can use different names than the ones I’ve chosen, but the pattern should remain. The start and end dates don’t need a time component if you only load daily. I prefer to use hash keys rather than comparing column by column, because predicates like ISNULL(a, ”) = ISNULL(b, ”) are non-SARGable and get slow, since it takes longer to find matches and identify new, updated, or deleted rows. A hash gives you an easy way to identify a row by its contents even when it has no natural unique id. Without one, you end up writing long predicates checking whether every column matches between source and target. With a hash it is a simple = or <>. One caveat for Fabric Warehouses – SARGability is a b-tree seek concept (feel free to dig into this on your own, such a cool internals thing to learn about!), and Warehouses in Fabric have no indexes to seek. But comparing a single hash is still cheaper than comparing many wide columns.
The downside to using a hash key is that if a column is added, you’ll need to manually rerun all the hashes to include the new column so you don’t end up with a huge change on the day the new column was added. Small price to pay for the daily speed in my opinion, but it could be a valid concern if columns change frequently enough.
Creating the SCD2 Table
Why the || between each field? The || is a character combo that doesn’t appear in the wild very often and it’s critical to have a clear separation between fields so the hash is truly unique. It’s a small thing that makes a huge difference to make sure you don’t end up modifying records unintentionally.
I’ve also created all the columns to be in alphabetical order for future user sanity with the standard fields at the end (effective start/end and the hashes). There’s also an index on this table for the current records since I’m anticipating most queries will only care about the current account descriptions.
Why two hash columns? The first hash gives us what I think of as Type 1 fields, that is fields that if they change actually denote a new record entirely, not a modification on existing record. The hash diff is built off Type 2 fields, fields that when they change I want the old version soft deleted and the new one set to current.
ONLY DO THIS ONCE. If you drop the table after facts have started flowing in, be aware that it will be very hard to rebuild since you will have deleted the ids that the fact table is trying to join to. If you need to make adjustments to this dimension, put it in a temp table and temporarily suspend the identity insert so you can preserve the original ids of the table.
You can use IDENTITY in Microsoft Fabric warehouses now, but it does contain significant restrictions as it is in preview. If you want to use this preview feature, please refer to this Microsoft Learn doc for the latest information on the limitations: IDENTITY Columns in Fabric Data Warehouse – Microsoft Fabric | Microsoft Learn.
One caveat to the data types in the create table, you may notice I’m using DATETIME2(6) instead of just DATETIME. The reason is that I’ve been working in Microsoft Fabric data warehouses for a while and DATETIME is not supported (Data Types in Fabric Data Warehouse – Microsoft Fabric | Microsoft Learn). If you’re using an on-prem SQL Server, feel free to use your preferred datetime data type.
/*--------------------------------------------------------FOR SQL DATABASES, WILL NOT WORK ON FABRIC DATA WAREHOUSES--------------------------------------------------------*/DROP TABLE IF EXISTS dim.Account;CREATE TABLE dim.Account ( id INT IDENTITY(1,1) NOT NULL ,[4444] VARCHAR(10) NULL ,[4444_desc] VARCHAR(550) NULL ,aa VARCHAR(10) NULL ,aa_desc VARCHAR(550) NULL ,fund VARCHAR(10) NULL ,gl VARCHAR(10) NULL ,gl_desc VARCHAR(550) NULL ,llll VARCHAR(10) NULL ,llll_desc VARCHAR(550) NULL ,o VARCHAR(10) NULL ,obbb VARCHAR(10) NULL ,obbb_desc VARCHAR(550) NULL ,pp VARCHAR(10) NULL ,ppss VARCHAR(10) NULL ,pprr_desc VARCHAR(550) NULL ,s VARCHAR(10) NULL ,s_desc VARCHAR(550) NULL ,effective_start_datetime DATETIME NOT NULL ,effective_end_datetime DATETIME NOT NULL ,is_current BIT NOT NULL -- identity hash: the 10 business key columns ,hash_key AS CONVERT(VARBINARY(32), HASHBYTES('SHA2_256', CONCAT( ISNULL([4444], ''), '||' ,ISNULL(aa, ''), '||' ,ISNULL(fund, ''), '||' ,ISNULL(gl, ''), '||' ,ISNULL(llll, ''), '||' ,ISNULL(o, ''), '||' ,ISNULL(obbb, ''), '||' ,ISNULL(pp, ''), '||' ,ISNULL(ppss, ''), '||' ,ISNULL(s, '')))) PERSISTED -- change hash: the 7 tracked description columns ,hash_diff AS CONVERT(VARBINARY(32), HASHBYTES('SHA2_256', CONCAT( ISNULL([4444_desc], ''), '||' ,ISNULL(aa_desc, ''), '||' ,ISNULL(gl_desc, ''), '||' ,ISNULL(llll_desc, ''), '||' ,ISNULL(obbb_desc, ''), '||' ,ISNULL(pprr_desc, ''), '||' ,ISNULL(s_desc, '')))) PERSISTED);-- One current row per identity. Also fails the load loudly if the-- source ever produces two current versions of the same key.-- INCLUDE (hash_diff) covers the Step 3 existence check.CREATE UNIQUE INDEX UX_Account_current ON dim.Account (hash_key) INCLUDE (hash_diff) WHERE is_current = 1;
/*--------------------------------------------------------FOR FABRIC DATA WAREHOUSES--------------------------------------------------------*/DROP TABLE IF EXISTS dim.Account; CREATE TABLE dim.Account ( id BIGINT NOT NULL ,[4444] VARCHAR(10) NULL ,[4444_desc] VARCHAR(550) NULL ,aa VARCHAR(10) NULL ,aa_desc VARCHAR(550) NULL ,fund VARCHAR(10) NULL ,gl VARCHAR(10) NULL ,gl_desc VARCHAR(550) NULL ,llll VARCHAR(10) NULL ,llll_desc VARCHAR(550) NULL ,o VARCHAR(10) NULL ,obbb VARCHAR(10) NULL ,obbb_desc VARCHAR(550) NULL ,pp VARCHAR(10) NULL ,ppss VARCHAR(10) NULL ,pprr_desc VARCHAR(550) NULL ,s VARCHAR(10) NULL ,s_desc VARCHAR(550) NULL ,effective_start_datetime DATETIME2(6) NOT NULL ,effective_end_datetime DATETIME2(6) NOT NULL ,is_current BIT NOT NULL ,hash_key VARBINARY(32) NOT NULL -- identity columns (the 10 codes) ,hash_diff VARBINARY(32) NOT NULL -- tracked description columns (the 7 descs)); -- Optimizer hint only. NOT ENFORCED means the engine will NOT-- reject duplicate ids, so the load logic must keep id unique.-- id genuinely is unique here, so this hint is safe to declare.-- Note: we do NOT declare a unique constraint on hash_key, because-- by design multiple versions of one identity share a hash_key.ALTER TABLE dim.Account ADD CONSTRAINT PK_Account PRIMARY KEY NONCLUSTERED (id) NOT ENFORCED;Create Default “NULL” Row
This is optional, but highly encouraged. Having a NULL row allows your fact table to never have a NULL value in the surrogate key for dim account. Pretty handy for reporting purposes to have a default “Unknown Account” show up instead of an empty record or, heaven forbid, the record get removed automatically when you group on a field from the account dimension. This only needs to be done once and is not included in the stored proc because it never needs to be updated.
/*--------------------------------------------------------FOR SQL DATABASES, WILL NOT WORK ON FABRIC DATA WAREHOUSES--------------------------------------------------------*/SET IDENTITY_INSERT dim.Account ON;INSERT INTO dim.Account ( id ,[4444] ,[4444_desc] ,aa ,aa_desc ,fund ,gl ,gl_desc ,llll ,llll_desc ,o ,obbb ,obbb_desc ,pp ,ppss ,pprr_desc ,s ,s_desc ,effective_start_datetime ,effective_end_datetime ,is_current)VALUES ( -1 ,'Unknown' --you can be more descriptive with these unknowns if you want ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'1900-01-01' ,'9999-12-31 23:59:59' ,1);SET IDENTITY_INSERT dim.Account OFF;
/*--------------------------------------------------------FOR FABRIC DATA WAREHOUSES--------------------------------------------------------*/INSERT INTO dim.Account ( id ,[4444] ,[4444_desc] ,aa ,aa_desc ,fund ,gl ,gl_desc ,llll ,llll_desc ,o ,obbb ,obbb_desc ,pp ,ppss ,pprr_desc ,s ,s_desc ,effective_start_datetime ,effective_end_datetime ,is_current ,hash_key ,hash_diff)SELECT -1 ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'1900-01-01' ,'9999-12-31 23:59:59.999999' ,1 ,CONVERT(VARBINARY(32), HASHBYTES('SHA2_256', CONCAT( 'Unknown', '||', 'Unknown', '||', 'Unknown', '||', 'Unknown', '||', 'Unknown' ,'||', 'Unknown', '||', 'Unknown', '||', 'Unknown', '||', 'Unknown', '||', 'Unknown'))) ,CONVERT(VARBINARY(32), HASHBYTES('SHA2_256', CONCAT( 'Unknown', '||', 'Unknown', '||', 'Unknown', '||', 'Unknown' ,'||', 'Unknown', '||', 'Unknown', '||', 'Unknown')));Create Loading Stored Procedure
This is where the real work happens. Keep in mind, this stored procedure will need to be called at some point by an orchestrator. Orchestrators can be a data pipeline, a SQL trigger, a SQL agent job, a notebook, etc. Anything that can call EXEC dim.usp_LoadAccount will work. Be sure that this comes BEFORE you create your fact tables. Fact tables will need to know what id to write into the surrogate key for your dimension.
This stored procedure does a few things – creates the source table, expires (aka soft deletes) records changed in the source, insert new and updated records, and expire the records that don’t exist at all in source anymore.
General notes about this stored procedure:
- The source temporary table does contain an index on it in the example below since that source temp table is called 3 times throughout the procedure.
- Everything is wrapped in transactions and try catches so nothing fails silently, and clear failure messages will bubble up to our orchestrator.
/*--------------------------------------------------------FOR SQL DATABASES, WILL NOT WORK ON FABRIC DATA WAREHOUSES--------------------------------------------------------*/CREATE OR ALTER PROCEDURE dim.usp_Load_AccountASBEGIN DECLARE @now DATETIME = SYSDATETIME(); DECLARE @end DATETIME = DATEADD(MINUTE, -1, @now); -- expired rows end 1 min before successor starts DECLARE @future DATETIME = '9999-12-31 23:59:59';/*------------------------------------------------------------Step 1. Stage distinct source rows (outside the transaction to keep the write transaction short).------------------------------------------------------------*/ DROP TABLE IF EXISTS #src; ;WITH src_cast AS ( SELECT DISTINCT CAST([4444] AS VARCHAR(10)) AS [4444] ,CAST([4444 Desc] AS VARCHAR(550)) AS [4444_desc] ,CAST([AA] AS VARCHAR(10)) AS aa ,CAST([AA Desc] AS VARCHAR(550)) AS aa_desc ,CAST([Fund] AS VARCHAR(10)) AS fund ,CAST([GL] AS VARCHAR(10)) AS gl ,CAST([GL Desc] AS VARCHAR(550)) AS gl_desc ,CAST([LLLL] AS VARCHAR(10)) AS llll ,CAST([LLLL Desc] AS VARCHAR(550)) AS llll_desc ,CAST([O] AS VARCHAR(10)) AS o ,CAST([OBBB] AS VARCHAR(10)) AS obbb ,CAST([OBBB Desc] AS VARCHAR(550)) AS obbb_desc ,CAST([PP] AS VARCHAR(10)) AS pp ,CAST([PPSS] AS VARCHAR(10)) AS ppss ,CAST([PPRR Desc] AS VARCHAR(550)) AS pprr_desc ,CAST([S] AS VARCHAR(10)) AS s ,CAST([S Desc] AS VARCHAR(550)) AS s_desc FROM Database.schema.[RawTable] ) SELECT [4444] ,[4444_desc] ,aa ,aa_desc ,fund ,gl ,gl_desc ,llll ,llll_desc ,o ,obbb ,obbb_desc ,pp ,ppss ,pprr_desc ,s ,s_desc -- expressions MUST match dim.Account's computed columns exactly ,CONVERT(VARBINARY(32), HASHBYTES('SHA2_256', CONCAT( ISNULL([4444], ''), '||' ,ISNULL(aa, ''), '||' ,ISNULL(fund, ''), '||' ,ISNULL(gl, ''), '||' ,ISNULL(llll, ''), '||' ,ISNULL(o, ''), '||' ,ISNULL(obbb, ''), '||' ,ISNULL(pp, ''), '||' ,ISNULL(ppss, ''), '||' ,ISNULL(s, '')))) AS hash_key ,CONVERT(VARBINARY(32), HASHBYTES('SHA2_256', CONCAT( ISNULL([4444_desc], ''), '||' ,ISNULL(aa_desc, ''), '||' ,ISNULL(gl_desc, ''), '||' ,ISNULL(llll_desc, ''), '||' ,ISNULL(obbb_desc, ''), '||' ,ISNULL(pprr_desc, ''), '||' ,ISNULL(s_desc, '')))) AS hash_diff INTO #src FROM src_cast; -- index the staged rows: read 3 times on hash_key below CREATE CLUSTERED INDEX CIX_src ON #src (hash_key);/*------------------------------------------------------------Steps 2 to 4 mutate the dimension. Run them atomically so a failure can never leave a key with zero current rows.------------------------------------------------------------*/ BEGIN TRY BEGIN TRANSACTION; /*-------------------------------------------------------- Step 2. Expire current rows whose identity still exists in the source but whose description set changed. --------------------------------------------------------*/ UPDATE d SET effective_end_datetime = @end ,is_current = 0 FROM dim.Account d INNER JOIN #src s ON s.hash_key = d.hash_key WHERE d.is_current = 1 AND d.id <> -1 AND s.hash_diff <> d.hash_diff; /*-------------------------------------------------------- Step 3. Insert brand new identities and new versions of the rows just expired in Step 2. A current row matching on both hashes is unchanged and skipped. --------------------------------------------------------*/ INSERT INTO dim.Account ( [4444] ,[4444_desc] ,aa ,aa_desc ,fund ,gl ,gl_desc ,llll ,llll_desc ,o ,obbb ,obbb_desc ,pp ,ppss ,pprr_desc ,s ,s_desc ,effective_start_datetime ,effective_end_datetime ,is_current ) SELECT s.[4444] ,s.[4444_desc] ,s.aa ,s.aa_desc ,s.fund ,s.gl ,s.gl_desc ,s.llll ,s.llll_desc ,s.o ,s.obbb ,s.obbb_desc ,s.pp ,s.ppss ,s.pprr_desc ,s.s ,s.s_desc ,@now ,@future ,1 FROM #src s WHERE NOT EXISTS ( SELECT 1 FROM dim.Account d WHERE d.is_current = 1 AND d.id <> -1 AND d.hash_key = s.hash_key AND d.hash_diff = s.hash_diff ); /*-------------------------------------------------------- Step 4. Expire current rows whose identity has dropped out of the source entirely. --------------------------------------------------------*/ UPDATE d SET effective_end_datetime = @end ,is_current = 0 FROM dim.Account d WHERE d.is_current = 1 AND d.id <> -1 AND NOT EXISTS ( SELECT 1 FROM #src s WHERE s.hash_key = d.hash_key ); COMMIT TRANSACTION; END TRY BEGIN CATCH IF @@TRANCOUNT > 0 ROLLBACK TRANSACTION; THROW; END CATCH; DROP TABLE IF EXISTS #src;END;
/*--------------------------------------------------------FOR FABRIC DATA WAREHOUSES--------------------------------------------------------*/CREATE OR ALTER PROCEDURE dim.usp_Load_AccountASBEGIN DECLARE @now DATETIME2(6) = SYSDATETIME(); DECLARE @end DATETIME2(6) = DATEADD(MINUTE, -1, SYSDATETIME()); -- expired rows end 1 min before successor starts DECLARE @future DATETIME2(6) = '9999-12-31 23:59:59.999999'; /*------------------------------------------------------------ Step 1. Stage distinct source rows, hash them, and dedup to one row per identity (hash_key). COLLATE DATABASE_DEFAULT normalizes lakehouse collation so the hashes are byte comparable. The dedup replaces the safety the filtered unique index used to provide, since NOT ENFORCED constraints cannot. ------------------------------------------------------------*/ DROP TABLE IF EXISTS #src; ;WITH src_cast AS ( SELECT DISTINCT CAST([4444] AS VARCHAR(10)) COLLATE DATABASE_DEFAULT AS [4444] ,CAST([4444 Desc] AS VARCHAR(550)) COLLATE DATABASE_DEFAULT AS [4444_desc] ,CAST([AA] AS VARCHAR(10)) COLLATE DATABASE_DEFAULT AS aa ,CAST([AA Desc] AS VARCHAR(550)) COLLATE DATABASE_DEFAULT AS aa_desc ,CAST([Fund] AS VARCHAR(10)) COLLATE DATABASE_DEFAULT AS fund ,CAST([GL] AS VARCHAR(10)) COLLATE DATABASE_DEFAULT AS gl ,CAST([GL Desc] AS VARCHAR(550)) COLLATE DATABASE_DEFAULT AS gl_desc ,CAST([LLLL] AS VARCHAR(10)) COLLATE DATABASE_DEFAULT AS llll ,CAST([LLLL Desc] AS VARCHAR(550)) COLLATE DATABASE_DEFAULT AS llll_desc ,CAST([O] AS VARCHAR(10)) COLLATE DATABASE_DEFAULT AS o ,CAST([OBBB] AS VARCHAR(10)) COLLATE DATABASE_DEFAULT AS obbb ,CAST([OBBB Desc] AS VARCHAR(550)) COLLATE DATABASE_DEFAULT AS obbb_desc ,CAST([PP] AS VARCHAR(10)) COLLATE DATABASE_DEFAULT AS pp ,CAST([PPSS] AS VARCHAR(10)) COLLATE DATABASE_DEFAULT AS ppss ,CAST([PPRR Desc] AS VARCHAR(550)) COLLATE DATABASE_DEFAULT AS pprr_desc ,CAST([S] AS VARCHAR(10)) COLLATE DATABASE_DEFAULT AS s ,CAST([S Desc] AS VARCHAR(550)) COLLATE DATABASE_DEFAULT AS s_desc FROM LakehouseName.dbo.[RawTable] ) ,src_hashed AS ( SELECT [4444] ,[4444_desc] ,aa ,aa_desc ,fund ,gl ,gl_desc ,llll ,llll_desc ,o ,obbb ,obbb_desc ,pp ,ppss ,pprr_desc ,s ,s_desc ,CONVERT(VARBINARY(32), HASHBYTES('SHA2_256', CONCAT( ISNULL([4444], ''), '||' ,ISNULL(aa, ''), '||' ,ISNULL(fund, ''), '||' ,ISNULL(gl, ''), '||' ,ISNULL(llll, ''), '||' ,ISNULL(o, ''), '||' ,ISNULL(obbb, ''), '||' ,ISNULL(pp, ''), '||' ,ISNULL(ppss, ''), '||' ,ISNULL(s, '')))) AS hash_key ,CONVERT(VARBINARY(32), HASHBYTES('SHA2_256', CONCAT( ISNULL([4444_desc], ''), '||' ,ISNULL(aa_desc, ''), '||' ,ISNULL(gl_desc, ''), '||' ,ISNULL(llll_desc, ''), '||' ,ISNULL(obbb_desc, ''), '||' ,ISNULL(pprr_desc, ''), '||' ,ISNULL(s_desc, '')))) AS hash_diff FROM src_cast ) SELECT [4444] ,[4444_desc] ,aa ,aa_desc ,fund ,gl ,gl_desc ,llll ,llll_desc ,o ,obbb ,obbb_desc ,pp ,ppss ,pprr_desc ,s ,s_desc ,hash_key ,hash_diff INTO #src FROM ( SELECT * -- one winner per identity; ORDER BY makes the choice deterministic ,ROW_NUMBER() OVER (PARTITION BY hash_key ORDER BY hash_diff) AS rn FROM src_hashed ) z WHERE rn = 1; /*------------------------------------------------------------ Steps 2 to 4 mutate the dimension. Run them atomically so a failure can never leave a key with zero current rows. ------------------------------------------------------------*/ BEGIN TRY BEGIN TRANSACTION; /*-------------------------------------------------------- Step 2. Expire current rows whose identity still exists in the source but whose description set changed. --------------------------------------------------------*/ UPDATE d SET effective_end_datetime = @end ,is_current = 0 FROM dim.Account d INNER JOIN #src s ON s.hash_key = d.hash_key WHERE d.is_current = 1 AND d.id <> -1 AND s.hash_diff <> d.hash_diff; /*-------------------------------------------------------- Step 3. Insert brand new identities and new versions of the rows just expired in Step 2. A current row matching on both hashes is unchanged and skipped. New surrogate ids are MAX(existing positive id) + ROW_NUMBER(). Read the high-water mark inside the transaction. NOTE: this is concurrency safe only for a single serialized load. Two overlapping loads could read the same max and assign overlapping ids. This should not be an issue in batch loading, but if you are doing streaming or multiple inserts at a time, you should use a different ID process. I am not using IDENTITY here because I still want that default -1 ID option. --------------------------------------------------------*/ DECLARE @maxid BIGINT = (SELECT ISNULL(MAX(id), 0) FROM dim.Account WHERE id > 0); INSERT INTO dim.Account ( id ,[4444] ,[4444_desc] ,aa ,aa_desc ,fund ,gl ,gl_desc ,llll ,llll_desc ,o ,obbb ,obbb_desc ,pp ,ppss ,pprr_desc ,s ,s_desc ,effective_start_datetime ,effective_end_datetime ,is_current ,hash_key ,hash_diff ) SELECT @maxid + ROW_NUMBER() OVER (ORDER BY s.hash_key, s.hash_diff) ,s.[4444] ,s.[4444_desc] ,s.aa ,s.aa_desc ,s.fund ,s.gl ,s.gl_desc ,s.llll ,s.llll_desc ,s.o ,s.obbb ,s.obbb_desc ,s.pp ,s.ppss ,s.pprr_desc ,s.s ,s.s_desc ,@now ,@future ,1 ,s.hash_key ,s.hash_diff FROM #src s WHERE NOT EXISTS ( SELECT 1 FROM dim.Account d WHERE d.is_current = 1 AND d.id <> -1 AND d.hash_key = s.hash_key AND d.hash_diff = s.hash_diff ); /*-------------------------------------------------------- Step 4. Expire current rows whose identity has dropped out of the source entirely. --------------------------------------------------------*/ UPDATE d SET effective_end_datetime = @end ,is_current = 0 FROM dim.Account d WHERE d.is_current = 1 AND d.id <> -1 AND NOT EXISTS ( SELECT 1 FROM #src s WHERE s.hash_key = d.hash_key ); COMMIT TRANSACTION; END TRY BEGIN CATCH IF @@TRANCOUNT > 0 ROLLBACK TRANSACTION; THROW; END CATCH; DROP TABLE IF EXISTS #src; END;Full SQL Script (Including Create Table)
/*--------------------------------------------------------FOR SQL DATABASES, WILL NOT WORK ON FABRIC DATA WAREHOUSES--------------------------------------------------------*//*================================================================dim.Account -- Type 2 SCD Setup + Load Procedure * hash_key : identity columns (defines "same account") * hash_diff : tracked description columns (drives versioning) * filtered UNIQUE index guarantees one current row per account * #src is indexed after load * load runs inside a single transaction with rollback * expired rows end one minute before the successor starts================================================================---*/DROP TABLE IF EXISTS dim.Account;CREATE TABLE dim.Account ( id INT IDENTITY(1,1) NOT NULL ,[4444] VARCHAR(10) NULL ,[4444_desc] VARCHAR(550) NULL ,aa VARCHAR(10) NULL ,aa_desc VARCHAR(550) NULL ,fund VARCHAR(10) NULL ,gl VARCHAR(10) NULL ,gl_desc VARCHAR(550) NULL ,llll VARCHAR(10) NULL ,llll_desc VARCHAR(550) NULL ,o VARCHAR(10) NULL ,obbb VARCHAR(10) NULL ,obbb_desc VARCHAR(550) NULL ,pp VARCHAR(10) NULL ,ppss VARCHAR(10) NULL ,pprr_desc VARCHAR(550) NULL ,s VARCHAR(10) NULL ,s_desc VARCHAR(550) NULL ,effective_start_datetime DATETIME NOT NULL ,effective_end_datetime DATETIME NOT NULL ,is_current BIT NOT NULL -- identity hash: the 10 business key columns ,hash_key AS CONVERT(VARBINARY(32), HASHBYTES('SHA2_256', CONCAT( ISNULL([4444], ''), '||' ,ISNULL(aa, ''), '||' ,ISNULL(fund, ''), '||' ,ISNULL(gl, ''), '||' ,ISNULL(llll, ''), '||' ,ISNULL(o, ''), '||' ,ISNULL(obbb, ''), '||' ,ISNULL(pp, ''), '||' ,ISNULL(ppss, ''), '||' ,ISNULL(s, '')))) PERSISTED -- change hash: the 7 tracked description columns ,hash_diff AS CONVERT(VARBINARY(32), HASHBYTES('SHA2_256', CONCAT( ISNULL([4444_desc], ''), '||' ,ISNULL(aa_desc, ''), '||' ,ISNULL(gl_desc, ''), '||' ,ISNULL(llll_desc, ''), '||' ,ISNULL(obbb_desc, ''), '||' ,ISNULL(pprr_desc, ''), '||' ,ISNULL(s_desc, '')))) PERSISTED);-- One current row per identity. Also fails the load loudly if the-- source ever produces two current versions of the same key.-- INCLUDE (hash_diff) covers the Step 3 existence check.CREATE UNIQUE INDEX UX_Account_current ON dim.Account (hash_key) INCLUDE (hash_diff) WHERE is_current = 1;SET IDENTITY_INSERT dim.Account ON;INSERT INTO dim.Account ( id ,[4444] ,[4444_desc] ,aa ,aa_desc ,fund ,gl ,gl_desc ,llll ,llll_desc ,o ,obbb ,obbb_desc ,pp ,ppss ,pprr_desc ,s ,s_desc ,effective_start_datetime ,effective_end_datetime ,is_current)VALUES ( -1 ,'Unknown' --you can be more descriptive with these unknowns if you want ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'1900-01-01' ,'9999-12-31 23:59:59' ,1);SET IDENTITY_INSERT dim.Account OFF;CREATE OR ALTER PROCEDURE dim.usp_Load_AccountASBEGIN DECLARE @now DATETIME = SYSDATETIME(); DECLARE @end DATETIME = DATEADD(MINUTE, -1, @now); -- expired rows end 1 min before successor starts DECLARE @future DATETIME = '9999-12-31 23:59:59';/*------------------------------------------------------------Step 1. Stage distinct source rows (outside the transaction to keep the write transaction short).------------------------------------------------------------*/ DROP TABLE IF EXISTS #src; ;WITH src_cast AS ( SELECT DISTINCT CAST([4444] AS VARCHAR(10)) AS [4444] ,CAST([4444 Desc] AS VARCHAR(550)) AS [4444_desc] ,CAST([AA] AS VARCHAR(10)) AS aa ,CAST([AA Desc] AS VARCHAR(550)) AS aa_desc ,CAST([Fund] AS VARCHAR(10)) AS fund ,CAST([GL] AS VARCHAR(10)) AS gl ,CAST([GL Desc] AS VARCHAR(550)) AS gl_desc ,CAST([LLLL] AS VARCHAR(10)) AS llll ,CAST([LLLL Desc] AS VARCHAR(550)) AS llll_desc ,CAST([O] AS VARCHAR(10)) AS o ,CAST([OBBB] AS VARCHAR(10)) AS obbb ,CAST([OBBB Desc] AS VARCHAR(550)) AS obbb_desc ,CAST([PP] AS VARCHAR(10)) AS pp ,CAST([PPSS] AS VARCHAR(10)) AS ppss ,CAST([PPRR Desc] AS VARCHAR(550)) AS pprr_desc ,CAST([S] AS VARCHAR(10)) AS s ,CAST([S Desc] AS VARCHAR(550)) AS s_desc FROM Database.schema.[RawTable] ) SELECT [4444] ,[4444_desc] ,aa ,aa_desc ,fund ,gl ,gl_desc ,llll ,llll_desc ,o ,obbb ,obbb_desc ,pp ,ppss ,pprr_desc ,s ,s_desc -- expressions MUST match dim.Account's computed columns exactly ,CONVERT(VARBINARY(32), HASHBYTES('SHA2_256', CONCAT( ISNULL([4444], ''), '||' ,ISNULL(aa, ''), '||' ,ISNULL(fund, ''), '||' ,ISNULL(gl, ''), '||' ,ISNULL(llll, ''), '||' ,ISNULL(o, ''), '||' ,ISNULL(obbb, ''), '||' ,ISNULL(pp, ''), '||' ,ISNULL(ppss, ''), '||' ,ISNULL(s, '')))) AS hash_key ,CONVERT(VARBINARY(32), HASHBYTES('SHA2_256', CONCAT( ISNULL([4444_desc], ''), '||' ,ISNULL(aa_desc, ''), '||' ,ISNULL(gl_desc, ''), '||' ,ISNULL(llll_desc, ''), '||' ,ISNULL(obbb_desc, ''), '||' ,ISNULL(pprr_desc, ''), '||' ,ISNULL(s_desc, '')))) AS hash_diff INTO #src FROM src_cast; -- index the staged rows: read 3 times on hash_key below CREATE CLUSTERED INDEX CIX_src ON #src (hash_key);/*------------------------------------------------------------Steps 2 to 4 mutate the dimension. Run them atomically so a failure can never leave a key with zero current rows.------------------------------------------------------------*/ BEGIN TRY BEGIN TRANSACTION; /*-------------------------------------------------------- Step 2. Expire current rows whose identity still exists in the source but whose description set changed. --------------------------------------------------------*/ UPDATE d SET effective_end_datetime = @end ,is_current = 0 FROM dim.Account d INNER JOIN #src s ON s.hash_key = d.hash_key WHERE d.is_current = 1 AND d.id <> -1 AND s.hash_diff <> d.hash_diff; /*-------------------------------------------------------- Step 3. Insert brand new identities and new versions of the rows just expired in Step 2. A current row matching on both hashes is unchanged and skipped. --------------------------------------------------------*/ INSERT INTO dim.Account ( [4444] ,[4444_desc] ,aa ,aa_desc ,fund ,gl ,gl_desc ,llll ,llll_desc ,o ,obbb ,obbb_desc ,pp ,ppss ,pprr_desc ,s ,s_desc ,effective_start_datetime ,effective_end_datetime ,is_current ) SELECT s.[4444] ,s.[4444_desc] ,s.aa ,s.aa_desc ,s.fund ,s.gl ,s.gl_desc ,s.llll ,s.llll_desc ,s.o ,s.obbb ,s.obbb_desc ,s.pp ,s.ppss ,s.pprr_desc ,s.s ,s.s_desc ,@now ,@future ,1 FROM #src s WHERE NOT EXISTS ( SELECT 1 FROM dim.Account d WHERE d.is_current = 1 AND d.id <> -1 AND d.hash_key = s.hash_key AND d.hash_diff = s.hash_diff ); /*-------------------------------------------------------- Step 4. Expire current rows whose identity has dropped out of the source entirely. --------------------------------------------------------*/ UPDATE d SET effective_end_datetime = @end ,is_current = 0 FROM dim.Account d WHERE d.is_current = 1 AND d.id <> -1 AND NOT EXISTS ( SELECT 1 FROM #src s WHERE s.hash_key = d.hash_key ); COMMIT TRANSACTION; END TRY BEGIN CATCH IF @@TRANCOUNT > 0 ROLLBACK TRANSACTION; THROW; END CATCH; DROP TABLE IF EXISTS #src;END;
/*--------------------------------------------------------FOR FABRIC DATA WAREHOUSES--------------------------------------------------------*/DROP TABLE IF EXISTS dim.Account; CREATE TABLE dim.Account ( id BIGINT NOT NULL ,[4444] VARCHAR(10) NULL ,[4444_desc] VARCHAR(550) NULL ,aa VARCHAR(10) NULL ,aa_desc VARCHAR(550) NULL ,fund VARCHAR(10) NULL ,gl VARCHAR(10) NULL ,gl_desc VARCHAR(550) NULL ,llll VARCHAR(10) NULL ,llll_desc VARCHAR(550) NULL ,o VARCHAR(10) NULL ,obbb VARCHAR(10) NULL ,obbb_desc VARCHAR(550) NULL ,pp VARCHAR(10) NULL ,ppss VARCHAR(10) NULL ,pprr_desc VARCHAR(550) NULL ,s VARCHAR(10) NULL ,s_desc VARCHAR(550) NULL ,effective_start_datetime DATETIME2(6) NOT NULL ,effective_end_datetime DATETIME2(6) NOT NULL ,is_current BIT NOT NULL ,hash_key VARBINARY(32) NOT NULL -- identity columns (the 10 codes) ,hash_diff VARBINARY(32) NOT NULL -- tracked description columns (the 7 descs)); -- Optimizer hint only. NOT ENFORCED means the engine will NOT-- reject duplicate ids, so the load logic must keep id unique.-- id genuinely is unique here, so this hint is safe to declare.-- Note: we do NOT declare a unique constraint on hash_key, because-- by design multiple versions of one identity share a hash_key.ALTER TABLE dim.Account ADD CONSTRAINT PK_Account PRIMARY KEY NONCLUSTERED (id) NOT ENFORCED;INSERT INTO dim.Account ( id ,[4444] ,[4444_desc] ,aa ,aa_desc ,fund ,gl ,gl_desc ,llll ,llll_desc ,o ,obbb ,obbb_desc ,pp ,ppss ,pprr_desc ,s ,s_desc ,effective_start_datetime ,effective_end_datetime ,is_current ,hash_key ,hash_diff)SELECT -1 ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'Unknown' ,'1900-01-01' ,'9999-12-31 23:59:59.999999' ,1 ,CONVERT(VARBINARY(32), HASHBYTES('SHA2_256', CONCAT( 'Unknown', '||', 'Unknown', '||', 'Unknown', '||', 'Unknown', '||', 'Unknown' ,'||', 'Unknown', '||', 'Unknown', '||', 'Unknown', '||', 'Unknown', '||', 'Unknown'))) ,CONVERT(VARBINARY(32), HASHBYTES('SHA2_256', CONCAT( 'Unknown', '||', 'Unknown', '||', 'Unknown', '||', 'Unknown' ,'||', 'Unknown', '||', 'Unknown', '||', 'Unknown')));CREATE OR ALTER PROCEDURE dim.usp_Load_AccountASBEGIN DECLARE @now DATETIME2(6) = SYSDATETIME(); DECLARE @end DATETIME2(6) = DATEADD(MINUTE, -1, SYSDATETIME()); -- expired rows end 1 min before successor starts DECLARE @future DATETIME2(6) = '9999-12-31 23:59:59.999999'; /*------------------------------------------------------------ Step 1. Stage distinct source rows, hash them, and dedup to one row per identity (hash_key). COLLATE DATABASE_DEFAULT normalizes lakehouse collation so the hashes are byte comparable. The dedup replaces the safety the filtered unique index used to provide, since NOT ENFORCED constraints cannot. ------------------------------------------------------------*/ DROP TABLE IF EXISTS #src; ;WITH src_cast AS ( SELECT DISTINCT CAST([4444] AS VARCHAR(10)) COLLATE DATABASE_DEFAULT AS [4444] ,CAST([4444 Desc] AS VARCHAR(550)) COLLATE DATABASE_DEFAULT AS [4444_desc] ,CAST([AA] AS VARCHAR(10)) COLLATE DATABASE_DEFAULT AS aa ,CAST([AA Desc] AS VARCHAR(550)) COLLATE DATABASE_DEFAULT AS aa_desc ,CAST([Fund] AS VARCHAR(10)) COLLATE DATABASE_DEFAULT AS fund ,CAST([GL] AS VARCHAR(10)) COLLATE DATABASE_DEFAULT AS gl ,CAST([GL Desc] AS VARCHAR(550)) COLLATE DATABASE_DEFAULT AS gl_desc ,CAST([LLLL] AS VARCHAR(10)) COLLATE DATABASE_DEFAULT AS llll ,CAST([LLLL Desc] AS VARCHAR(550)) COLLATE DATABASE_DEFAULT AS llll_desc ,CAST([O] AS VARCHAR(10)) COLLATE DATABASE_DEFAULT AS o ,CAST([OBBB] AS VARCHAR(10)) COLLATE DATABASE_DEFAULT AS obbb ,CAST([OBBB Desc] AS VARCHAR(550)) COLLATE DATABASE_DEFAULT AS obbb_desc ,CAST([PP] AS VARCHAR(10)) COLLATE DATABASE_DEFAULT AS pp ,CAST([PPSS] AS VARCHAR(10)) COLLATE DATABASE_DEFAULT AS ppss ,CAST([PPRR Desc] AS VARCHAR(550)) COLLATE DATABASE_DEFAULT AS pprr_desc ,CAST([S] AS VARCHAR(10)) COLLATE DATABASE_DEFAULT AS s ,CAST([S Desc] AS VARCHAR(550)) COLLATE DATABASE_DEFAULT AS s_desc FROM LakehouseName.dbo.[RawTable] ) ,src_hashed AS ( SELECT [4444] ,[4444_desc] ,aa ,aa_desc ,fund ,gl ,gl_desc ,llll ,llll_desc ,o ,obbb ,obbb_desc ,pp ,ppss ,pprr_desc ,s ,s_desc ,CONVERT(VARBINARY(32), HASHBYTES('SHA2_256', CONCAT( ISNULL([4444], ''), '||' ,ISNULL(aa, ''), '||' ,ISNULL(fund, ''), '||' ,ISNULL(gl, ''), '||' ,ISNULL(llll, ''), '||' ,ISNULL(o, ''), '||' ,ISNULL(obbb, ''), '||' ,ISNULL(pp, ''), '||' ,ISNULL(ppss, ''), '||' ,ISNULL(s, '')))) AS hash_key ,CONVERT(VARBINARY(32), HASHBYTES('SHA2_256', CONCAT( ISNULL([4444_desc], ''), '||' ,ISNULL(aa_desc, ''), '||' ,ISNULL(gl_desc, ''), '||' ,ISNULL(llll_desc, ''), '||' ,ISNULL(obbb_desc, ''), '||' ,ISNULL(pprr_desc, ''), '||' ,ISNULL(s_desc, '')))) AS hash_diff FROM src_cast ) SELECT [4444] ,[4444_desc] ,aa ,aa_desc ,fund ,gl ,gl_desc ,llll ,llll_desc ,o ,obbb ,obbb_desc ,pp ,ppss ,pprr_desc ,s ,s_desc ,hash_key ,hash_diff INTO #src FROM ( SELECT * -- one winner per identity; ORDER BY makes the choice deterministic ,ROW_NUMBER() OVER (PARTITION BY hash_key ORDER BY hash_diff) AS rn FROM src_hashed ) z WHERE rn = 1; /*------------------------------------------------------------ Steps 2 to 4 mutate the dimension. Run them atomically so a failure can never leave a key with zero current rows. ------------------------------------------------------------*/ BEGIN TRY BEGIN TRANSACTION; /*-------------------------------------------------------- Step 2. Expire current rows whose identity still exists in the source but whose description set changed. --------------------------------------------------------*/ UPDATE d SET effective_end_datetime = @end ,is_current = 0 FROM dim.Account d INNER JOIN #src s ON s.hash_key = d.hash_key WHERE d.is_current = 1 AND d.id <> -1 AND s.hash_diff <> d.hash_diff; /*-------------------------------------------------------- Step 3. Insert brand new identities and new versions of the rows just expired in Step 2. A current row matching on both hashes is unchanged and skipped. New surrogate ids are MAX(existing positive id) + ROW_NUMBER(). Read the high-water mark inside the transaction. NOTE: this is concurrency safe only for a single serialized load. Two overlapping loads could read the same max and assign overlapping ids. This should not be an issue in batch loading, but if you are doing streaming or multiple inserts at a time, you should use a different ID process. I am not using IDENTITY here because I still want that default -1 ID option. --------------------------------------------------------*/ DECLARE @maxid BIGINT = (SELECT ISNULL(MAX(id), 0) FROM dim.Account WHERE id > 0); INSERT INTO dim.Account ( id ,[4444] ,[4444_desc] ,aa ,aa_desc ,fund ,gl ,gl_desc ,llll ,llll_desc ,o ,obbb ,obbb_desc ,pp ,ppss ,pprr_desc ,s ,s_desc ,effective_start_datetime ,effective_end_datetime ,is_current ,hash_key ,hash_diff ) SELECT @maxid + ROW_NUMBER() OVER (ORDER BY s.hash_key, s.hash_diff) ,s.[4444] ,s.[4444_desc] ,s.aa ,s.aa_desc ,s.fund ,s.gl ,s.gl_desc ,s.llll ,s.llll_desc ,s.o ,s.obbb ,s.obbb_desc ,s.pp ,s.ppss ,s.pprr_desc ,s.s ,s.s_desc ,@now ,@future ,1 ,s.hash_key ,s.hash_diff FROM #src s WHERE NOT EXISTS ( SELECT 1 FROM dim.Account d WHERE d.is_current = 1 AND d.id <> -1 AND d.hash_key = s.hash_key AND d.hash_diff = s.hash_diff ); /*-------------------------------------------------------- Step 4. Expire current rows whose identity has dropped out of the source entirely. --------------------------------------------------------*/ UPDATE d SET effective_end_datetime = @end ,is_current = 0 FROM dim.Account d WHERE d.is_current = 1 AND d.id <> -1 AND NOT EXISTS ( SELECT 1 FROM #src s WHERE s.hash_key = d.hash_key ); COMMIT TRANSACTION; END TRY BEGIN CATCH IF @@TRANCOUNT > 0 ROLLBACK TRANSACTION; THROW; END CATCH; DROP TABLE IF EXISTS #src; END;