Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««123

Slowly changing dimensions using T-SQL MERGE Expand / Collapse
Author
Message
Posted Thursday, June 30, 2011 3:39 AM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Saturday, June 28, 2014 9:31 AM
Points: 336, Visits: 244
Great article, just what I needed. Thanks

I made a slight modification in the CSD2 variant, just for security, to avoid ererroneous data, when the procedure is executed on an earlier day, than the last update.

I admit, this does not happen when using GETDATE() to set today, but my procedure has a date input, so I decided to be sure no updates are made on records which have ValidFrom > @Today.
So I added a statement to the WHEN MATCHED condition:
     WHEN MATCHED 
AND IsCurrent = 1
AND ValidFrom <= @Today --Never allow updates to overwrite future records
AND (ISNULL(DST.ClientName,'') <> ISNULL(SRC.ClientName,'')
...

Thanks once more
Herman
Post #1134300
Posted Thursday, June 30, 2011 7:32 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Thursday, March 6, 2014 10:14 AM
Points: 11, Visits: 101
Thanks for that suggestion amenjonathan. What you mention is precisely what we are doing. We are bringing in a copy of the production source tables to a staging server and then using those copies to build our warehouse tables. For a number of reasons we can't use replication, log shipping, etc. mostly per our agreement with the software vendor. Many of these tables are very large so bringing over entire copies of them each day is not desired and would take too long. It is best to update/sync the existing tables with production. I should also mention that we use some of these copied tables to do live reporting and also audit/track changes to some of the more important info so they need to update throughout the day to stay in sync. Our production system stays up 24/7 we have to use means that won't cause any performance issues with it. Since most of the data does not change from day to day, that's where we thought the MERGE command would be very useful to update those table copies on the staging server. It seemed to be a perfect fit, but it just wouldn't perform well enough.

As for the use of checksum, fortunately we have a last modified datetime field on every record so we really don't need to use that. Performance wise, the inserts and updates run very quick because they are easily identified. The slow part of syncing is making sure you delete the records in the copy that have also been deleted on production. In order to do that I have to pull down a list of all the primary keys from the production table and compare those to the copy to make they still exist. I have sync procedures setup for each table with parameters to control the types of updates allowed. We have a job that runs every 10 minutes to keep them all in sync for inserts and updates. Once every 3 hours it will also perform any deletes since that takes about 4 times as long to run as a sync without them. So far I've been amazed at how well it all works without impacting production at all. We ran traces for quite a while just to make sure. What I found out in doing this is just how much different a query will perform when run locally versus when run over a linked server. There can be huge differences even if the queries are simple.
Post #1134445
Posted Thursday, June 30, 2011 8:01 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Tuesday, June 24, 2014 4:48 AM
Points: 81, Visits: 898
Hi dwalker,

This is extremely interesting, and thanks for sharing.
I will try and find the time to look closer at this particular problem, as I had no idea that the speed difference was so great.

Adam
Post #1134465
Posted Thursday, July 7, 2011 9:38 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Friday, July 29, 2011 8:41 AM
Points: 1, Visits: 11
Adam,

Fantastic article. Your writing style is very concise and informative. I am forwarding this to all my teams, great job!

Scott
Post #1138239
Posted Thursday, December 15, 2011 3:51 PM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Friday, July 25, 2014 9:13 AM
Points: 92, Visits: 658
Adam,

Thank you very much for this article. I have been searching around for a more intelligent way to handle massive, bulk-loaded datasets of user information. We have customers that basically send us entire refreshes of their data on daily and weekly frequencies and other that only send us updates. I need to speed up this process tremendously and stop performing delete then insert operations on entire groups of users. Instead, I need to post updates only when data really changes from load to load.

I was particularly thrilled with some of the speed enhancements suggested by others of using the CHECKSUM( ) over the range of data as opposed to string compares everywhere. I will be testing it out both ways and see if I can quantify the differences I see in performance. Some of our data loads are small 2,000 or less, but we have some problem-children in the 200,000+ range.

Very good article, and I will make a point to search for more of these from you...your coding style makes it very easy to read and understand the T-SQL code.

Larry
Post #1222794
Posted Monday, December 19, 2011 8:51 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Friday, July 25, 2014 9:13 AM
Points: 92, Visits: 658
In trying to use this article to tailor a solution for my organization, I have encountered a bit of a snag.

Our "user" information is kept in two tables which we call UserAuthentication and Users. The UserAuthentication table contains the IDENTITY column and it gets created first, its IDENTITY value is captured after the INSERT with a SCOPE_IDENTITY() and then this value is used when creating a new Users record using an INSERT statement.

Assuming that I have all of the records to be merged in a staging table which contains fields destined for both the UserAuthentication and Users tables, how might I most efficiently utilize the MERGE statements to create records in both tables? Should I structure a stored procedure to handle this as MERGE against the UserAuthentication table first, then follow it with a MERGE against the Users table?

Would another solution be to perform the MERGE against the UserAuthentication table, then capture the resulting INSERT records into a temp table that I can use as a basis for INSERTS on the Users table?
Post #1223866
Posted Monday, July 16, 2012 12:59 PM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Wednesday, December 4, 2013 3:54 PM
Points: 67, Visits: 230
In the Type 4 example, I see that you have a "ValidFrom" column in Client_SCD4_History. But it never gets populated, since Client_SCD1 doesn't have a ValidFrom date.

I see you are populating the ValidTo date in the type 4 history table, which is good.

Presumably, if you're using Type 4, all records in the main table (Client_SCD1) are "currently" valid. It would be nice to see the code that populates the ValidFrom date in the Type 4 history table. But since you're doing Type 4 based on what the Type 1 table needs, plus the history table, you haven't kept the ValidFrom date from the last time the record was updated. The Type 1 table doesn't have it.

Am I right that there's something missing here?

David Walker

Post #1330340
Posted Tuesday, July 17, 2012 3:44 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Tuesday, June 24, 2014 4:48 AM
Points: 81, Visits: 898
You are quite right, David, this should have been added to ensure completeness. Thanks for noticing this.
Post #1330604
Posted Friday, November 2, 2012 7:25 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Monday, July 21, 2014 3:46 AM
Points: 115, Visits: 472
Hi everyone, and thank you Adam for this great article.

I'm wondering if it is possible to have a mix approach in one table, something like a Type 1 and 2 SCD in the same table. Suppose that in the customer dimension you have a couple of columns that change very often and some others that rarely do. This is not a typical "slow changing" scenario but if for any reason you have to face this situation what would you do?

I would like only to update the columns that frequently change (type 1) and record historical values for the other columns (type 2). I think this is possible when the SSIS Slow Change Dimension Transformation, but due to the performance issue is not possible for me to use it. In this transformation you can mark a single column as "Changing Attribute", "Historical Attribute" or "Fixed Attribute".

Any ideas or clues about how to accomplish that with the MERGE statement. At the moment I've created to tables, one to store the columns that often change and one to be treated as a classic SCD.

Any comment would be appreciated.


Paul Hernández
http://hernandezpaul.wordpress.com/
https://twitter.com/paul_eng
Post #1380354
Posted Friday, April 12, 2013 10:45 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, October 31, 2013 11:59 AM
Points: 1, Visits: 9
Found this when looking for good examples. Great article! I noticed that one of the fields is left out of the WHEN MATCHED clause in the SCD2 version. "Country" is there, but not "County". If you modify any of the counties in the source table, it won't create a new record.
Post #1441795
« Prev Topic | Next Topic »

Add to briefcase «««123

Permissions Expand / Collapse