Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 1234»»»

A faster way to prepare dimensional databases Expand / Collapse
Author
Message
Posted Tuesday, February 23, 2010 10:05 PM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Saturday, December 6, 2014 9:55 AM
Points: 81, Visits: 908
Comments posted to this topic are about the item A faster way to prepare dimensional databases
Post #871671
Posted Wednesday, February 24, 2010 2:32 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, November 10, 2014 10:28 PM
Points: 2, Visits: 83
Although I'm glad to learn new ways of loading an multidimensional database structure, I do feel the solution to be a bit cumbersome.

This could of course be due to that you dont have control over the complete ETL-chain.

Our dw also has normalized database and multidimensional database. The norm is loaded daily from 20 different sources and we perform incremental updates of the two main fact tables, Sales and GL.

From textfile to updated transaction table, given about 3 million updated rows in each table every night, we have a processing time of about 30 mins.

The logic has the following bases
1. Norm dim tables create and hold surrogate keys for items, accounts etc. This means surrogate keys are fixed and not recreated every night
2. Norm fact tables have surrogate keys for; a: every dim member on the row, b: for the fact row itself
3. All norm tables have an "LatestUpdated"-flag which is set when a nightly load updates the row
4. Each load step is logged, saving the "latest succesful run data"-flag

The dw update then do this
1. Truncate and reload of all dimension tables. This works since norm keep all surrogate keys
2. Incremental update of the Fact table using a comparison of "LatestUpdated" and "LastSuccessfulRun". Minimum rows are touched and surrogate keys are already in place
3. Reports tables/cube loads follow...

I think this approach seem simpler and less error prone.

This of course require that you have control of how the norm database is loaded.
Post #871788
Posted Wednesday, February 24, 2010 2:58 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, April 20, 2010 6:15 AM
Points: 5, Visits: 104
actually i was waiting for a loooooooong time to read this type of article to get basic ideas, really very good article, thanks a lot
Post #871801
Posted Wednesday, February 24, 2010 7:24 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Tuesday, December 7, 2010 8:19 AM
Points: 327, Visits: 123
This is one of the best nuts and bolts articles I've come across in SSAS.

Question for the Group:
Is there a "test" dataset available for creating an SSAS Dimensional Database? All of the books, articles and commentary tell you to load the Adventure Works Cube and move on...this is a bit like putting the cart before the horse...it is already a completed solution.

The dataset ideally would have semi-dirty data that requires some cleanup before it can be loaded into a dimensional DB...something that is commonplace everywhere except in AdventureWorks.

Please correct me if I am off the mark on AdventureWorks DW.
Post #871907
Posted Wednesday, February 24, 2010 8:06 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Friday, December 12, 2014 8:24 AM
Points: 2,818, Visits: 2,567
Very good article. I am working in a very similar situation. Eventually I will gain more control over the ETL process. But I took a couple of good ideas from this article that will definitely help me in the short term. Thanks.
Post #871931
Posted Wednesday, February 24, 2010 8:28 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Friday, July 11, 2014 7:34 AM
Points: 22, Visits: 52
Adam,

Great article! Very succinct and gets to the point about how important a DW and DM coexist together.

My question is about growth. How is the process working after say 30 days and 180 days? What has the increased volume done to performance?

Once again, great Job.

Jim
Post #871947
Posted Wednesday, February 24, 2010 10:25 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Friday, December 17, 2010 4:44 AM
Points: 2, Visits: 20
"Insert Into" with a large amount of records will run a long time because SQL Server does the whole insert in one transaction. Use a data pump (Data Flow Task) and set the properties to write the transaction every 5000 records (set both "Rows per Batch" and "Maximum insert commit size" to 5000). You will save hours on your loads. You don't really care about backing out the whole transaction because if it fails, you will most likely go back to the truncate and start over again.

Post #872050
Posted Wednesday, February 24, 2010 12:51 PM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Thursday, May 8, 2014 10:08 PM
Points: 358, Visits: 397
The final refinement of the SK lookup table method comes when you add an MD5 checksum column and leave them permanently populated. Use these tables to quickly determine update/insert for only changed or new records. You can even make one of these for the fact table itself. Then fast incremental loads for the entire process become possible and your times will be cut dramatically. I've dealt with source systems that put out flat file dumps of 500+ million rows daily and used this method to process the deltas in barely an hour.
Post #872216
Posted Wednesday, February 24, 2010 1:59 PM


SSChasing Mays

SSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing Mays

Group: General Forum Members
Last Login: Friday, January 3, 2014 10:59 AM
Points: 626, Visits: 836
Very impressive article!

I'm just now being asked to transition to some DW work after eons of doing primarily OLTP support.
I'm bookmarking this article!

Thanks so much for your efforts.


Craig Outcalt



Tips for new DBAs: http://www.sqlservercentral.com/articles/Career/64632
My other articles: http://www.sqlservercentral.com/Authors/Articles/Craig_Outcalt/560258
Post #872276
Posted Wednesday, February 24, 2010 3:44 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Thursday, April 11, 2013 10:23 AM
Points: 15, Visits: 35
One technique that can be used is to keep all the surrogate keys in a separate fact extension table with the fact id. Hence you are either updating a very narrow fact table or inserting to it.

We always use views for dimensions and fact tables as input to an SSAS cube. That way, you can join the the two fact tables in a view to get the surrogate keys deployed as well.
Post #872355
« Prev Topic | Next Topic »

Add to briefcase 1234»»»

Permissions Expand / Collapse