Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Populating Fact Tables Expand / Collapse
Author
Message
Posted Tuesday, December 19, 2006 6:42 PM


SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Monday, January 14, 2013 4:24 AM
Points: 81, Visits: 188
Comments posted here are about the content posted at http://www.sqlservercentral.com/columnists/vRainardi/2769.asp
Post #331679
Posted Tuesday, January 2, 2007 6:28 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Monday, October 29, 2012 4:10 AM
Points: 372, Visits: 55
Thank you for an interesting article on loading of fact tables. This is something I might be doing shortly.
Post #333720
Posted Tuesday, January 2, 2007 6:49 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Monday, May 7, 2007 3:16 PM
Points: 32, Visits: 1

Thank you, this is very helpful.  I am taking on a new assignment, and will be building a data warehouse.  This is a great example of how to monitor for pitfalls.

I understand this is 3rd in a series.  How do i find the preceding two articles?

Thanks, Mike

 

 

Post #333727
Posted Tuesday, January 2, 2007 10:00 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Yesterday @ 3:08 PM
Points: 80, Visits: 333

One of the problems I've run into is the size of the fact tables.  This is with a with a web site doing 70+ million hits per year.  I've found that the majority of reporting works on month/year as the major time dimension, so I created a function that changes the date to the first of the month.  I don't like the idea of chosing a particular day, but OLAP tools are looking for a date and not just a month/year combo.  This cuts the size of the data into 1/30th of what it would be otherwise. 

CREATE FUNCTION [dbo].[GetRptDate] (@Date datetime) 
RETURNS datetime AS 
BEGIN
DECLARE @StrYear nvarchar(4)
SET @StrYear = Year(@Date)
DECLARE @StrMonth nvarchar(2)
SET @StrMonth = Month(@Date)
RETURN  convert(datetime,(convert(nvarchar,@StrYear+'-01-'+@StrMonth, 103)), 103)
END

I use this function as part of an ETL job that runs daily.  There is a lot more to it than this, but I wanted to see how other people are dealing with VLFT (Very Large Fact Tables - a new acronym?). Thanks.

Post #333787
Posted Tuesday, January 2, 2007 1:23 PM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Friday, September 28, 2012 6:13 AM
Points: 380, Visits: 52
Mike - If you click on the author's name at the top of the page, you can see all the articles he has written. It appears that the previous two articles are parts 1 and 2. It also appears that he has written several DW articles, so I will be digging into all of them.

Thanks Vincent!




Michelle
Post #333841
Posted Wednesday, January 3, 2007 3:59 PM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Sunday, September 8, 2013 5:39 PM
Points: 263, Visits: 862

Thank you Vincent for taking the time to write this article. It is coherent, concise, and useful. Well done!

In the section titled "Fact Table Partitioning" (near the bottom) I think there are some missing hyperlinks. I would like to read the one about partitioning in SQL 2005. The sentence reads as "This article provides a good overview on the partitioning on SQL Server 2005", but there is no hyperlink on "This". Or at least it's not showing up in my browser (IE7). Can you please reply with the link?

Thank you!




James Stover, McDBA
Post #334188
Posted Thursday, January 4, 2007 11:25 PM


SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Monday, January 14, 2013 4:24 AM
Points: 81, Visits: 188

Hi James, the link to the excellent SQL 2005 partitioning by Kimberly Tripp is http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql90/html/sql2k5partition.asp, where as the one on SQL 2000 by Joy Mundy is http://msdn2.microsoft.com/en-us/library/aa902650(SQL.80).aspx

Thank you Sam for the useful date converter function. On Oracle and SQL 2005 we deal with VLFT (good acronym, thx) using table partitioning. On SQL 2000 as there is no real partitioning, summary tables, indexing and purging are our weapons. We had a snapshot fact table with 90m rows and treated this with a summary table (which is only 300k rows), purging (leaving daily snapshot data for the last 4 weeks and weekly for last 2 years). Luckily the ETL is not too much of a problem as we maily do insert and no updates.

Thank you Michelle pointing the two articles for Mike. The URL for my articles is http://www.sqlservercentral.com/columnists/vRainardi. Mike, Data Warehouse Loading is the first in the series, where I describe the general things, then I wrote 1 article focusing on dimension tables and another one focusing on the fact tables. I haven't covered audit, control, restore, MDB and EAI (topic number 7, 8, 9, 10, 11 - see planned content on the DW Loading article) as I've been busy writing a book and another article on using DW for CRM but if a lot of people want it please tell me and I'll write it for you guys.

Post #334567
Posted Saturday, January 6, 2007 10:01 AM


SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Wednesday, July 6, 2011 7:52 AM
Points: 80, Visits: 44

Hi,

the article is good but I missed the usage of the new possibilities of SQL Server 2005 SSIS. In SQL 2000 it was a common way to use a "staging area" in form of a temporary table for the dimension mapping, calculations and whatever. However, performance problems arised when dealing with tons of data. Because every time you write data into your database you have disk IO which slows down performance dramatically.

With SQL 2005 you can avoid this behaviour, because the SSIS architecture enables you to perform nearly all transformations in memory and then writing the data into the fact table. This is much faster than using a staging table and common best practice.

Another thing I do not agree with is using the combination of all surrogate dimension keys in the fact table as clustered index. How does this work in a table with xx million rows and 15 dimensions? This is much afford for the dbms to maintain such an index and I do not believe that it works performantly.

I made good experiences by using one of the date surrogate keys as clustered index for due date reporting fact tables. Other experiences?

 

Best regards,

Stefan

 

 




Kindest Regards,

SK
Post #334880
Posted Saturday, January 6, 2007 9:02 PM


SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Monday, January 14, 2013 4:24 AM
Points: 81, Visits: 188

Hi Stefan, you are right that in SSIS we can do all transformations and lookups in memory and write directly into the fact table. The reason we dumped the source systems data into a stage database in this particular project was because our access window to the operational systems are limited. In one system (the main one) we were allowed access only from 4 am to 5 am and the whole ETL package took about 2 hours. The operation guys didn't want us to do any data extract during working hours (which was 6 am to 8 pm) because of the impact to the online performance of the operation system. From 8 pm to 4 am they have various house keeping processes such as overnight batch process and backup. So we just used the limited time window only to get the data out from the source system into either a flat file or a collection of unindex, unconstrained tables. We managed to get the extract portion of the ETL to 15-30 minutes. But as you said, the best practice is to perform in-memory transformations, because this produces overall end-to-end shortest ETL execution time. In other projects where I used Informatica for ETL I also performed in-memory lookup and transformations, especially when we have a dedicated box for ETL processes.

You second point is very true, we should not use the combination of all the surrogate dimensional keys as the clustered index, but should use only several keys. I normally use either 3 surrogate dimensional keys which make the fact table record unique, or a fact key column if there is one. In a particular project we had a customer profitability incremental snapshot fact table, which had 19 surrogate keys. The grain of this fact table is 1 row for each customer requirement per day. The clustered index was made of only 3 columns: date_key, customer_key and a degenerate dimension column called requirement number. We had a transactional fact table which record the visit to each customer with 12 surrogate dimensional keys. The grain  of this fact table is 1 row for each visit. The clustered index is comprised of date_key, customer_key and requirement number DD. I normally try to create unique clustered index and in many occasions it is the primary key columns. In certain cases I had non-unique clustered index, for example when the number of surrogate key columns required to uniquely identify the fact table record is more than 4 and there is no fact table key. In these cases my preference is to create a fact table key and make it the clustered index. If it not possible to do that as an alternative I would consider taking 2 or 3 PK columns as non-unique clustered index. The first column is typically the date key, with the main reason being the tendency of extremely high usage of this column in the application queries. The second column is typically the customer key, account key, product key or service key depending on which mart I am working on, also because of the same reason e.g. frequency of usage in the queries. For performance and storage reasons I tend to avoid having more than 3 columns for clustered index and for performance reasons I also tend to avoid having a fact table without a clustered index.

Thank you for sharing your experience about the date surrogate keys, Stefan. If anybody have other experiences with fact table indexes or in-memory transformations, be it contradictory or supportive to the above, I would appreciate if you could share them so we can all learn from them.

Kind regards,
Vincent

Post #334901
Posted Wednesday, February 27, 2008 3:14 AM


SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, February 21, 2014 2:03 AM
Points: 179, Visits: 158
Hi Vincent,

Thanks a lot for this article (and its companions "Data Warehouse Loading" and "Upsert Dimension Tables") : a real gold mine :) completing on the practical side some points developed by Ralph Kimball in his excellent book "The Microsoft Data Warehouse Toolkit".

I haven't covered audit, control, restore, MDB and EAI (topic number 7, 8, 9, 10, 11 - see planned content on the DW Loading article) as I've been busy writing a book and another article on using DW for CRM but if a lot of people want it please tell me and I'll write it for you guys.


Please do ! I'm looking forward to read it !

Also an other request: some tips or best practices concerning the use of indexes for both dimension and fact tables. I have read lots of articles about the subject just to get lots of contradictory information. I would really appreciate the advise of a long time practioner ;).

Luc
MCTS SQL Server 2005 BI


Luc Magnée
MCTS BI
Antwerp - Belgium
Post #460752
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse