Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 123»»»

Loading Fact Tables - Step by Step Instructions Challenge Expand / Collapse
Author
Message
Posted Sunday, April 7, 2013 1:04 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Sunday, April 13, 2014 5:59 PM
Points: 13, Visits: 212
Dear All,

This is my first Data warehouse project and I am having serious problems loading my fact table.

I have already created the dimension tables (over 20)with unique IDs each one (using IDENTITY int).
Apart from that each record has a FK relationship with one particular table
e.g.
Delegate Table
DelegateID PK
DelegateName

StaffGroup
StaffGroupID PK
DelegateID FK
StaffGroupName

The problem is that I am unable to load my fact table using the LookUp function in SSIS 2008 R2. I am using the DelegateID as the LookUp key.
As the tables are different lengths ect. Date and the measures tables, I am unable to successfully complete the LookUps. I can use the NoMatch output but with that many tables it will become impossible to folly.
There are null values in my Date and Measures tables. I thought that they would be acceptable there.

My request is can someone give step by step instructions on how to load a fact table?
I have scoured the internet for this, and there is nothing apart from a blog here. While it is informative it is pretty hard to understand for a newbie.
I would be a great boon for me and others if someone in the community would take up this challenge. If not at least point me the right direction.

MSDN does not have any realistic explanations and neither does BOL.
Post #1439658
Posted Sunday, April 7, 2013 9:57 PM


SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Thursday, December 11, 2014 10:00 PM
Points: 179, Visits: 592
From your sample tables you don't really have a star schema. If possibly you should denormalise your dimension as much as posssible, not split them off into snowflakes

You need to explain your problem - "I am unable to load my fact table using the LookUp function" - why?


Normally you get the source system data with source system keys, load it into a staging table, then personally I don't use SSIS lookups, I use a bulk update to set the surrogate keys then load it into the fact.


Have you read the Kimball Books or the microsoft data warehousing books? Thes concepts come from books, not from technical articles (MSDN, BOL)

Post #1439677
Posted Monday, April 8, 2013 8:35 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Monday, July 28, 2014 7:12 PM
Points: 2,148, Visits: 487
I agree.

The Microsoft Datawawrehouse Toolkit, chapter 7, has great tips for loading fact tables along with sample scripts and SSIS packages to go with it. The 2005 version gives you more of a step-by-step tutorial; whereas, the newest version, SQL Server 2008 R2 Edition, maps the the SSIS product to the 34 subsystems of ETL along with giving you advice on establishing standards.

My biggest tip: Once you have mastered the art of the lookup transform, remove all foreign keys in your fact table. Like the author states, they are "technically right, but practically wrong" as FKs add unnecessary overhead for a check you just performed in the lookup transform.

Book and code samples can be found here:

http://www.kimballgroup.com/data-warehouse-and-business-intelligence-resources/data-warehouse-books/booksmdwt/

The 34 subsystems of ETL can be found in the Kimball Group Reader or on their website: http://www.kimballgroup.com/2007/10/21/subsystems-of-etl-revisited/
Post #1439877
Posted Monday, April 8, 2013 12:32 PM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Monday, July 28, 2014 7:12 PM
Points: 2,148, Visits: 487
One more thing......in the Microsoft Data Warehouse Toolkit, the actual use of the Lookup Transform is covered more in depth in

ETL Subsystem 13: Fact Table Builders
ETL Subsystem 14: Surrogate Key Pipeline
Technique #1: Using Cascading Lookups (via the lookup transform)
Technique #2: Using database joins
Post #1439976
Posted Tuesday, April 9, 2013 7:47 AM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Today @ 5:39 AM
Points: 4,463, Visits: 6,391
My addition to this thread is to inquire why you used full integers for your surrogate keys. Do you REALLY need 4.2BILLION possible values? I bet not. tinyint is one byte, smallint is 2 bytes. Review how many values you can have for each (256 and 65K+ respectively) and use the right one for every dimension. Believe it or not those extra bytes (recall that an integer is FOUR BYTES) on a fact table REALLY DO ADD UP!!

Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
Post #1440328
Posted Tuesday, April 9, 2013 8:07 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Monday, July 28, 2014 7:12 PM
Points: 2,148, Visits: 487
Completely agree: http://www.kimballgroup.com/2001/11/03/design-tip-30-put-your-fact-tables-on-a-diet/



I also forgot. NEVER, EVER, under ANY circumstance, leave null values in your dimension keys. It puts too much at risk for an end user. Look up "surrogate nulls" for more information :)
Post #1440346
Posted Tuesday, April 9, 2013 8:59 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Sunday, April 13, 2014 5:59 PM
Points: 13, Visits: 212
Dear All,
Thank you for your comments. I will be getting rid of the NULL values and reducing the size of my int(s) pronto!
I have Amazoned the MS Data Warehouse Tookit and should get it by the end of the week.

In the meantime I have attached data schematic that I would like to convert to dimensions. I am trying to monitor compliance rates based on Topic and Job. A specific job will need to do specific topics. I will then need to slice by Organisation hierarchy.

As I need to delve in to each individually I can't put the topics (e.g. Consent or Dementia) all under one roof so to speak.
The relationship is centered on the Delegate / staff member, hence the link. Based on the business requirement matrix I have worked out who does what.

I have tied this to each topic using the Delegate table PK as FKs in almost all the other tables.

This is where I am encountering problems.

When I try to upload the individual PK into the Fact Table in SSIS I am getting a lot of No Match Outputs. Perhaps (actually I am pretty sure) I am doing the LookUp incorrectly.
I am using the DelgateID as the LookUpSource. Should I be using something else? Is it because of the NULLS in the Dimension tables or a combination of both?
Am I completely on the wrong track here?! Thoughts very welcomed.

I am quite eager to learn and willl readily start over again if general concensus is to do so.
Be brutal if you must in a kind way!!

Regards,

Marvin.


  Post Attachments 
MarvinPerrottDataWareHouseStage1.jpg (29 views, 650.16 KB)
Post #1440398
Posted Tuesday, April 9, 2013 7:32 PM


SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Thursday, December 11, 2014 10:00 PM
Points: 179, Visits: 592
I don't understand enough about your technical issue with lookups, but for starters be aware that under default settings it is case sensitive.

You generally would not load a source system key into a fact table. Source system keys usually go into the dimensions (or a prior merge/mapping table before the dimension).

What you should be doing is loading dimensions (which often will not necessarily map 1-1 with lookup tables in the source system), and saving the source system key in here, and generating a new 'surrogate key' in the process.


That's your first step.


Next step is to load transactions from the source, then work out what dimensions these transactions relate to and then load that into the fact table.


The result is a fact table that only has surrogate keys in it. All of the complications around changing attributes, multiple source systems, data cleansing, multi column keys, strange data models etc. are handled by the ETL and the dimension table. All of these complications are boiled down to the single surrogate key that joins neatly to your fact.


Hopefully the schema you posted is the source system, not the data warehouse? While you are waiting for your data warehouse book, there are plenty of resources online about star schemas and Kimball.


When building your star schema you need to think about the business process, i.e. what events happen over time and in what way are they measured? - that's your fact(s).

If your source system records these events in multiple tables in a complicated data model, your challenge is to boil this down to a nice fact/dimension representation.
Post #1440626
Posted Wednesday, April 10, 2013 5:38 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Sunday, April 13, 2014 5:59 PM
Points: 13, Visits: 212
Dear Nick,

I have already done the first step. so, what's in my dimension tabls have both the source keys and new surrogate keys.
the schema posted is not my Datawarehouse but it may well be something similiar.

I have thought about my measures and it is at the lowest level of granularit as well.

It is just the uploading that is posing a problem at present.
Post #1440746
Posted Wednesday, April 10, 2013 7:58 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Monday, July 28, 2014 7:12 PM
Points: 2,148, Visits: 487
Is this an OLTP-like schema for the Quality Management function of a hospital?
Post #1440819
« Prev Topic | Next Topic »

Add to briefcase 123»»»

Permissions Expand / Collapse