SQL Dimensions & Fact Tables

  • Hello,

    I am in the process of creating a datawarehouse database on our SQL Server, I have used SSIS to dump the live data from our OLTP system into a staging table in SQL.

    My next step is to create the Dimensions & Fact tables before I can populate them from the staging table.

    Below is a snipped of how I go about doing this..

    -- Create table

    CREATE TABLE [dbo].[DIM_COMPANY](

    [SK_COMP_ID] [int] IDENTITY(1,1) NOT NULL,-- Surrogate Key

    [COMP] [varchar](150) NULL,-- Business Key

    [COMP NAME] [varchar](150) NULL,

    [COMP DESCRIPTION] [varchar](150) NULL

    )

    -- Create primary key constraint

    ALTER TABLE [dbo].[DIM_COMPANY] ADD

    CONSTRAINT [PK_DIM_COMPANY] PRIMARY KEY CLUSTERED

    (

    [SK_COMP_ID]

    ) ON [PRIMARY]

    -- Add column into fact table

    ALTER TABLE [dbo].[FACT_SALES]

    ADD [SK_COMP_ID] [INT] NOT NULL

    -- Add foreign key

    ALTER TABLE [dbo].[FACT_SALES]

    WITH CHECK ADD CONSTRAINT [FK_FACT_SALES_DIM_COMPANY]

    FOREIGN KEY([FK_COMP_ID])

    REFERENCES [dbo].[DIM_COMPANY](SK_COMP_ID)

    I am fairly new to this, but from the above example I am going to be referencing my dimensions in my fact table using surrogate keys, thus I have created identity columns in each of my dimensions.

    Each dimension also has a business key which is unique key from our OLTP system that will identify that row of data.

    My plan is to leverage SSIS and Lookup/SCD/MERGE to populate my Dimensions & Fact tables.

    Does it make sense to create a clustered primary key on my surrogate column?

    Using SSIS, how would I efficiently populate my Dimensions? by using SCD?

    Using SSIS, how would I efficiently populate my Fact tables? by using Lookup?

    Any tips from expert DW folks is appreciated.

  • Hi

    Does it make sense to create a clustered primary key on my surrogate column?

    I've always done this as chances are you'll be using these in your joins

    Using SSIS, how would I efficiently populate my Dimensions? by using SCD?

    There really are many ways of doing this, one is using the SSIS component or you could use something like this:

    http://www.sqlservercentral.com/articles/Integration+Services+(SSIS)/69766/

    This may also help (although it just takes you through the principals, the joining on Email is not recommended!)

    http://www.sqlservercentral.com/stairway/72494/

    It all depends on your data volumes, structure and how much time your willing to invest

    Another question is do you NEED surrogate keys if your business keys are unique, will you be tracking history? If your not then Surrogate Keys may not be required..

    Using SSIS, how would I efficiently populate my Fact tables? by using Lookup?

    I really wouldn't recommend using SSIS Lookup, its incredibly memory intensive I personally would use SP's and assign the surrogate keys using these

    Also the Kimball group website has loads of useful information:

    http://www.kimballgroup.com/

    Andy

    ==========================================================================================================================
    A computer lets you make more mistakes faster than any invention in human history - with the possible exceptions of handguns and tequila. Mitch Ratcliffe

  • Thanks for the quick reply.

    I am fairly new to TSQL, using SSIS seems to be the easier way and lots of youtube resources to look at to implement my load.

    One question I have is the process to load my Dimensions from my staging table, I have the business key which is unique to that dimensions data above, how do I using that load data from my staging table to my dimenion on the business key while generating my surrogate keys?

    I am a little confused there and wanted to look at an example to study?

  • Are you recording history in your Dimension or just adding new records / updating existing records?

    ==========================================================================================================================
    A computer lets you make more mistakes faster than any invention in human history - with the possible exceptions of handguns and tequila. Mitch Ratcliffe

  • There is no requirement to record history, basically a UPSERT/MERGE (upload record if exist, insert if it does not).

    Thanks.

  • mirde (8/22/2012)


    There is no requirement to record history, basically a UPSERT/MERGE (upload record if exist, insert if it does not).

    is the business aware that the same query looking for something between 08/01/2012 and 08/15/2012 would probably return different values if done today or done six month from today?

    The only way you can ensure consistent reporting over time is to record history - like resorting to Type 2 SCD for Dimension tables and dated facts for Fact tables.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • The DW us being built as a Sales DW, the staging data is extracted from our OLTP system which is General Ledger based.

    The way the business works, any new transactions posted are always added as new records in the database. With that in mind, I am building the Dimensions such as Product, Ship Location, Bill Location, Sales Person, and the Sales Fact data.

    In the GL world, if I am looking at a Product by Sales Person for a specific time range, why would it yield different values if I ask the Cube the same question at different dates?

  • There is no requirement to record history, basically a UPSERT/MERGE (upload record if exist, insert if it does not).

    OK so in this case you actually don't need surrogate keys, since your business keys are unique you can use these, you are just making more work for yourself..

    Just INSERT new records into the fact table where the business key does not exist and UPDATE existing records where the business key does with new the information..

    Since you are dealing with GL data it would be advisable to use a date dimension and populate a date key from that. It's getting late and I don't have a script (or a SQL instance on this laptop will provide examples tomorrow for you) so you can use financial years / quarters ect in calculations..

    In the GL world, if I am looking at a Product by Sales Person for a specific time range, why would it yield different values if I ask the Cube the same question at different dates?

    As we didn't know if you were recording historical data what Paul was referring to was that if a Fact was modified i.e. a purchase order was changed from £100 on 18.01.2012 to £200 on the 19.01.2012, if you queried this table you would lose the fact that the original order was for £100, it wasn't clear that you were dealing with GL transactional data

    Cheers

    Andy

    ==========================================================================================================================
    A computer lets you make more mistakes faster than any invention in human history - with the possible exceptions of handguns and tequila. Mitch Ratcliffe

  • Thank you Andy for your help.

    I am going to opt-in on using SK's across my Dimensions just to get practice on using them as some dimensions might not have a business key available, thus I'll have to resort to a surrogate key.

  • mirde (8/22/2012)


    I am going to opt-in on using SK's across my Dimensions just to get practice on using them as some dimensions might not have a business key available, thus I'll have to resort to a surrogate key.

    If your dimension doesn't have a business key then you should redesign it so that it does. It really doesn't make sense to store duplicate data. Some form of external identification for facts in the database is essential to make sense of the information being recorded and to ensure it can be updated and verified properly. That is as true in a data warehouse to in any other database.

    I would second the suggestion that you record at least some history if you choose. You can still present the users with only a snapshot of the latest data if you choose.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply