How to load and in what order warehouse tables

  • Hi,

    I am somewhat new to were housing but not new to SQL. I  have been looking all over the net and cannot find how to load the tables in a warehouse database. they talk in theory but not the physical process.
    I know how to write code to Insert into tables and even to pull in data from SSIS and load a table in OLTP system.
    What I am not sure is things like the order and how you would populate the Fact table. So for example, I can envision getting a file from a customer for products and load them into the products table. Please tell me if it is more than that? 
    But how would you load the Fact table?
    It is made up of keys and data from different tables and how would you load the measures?
    And what is the order to load the tables (dimensions first or facts)?
    If it helps in the question I added a set of tables here that I got off the net. 
    Dimensions:

    Customer, Product, Store, Sales Person  

    Fact Table:

    FactProductSales
    (SalesInvoiceNumber,
    SalesDateKey,
    SalesTimeKey,
    SalesTimeAltKey,
    StoreID,
    CustomerID,
    ProductID,
    SalesPersonID,
    Quantity,
    ProductActualCost,
    SalesTotalCost,
    Deviation)
    Thank you

  • When designing a star schema the first two questions you need to answer are:
    1. What does the customer want to measure?
    2. How do they want to measure it?

    Using products as an example - is it someone who sells products? Then they may want to measure sales. If they produce it, then they may want to measure production... Understanding that helps you design the fact table(s). I'll stick with sales for the rest if this post. 

    Those who measure pretty much anything usually want to do so by some unit of time (year, Quarter, Month, etc).... This is why everyone has a date dimension table. Armed with a sales fact and date dim you can begin creating reports that give the customer Total or Average or min/max Sales by Year, by Quarter, Etc. What are some other ways they need to measure sales? By what was sold - you'll probably need a product dim table... Other ways you want to measure sales might include by who sold it, what it was sold for, by where it was sold, etc. Understanding this helps you design your other dimension tables. 

    To your other question:  you generally load your dimension tables first, then fact tables.The Unique keys live in your dims, the referencing keys live in the fact. Loading the dim first avoids orphans in the fact.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Hi,
    Thanks for the post I got a lot from this. But what is still not quite clear is how the fact table gets loaded. lets day I get new records from a customer to load my Customer, Product, Store, Sales Person tables, that I think would be straight forward, but how would I use this to load me Fact table. I quess new Ids (like product ID ) would go where the product ID is in the Fact table but what about the measures?
    And what about the other IDs ( from the other dimensions) and other measures?
    For a given record in the fact table, it would have all the dimensions and measures for all the tables so would you have to do these at once? and how?
    Thank you

  • itmasterw 60042 - Thursday, March 30, 2017 6:42 AM

    Hi,
    Thanks for the post I got a lot from this. But what is still not quite clear is how the fact table gets loaded. lets day I get new records from a customer to load my Customer, Product, Store, Sales Person tables, that I think would be straight forward, but how would I use this to load me Fact table. I quess new Ids (like product ID ) would go where the product ID is in the Fact table but what about the measures?
    And what about the other IDs ( from the other dimensions) and other measures?
    For a given record in the fact table, it would have all the dimensions and measures for all the tables so would you have to do these at once? and how?
    Thank you

    you dont use the dimension tables  (cust/prod/store etc) to build a fact table......your customer is going to have to provide that detail....eg a list/table of sales invoices  (datetime/salesis/prodid/custid/qty,uom,salesprice etc)....you can create your fact table from a history of these and then get your customer to send you just daily updates

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Okay thanks, I really appreciate the help .
    Thank you

  • J Livingston SQL - Thursday, March 30, 2017 7:31 AM

    itmasterw 60042 - Thursday, March 30, 2017 6:42 AM

    Hi,
    Thanks for the post I got a lot from this. But what is still not quite clear is how the fact table gets loaded. lets day I get new records from a customer to load my Customer, Product, Store, Sales Person tables, that I think would be straight forward, but how would I use this to load me Fact table. I quess new Ids (like product ID ) would go where the product ID is in the Fact table but what about the measures?
    And what about the other IDs ( from the other dimensions) and other measures?
    For a given record in the fact table, it would have all the dimensions and measures for all the tables so would you have to do these at once? and how?
    Thank you

    you dont use the dimension tables  (cust/prod/store etc) to build a fact table......your customer is going to have to provide that detail....eg a list/table of sales invoices  (datetime/salesis/prodid/custid/qty,uom,salesprice etc)....you can create your fact table from a history of these and then get your customer to send you just daily updates

    You don't use the dimension tables to build the fact table??? Then how do you link the fact table to the dimensions? How do you get all the keys pointing to the dimensions?



    Alvin Ramard
    Memphis PASS Chapter[/url]

    All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.

    For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • Alvin Ramard - Thursday, March 30, 2017 7:54 AM

    J Livingston SQL - Thursday, March 30, 2017 7:31 AM

    itmasterw 60042 - Thursday, March 30, 2017 6:42 AM

    Hi,
    Thanks for the post I got a lot from this. But what is still not quite clear is how the fact table gets loaded. lets day I get new records from a customer to load my Customer, Product, Store, Sales Person tables, that I think would be straight forward, but how would I use this to load me Fact table. I quess new Ids (like product ID ) would go where the product ID is in the Fact table but what about the measures?
    And what about the other IDs ( from the other dimensions) and other measures?
    For a given record in the fact table, it would have all the dimensions and measures for all the tables so would you have to do these at once? and how?
    Thank you

    you dont use the dimension tables  (cust/prod/store etc) to build a fact table......your customer is going to have to provide that detail....eg a list/table of sales invoices  (datetime/salesis/prodid/custid/qty,uom,salesprice etc)....you can create your fact table from a history of these and then get your customer to send you just daily updates

    You don't use the dimension tables to build the fact table??? Then how do you link the fact table to the dimensions? How do you get all the keys pointing to the dimensions?

    the OP said  

    But what is still not quite clear is how the fact table gets loaded. lets day I get new records from a customer to load my Customer, Product, Store, Sales Person tables, that I think would be straight forward, but how would I use this to load me Fact table. I quess new Ids (like product ID ) would go where the product ID is in the Fact table but what about the measures?

    and apologies to all if I am misreading the question, but assuming that the OP gets the following data to load....where does the "fact" come from

    ========== ===========
    Customer    
    CustID  CustName 
    1    John  
    2    Bill  
    3    Dan  
            
    Product     
    ProdID  ProdName 
    1    Widget 
    2    Woogle 
    3    Waffle 
            
    Store     
    StoreId  StoreName
    1    NewYork 
    2    Paris  
    3    London 
    ========== ===========

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Typically you will start by loading data from either a flat file or a SQL query from a source database into a flat staging table (ColumnStore is good for this). Once done, you have an ETL process that loads new or updated rows into the dimension tables (new city/states, new customers, new product codes, etc.). Read up on SCD (slow changing dimension) data loading. Once that is done, then you load the fact table.
    http://www.kimballgroup.com/category/fact-table-core-concepts/

    To understand dimensional modeling and ETL, don't rely solely on random web searching, advice, or intuition. Read this book by Ralph Kimball; it does an excellent job of explaining practically everything end to end.
    https://www.amazon.com/Data-Warehouse-Toolkit-Definitive-Dimensional/dp/1118530802

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • At a very high level, you can load your dimensions first and then apply your foreign keys from those dimensions back onto your data. Once that's done, you can then populate your facts with only the foreign keys and the metrics you defined to track in your fact.

    For example, if you load a table with 5 dimensions and 3 metrics you want to store. If those dimensions are Product, Product Type, Time, Order Number and Customer, then you could easily take the approach of creating a separate table per dimension with just a ID and value. For example, for Product, you could create a table called DimProduct that has ProductID and Product as the two fields. ProductID is your Identity field that will auto increment starting at 0.

    Next, you will want to add room for those dimension foreign keys back on the table with your data. You will want to add new fields for every new foreign key associated with those dimensions you will eventually populate. For example, you would add the ProductID back onto that table where your data is sitting. It will be NULL at first, but you will populate it later.

    Then you will want to uniquely SELECT the dimensions one by one and populate those new dimension tables you created. For example with Product, you want to INSERT SELECT the Product from that table WHERE NOT EXIST in the DimProduct table. This will insert a record for every NEW Product name that comes in that fact to the DimProduct table. Being there is a ProductID on the DimProduct table that will auto increment when new records (i.e.: new product names) are inserted into that table, then you have a ProductID that you can then join back on that table.

    Once you do that for all dimensions on the staging table, you now have all foreign keys applied to the data. Then you create a FactSales table or whatever you are measuring with those products and populate it on the granularity you want where the dimensions now have unique ID's that relate to other tables for those dimensions that also have a unique list of dimensional values that match your data.

    That's just a simple approach. There are other things to consider like how you validate, transform, clean and summarize the data. There is also considerations of how you may consolidate dimensions into one table as opposed to a dimension per table. Then you have how you will handle primary keys on the fact table, how duplicates are handled itself along with change management, timestamps of publishes, refresh rates of those publishes and so much more.

    I would suggest picking up one of the many fine ETL & Data Warehouse books from Ralph Kimball for further reading. They are very good and give you a step-by-step guide to how to approach to designing and then implementing such things.

    Hope that helps.

  • Thanks again for the reply xsevensinzx. But now that you have created your Fact table, what I am wondering is how do you update it.
    SO if in a month you where to gat files for the different dimensions and I guess the fact table how does that get up dated. I mean does the customer know what you need record be record to insert new data for each of the dimension keys and measures? I would guess not. I would suspect that they send you the new data for the dimensions and some how you extract from this the data you need for the Fact table.
    So I am still not quite clear on how you populate the Fact table with new data?
    Thank you

  • itmasterw 60042 - Friday, March 31, 2017 6:09 AM

    Thanks again for the reply xsevensinzx. But now that you have created your Fact table, what I am wondering is how do you update it.
    SO if in a month you where to gat files for the different dimensions and I guess the fact table how does that get up dated. I mean does the customer know what you need record be record to insert new data for each of the dimension keys and measures? I would guess not. I would suspect that they send you the new data for the dimensions and some how you extract from this the data you need for the Fact table.
    So I am still not quite clear on how you populate the Fact table with new data?
    Thank you

    Typically, business will send you a flat file, or you execute a SQL, which contains columns for both measurements and dimensions. This is staged to a table or perhaps in your ETL tool's buffer, and then you insert/update the dimension and fact tables from that. How exactly the tables are inserted/updated depends on the type of dimension (SCD 1, 2, 3, etc.) and the type and grain of fact table (transactional, periodic snapshot, accumulating snapshot, etc.). Again, read Kimball's Data Warehouse Toolkit book, the first couple of chapters will at least clarify the concept of how star-schema data modeling works.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • That makes it clearer thanks , and I did get his book and am reading through it, you are right it is an excellent book.
    Thanks

  • itmasterw 60042 - Friday, March 31, 2017 6:09 AM

    Thanks again for the reply xsevensinzx. But now that you have created your Fact table, what I am wondering is how do you update it.
    SO if in a month you where to gat files for the different dimensions and I guess the fact table how does that get up dated. I mean does the customer know what you need record be record to insert new data for each of the dimension keys and measures? I would guess not. I would suspect that they send you the new data for the dimensions and some how you extract from this the data you need for the Fact table.
    So I am still not quite clear on how you populate the Fact table with new data?
    Thank you

    The basics here are that a fact table is typically designed to model some sort of business process, such as your Sales example, at a transactional level.  The dimensions contain just the descriptive attributes of the transactions, so those would only change if they added new products, new stores, etc.  After you update those with the most recent information, then you can add the new sales transactions to your fact table.  You probably don't want to change any of the existing fact rows, unless there was some sort of error.  Just try to keep fact rows transactional, at the most detailed level that is needed to model the business process.  (this is called the grain of the table)

    I also highly recommend the Kimball methodology, in addition to their books, you can find some resources online here:
    http://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/
    http://decisionworks.com/data-warehouse-business-intelligence-resources/

  • Thank you

  • At a very high level, keep in mind that: 
    facts are numbers, 
    dimensions are words (to describe the numbers). 

    As for loading , that can depend on your business rules. Typically you do get a data feed in the form of a flat file. But not always. You have to go sometime and decide how much information from the operation database(s) to get with regard to date. For example, if you are a retailer with a 30 day return policy, then a sales fact in your fact table over 30 days old will not (barring rare circumstances related to your business - but we just assume the 30 day return policy here) be subject to an update. As a result you only pull those records 30 days and newer to see if they are already in the fact table. If they are not you add them (a new sale- easy). If they are you apply an "update" here (not so straight forward- caution). I use the term update rather loosely. Given a record that is subject to an update (already exists in the target fact table) then you can have the situation where that purchase was later returned. To know you need the records in a format where you have a fact such as factSale > purchaseDate , ..., returnDate. Kimball calls these accumulating fact tables. Changes to other fields usually indicate a correction of some sort (like the source system had the SKU number incorrectly entered). So you still want to compare the two records , update the existing from the source. 
    Hope this helps you a bit. I echo the thought that the Kimball books cover things very intelligently.

    ----------------------------------------------------

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply