Implementing, loading and optimizing a Type 2 Slowly Changing Fact Table

  • Herman van Midden

    SSChasing Mays

    Points: 643

    Comments posted to this topic are about the item Implementing, loading and optimizing a Type 2 Slowly Changing Fact Table

  • SQL_Kills

    Hall of Fame

    Points: 3139

    Firsly I got the following message when trying to create the table SCF_Invoices

    Msg 102, Level 15, State 1, Line 2

    Incorrect syntax near '('."

    Which I then amended the script and remove the bracket by the InvoiceID

    I ten ran the script again and got the below error message:

    Msg 1776, Level 16, State 0, Line 1

    There are no primary or candidate keys in the referenced table 'DI_Customer' that match the referencing column list in the foreign key 'FK_SCF_Invoices_Customer'.

    Msg 1750, Level 16, State 0, Line 1

    Could not create constraint. See previous errors.

  • Herman van Midden

    SSChasing Mays

    Points: 643

    Hi!

    I am sorry but it seems I made an error in the code inserted in the article.

    I forgot to define PRIMARY KEYS for the DI_Customer and DI_Bucket tables on the CustomerID and BucketID columns respectively.

    The attached sample code is correct and has been tested. Please download the "SCF_Invoices corrected.zip" file and run the "Create SQL_Invoices.sql" script. A description can be found in the README file.

    Regards,

    Herman

  • SQL_Kills

    Hall of Fame

    Points: 3139

    Hi,

    No problem, where do I find this attached zip file?

    Thanks

  • Herman van Midden

    SSChasing Mays

    Points: 643

    You can find the zip file at the bottom of the article under Resources.

  • Renato10

    Valued Member

    Points: 64

    The Dimensional Modeling Methodology does not know Type 2 Slowly Changing Fact Table. The Dimensional Modeling Methodology does know Slowly Changing Dimensions.

  • Herman van Midden

    SSChasing Mays

    Points: 643

    Renato10: I know, but I am not the first to use this term when applying SCD logic on a fact table. Have a look at http://brentgreenwood.blogspot.com/2011/12/slowly-changing-facts.html for example.

    The solution presented works well for me and my customer. I would be happy to learn if there are better solutions.

    Regards,

    Herman

  • Leo Peysakhovich

    Hall of Fame

    Points: 3880

    "ALTER TABLE SCF_Invoices ADD CONSTRAINT PK_SCF_Invoices PRIMARY KEY CLUSTERED

    ( ValidFrom ASC,

    InvoiceIDO ASC

    );

    Since data is added for consecutive days, this PRIMARY KEY also will ensure new records are always appended at the end of the table."

    With such index construction the statement above is not necessary true. It has dependency from InvoiceIDO and can lead to the situations where record will not be at the end of the table. In this case you may have some performance penalties.

  • nick.mcdermaid

    Hall of Fame

    Points: 3779

    So in summary woould you say that load performance was improved by a large reduction of records in the target fact table, or by the addition of indexes?

  • sneumersky

    SSCertifiable

    Points: 7667

    Folks using the Kimball Methodology would refer to this concept as an accumulating snapshot fact table. For more information on accumulating snapshot fact tables see the following articles:

    http://www.kimballgroup.com/2002/06/13/design-tip-37-modeling-a-pipeline-with-an-accumulating-snapshot/

    http://www.kimballgroup.com/2010/12/01/design-tip-130-accumulating-snapshots-for-complex-workflows/

    http://www.kimballgroup.com/2012/05/01/design-tip-145-time-stamping-accumulating-snapshot-fact-tables/

  • Herman van Midden

    SSChasing Mays

    Points: 643

    Nick,

    The main improvement was achieved by reducing the total number of records. In addition, the indexes improved load and query speed, while not substantially increasing storage speed.

    In my first attempts, using an ordinary fact table, index maintenance noticeably decreased loading speed when I tried to optimize the process.

    Regards,

    Herman

  • Herman van Midden

    SSChasing Mays

    Points: 643

    Nick,

    The main improvement was definitely achieved by reducing the number of records.

    But also the indexes have a considerable influence. With the correct index loading speed was reduced by half.

    However, every new index requires index maintenance operation, which increases processing cost with 20% or so. So the amount of indexes should be kept as low as possible.

    Regards,

    Herman

  • mb97

    SSC Enthusiast

    Points: 114

    Since you don't delete from your Invoice "SCF" table, doesn't the PK give you a violation when you do reloads from the same day?

  • Bruce W Cassidy

    SSChampion

    Points: 12607

    Dimensional modelling was something invented by Ralph Kimball (although I'm sure Margy Ross had a great deal of input as well). They came up with the naming standards for different types of dimensions. On to be sure, there are also different types of fact tables (snapshot, transactional, aggregate/summary, etc), but I don't believe they ever referred to a "type-2 fact table". Interesting idea; although it basically seems like a variety of snapshot fact table, with different results over time.

    Ralph Kimball has a nice intro to the three basic types of dimensions here: http://www.kimballgroup.com/2008/08/21/slowly-changing-dimensions/

    Margy Ross has another article on the same site, which explores additional types of changing dimensions, usually by adding snow-flakes. As an aside, she downplays the numbering and instead names the different types.

    A similar article introducing facts is here: http://www.kimballgroup.com/2008/11/05/fact-tables/

    You can see that "type-2" doesn't really apply well to fact tables -- because there is no "type-1", etc. Instead, all you are really doing is creating what looks to me like a combination of transaction fact and snapshot fact. I'm not saying that's a bad idea -- in fact, I think I've done similar things myself in the past. Calling it a "type-2" though... well, you'll get all of the dimensional modellers popping on to tell you that it's a type-2 dimension, not fact. 😀

    Oh, I ran into that same bug in SQL Server with merge output and relational integrity, and ended up solving it in much the same way too.

  • mb97

    SSC Enthusiast

    Points: 114

    In a way, it's all semantics. Whether SCF or SCD really depends on usage, nothing more. In the stuff that I work with, there is no question that invoices are dimensions- the invoices are attributes of what you might call higher level facts (without going into a deep explanation), and so invoices are like the middle layer of a snowflake design. That doesn't mean that is their function in your case, though. It seems like you're just borrowing usefully from dimensional design principles to efficiently implement transaction history- nothing wrong with that.

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply