Implementing, loading and optimizing a Type 2 Slowly Changing Fact Table

  • Hi mb67!

    No there is no PK violation because a new InvoiceID is generated for the previously deleted record. So we end up with two records, one with a negative validity period (ValidFrom > ValidTo) and one valid record, each with a different InvoiceID

    Regards,

    Herman

  • Thanks to all you guys on the update on dimensional modelling.

    Since I also attended Kimballs courses I am aware of the 'problem'. Maybe I should have called my solution a timestamped accumulated snapshot, but SCD's are much better known and its implementation using MS SQL is widely available.

    Of course a Type 1 SCF does'nt make sense (thanks Bruce), so why define it as a Type 2 SCF? I think everybody understood the point of the story, and that was what I wanted to reach.

    And since I am not the first one to use it, it even a topic on Kimballs forum http://forum.kimballgroup.com/t1588-slowly-changing-facts I decided to use this name not realizing it would start this discussion :Whistling:

    So I borrowed some useful dimensional design principles to efficiently implement transaction history (thanks mb67).

    I hope you like my solution, although its theoretically maybe not correct, even if Invoices are dimenions, I am mixing facts with dimensions, it solves a lot of practical problems for me.

    Regards,

    Herman

  • Your PK is defined on InvoiceIDO, not InvoiceID.

    ALTER TABLE SCF_Invoices ADD CONSTRAINT PK_SCF_Invoices PRIMARY KEY CLUSTERED

    ( ValidFrom ASC,

    InvoiceIDO ASC

    )

  • Slowly changing fact is also a term used by David Marco (who I believe is/was affiliated with the Inmon side of things) in his book Building The Metadata Repository (Wiley, 2000); so it has definitely been around for quite some time. It's all a "I say TO-MAY-TOE, you say TO-MAH-TOE" thing 🙂

  • Herman van Midden (8/20/2013)


    [...]I borrowed some useful dimensional design principles to efficiently implement transaction history.

    I hope you like my solution, although its theoretically maybe not correct, even if Invoices are dimenions, I am mixing facts with dimensions, it solves a lot of practical problems for me.

    That's what it's all about. I just try to avoid the "type-2" naming thing, even for dimensions.

  • Hi mb97!

    Yes, you are right. I mixed up some things and simply assumed I did it correct. So I had to do my homework again. The correct indexes are as follows:

    The primary key:

    ALTER TABLE SCF_Invoices ADD CONSTRAINT PK_SCF_Invoices PRIMARY KEY CLUSTERED

    ( ValidFrom ASC,

    InvoiceID ASC

    )

    And the index to aid the MERGE statement

    CREATE NONCLUSTERED INDEX [IX_IsCurrent_InvoiceIDO_InclData] ON [dbo].[SCF_Invoices]

    ( [InvoiceIDO] ASC

    )

    INCLUDE ([IsCurrent],[ValidFrom],[ValidTo],[InvoiceID],[CustomerID],[DueDate],[IsOpen],[BucketID],[OrigAmnt],[RemAmnt])

    WHERE ([IsCurrent]=(1))

    Thanks for checking my code! I will try to correct the article and sample code.

    Regards,

    Herman

  • Indexing SCDs drives me nuts, that's why I was looking at it so carefully- I still haven't seen a good general solution for it, especially if you need to access non-current versions by comparing to a fact effective date. I sometimes think it might be better to index on the SCD end date instead of the start date, but that's not clean, either.

    -Mike

  • The Microsoft Data Warehouse Toolkit has a good general indexing strategy for type 2 SCDs. It provides an excellent starting point.

  • sneumersky: I guess I should try to get a copy of the Microsoft Data Warehouse Toolkit. Thanks for the tip.

    Can you give me some hints on the indexing strategy they suggest?

    -Herman

Viewing 9 posts - 16 through 23 (of 23 total)

You must be logged in to reply to this topic. Login to reply