Implementing, loading and optimizing a Type 2 Slowly Changing Fact Table

Question

Post reply

Implementing, loading and optimizing a Type 2 Slowly Changing Fact Table

Herman van Midden

SSChasing Mays

Points: 643
More actions
August 18, 2013 at 10:58 pm

#297006

Comments posted to this topic are about the item Implementing, loading and optimizing a Type 2 Slowly Changing Fact Table

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply

SQL_Kills Hall of Fame Points: 3154 More actions · Answer 1

Firsly I got the following message when trying to create the table SCF_Invoices

Msg 102, Level 15, State 1, Line 2

Incorrect syntax near '('."

Which I then amended the script and remove the bracket by the InvoiceID

I ten ran the script again and got the below error message:

Msg 1776, Level 16, State 0, Line 1

There are no primary or candidate keys in the referenced table 'DI_Customer' that match the referencing column list in the foreign key 'FK_SCF_Invoices_Customer'.

Msg 1750, Level 16, State 0, Line 1

Could not create constraint. See previous errors.

Herman van Midden SSChasing Mays Points: 643 More actions · Answer 2

Hi!

I am sorry but it seems I made an error in the code inserted in the article.

I forgot to define PRIMARY KEYS for the DI_Customer and DI_Bucket tables on the CustomerID and BucketID columns respectively.

The attached sample code is correct and has been tested. Please download the "SCF_Invoices corrected.zip" file and run the "Create SQL_Invoices.sql" script. A description can be found in the README file.

Regards,

Herman

SQL_Kills Hall of Fame Points: 3154 More actions · Answer 3

Hi,

No problem, where do I find this attached zip file?

Thanks

Herman van Midden SSChasing Mays Points: 643 More actions · Answer 4

You can find the zip file at the bottom of the article under Resources.

Renato10 SSC Veteran Points: 205 More actions · Answer 5

The Dimensional Modeling Methodology does not know Type 2 Slowly Changing Fact Table. The Dimensional Modeling Methodology does know Slowly Changing Dimensions.

Herman van Midden SSChasing Mays Points: 643 More actions · Answer 6

Renato10: I know, but I am not the first to use this term when applying SCD logic on a fact table. Have a look at http://brentgreenwood.blogspot.com/2011/12/slowly-changing-facts.html for example.

The solution presented works well for me and my customer. I would be happy to learn if there are better solutions.

Regards,

Herman

Leo Peysakhovich Hall of Fame Points: 3973 More actions · Answer 7

"ALTER TABLE SCF_Invoices ADD CONSTRAINT PK_SCF_Invoices PRIMARY KEY CLUSTERED

( ValidFrom ASC,

InvoiceIDO ASC

);

Since data is added for consecutive days, this PRIMARY KEY also will ensure new records are always appended at the end of the table."

With such index construction the statement above is not necessary true. It has dependency from InvoiceIDO and can lead to the situations where record will not be at the end of the table. In this case you may have some performance penalties.

nick.mcdermaid Hall of Fame Points: 3839 More actions · Answer 8

So in summary woould you say that load performance was improved by a large reduction of records in the target fact table, or by the addition of indexes?

sneumersky SSCertifiable Points: 7667 More actions · Answer 9

Folks using the Kimball Methodology would refer to this concept as an accumulating snapshot fact table. For more information on accumulating snapshot fact tables see the following articles:

http://www.kimballgroup.com/2002/06/13/design-tip-37-modeling-a-pipeline-with-an-accumulating-snapshot/

http://www.kimballgroup.com/2010/12/01/design-tip-130-accumulating-snapshots-for-complex-workflows/

http://www.kimballgroup.com/2012/05/01/design-tip-145-time-stamping-accumulating-snapshot-fact-tables/

Herman van Midden SSChasing Mays Points: 643 More actions · Answer 10

Nick,

The main improvement was achieved by reducing the total number of records. In addition, the indexes improved load and query speed, while not substantially increasing storage speed.

In my first attempts, using an ordinary fact table, index maintenance noticeably decreased loading speed when I tried to optimize the process.

Regards,

Herman

Herman van Midden SSChasing Mays Points: 643 More actions · Answer 11

Nick,

The main improvement was definitely achieved by reducing the number of records.

But also the indexes have a considerable influence. With the correct index loading speed was reduced by half.

However, every new index requires index maintenance operation, which increases processing cost with 20% or so. So the amount of indexes should be kept as low as possible.

Regards,

Herman

mb97 SSC Enthusiast Points: 114 More actions · Answer 12

Since you don't delete from your Invoice "SCF" table, doesn't the PK give you a violation when you do reloads from the same day?

Bruce W Cassidy SSChampion Points: 12668 More actions · Answer 13

Dimensional modelling was something invented by Ralph Kimball (although I'm sure Margy Ross had a great deal of input as well). They came up with the naming standards for different types of dimensions. On to be sure, there are also different types of fact tables (snapshot, transactional, aggregate/summary, etc), but I don't believe they ever referred to a "type-2 fact table". Interesting idea; although it basically seems like a variety of snapshot fact table, with different results over time.

Ralph Kimball has a nice intro to the three basic types of dimensions here: http://www.kimballgroup.com/2008/08/21/slowly-changing-dimensions/

Margy Ross has another article on the same site, which explores additional types of changing dimensions, usually by adding snow-flakes. As an aside, she downplays the numbering and instead names the different types.

A similar article introducing facts is here: http://www.kimballgroup.com/2008/11/05/fact-tables/

You can see that "type-2" doesn't really apply well to fact tables -- because there is no "type-1", etc. Instead, all you are really doing is creating what looks to me like a combination of transaction fact and snapshot fact. I'm not saying that's a bad idea -- in fact, I think I've done similar things myself in the past. Calling it a "type-2" though... well, you'll get all of the dimensional modellers popping on to tell you that it's a type-2 dimension, not fact. 😀

Oh, I ran into that same bug in SQL Server with merge output and relational integrity, and ended up solving it in much the same way too.

mb97 SSC Enthusiast Points: 114 More actions · Answer 14

In a way, it's all semantics. Whether SCF or SCD really depends on usage, nothing more. In the stuff that I work with, there is no question that invoices are dimensions- the invoices are attributes of what you might call higher level facts (without going into a deep explanation), and so invoices are like the middle layer of a snowflake design. That doesn't mean that is their function in your case, though. It seems like you're just borrowing usefully from dimensional design principles to efficiently implement transaction history- nothing wrong with that.