Financial Data warehouse design - which one is better - Snowflake or star

  • Hey guys,

    When it comes to Finance/Super Annuation Data warehouse design, which one is better - Snowflake or star?

    Any ideas?

  • To be honest, it all depends on what suits your own environment and the method you will be displaying the data in.

    For example, if you want to expose the data as a cube through some sort of UDM, then Star Schema is best. IE You can have product type and product category in one table called product. When you come to build the product dimension, it will be easier and faster to build from this single table.

    If you are looking to write sql quieries, then you may want to have a snowflake schema. IE You may want to return all products by type for category of xxxID. The query should work quicker as you are restricting the amount of data to return using indexes on both product tables.

    these are crude examples, but should give you some idea of where to start. I wouls also take a look at ralph kimball site - http://www.kimballgroup.com/html/designtips.html

    Another factor will be the structure of the source data. If you do not have primary keys you will need to generate surrogate keys. Doing this in a snowflake schema means having to constantly check up surrogate key values along a number of tables, which will obviously require more sql to write and maintain.


    ------------------------------
    The Users are always right - when I'm not wrong!

  • absolutely, all depends. Star schema are easier to maintain and obviously less complex than a snowflake. I've used both , it's the end result that is important, if it requires a snowflake then so be it.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • Thanks Guys.

    The company I just joined has a DWH, that is approximately 300 GB, inlcuding 128 GB of Indexes. I have been asked to look into the model to see where the potential problems are.

    Now I know that the OLTP model from where the data is coming is only 66 GB, so I am not sure if the DWH has just been blown too much in size or not. I was thinking to start with the following:

    1. Run the profiler for the whole day (including the time when ETL and other processes run).
    2. Then review the results and feed it to the Indexing wizard (mentioned by Microsoft).
    3. Meanwhile, Understand the ER diagram of the DWH (the guy who already looks into it, says it is a Star schema) and
    4. If there are any dimensions that do not change much, add them to the fact table. But then If I add them to the fact table, I would still be blowing the size, wouldn't I?

    Please tell me your thoughts on this, or what do you think I should start with?

    Your help is much appreciated. Thanks.

  • Start by looing at the number of the dimensions being used and the number of members within them. It could be that you are using a dimension that contains thousands of members, in which case, the cube is not the recommended architecture. Dimensions sshould be very few and contain minimum amount of members (less than 100 in my opinion).

    Remember that when you create a cube you are basically asking the cube builder to create every possible combination of members and measures. IE If you have 2 dimensions at a single level with 4 members in each and one measure, this equates to 4 X 4 X 4 = 64 lines. This is not exactly how the cube works, but it helps to think this way when designing them. As you have members with more levels (Year,Month,Day) this creates a major amount of more possible combinations.

    After looking at the structure of the cube, check with the wizard to see which dimensions are used the most. You can then posssibly target your efforts to rationilisation. Or perhaps, to split the single cube into multiple cubes.


    ------------------------------
    The Users are always right - when I'm not wrong!

  • A DW will always be much larger than it's oltp database as it stores effectively point in time snapshots and is likely to be denormalised.

    I disagree about dimension members - the numbers will always relate to the data being modelled - it may be these have to be large , this is often where snowflakes come in.

    If it's performance then probably the disk subsystem is at fault - is it load or extraction which is the problem? If you have raid 5 then consider changing to raid 10.

    The next obvious step is partitioning and making sure the indexes are actually right. A oft forgotten step during dw loads and such is to update the stats on the underlying tables - this may need doing pre and post load - test and see.

    Memory is quite important too - are you using enterprise / no of procs / amount of memory etc. etc.

    There are some whitepapers about optimising large cubes - have a search on ms web site.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply