Designing a >12 billion row fact table

  • I have an existing fact table that contains over 12 billion rows (+- 900GB).

    This table is not partitioned (Primary file group).

    The table contains monthly financial data which comprises of different "versions" (ie. Actuals, Budget and multiple Profit Forecasts).

    The problem we are experiencing is two-fold:

    1) The monthly ETL process, which inserts the latest month's data (+- 20GB), takes hours to complete

    2) The SSAS cube, which reads the latest three full financial years worth of data (but only certain "versions"), takes a long time to process

    I was thinking of partitioning the table so that we could improve the ETL speeds, by doing partition swaps, and at the same time make archiving of older data easier.

    I would also like to create indexes on the table but I do not want to slow the ETL down, and I also don't want to use up a lot of disk space with the indexes.

    Further info on the current environment:

    - SQL 2012

    - SQL instance and SSAS instance (multidimensional) are on separate servers

    - The cube reads from views which are located on the SQL instance (the fact view aggregates the data)

    - The cube does a full process of three financial years each month

    What would you suggest I do in this situation? What would be the best way to design the fact table?

    Edit: Due to the cube only reading certain versions, but across the majority of the table, would it be better to partition on version instead of date?

    Kind regards

    Paulo

  • Yes, you need to partition the table for improvements to ETL (SSIS) and partition the Cube and only process updated partitions.

    Whether by date or version does not matter. What ever works best with your skills and knowledge.

    Thomas

    Thomas LeBlanc, MVP Data Platform Consultant

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply