Datawarehouse Incremental Load Large Datasets

  • Hi,

    I'm in the process of developing a Data Warehouse for a new product.

    The fact table, we envisage I be very large hundreds of billions of rows, getting updated incrementally with approx 2 million rows an hour. Initial testing shows that the inserts (1.8 million rows) are very slow due to the indexes on the fact table. One suggestion is to drop the index's and then rebuild them following the inserts. This is fine whilst the Data Warehouse is in it's infancy but when it reaches billions of rows it would be too inefficient.

    I have been looking at other strategies and was wondering about partitioning the table, loading into a new partition, creating the index and then switching it in to the fact table.

    I was wondering has anyone had any experience of using this method, any pros and cons, or has anyone got any better alternative ideas.

  • Hi Simon,

    I worked on a DW project that used exactly this strategy of loading the incremental rows into a partition and then switching it into the main table. We disabled the indexes on the switch in partition, load the data, rebuilt the indexes then switch in.

    One big downside is that you need a schema lock to switch partitions so anyone querying your fact table will prevent you switching in the new partition. I cannot remember exactly how we solved this but we were looking at creating a snapshot and pointing the reporting tool to the snapshot which would free up the DB for partition switching. We would then drop and recreate the snapshot.

    If you are loading data every couple of hours you might want to consider have a current day fact table with hourly partitions and then migrate that to the main fact table with daily partitions as part of an end of day/start of day process. You would need a view over the two tables to allow the reporting to be seamless to the users.

    Jez

  • Hi Jez,

    Thanks for the update it's given me a lot of food for thought.

    I certainly wouldn't have considered the schema locking during the switch, something that I'm clearly going to have to investigate.

    However I think this is the way I'm going to go.

    Regards,

    Simon

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply