Pattern to load incremental data - MERGE or TRUNCATE and INSERT

  • Hi,
    There mainly 2 patterns for incrementally loading data for facts.
    1. Load new records to staging
    2. Truncate any of these rows from Fact Sales
    3. Insert records from staging

    This is good as it saves having to update records that have changed.  

    The other pattern :
    1. Load to staging
    2. Do a MERGE into fact to upload New/Changed records
    Of course it relies on being able to get new/updated rows from the source into staging.
    I've mostly got experience of the latter.  My question is when would pattern 1 be suitable over the merge?

  • The first pattern (truncate & load) is suitable in cases where you either have: 

    • A very small fact table
    • A fact table where most records change and will have to be updated in any event
    • A snapshot fact table, assuming that you're reloading a specific snapshot and using partition switching
  • leehbi - Friday, November 16, 2018 9:34 AM

    Hi,
    There mainly 2 patterns for incrementally loading data for facts.
    1. Load new records to staging
    2. Truncate any of these rows from Fact Sales
    3. Insert records from staging

    This is good as it saves having to update records that have changed.  

    The other pattern :
    1. Load to staging
    2. Do a MERGE into fact to upload New/Changed records
    Of course it relies on being able to get new/updated rows from the source into staging.
    I've mostly got experience of the latter.  My question is when would pattern 1 be suitable over the merge?

    A truncate and full load is useful if you need to re-initialise the data warehouse. Maybe some changes have been made that the merge process don't deal with or some columns have been added that won't be merged. On the data-warehouse I maintain we run a full-load (truncate most of the tables) on a Sunday when there is not much usage. All the other dates an incremental load is run.

  • Thanks for comments - partitioning is the way to go with snapshots.

  • leehbi - Monday, November 19, 2018 2:04 AM

    Thanks for comments - partitioning is the way to go with snapshots.

    Careful now... While partitioning is a great way to move temporal sections of data in and out of a table, it won't determine if rows have been updated.  It will also (in many cases) actually slow down queries compared to a monolithic structure.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply