Pattern to load incremental data - MERGE or TRUNCATE and INSERT

  • leehbi

    SSCrazy

    Points: 2717

    Hi,
    There mainly 2 patterns for incrementally loading data for facts.
    1. Load new records to staging
    2. Truncate any of these rows from Fact Sales
    3. Insert records from staging

    This is good as it saves having to update records that have changed.  

    The other pattern :
    1. Load to staging
    2. Do a MERGE into fact to upload New/Changed records
    Of course it relies on being able to get new/updated rows from the source into staging.
    I've mostly got experience of the latter.  My question is when would pattern 1 be suitable over the merge?

  • Martin Schoombee

    SSCoach

    Points: 19010

    The first pattern (truncate & load) is suitable in cases where you either have: 

    • A very small fact table
    • A fact table where most records change and will have to be updated in any event
    • A snapshot fact table, assuming that you're reloading a specific snapshot and using partition switching
  • Jonathan AC Roberts

    SSCoach

    Points: 16685

    leehbi - Friday, November 16, 2018 9:34 AM

    Hi,
    There mainly 2 patterns for incrementally loading data for facts.
    1. Load new records to staging
    2. Truncate any of these rows from Fact Sales
    3. Insert records from staging

    This is good as it saves having to update records that have changed.  

    The other pattern :
    1. Load to staging
    2. Do a MERGE into fact to upload New/Changed records
    Of course it relies on being able to get new/updated rows from the source into staging.
    I've mostly got experience of the latter.  My question is when would pattern 1 be suitable over the merge?

    A truncate and full load is useful if you need to re-initialise the data warehouse. Maybe some changes have been made that the merge process don't deal with or some columns have been added that won't be merged. On the data-warehouse I maintain we run a full-load (truncate most of the tables) on a Sunday when there is not much usage. All the other dates an incremental load is run.

  • leehbi

    SSCrazy

    Points: 2717

    Thanks for comments - partitioning is the way to go with snapshots.

  • Jeff Moden

    SSC Guru

    Points: 993883

    leehbi - Monday, November 19, 2018 2:04 AM

    Thanks for comments - partitioning is the way to go with snapshots.

    Careful now... While partitioning is a great way to move temporal sections of data in and out of a table, it won't determine if rows have been updated.  It will also (in many cases) actually slow down queries compared to a monolithic structure.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply