Pattern to load incremental data - MERGE or TRUNCATE and INSERT

Question

Post reply

Pattern to load incremental data - MERGE or TRUNCATE and INSERT

leehbi

SSCrazy

Points: 2874
More actions
November 16, 2018 at 9:34 am

#380876

Hi,
There mainly 2 patterns for incrementally loading data for facts.
1. Load new records to staging
2. Truncate any of these rows from Fact Sales
3. Insert records from staging
This is good as it saves having to update records that have changed.
The other pattern :
1. Load to staging
2. Do a MERGE into fact to upload New/Changed records
Of course it relies on being able to get new/updated rows from the source into staging.
I've mostly got experience of the latter. My question is when would pattern 1 be suitable over the merge?

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

Martin Schoombee SSCoach Points: 19116 More actions · Answer 1

The first pattern (truncate & load) is suitable in cases where you either have:

A very small fact table
A fact table where most records change and will have to be updated in any event
A snapshot fact table, assuming that you're reloading a specific snapshot and using partition switching

Jonathan AC Roberts SSCoach Points: 18263 More actions · Answer 2

leehbi - Friday, November 16, 2018 9:34 AM
Hi,
There mainly 2 patterns for incrementally loading data for facts.
1. Load new records to staging
2. Truncate any of these rows from Fact Sales
3. Insert records from staging
This is good as it saves having to update records that have changed.
The other pattern :
1. Load to staging
2. Do a MERGE into fact to upload New/Changed records
Of course it relies on being able to get new/updated rows from the source into staging.
I've mostly got experience of the latter. My question is when would pattern 1 be suitable over the merge?

A truncate and full load is useful if you need to re-initialise the data warehouse. Maybe some changes have been made that the merge process don't deal with or some columns have been added that won't be merged. On the data-warehouse I maintain we run a full-load (truncate most of the tables) on a Sunday when there is not much usage. All the other dates an incremental load is run.

leehbi SSCrazy Points: 2874 More actions · Answer 3

Thanks for comments - partitioning is the way to go with snapshots.

Jeff Moden SSC Guru Points: 1003865 More actions · Answer 4

leehbi - Monday, November 19, 2018 2:04 AM
Thanks for comments - partitioning is the way to go with snapshots.

Careful now... While partitioning is a great way to move temporal sections of data in and out of a table, it won't determine if rows have been updated. It will also (in many cases) actually slow down queries compared to a monolithic structure.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)