SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Upsert Dimension Table


Upsert Dimension Table

Author
Message
Alan Hargreaves
Alan Hargreaves
Grasshopper
Grasshopper (16 reputation)Grasshopper (16 reputation)Grasshopper (16 reputation)Grasshopper (16 reputation)Grasshopper (16 reputation)Grasshopper (16 reputation)Grasshopper (16 reputation)Grasshopper (16 reputation)

Group: General Forum Members
Points: 16 Visits: 108

SSIS is memory hungry so is you use the LookUp trans will you will need to be mindful of how SSIS will chew all your memory. I've used the process I outlined above on up to 10 million rows (which is not a lot) but SSIS manages this very well and very fast.

Simply build your package using Execute SQL taks between dataflows to faciliate my process.


uday balaji
uday balaji
SSC Rookie
SSC Rookie (36 reputation)SSC Rookie (36 reputation)SSC Rookie (36 reputation)SSC Rookie (36 reputation)SSC Rookie (36 reputation)SSC Rookie (36 reputation)SSC Rookie (36 reputation)SSC Rookie (36 reputation)

Group: General Forum Members
Points: 36 Visits: 1
Oh cool! I should have thought of that before... thanks a lot :o)
uday balaji
uday balaji
SSC Rookie
SSC Rookie (36 reputation)SSC Rookie (36 reputation)SSC Rookie (36 reputation)SSC Rookie (36 reputation)SSC Rookie (36 reputation)SSC Rookie (36 reputation)SSC Rookie (36 reputation)SSC Rookie (36 reputation)

Group: General Forum Members
Points: 36 Visits: 1
I'm not handling such large data. But will be mindful of that... thanks for the warning.
Peter Kuzniewski
Peter Kuzniewski
Valued Member
Valued Member (57 reputation)Valued Member (57 reputation)Valued Member (57 reputation)Valued Member (57 reputation)Valued Member (57 reputation)Valued Member (57 reputation)Valued Member (57 reputation)Valued Member (57 reputation)

Group: General Forum Members
Points: 57 Visits: 562
Steve,
I think one of your indexes is slightly sick. We already know all about the sale. Now, we need the Upsert article, please.

(We all hope this is NOT a sign of things to come)

Regards,
Peter
Steve Jones
Steve Jones
SSC Guru
SSC Guru (61K reputation)SSC Guru (61K reputation)SSC Guru (61K reputation)SSC Guru (61K reputation)SSC Guru (61K reputation)SSC Guru (61K reputation)SSC Guru (61K reputation)SSC Guru (61K reputation)

Group: Administrators
Points: 61379 Visits: 19097
OK, OK, apologies. That was the weirdest thing. The process we use to load the articles is generating an old one on disk for some reason. Did it again this morning and I didn't catch it last night.

It's up there now.

Follow me on Twitter: @way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com
Loner
Loner
Hall of Fame
Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)

Group: General Forum Members
Points: 3526 Visits: 3350

The surrograte key is an identity field, how can it have zero in it?

What I usually do is loading all the data in a staging table. Then using left join to insert the new records into the dimension table. The dimension table does not need to have the surrogate key.

INSERT INTO Dim(Natural_key, attribute1, attribute2, load_time)
SELECT s.natural_key, s.attribute1, s.attribute2, GETDATE()
FROM staging_table s
LEFT JOIN dw.dbo.dimension dim ON s.natural_key = dim.natural_key WHERE dim.natural_key is NULL

BTW, your step 2 insert statement, did you forget to insert the natural_key?

If the records are already in the dimension table, then update the record with the most recent information.

Update d set d.attitute1=s.attitute1, d.attitute2 = s.attitute2, d.load_time=GETDATE() FROM dw.dbo.dimension d inner join staging_table s where (d.natural_key = s.natural_key) and (d.attitute1 <> s.attitute1 or d.attitute2 <> s.attitute2)

I don't know what kind of benefit to do SCD3. I think the effective date and expiration date is good enough in dimension table. In SCD3 you are changing the dimension table structure. What if the attribute changes 10 times, you would have a big table.


Rasool Omidakhsh
Rasool Omidakhsh
SSC-Enthusiastic
SSC-Enthusiastic (187 reputation)SSC-Enthusiastic (187 reputation)SSC-Enthusiastic (187 reputation)SSC-Enthusiastic (187 reputation)SSC-Enthusiastic (187 reputation)SSC-Enthusiastic (187 reputation)SSC-Enthusiastic (187 reputation)SSC-Enthusiastic (187 reputation)

Group: General Forum Members
Points: 187 Visits: 24
In SSIS you could use the <Slowly Changing Dimension> component which basically does the UPSERT.
cliffb
cliffb
Right there with Babe
Right there with Babe (795 reputation)Right there with Babe (795 reputation)Right there with Babe (795 reputation)Right there with Babe (795 reputation)Right there with Babe (795 reputation)Right there with Babe (795 reputation)Right there with Babe (795 reputation)Right there with Babe (795 reputation)

Group: General Forum Members
Points: 795 Visits: 438

Is there a reason you chose to use a list of "OR" vs CHECKSUM in your update statement? I realize CHECKSUM is not the same as CRC but it works just fine if comparing a single row.

Example:

UPDATE dim
SET dim.attribute1 = tmp.attribute1,
dim.attribute1 = tmp.attribute2,
dim.load_time = tmp.load_time
FROM dw.dbo.dimension dim
INNER JOIN #dim tmp
ON tmp.surrogate_key = dim.surrogate_key
WHERE tmp.surrogate_key <> 0
AND CHECKSUM(dim.attribute1,dim.attribute2) <> CHECKSUM(tmp.attribute1,tmp.attribute2)

This is cleaner to read for SCD Type 1 changes and, at least in our environment, seems to be a bit quicker.





uday balaji
uday balaji
SSC Rookie
SSC Rookie (36 reputation)SSC Rookie (36 reputation)SSC Rookie (36 reputation)SSC Rookie (36 reputation)SSC Rookie (36 reputation)SSC Rookie (36 reputation)SSC Rookie (36 reputation)SSC Rookie (36 reputation)

Group: General Forum Members
Points: 36 Visits: 1

I tried using the Slowly Changing Dimension transormation. I basically need two paths - One to insert the new record ('New Output Path' to OLE DB Destination) and one to update changed attributes in an existing record based on a specified business key ('Changed Attribute Updates Output' to OLE DB Destination).

The insertion works fine but when i modify the same record in source data (retaining the same primary key) the data gets routed in the Changed Attribute path but fails with a primary key violation (it has attempted to insert into the destination table). Am i missing a setting or something that forces update rather than insert?


Loner
Loner
Hall of Fame
Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)

Group: General Forum Members
Points: 3526 Visits: 3350

cliffb,

I could use checksum, I was just in a hurry so I wrote it in using 'OR' especially there were only two values to check. However I would not use surrograte key to compare when I update the dimension table, I would use the natural key.


Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search