SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Upsert Dimension Table


Upsert Dimension Table

Author
Message
Alan Hargreaves
Alan Hargreaves
Valued Member
Valued Member (68 reputation)Valued Member (68 reputation)Valued Member (68 reputation)Valued Member (68 reputation)Valued Member (68 reputation)Valued Member (68 reputation)Valued Member (68 reputation)Valued Member (68 reputation)

Group: General Forum Members
Points: 68 Visits: 108

SSIS is memory hungry so is you use the LookUp trans will you will need to be mindful of how SSIS will chew all your memory. I've used the process I outlined above on up to 10 million rows (which is not a lot) but SSIS manages this very well and very fast.

Simply build your package using Execute SQL taks between dataflows to faciliate my process.


uday balaji
uday balaji
Old Hand
Old Hand (366 reputation)Old Hand (366 reputation)Old Hand (366 reputation)Old Hand (366 reputation)Old Hand (366 reputation)Old Hand (366 reputation)Old Hand (366 reputation)Old Hand (366 reputation)

Group: General Forum Members
Points: 366 Visits: 1
Oh cool! I should have thought of that before... thanks a lot :o)
uday balaji
uday balaji
Old Hand
Old Hand (366 reputation)Old Hand (366 reputation)Old Hand (366 reputation)Old Hand (366 reputation)Old Hand (366 reputation)Old Hand (366 reputation)Old Hand (366 reputation)Old Hand (366 reputation)

Group: General Forum Members
Points: 366 Visits: 1
I'm not handling such large data. But will be mindful of that... thanks for the warning.
Peter Kuzniewski
Peter Kuzniewski
SSC Veteran
SSC Veteran (257 reputation)SSC Veteran (257 reputation)SSC Veteran (257 reputation)SSC Veteran (257 reputation)SSC Veteran (257 reputation)SSC Veteran (257 reputation)SSC Veteran (257 reputation)SSC Veteran (257 reputation)

Group: General Forum Members
Points: 257 Visits: 562
Steve,
I think one of your indexes is slightly sick. We already know all about the sale. Now, we need the Upsert article, please.

(We all hope this is NOT a sign of things to come)

Regards,
Peter
Steve Jones
Steve Jones
SSC Guru
SSC Guru (328K reputation)SSC Guru (328K reputation)SSC Guru (328K reputation)SSC Guru (328K reputation)SSC Guru (328K reputation)SSC Guru (328K reputation)SSC Guru (328K reputation)SSC Guru (328K reputation)

Group: Administrators
Points: 328157 Visits: 20105
OK, OK, apologies. That was the weirdest thing. The process we use to load the articles is generating an old one on disk for some reason. Did it again this morning and I didn't catch it last night.

It's up there now.

Follow me on Twitter: @way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com
Loner
Loner
SSChampion
SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)

Group: General Forum Members
Points: 11830 Visits: 3367

The surrograte key is an identity field, how can it have zero in it?

What I usually do is loading all the data in a staging table. Then using left join to insert the new records into the dimension table. The dimension table does not need to have the surrogate key.

INSERT INTO Dim(Natural_key, attribute1, attribute2, load_time)
SELECT s.natural_key, s.attribute1, s.attribute2, GETDATE()
FROM staging_table s
LEFT JOIN dw.dbo.dimension dim ON s.natural_key = dim.natural_key WHERE dim.natural_key is NULL

BTW, your step 2 insert statement, did you forget to insert the natural_key?

If the records are already in the dimension table, then update the record with the most recent information.

Update d set d.attitute1=s.attitute1, d.attitute2 = s.attitute2, d.load_time=GETDATE() FROM dw.dbo.dimension d inner join staging_table s where (d.natural_key = s.natural_key) and (d.attitute1 <> s.attitute1 or d.attitute2 <> s.attitute2)

I don't know what kind of benefit to do SCD3. I think the effective date and expiration date is good enough in dimension table. In SCD3 you are changing the dimension table structure. What if the attribute changes 10 times, you would have a big table.


Rasool Omidakhsh
Rasool Omidakhsh
Old Hand
Old Hand (341 reputation)Old Hand (341 reputation)Old Hand (341 reputation)Old Hand (341 reputation)Old Hand (341 reputation)Old Hand (341 reputation)Old Hand (341 reputation)Old Hand (341 reputation)

Group: General Forum Members
Points: 341 Visits: 24
In SSIS you could use the <Slowly Changing Dimension> component which basically does the UPSERT.
cliffb
cliffb
SSCrazy
SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)

Group: General Forum Members
Points: 2337 Visits: 438

Is there a reason you chose to use a list of "OR" vs CHECKSUM in your update statement? I realize CHECKSUM is not the same as CRC but it works just fine if comparing a single row.

Example:

UPDATE dim
SET dim.attribute1 = tmp.attribute1,
dim.attribute1 = tmp.attribute2,
dim.load_time = tmp.load_time
FROM dw.dbo.dimension dim
INNER JOIN #dim tmp
ON tmp.surrogate_key = dim.surrogate_key
WHERE tmp.surrogate_key <> 0
AND CHECKSUM(dim.attribute1,dim.attribute2) <> CHECKSUM(tmp.attribute1,tmp.attribute2)

This is cleaner to read for SCD Type 1 changes and, at least in our environment, seems to be a bit quicker.





uday balaji
uday balaji
Old Hand
Old Hand (366 reputation)Old Hand (366 reputation)Old Hand (366 reputation)Old Hand (366 reputation)Old Hand (366 reputation)Old Hand (366 reputation)Old Hand (366 reputation)Old Hand (366 reputation)

Group: General Forum Members
Points: 366 Visits: 1

I tried using the Slowly Changing Dimension transormation. I basically need two paths - One to insert the new record ('New Output Path' to OLE DB Destination) and one to update changed attributes in an existing record based on a specified business key ('Changed Attribute Updates Output' to OLE DB Destination).

The insertion works fine but when i modify the same record in source data (retaining the same primary key) the data gets routed in the Changed Attribute path but fails with a primary key violation (it has attempted to insert into the destination table). Am i missing a setting or something that forces update rather than insert?


Loner
Loner
SSChampion
SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)

Group: General Forum Members
Points: 11830 Visits: 3367

cliffb,

I could use checksum, I was just in a hurry so I wrote it in using 'OR' especially there were only two values to check. However I would not use surrograte key to compare when I update the dimension table, I would use the natural key.


Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum








































































































































































SQLServerCentral


Search