SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Using the MERGE Statement in SSIS Via a Stored Procedure


Using the MERGE Statement in SSIS Via a Stored Procedure

Author
Message
rickard 33978
rickard 33978
Forum Newbie
Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)

Group: General Forum Members
Points: 7 Visits: 0
Good solution!
Rickard
NbleSavage-393985
NbleSavage-393985
Valued Member
Valued Member (68 reputation)Valued Member (68 reputation)Valued Member (68 reputation)Valued Member (68 reputation)Valued Member (68 reputation)Valued Member (68 reputation)Valued Member (68 reputation)Valued Member (68 reputation)

Group: General Forum Members
Points: 68 Visits: 285
Great article!!! Been looking for a good means to accomplish this as well (eg. resolving the SSIS limit with a script task component and no native SSIS task to instigate the MERGE in an incremental load).

The use of the sproc I would think should provide a performance bump as well over the script task (which is how I'd been implementing this previously).

Cheers!

- Savage
Jason-299789
Jason-299789
SSCrazy
SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)

Group: General Forum Members
Points: 2621 Visits: 3232
A very nice article, though isnt there a better way of building the Column lists than using an XML?

Could you not just do this


Declare @var varchar(2000) = ''

Select @var=@var+','+COLUMN_NAME
from INFORMATION_SCHEMA.COLUMNS
Where TABLE_SCHEMA='<schema>' and TABLE_NAME ='<table>'

Select Substring(@var,2,len(@var)-2)



_________________________________________________________________________
SSC Guide to Posting and Best Practices
F.L
F.L
SSC-Enthusiastic
SSC-Enthusiastic (138 reputation)SSC-Enthusiastic (138 reputation)SSC-Enthusiastic (138 reputation)SSC-Enthusiastic (138 reputation)SSC-Enthusiastic (138 reputation)SSC-Enthusiastic (138 reputation)SSC-Enthusiastic (138 reputation)SSC-Enthusiastic (138 reputation)

Group: General Forum Members
Points: 138 Visits: 402
Since I must be missing something crucial, could someone please explain why you can’t achieve the same thing with a SCD component? Isn’t this a type 1 SCD (Update/Insert)?
I know that a SCD component is generally slower but other than that?
sixthzenz
sixthzenz
SSC-Enthusiastic
SSC-Enthusiastic (138 reputation)SSC-Enthusiastic (138 reputation)SSC-Enthusiastic (138 reputation)SSC-Enthusiastic (138 reputation)SSC-Enthusiastic (138 reputation)SSC-Enthusiastic (138 reputation)SSC-Enthusiastic (138 reputation)SSC-Enthusiastic (138 reputation)

Group: General Forum Members
Points: 138 Visits: 348
Do these dynamically built sql statements get stored in the execution plan? My understanding is that such statements do not, in which case optimal performance isn't attained. I guess this would only really apply to larger data loading operations, but you get my drift.
Wayne Motley-381939
Wayne Motley-381939
SSC Rookie
SSC Rookie (26 reputation)SSC Rookie (26 reputation)SSC Rookie (26 reputation)SSC Rookie (26 reputation)SSC Rookie (26 reputation)SSC Rookie (26 reputation)SSC Rookie (26 reputation)SSC Rookie (26 reputation)

Group: General Forum Members
Points: 26 Visits: 501
Very good solution, I agree with SSC-Enthusiastic in that I have a similar solution for my DW projects but have used the information schema as the way I chose to get the columns names and types.
Jason-299789
Jason-299789
SSCrazy
SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)

Group: General Forum Members
Points: 2621 Visits: 3232
sixthzenz (1/23/2012)
Do these dynamically built sql statements get stored in the execution plan? My understanding is that such statements do not, in which case optimal performance isn't attained. I guess this would only really apply to larger data loading operations, but you get my drift.


On ETL does it matter?

One thing you could do is use the approach to build the Shell SP's with the Merge script in the middle.

All that code takes is to have a variable called @SP_Header which is defined as 'CREATE STORED PROCEDURE usp_Load'+TableName+' as '

Then you have the shell ready to go so all the developer has to do is put in the Error catching, and a bit of a tidy up round formatting.

_________________________________________________________________________
SSC Guide to Posting and Best Practices
kramaswamy
kramaswamy
SSCommitted
SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)

Group: General Forum Members
Points: 1984 Visits: 1812
One thing to be careful about whenever you're using dynamic SQL in SSIS - if you're ever thinking of putting that data into a destination table or file through a data flow task, and your source is using dynamic SQL, you *will* run in to problems.

Since your solution isn't doing that, it shouldn't be an issue, but for anyone who is taking this idea and trying to adapt it to their own work, keep that in mind.
Jason-299789
Jason-299789
SSCrazy
SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)

Group: General Forum Members
Points: 2621 Visits: 3232
kramaswamy (1/23/2012)
One thing to be careful about whenever you're using dynamic SQL in SSIS - if you're ever thinking of putting that data into a destination table or file through a data flow task, and your source is using dynamic SQL, you *will* run in to problems.

Since your solution isn't doing that, it shouldn't be an issue, but for anyone who is taking this idea and trying to adapt it to their own work, keep that in mind.


Why will you run into problems? The only time you his a problem is when you remove a column from the source. If you add columns its generally not a problem, or in my experience anyway.

As for your earlier comment about the SCD component, it promised much and is ok if you have a static design, but if you make a change you have a lot more work on your hands in regard to changing, retesting and deploying the new SSIS packages.

Not sure if thats changed in 2012 or not, i would guess the latter.

_________________________________________________________________________
SSC Guide to Posting and Best Practices
nsmith 97264
nsmith 97264
Forum Newbie
Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)

Group: General Forum Members
Points: 9 Visits: 32
Update to article...

Since this article was written we have done some investigation into the efficiency of the merge statement and found one problem with the logic within the stored procedure. The problem is that when you update a row's primary key (even if you are updating the primary key to its original value) that the update is essentially handled as an insert and therefore you end up losing some efficiency, actually a lot of efficiency. I wanted to thank Paul White for his blog post regarding "The Impact of Non-Updating Updates" (found at http://sqlblog.com/blogs/paul_white/archive/2010/08/11/the_2D00_impact_2D00_of_2D00_update_2D00_statements_2D00_that_2D00_don_2D00_t_2D00_change_2D00_data.aspx).

Armed with this information our stored procedure was modified so that the matching predicate primary keys were removed from the update portion of the dynamically built stored procedure.

The largest table we import from our Electronic Health Record (EHR) source system stores clinical results and contains about 2.4 billion rows. Currently we merge between 600,000 - 900,000 rows of data from that source system clincical result table into the corresponding table within our Enterprise Data Warehouse (EDW). With our old 2-step process of deleting rows from the production table that exist in the staging table and then inserting new rows, the whole process took around 2 hours to complete. Once we implemented the updated stored procedure (that no longer updated the primary key[s] for updated rows) that time was reduced to about 6 minutes.

A bulk of that processing time in the old process was due to the deletes that were performed a table that contained over 2 billion rows.

We also added a parameter to have the stored procedure kick out the actual merge statement text instead of executing it so that you can view the SQL code that was dynamically built to make sure it looks as you would expect.

Attached is the updated stored procedure as we are running it today.
Attachments
generate_merge_v2.txt (30 views, 12.00 KB)
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search