Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Recommended Approach for SSIS Package that Inserts and Updates Warehouse table Expand / Collapse
Author
Message
Posted Tuesday, April 08, 2008 8:43 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, January 02, 2013 11:51 AM
Points: 6, Visits: 111
I'm attempting to write an SSIS package that

1. updates a table in a target database from the resultset of a staging database table joined to multiple reference tables in the target database where the staging table data's key values already exist in the target table.

2. inserts new data into the target database table from the resultset of a staging database table joined to multiple reference tables in the target database


data volumes would be up to 1M rows in stage table b

1. update target table a
from stage table b joined to N number of reference tables in target database
where stage table data already exists in target table

2. insert into target table a
from stage table b joined to N number of reference tables in target database
where stage table data does not exist in target table

I was able to define the OLE DB Source for the Insert logic in the Data Flow Task. What transformation can be used to perform the Update logic?



Post #481599
Posted Tuesday, April 08, 2008 10:55 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Tuesday, August 10, 2010 5:07 AM
Points: 2,732, Visits: 23,078
I assume you used an OLEDB Destination for the insert. You could use an OLEDB Command for the update.

1 million rows for this type of operation may be a bit much.
You may want to consider staging your data and using T-SQL to do the insert and update.

I like to use SSIS entirely for dimension update type operations like this, but it has it's limits.
Post #481733
Posted Wednesday, April 14, 2010 2:36 AM


SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Tuesday, April 01, 2014 7:50 AM
Points: 404, Visits: 1,124
Not sure if your familiar with it, but Microsoft have a best practice example for data warehouse / ETL called Project Real.

Worth a read.


_____________________________________________________________________________
MCITP: Business Intelligence Developer (2005)
Post #902934
Posted Wednesday, April 14, 2010 3:08 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 3:31 PM
Points: 12,239, Visits: 9,212
Certainly don't use the OLE DB Command for the updates, as it will issue one million transactions against your DB if you have one million updates.

This is the approach that I use:

0. Set delayvalidation of the package to false and set RetainSameConnection of the connections to staging and DWH to TRUE.
1. Create a global temp table with the exact same structure as the destination table. Just make sure to drop any Identity constraints if any.
2. Start a transaction with an Execute SQL Statement: "Begin TRAN" on the connection to the DW.
3. Use a dataflow task to distinguish between updates and inserts. Do this by using an OLE DB Source on your staging table. Read all rows, perform possible lookups on other tables to get surrogate keys and then finally join with your destination table in the DWH with the business key.
4. use a conditional split after the source. If the SK of the DWH table is NULL, than the row is an insert. If it is not, than a match is found between staging and DWH and that means it is an update.
6. The insert output goes to an OLE DB Destination with fastload which writes to the destination table. Just make sure you don't map the SK of the DWH, as identity_inserts are not allowed.
7. The update output goes to another OLE DB Destination, which writes to the temp table. Here, you want to map the SK. (that's why there should be no identity_constraints on the temp table).
8. After the dataflow, place an Execute SQL Task. This will update the destination table. Just join the destination table and the temp table on the SK. Since this is a (small) integer, this will go really fast.
9. Commit the transaction with an Execute SQL Statement: "COMMIT".

The transaction is optional, but it makes sure that there are only updates if the inserts were succesfull and vice versa.




How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?

Member of LinkedIn. My blog at LessThanDot.

MCSA SQL Server 2012 - MCSE Business Intelligence
Post #902941
Posted Wednesday, April 14, 2010 3:11 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 3:31 PM
Points: 12,239, Visits: 9,212
Oh yeah, forgot to mention:

if you have SQL Server 2008, just use the MERGE statement, much much easier .

If you have 2005 and you're allowed to use 3rd party components, you can also use the Kimball SCD component from codeplex.




How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?

Member of LinkedIn. My blog at LessThanDot.

MCSA SQL Server 2012 - MCSE Business Intelligence
Post #902944
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse