Webinar Q&A: SSIS Design Patterns for Loading a Data Warehouse

I’m not going to sugar coat it. I’ve been a bit of a slacker lately. Unfortunately I didn’t get the Q&A for my two webinars up the week I did the presentations. So here is part 1 and the SSAS post will be separate. To start with, thank you for joining if you did attend the webinar. All of us at Pragmatic Works that do the webinars really do it because we enjoy giving back to the community. If you are looking for the recording you can find it here.

Important!

The code I used during the webinar including the completed SSIS packages can be downloaded here.

A few notes about the files included:

In order to demo the solution a little better I removed some columns from the DimEmployee table inside AdventureWorksDW. There is a new copy of the table in the zip file. See note number 4 below.
SSIS Project is in the “DW Loading” folder. One package has the dimension load and the other is the fact table load.
AdventureWorksAssets.sql has all the source data I used in the session. This script will create the source data tables as well as populate them with the necessary data.
AdventureWorksDW2012Assets.sql has the new schema for the DimEmployee and FactSalesQuota tables. You are going to need to drop or rename the existing versions of the tables. They will be created empty as the packages will load them. You can always restore the AdventureWorksDW2012 database if you want to get the original schema back. The other thing that is in this script is the definition for the necessary stage tables, again created empty.

Below are the questions I got during the webinar.

Q: If the SCD wizard is accessing the data warehouse why do we need an OLEDB source?
A: The SCD wizard compares and writes data to the data warehouse only it does not pull data from your source system. The OLEDB source is used to pull in all the data from the transactional systems that feed the warehouse.

Q: Can we use CDC instead of the Slowly Changing Dimension (SCD) wizard?

A: Absolutely! I prefer to use CDC when possible as it already identifies the updates and inserts for me and makes the SSIS package simplified. If you are using SQL Server 2012 there are some really awesome new CDC components that will help with this. I may try to do this as my next webinar topic in January, if not I will certainly add it to the list for next year.

Q: What happens when a row has both type 1 and type 2 updates?

A: If a record has both it should be processed as a type 2 otherwise we will lose the history of the change. The SCD wizard will take this into account on its own but this is important to consider when setting up your own components. We used the conditional split to identify if the record was unchanged, type 1 or type 2. The key to remember with the conditional split is the record will go through the first output it matches. Therefore be sure to make the Type 2 condition listed before the Type 1 condition in the conditional split transform. There are arrows on the right side of the screen to move the conditions up and down.

Q: What about using T-SQL Merge?

A: I have use this method in the past and I like it as an alternative to SSIS. Some places just aren’t SSIS shops and can’t support a large warehouse load process that is heavy in SSIS development. As with everything be sure to test the performance and make sure it meets your needs. In order for this to work all source data will need to be staged into a table on the same server as the warehouse. That is an extra step to land the data but it’s usually not a big deal. I’ve found that Merge works well on some fairly large datasets but depending on the amount of memory on the server it will hit a brick wall in performance, so as I mentioned: test, test, test. I have used it on the past with tables in the hundreds of millions of records on very large servers and it works just fine but the same query on a development server with only a few gigs of memory has been painfully slow.

Q: What is the difference between fast load and non-fast load in the OLEDB destination?

A: The regular Table or View method does row by row inserts. Table or View Fast Load does inserts in bulk.

Q: How do you remove the inferred member from the dimension when the record shows up?

A: You won’t remove the inferred member record that was created, you will simply update it with all the actual values when the real record comes through. If you use a column that indicates the record is inferred be sure to switch that to false when making the update.

Be sure to check the follow up about building your first SSAS cube.

Book Review: Big Red - Voyage of a Trident Submarine

by Andy Warren

SQLServerCentral.com

Blogs

I've grown up reading Tom Clancy and probably most of you have at least seen Red October, so this book caught my eye when browsing used books for a recent trip. It's a fairly human look at what's involved in sailing on a Trident missile submarine...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-03-10

1,439 reads

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

by Robert Davis

SQLServerCentral.com

Blogs

Question: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? This question was sent to me via email. My reply follows. Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? Databases to be mirrored are currently running on 2005 SQL instances but will be upgraded to 2008 SQL in the near future.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-23

1,567 reads

Inserting Markup into a String with SQL

by Phil Factor

SQLServerCentral.com

T-SQL

In which Phil illustrates an old trick using STUFF to intert a number of substrings from a table into a string, and explains why the technique might speed up your code...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-18

1,631 reads

Networking - Part 4

by Andy Warren

SQLServerCentral.com

Blogs

You may want to read Part 1 , Part 2 , and Part 3 before continuing. This time around I'd like to talk about social networking. We'll start with social networking. Facebook, MySpace, and Twitter are all good examples of using technology to let...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-17

1,530 reads

Speaking at Community Events - More Thoughts

by Andy Warren

SQLServerCentral.com

Blogs

Last week I posted Speaking at Community Events - Time to Raise the Bar?, a first cut at talking about to what degree we should require experience for speakers at events like SQLSaturday as well as when it might be appropriate to add additional focus/limitations on the presentations that are accepted. I've got a few more thoughts on the topic this week, and I look forward to your comments.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-13

360 reads

Webinar Q&A: SSIS Design Patterns for Loading a Data Warehouse

Rate

Share

Share

Rate

Webinar Q&A: SSIS Design Patterns for Loading a Data Warehouse

Rate

Share

Share

Rate

Related content

Book Review: Big Red - Voyage of a Trident Submarine

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

Inserting Markup into a String with SQL

Networking - Part 4

Speaking at Community Events - More Thoughts