SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Slowly Changing dimension inside a ForEach Loop container


Slowly Changing dimension inside a ForEach Loop container

Author
Message
romina
romina
Valued Member
Valued Member (72 reputation)Valued Member (72 reputation)Valued Member (72 reputation)Valued Member (72 reputation)Valued Member (72 reputation)Valued Member (72 reputation)Valued Member (72 reputation)Valued Member (72 reputation)

Group: General Forum Members
Points: 72 Visits: 234
Hi there folks,

Sorry for the lengthy topic but this is my first post - as you can tell!

I am having memory issues with my SSIS package.

I am running SQL Server 2005 Standard Edition on a Windows Server 2003 SP2 Enterprise Edition machine (Intel Xeon 5160 @ 3.00GHz, 4GB RAM) - yes, I am aware how obsolete and old the machine is!

Anyway, I have over 4 million records I need to load from a table that has been capturing weekly data from a HR system into a data warehouse.

The SSIS I have created goes through a ForEach Loop where it picks up one employee at a time (there are around 96,000 employees). There is another ForEach Loop that picks up the extract date. Inside the second ForEach Loop there is a Data Flow Task that has an OLE DB Source. I have used a SQL Command and I perform all the look ups in the SQL command. At this stage I am not concerned about null values from the lookup as I have done all this prior to loading the data. Therefore, I know that the SQL command with the lookups will return valid values. After the OLE DB source I have an SCD. I need to keep history for most of the columns in the table that I am loading. I am using 2 business keys in my SCD. The SCD works fine and does what it has to do.

The problem arises after some time. Normally at around 1 to 1.5 hrs the SSIS package fails with the following error (values differ from execution to execution):

Microsoft (R) SQL Server Execute Package Utility
Version 9.00.5000.00 for 32-bit
Copyright (C) Microsoft Corp 1984-2005. All rights reserved.

Started: 3:19:00 PM
Error: 2014-08-15 16:24:50.91
Code: 0xC0047012
Source: EmployeeFact with SCD DTS.Pipeline
Description: A buffer failed while allocating 9109056 bytes.
End Error
Error: 2014-08-15 16:24:50.91
Code: 0xC0047011
Source: EmployeeFact with SCD DTS.Pipeline
Description: The system reports 86 percent memory load. There are 4293980160 bytes of physical memory with 583577600 bytes free. There are 2147352576 bytes of virtual memory with 240123904 bytes free. The paging file has 9486200832 bytes with 4991451136 bytes free.
End Error
Error: 2014-08-15 16:24:50.91
Code: 0xC0047056
Source: EmployeeFact with SCD
Description: The Data Flow task failed to create a buffer to call PrimeOutput for output "OLE DB Source" (1) on component "OLE DB Source Output" (11). This error usually occurs due to an out-of-memory condition.
End Error
Error: 2014-08-15 16:24:50.91
Code: 0xC0047021
Source: EmployeeFact with SCD
Description: SSIS Error Code DTS_E_THREADFAILED. Thread "SourceThread0" has exited with error code 0x8007000E. There may be error messages posted before this with more information on why the thread has exited.
End Error
Error: 2014-08-15 16:24:50.91
Code: 0xC0047039
Source: EmployeeFact with SCD
Description: SSIS Error Code DTS_E_THREADCANCELLED. Thread "WorkThread0" received a shutdown signal and is terminating. The user requested a shutdown, or an error in another thread is causing the pipeline to shutdown. There may be error messages posted before this with more information on why the thread was cancelled.
End Error
Error: 2014-08-15 16:24:50.91
Code: 0xC0047039
Source: EmployeeFact with SCD
Description: SSIS Error Code DTS_E_THREADCANCELLED. Thread "WorkThread1" received a shutdown signal and is terminating. The user requested a shutdown, or an error in another thread is causing the pipeline to shutdown. There may be error messages posted before this with more information on why the thread was cancelled.
End Error
Error: 2014-08-15 16:24:50.91
Code: 0xC0047021
Source: EmployeeFact with SCD
Description: SSIS Error Code DTS_E_THREADFAILED. Thread "WorkThread0" has exited with error code 0xC0047039. There may be error messages posted before this with more information on why the thread has exited.
End Error
Error: 2014-08-15 16:24:50.91
Code: 0xC0047021
Source: EmployeeFact with SCD
Description: SSIS Error Code DTS_E_THREADFAILED. Thread "WorkThread1" has exited with error code 0xC0047039. There may be error messages posted before this with more information on why the thread has exited.
End Error
DTExec: The package execution returned DTSER_FAILURE (1).
Started: 3:19:00 PM
Finished: 4:24:52 PM
Elapsed: 3952 seconds


I monitor the server at all times when I am running this package and the CPU usage is always below 50%. I don't really have good counters to measure the memory usage but when I do a Window Task Manager I see the PF usage is hovering around 4 and 4.1 GB.

I also have played with the DefaultBufferMaxRows and DefaultBufferSize but I found that if I kept at the default values (10000 for DefaultBufferMaxRows and 10485760 for DefaultBufferSize) the package runs for longer than if I increase any of those values.

I am going nuts trying to find out what could be the reason for this memory failure? Is it the way the package is configured or is it the server (hardware)?

Please any input would be much appreciated!

Thanks
Romina
Phil Parkin
Phil Parkin
SSC-Forever
SSC-Forever (49K reputation)SSC-Forever (49K reputation)SSC-Forever (49K reputation)SSC-Forever (49K reputation)SSC-Forever (49K reputation)SSC-Forever (49K reputation)SSC-Forever (49K reputation)SSC-Forever (49K reputation)

Group: General Forum Members
Points: 49597 Visits: 21147
96,000 times round a Foreach loop sounds like a performance-killer to me.

It would be useful background to understand why you have decided to do things this way rather than using a set-based approach.


Help us to help you. For better, quicker and more-focused answers to your questions, consider following the advice in this link.

If the answer to your question can be found with a brief Google search, please perform the search yourself, rather than expecting one of the SSC members to do it for you.

Please surround any code or links you post with the appropriate IFCode formatting tags. It helps readability a lot.
romina
romina
Valued Member
Valued Member (72 reputation)Valued Member (72 reputation)Valued Member (72 reputation)Valued Member (72 reputation)Valued Member (72 reputation)Valued Member (72 reputation)Valued Member (72 reputation)Valued Member (72 reputation)

Group: General Forum Members
Points: 72 Visits: 234
Hi Phil,

Thank you for your reply.

I am not sure I get you when you say a set based approach.

I was just really trying to minimize the number of records entering the SCD with a ForEach Loop. If I just go ahead and don't use the ForEach loop my SSIS package will take hours to complete because of the SCD.

Thanks
Romina
Phil Parkin
Phil Parkin
SSC-Forever
SSC-Forever (49K reputation)SSC-Forever (49K reputation)SSC-Forever (49K reputation)SSC-Forever (49K reputation)SSC-Forever (49K reputation)SSC-Forever (49K reputation)SSC-Forever (49K reputation)SSC-Forever (49K reputation)

Group: General Forum Members
Points: 49597 Visits: 21147
The SCD transformation can perform badly when compared against other approaches.

What type of SCD are you using?

When I refer to a set-based approach, I am referring to performing actions on groups ('sets') of rows concurrently, rather than using any sort of looping. This is usually the standard and optimal way of working against data which is held in a SQL Server database.


Help us to help you. For better, quicker and more-focused answers to your questions, consider following the advice in this link.

If the answer to your question can be found with a brief Google search, please perform the search yourself, rather than expecting one of the SSC members to do it for you.

Please surround any code or links you post with the appropriate IFCode formatting tags. It helps readability a lot.
Koen Verbeeck
Koen Verbeeck
SSC Guru
SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)

Group: General Forum Members
Points: 60131 Visits: 13297
I have an article on how to implement SCD in SSIS using different methods and I compare them against each other.

Four Methods for Implementing a Slowly Changing Dimension in Integration Services – Part II

Giving that you work with SSIS 2005, I would go with the last method: custom implementation (with custom I mean build it yourself using standard components in SSIS, instead of using 3rd party components).


How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?
My blog at SQLKover.

MCSE Business Intelligence - Microsoft Data Platform MVP
romina
romina
Valued Member
Valued Member (72 reputation)Valued Member (72 reputation)Valued Member (72 reputation)Valued Member (72 reputation)Valued Member (72 reputation)Valued Member (72 reputation)Valued Member (72 reputation)Valued Member (72 reputation)

Group: General Forum Members
Points: 72 Visits: 234
Phil,

I am using the SCD wizard from SSIS 2005. Mostly Type 2 changes with very few Type 1.

What I am trying to achieve is retrieve all 4 million records from the old system to the new and the only way to group them is through the EmployeeID and the ExtractDate (a Datetime field that reflects when the extract was made from the old system to the new).

The data extract is done via a stored procedure run every week and literally "dumped" into a huge table that now has over 4 million records. Records for each employee for each date the extract was made. There are around 96,000 employees.

Initially the records will be mostly new but there are a lot of duplicated records that need to be cleaned out before loading into the new datawarehouse system. For example, for a particular employee there might be 141 records in this "huge dumping table" but when I perform the SCD, only 14 records make it to the DW after getting rid of duplicates and updating historical records.

Working with an employee at a time is fine, but with 96,000 employees the SCD simply takes way too long, and this is what I was trying to avoid with the ForEach loop.

I guess I will have to try Koen's approach and build a custom SCD.

Just one last question Phill, how much is the hardware (the server configuration) affecting performance? Or is it purely my SSIS that needs to be configured better?

Thanks again. I appreciate your help with this!
romina
romina
Valued Member
Valued Member (72 reputation)Valued Member (72 reputation)Valued Member (72 reputation)Valued Member (72 reputation)Valued Member (72 reputation)Valued Member (72 reputation)Valued Member (72 reputation)Valued Member (72 reputation)

Group: General Forum Members
Points: 72 Visits: 234
Koen, thank you for this. Very helpful!

In the custom implementation example, is that a Merge or a Merge Join?

If it's simply a Merge, can I use the Union All instead?

In SSIS 2005 you need to sort the source before Merging it, which might cause performance issues. What do you think?

Thanks
Romina
Koen Verbeeck
Koen Verbeeck
SSC Guru
SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)

Group: General Forum Members
Points: 60131 Visits: 13297
romina (8/18/2014)
Koen, thank you for this. Very helpful!

In the custom implementation example, is that a Merge or a Merge Join?

Thanks
Romina


Neither.
Which number in the screenshot of the data flow are we talking about?


How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?
My blog at SQLKover.

MCSE Business Intelligence - Microsoft Data Platform MVP
romina
romina
Valued Member
Valued Member (72 reputation)Valued Member (72 reputation)Valued Member (72 reputation)Valued Member (72 reputation)Valued Member (72 reputation)Valued Member (72 reputation)Valued Member (72 reputation)Valued Member (72 reputation)

Group: General Forum Members
Points: 72 Visits: 234
4 and 9
Koen Verbeeck
Koen Verbeeck
SSC Guru
SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)

Group: General Forum Members
Points: 60131 Visits: 13297
romina (8/18/2014)
4 and 9


Ah yes, those are UNION ALL components (designated by the (ALL) naming conventions).
I used "merge" in their names because they combine two streams of data together.


How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?
My blog at SQLKover.

MCSE Business Intelligence - Microsoft Data Platform MVP
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search