SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Slowly Changing dimension inside a ForEach Loop container


Slowly Changing dimension inside a ForEach Loop container

Author
Message
romina
romina
Valued Member
Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)

Group: General Forum Members
Points: 64 Visits: 229
Thanks Koen and Phil,

I haven't got too far with this. Still struggling to load the data. I am not sure what else to do. It's getting to my nerves.

Any other ideas?

Basically I am trying to read a table with over 4 million records (96,000 employee's data that has been extracted weekly over the last 56 weeks - a lot of duplicate data for the same employee!), apply an SCD (which will reduce the number of records dramatically by getting rid of duplicates) and load it into a Data warehouse.

Your input will be highly appreciated ...

Thanks
Romina
MMartin1
MMartin1
SSCrazy
SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)

Group: General Forum Members
Points: 2785 Visits: 2031
I understood you are loading the records from another system. Why not stage the records locally (on your local server) and perform your needed operations there while moving from the staging area to your destination (applying a SCD transform)? Are you sure you need nested looping?

----------------------------------------------------
How to post forum questions to get the best help
Koen Verbeeck
Koen Verbeeck
One Orange Chip
One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)

Group: General Forum Members
Points: 27535 Visits: 13268
romina (8/19/2014)
Thanks Koen and Phil,

I haven't got too far with this. Still struggling to load the data. I am not sure what else to do. It's getting to my nerves.

Any other ideas?

Basically I am trying to read a table with over 4 million records (96,000 employee's data that has been extracted weekly over the last 56 weeks - a lot of duplicate data for the same employee!), apply an SCD (which will reduce the number of records dramatically by getting rid of duplicates) and load it into a Data warehouse.

Your input will be highly appreciated ...

Thanks
Romina


Do you have any control on the process that inserts the data into the staging table?
Anyway, 4 million rows is not that much for SQL Server. A few decent written TSQL statement can take care of your problem in a few minutes.


How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?
My blog at SQLKover.

MCSE Business Intelligence - Microsoft Data Platform MVP
romina
romina
Valued Member
Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)

Group: General Forum Members
Points: 64 Visits: 229
Koen Verbeeck (8/20/2014)
Do you have any control on the process that inserts the data into the staging table?
Anyway, 4 million rows is not that much for SQL Server. A few decent written TSQL statement can take care of your problem in a few minutes.


Koen, I have full control of the process. I am the one designing the SSIS package to load the data into the staging table.

The problem I have is that I need to implement a type 2 SCD. It's a business requirement to keep a history of employee records.

The other problem I am running into is the hardware configuration. I am not normally one who blames the hardware - very easy to do - but I think this time the limitations on the hardware are making it a bit difficult for me to run this process in a quick and efficient way.

Just the latest I have done is create a very simple DFT where I have the OLE DB source (a SQL command) and then SCD. It's been running for 12 hours now to load those 4 million records. This is unacceptable to me. The next stage of my project is to load another table with 10 million records ... I don't want to even imagine how long that will take if I don't do it more efficiently.
romina
romina
Valued Member
Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)

Group: General Forum Members
Points: 64 Visits: 229
MMartin1 (8/19/2014)
I understood you are loading the records from another system. Why not stage the records locally (on your local server) and perform your needed operations there while moving from the staging area to your destination (applying a SCD transform)? Are you sure you need nested looping?


Thank you MMartin1 for your reply.

I have already got my staging table locally with the 4 million records.

The nested looping was just a way to not pass on so many records into the SCD. So with the nested looping I was taking one employee at a time and then processing all the records for all the weeks the data has been extracted for that employee. I have got 60 weeks worth of weekly employee data in a staging table at the moment.

Can't think of any other way to do a set-based approach on this other than with the loop. I can do one week at a time, but each week will have between 75,000 to 80,000 records. I run into the same problem of long processing time.
MMartin1
MMartin1
SSCrazy
SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)

Group: General Forum Members
Points: 2785 Visits: 2031
Hi Romina,
Are you still using the looping? I don't think that will help. Just use a SCD transform and select it as a 'Historical Attribute' for the change type property for those columns you want history on and fixed or changing (as appropriate) for the rest. You may have done this already but I just want to be sure. Does this work any faster on the data load? Each change type will map to a different output from the SCD transform.

I've not really used this, only read about it so my working knowledge is limited. But I understand you need a derived column with the end date for the historically tagged records that changed (after scd transform), and then do a union all after with the new records. (Are you using something like validFrom , validTo in your destination?). I just dont think the looping containers are needed and they may be whats slowing you down.

----------------------------------------------------
How to post forum questions to get the best help
romina
romina
Valued Member
Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)Valued Member (64 reputation)

Group: General Forum Members
Points: 64 Visits: 229
MMartin1 (8/20/2014)
Hi Romina,
Are you still using the looping? I don't think that will help. Just use a SCD transform and select it as a 'Historical Attribute' for the change type property for those columns you want history on and fixed or changing (as appropriate) for the rest. You may have done this already but I just want to be sure. Does this work any faster on the data load? Each change type will map to a different output from the SCD transform.

I've not really used this, only read about it so my working knowledge is limited. But I understand you need a derived column with the end date for the historically tagged records that changed (after scd transform), and then do a union all after with the new records. (Are you using something like validFrom , validTo in your destination?). I just dont think the looping containers are needed and they may be whats slowing you down.


MMartin,

I stopped using the loop. The reason being was it was overloading the server's memory after about 1 hour of processing.

I am now just using the SCD but it's been now nearly 17 hours that it's been running. Of course, I have had no problem with memory or CPU usage but 17 hours to process 4 millions it's very inefficient.

Any thoughts?
MMartin1
MMartin1
SSCrazy
SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)

Group: General Forum Members
Points: 2785 Visits: 2031
There is a small piece in one of my books about this. I'll do some reading and get back to you tomorrow likely. The SCD transform is process intensive because it has to hit the table for each row and cannot cache that table. So it is for sure not fast, but I think there is an alternate design to avoid the SCD transform, for these situations.
In the meantime I would also mention indexing the PK and associated businessKey in both tables if not already done (clustered is best). But I'm sure you have that covered.

----------------------------------------------------
How to post forum questions to get the best help
Phil Parkin
Phil Parkin
SSCoach
SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)

Group: General Forum Members
Points: 18640 Visits: 20450
romina (8/20/2014)
MMartin1 (8/20/2014)
Hi Romina,
Are you still using the looping? I don't think that will help. Just use a SCD transform and select it as a 'Historical Attribute' for the change type property for those columns you want history on and fixed or changing (as appropriate) for the rest. You may have done this already but I just want to be sure. Does this work any faster on the data load? Each change type will map to a different output from the SCD transform.

I've not really used this, only read about it so my working knowledge is limited. But I understand you need a derived column with the end date for the historically tagged records that changed (after scd transform), and then do a union all after with the new records. (Are you using something like validFrom , validTo in your destination?). I just dont think the looping containers are needed and they may be whats slowing you down.


MMartin,

I stopped using the loop. The reason being was it was overloading the server's memory after about 1 hour of processing.

I am now just using the SCD but it's been now nearly 17 hours that it's been running. Of course, I have had no problem with memory or CPU usage but 17 hours to process 4 millions it's very inefficient.

Any thoughts?


Koen has already provided a link to an article which he wrote which demonstrates how to do this without using the SCD component. I think that you should invest your time in rewriting your package accordingly.


Help us to help you. For better, quicker and more-focused answers to your questions, consider following the advice in this link.

If the answer to your question can be found with a brief Google search, please perform the search yourself, rather than expecting one of the SSC members to do it for you.

Please surround any code or links you post with the appropriate IFCode formatting tags. It helps readability a lot.
Koen Verbeeck
Koen Verbeeck
One Orange Chip
One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)One Orange Chip (27K reputation)

Group: General Forum Members
Points: 27535 Visits: 13268
At this point, it might be useful to show us some screenshots and maybe some table DDL so we can provide more specific assistance.


How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?
My blog at SQLKover.

MCSE Business Intelligence - Microsoft Data Platform MVP
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search