Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««123»»

Slowly Changing dimension inside a ForEach Loop container Expand / Collapse
Author
Message
Posted Tuesday, August 19, 2014 1:26 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Wednesday, September 10, 2014 11:02 AM
Points: 47, Visits: 136
Thanks Koen and Phil,

I haven't got too far with this. Still struggling to load the data. I am not sure what else to do. It's getting to my nerves.

Any other ideas?

Basically I am trying to read a table with over 4 million records (96,000 employee's data that has been extracted weekly over the last 56 weeks - a lot of duplicate data for the same employee!), apply an SCD (which will reduce the number of records dramatically by getting rid of duplicates) and load it into a Data warehouse.

Your input will be highly appreciated ...

Thanks
Romina
Post #1605160
Posted Tuesday, August 19, 2014 5:37 PM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Wednesday, August 27, 2014 3:05 PM
Points: 355, Visits: 870
I understood you are loading the records from another system. Why not stage the records locally (on your local server) and perform your needed operations there while moving from the staging area to your destination (applying a SCD transform)? Are you sure you need nested looping?
Post #1605244
Posted Wednesday, August 20, 2014 12:28 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 4:05 AM
Points: 13,724, Visits: 10,680
romina (8/19/2014)
Thanks Koen and Phil,

I haven't got too far with this. Still struggling to load the data. I am not sure what else to do. It's getting to my nerves.

Any other ideas?

Basically I am trying to read a table with over 4 million records (96,000 employee's data that has been extracted weekly over the last 56 weeks - a lot of duplicate data for the same employee!), apply an SCD (which will reduce the number of records dramatically by getting rid of duplicates) and load it into a Data warehouse.

Your input will be highly appreciated ...

Thanks
Romina


Do you have any control on the process that inserts the data into the staging table?
Anyway, 4 million rows is not that much for SQL Server. A few decent written TSQL statement can take care of your problem in a few minutes.




How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?

Member of LinkedIn. My blog at LessThanDot.

MCSA SQL Server 2012 - MCSE Business Intelligence
Post #1605295
Posted Wednesday, August 20, 2014 8:26 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Wednesday, September 10, 2014 11:02 AM
Points: 47, Visits: 136
Koen Verbeeck (8/20/2014)
Do you have any control on the process that inserts the data into the staging table?
Anyway, 4 million rows is not that much for SQL Server. A few decent written TSQL statement can take care of your problem in a few minutes.


Koen, I have full control of the process. I am the one designing the SSIS package to load the data into the staging table.

The problem I have is that I need to implement a type 2 SCD. It's a business requirement to keep a history of employee records.

The other problem I am running into is the hardware configuration. I am not normally one who blames the hardware - very easy to do - but I think this time the limitations on the hardware are making it a bit difficult for me to run this process in a quick and efficient way.

Just the latest I have done is create a very simple DFT where I have the OLE DB source (a SQL command) and then SCD. It's been running for 12 hours now to load those 4 million records. This is unacceptable to me. The next stage of my project is to load another table with 10 million records ... I don't want to even imagine how long that will take if I don't do it more efficiently.
Post #1605465
Posted Wednesday, August 20, 2014 8:29 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Wednesday, September 10, 2014 11:02 AM
Points: 47, Visits: 136
MMartin1 (8/19/2014)
I understood you are loading the records from another system. Why not stage the records locally (on your local server) and perform your needed operations there while moving from the staging area to your destination (applying a SCD transform)? Are you sure you need nested looping?


Thank you MMartin1 for your reply.

I have already got my staging table locally with the 4 million records.

The nested looping was just a way to not pass on so many records into the SCD. So with the nested looping I was taking one employee at a time and then processing all the records for all the weeks the data has been extracted for that employee. I have got 60 weeks worth of weekly employee data in a staging table at the moment.

Can't think of any other way to do a set-based approach on this other than with the loop. I can do one week at a time, but each week will have between 75,000 to 80,000 records. I run into the same problem of long processing time.
Post #1605466
Posted Wednesday, August 20, 2014 12:40 PM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Wednesday, August 27, 2014 3:05 PM
Points: 355, Visits: 870
Hi Romina,
Are you still using the looping? I don't think that will help. Just use a SCD transform and select it as a 'Historical Attribute' for the change type property for those columns you want history on and fixed or changing (as appropriate) for the rest. You may have done this already but I just want to be sure. Does this work any faster on the data load? Each change type will map to a different output from the SCD transform.

I've not really used this, only read about it so my working knowledge is limited. But I understand you need a derived column with the end date for the historically tagged records that changed (after scd transform), and then do a union all after with the new records. (Are you using something like validFrom , validTo in your destination?). I just dont think the looping containers are needed and they may be whats slowing you down.
Post #1605569
Posted Wednesday, August 20, 2014 12:45 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Wednesday, September 10, 2014 11:02 AM
Points: 47, Visits: 136
MMartin1 (8/20/2014)
Hi Romina,
Are you still using the looping? I don't think that will help. Just use a SCD transform and select it as a 'Historical Attribute' for the change type property for those columns you want history on and fixed or changing (as appropriate) for the rest. You may have done this already but I just want to be sure. Does this work any faster on the data load? Each change type will map to a different output from the SCD transform.

I've not really used this, only read about it so my working knowledge is limited. But I understand you need a derived column with the end date for the historically tagged records that changed (after scd transform), and then do a union all after with the new records. (Are you using something like validFrom , validTo in your destination?). I just dont think the looping containers are needed and they may be whats slowing you down.


MMartin,

I stopped using the loop. The reason being was it was overloading the server's memory after about 1 hour of processing.

I am now just using the SCD but it's been now nearly 17 hours that it's been running. Of course, I have had no problem with memory or CPU usage but 17 hours to process 4 millions it's very inefficient.

Any thoughts?
Post #1605571
Posted Wednesday, August 20, 2014 3:04 PM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Wednesday, August 27, 2014 3:05 PM
Points: 355, Visits: 870
There is a small piece in one of my books about this. I'll do some reading and get back to you tomorrow likely. The SCD transform is process intensive because it has to hit the table for each row and cannot cache that table. So it is for sure not fast, but I think there is an alternate design to avoid the SCD transform, for these situations.
In the meantime I would also mention indexing the PK and associated businessKey in both tables if not already done (clustered is best). But I'm sure you have that covered.
Post #1605615
Posted Wednesday, August 20, 2014 11:59 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 7:23 AM
Points: 5,074, Visits: 11,852
romina (8/20/2014)
MMartin1 (8/20/2014)
Hi Romina,
Are you still using the looping? I don't think that will help. Just use a SCD transform and select it as a 'Historical Attribute' for the change type property for those columns you want history on and fixed or changing (as appropriate) for the rest. You may have done this already but I just want to be sure. Does this work any faster on the data load? Each change type will map to a different output from the SCD transform.

I've not really used this, only read about it so my working knowledge is limited. But I understand you need a derived column with the end date for the historically tagged records that changed (after scd transform), and then do a union all after with the new records. (Are you using something like validFrom , validTo in your destination?). I just dont think the looping containers are needed and they may be whats slowing you down.


MMartin,

I stopped using the loop. The reason being was it was overloading the server's memory after about 1 hour of processing.

I am now just using the SCD but it's been now nearly 17 hours that it's been running. Of course, I have had no problem with memory or CPU usage but 17 hours to process 4 millions it's very inefficient.

Any thoughts?


Koen has already provided a link to an article which he wrote which demonstrates how to do this without using the SCD component. I think that you should invest your time in rewriting your package accordingly.



Help us to help you. For better, quicker and more-focused answers to your questions, consider following the advice in this link.

When you ask a question (and please do ask a question: "My T-SQL does not work" just doesn't cut it), please provide enough information for us to understand its context.
Post #1605663
Posted Thursday, August 21, 2014 12:16 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 4:05 AM
Points: 13,724, Visits: 10,680
At this point, it might be useful to show us some screenshots and maybe some table DDL so we can provide more specific assistance.



How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?

Member of LinkedIn. My blog at LessThanDot.

MCSA SQL Server 2012 - MCSE Business Intelligence
Post #1605666
« Prev Topic | Next Topic »

Add to briefcase ««123»»

Permissions Expand / Collapse