Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


How to handle "big data" or "large data" with SSIS ?


How to handle "big data" or "large data" with SSIS ?

Author
Message
blasto_max
blasto_max
SSC-Enthusiastic
SSC-Enthusiastic (193 reputation)SSC-Enthusiastic (193 reputation)SSC-Enthusiastic (193 reputation)SSC-Enthusiastic (193 reputation)SSC-Enthusiastic (193 reputation)SSC-Enthusiastic (193 reputation)SSC-Enthusiastic (193 reputation)SSC-Enthusiastic (193 reputation)

Group: General Forum Members
Points: 193 Visits: 480
I am running dynamic queries that often return HUGE (300MB - 1GB) result sets initially. Later, it should not be this big (not sure about that though) because I will be using delta loading. These result sets are then loaded into a C# data table. A script loops over these rows and then generates a query (stored in SSIS variable) to load them to the appropriate destination columns (determined by other scripts).

For small result sets, my package runs properly. But, for big ones, it simply fails due to out of memory error.
How do I resolve this problem ? Can you suggest some strategies ? I guess I could fetch smaller parts of the data at a time and then load into target. Not sure how to go about it though. Is there a recipe for this ?

A brief intro to how the process works -

Execute SQL: Get big ResultSet > Script: Read each row and generate a String SQL like "Insert INTO TableABC VALUES" + {all columns of 1 row here}. Then, concatenate SQL to a String destinationInsert > Execute SQL: execute SQL inside String destinationInsert.

ETL process complete. Does that help ?
Matt Miller (#4)
Matt Miller (#4)
SSCertifiable
SSCertifiable (7.7K reputation)SSCertifiable (7.7K reputation)SSCertifiable (7.7K reputation)SSCertifiable (7.7K reputation)SSCertifiable (7.7K reputation)SSCertifiable (7.7K reputation)SSCertifiable (7.7K reputation)SSCertifiable (7.7K reputation)

Group: General Forum Members
Points: 7664 Visits: 18105
300mb of data should load in no time flat, but you seem to be going at it the hard way.

If the input file is to simply be inserted into a table, why not use a flat file data source and bulk load it directly into a table? going through and manually building out a huge series of single row inserts is not going to scale, especially when you have facilities to dump the data directly.

Even if you end up having to clean up data or manipulate other things - your best bet is to get the data into the DB itself as quickly as you can, clean it up THEN move it around. Trying to clean it up row by row in transit is going to lead to pain.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
Phil Parkin
Phil Parkin
SSCrazy Eights
SSCrazy Eights (8.4K reputation)SSCrazy Eights (8.4K reputation)SSCrazy Eights (8.4K reputation)SSCrazy Eights (8.4K reputation)SSCrazy Eights (8.4K reputation)SSCrazy Eights (8.4K reputation)SSCrazy Eights (8.4K reputation)SSCrazy Eights (8.4K reputation)

Group: General Forum Members
Points: 8411 Visits: 19514
Matt Miller (#4) (10/29/2013)
300mb of data should load in no time flat, but you seem to be going at it the hard way.

If the input file is to simply be inserted into a table, why not use a flat file data source and bulk load it directly into a table? going through and manually building out a huge series of single row inserts is not going to scale, especially when you have facilities to dump the data directly.

Even if you end up having to clean up data or manipulate other things - your best bet is to get the data into the DB itself as quickly as you can, clean it up THEN move it around. Trying to clean it up row by row in transit is going to lead to pain.


+1, though doing some clean-up in non-blocking components such as Derived Column transformations should not slow things down too much. Just keep the data pipeline fairly simple and avoid looping and blocking components. By using .NET datatables, you are throwing away a lot of the power of SSIS and may as well write the entire process as a C# app (that's not a recommendation :-))


Help us to help you. For better, quicker and more-focused answers to your questions, consider following the advice in this link.

If the answer to your question can be found with a brief Google search, please perform the search yourself, rather than expecting one of the SSC members to do it for you.

Please surround any code or links you post with the appropriate IFCode formatting tags. It helps readability a lot.
Koen Verbeeck
Koen Verbeeck
SSCoach
SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)

Group: General Forum Members
Points: 16546 Visits: 13210
Matt Miller (#4) (10/29/2013)

If the input file is to simply be inserted into a table, why not use a flat file data source and bulk load it directly into a table? going through and manually building out a huge series of single row inserts is not going to scale, especially when you have facilities to dump the data directly.


It seems the problem the OP is having is the sources are dynamic, which won't work in a regular data flow.

Matt Miller (#4) (10/29/2013)

Trying to clean it up row by row in transit is going to lead to pain.


Synchronous components in SSIS almost have no performance impact.

@blasto_max: can you describe your sources a bit more? How come they are "dynamic"? Are they predictable?



How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?

Member of LinkedIn. My blog at SQLKover.

MCSA SQL Server 2012 - MCSE Business Intelligence
blasto_max
blasto_max
SSC-Enthusiastic
SSC-Enthusiastic (193 reputation)SSC-Enthusiastic (193 reputation)SSC-Enthusiastic (193 reputation)SSC-Enthusiastic (193 reputation)SSC-Enthusiastic (193 reputation)SSC-Enthusiastic (193 reputation)SSC-Enthusiastic (193 reputation)SSC-Enthusiastic (193 reputation)

Group: General Forum Members
Points: 193 Visits: 480
Koen Verbeeck (10/30/2013)
Matt Miller (#4) (10/29/2013)

If the input file is to simply be inserted into a table, why not use a flat file data source and bulk load it directly into a table? going through and manually building out a huge series of single row inserts is not going to scale, especially when you have facilities to dump the data directly.


It seems the problem the OP is having is the sources are dynamic, which won't work in a regular data flow.

Matt Miller (#4) (10/29/2013)

Trying to clean it up row by row in transit is going to lead to pain.


Synchronous components in SSIS almost have no performance impact.

@blasto_max: can you describe your sources a bit more? How come they are "dynamic"? Are they predictable?


Thanks for those questions, I have dynamic sources. In brief - Source is like SomeID, Random Column Names....Destination is like - Sane column names...

In source, for example Column1 is just a value holder. For SomeID = 1, Column1 could go to Dest1 and for SomeID = 7, Column1 could go to Dest19. This is how it is. So, I create mappings of what goes where, generate SQL code to fetch data and load data. For ID = 1, Column1 is always mapped to fixed Dest.
After getting the mappings for ID = 1, I fetch HUGE data from the source tables for each Sub ID associated with an ID. Its a mess that I did not create.

Is the situation clearer now ?
Matt Miller (#4)
Matt Miller (#4)
SSCertifiable
SSCertifiable (7.7K reputation)SSCertifiable (7.7K reputation)SSCertifiable (7.7K reputation)SSCertifiable (7.7K reputation)SSCertifiable (7.7K reputation)SSCertifiable (7.7K reputation)SSCertifiable (7.7K reputation)SSCertifiable (7.7K reputation)

Group: General Forum Members
Points: 7664 Visits: 18105
Two ways to do this:
1. the SSIS route - use the flat file connector to make SSIS aware of the generic content. then use a conditional split based on column1 to define out multiple mappings to multiple destinations based on your column1 values.
2. SSIS + SQL route. still use the flat file to dump the data into staging (which matches the generic input columns you have). Then run a series of INSERT statements which filter based on the value of column1, e.g.


Insert into MyDest19 (col1, col2, col3, col4, etc....)
Select col1, col2, col3, col4, etc....
from mystagingtable
where column1='Dest19'



----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
blasto_max
blasto_max
SSC-Enthusiastic
SSC-Enthusiastic (193 reputation)SSC-Enthusiastic (193 reputation)SSC-Enthusiastic (193 reputation)SSC-Enthusiastic (193 reputation)SSC-Enthusiastic (193 reputation)SSC-Enthusiastic (193 reputation)SSC-Enthusiastic (193 reputation)SSC-Enthusiastic (193 reputation)

Group: General Forum Members
Points: 193 Visits: 480
Matt Miller (#4) (10/30/2013)
Two ways to do this:
1. the SSIS route - use the flat file connector to make SSIS aware of the generic content. then use a conditional split based on column1 to define out multiple mappings to multiple destinations based on your column1 values.
2. SSIS + SQL route. still use the flat file to dump the data into staging (which matches the generic input columns you have). Then run a series of INSERT statements which filter based on the value of column1, e.g.


Insert into MyDest19 (col1, col2, col3, col4, etc....)
Select col1, col2, col3, col4, etc....
from mystagingtable
where column1='Dest19'



can't i just somehow execute my big .SQL files instead ?
Matt Miller (#4)
Matt Miller (#4)
SSCertifiable
SSCertifiable (7.7K reputation)SSCertifiable (7.7K reputation)SSCertifiable (7.7K reputation)SSCertifiable (7.7K reputation)SSCertifiable (7.7K reputation)SSCertifiable (7.7K reputation)SSCertifiable (7.7K reputation)SSCertifiable (7.7K reputation)

Group: General Forum Members
Points: 7664 Visits: 18105
blasto_max (10/30/2013)
Matt Miller (#4) (10/30/2013)
Two ways to do this:
1. the SSIS route - use the flat file connector to make SSIS aware of the generic content. then use a conditional split based on column1 to define out multiple mappings to multiple destinations based on your column1 values.
2. SSIS + SQL route. still use the flat file to dump the data into staging (which matches the generic input columns you have). Then run a series of INSERT statements which filter based on the value of column1, e.g.


Insert into MyDest19 (col1, col2, col3, col4, etc....)
Select col1, col2, col3, col4, etc....
from mystagingtable
where column1='Dest19'



can't i just somehow execute my big .SQL files instead ?


I suppose it's possible with enough RAM and a idle server, but it's very inefficient and will always be subject to running into resource issues like you've already seen when you have a big day or something changes. You could possibly look at using a flat file OUTPUT and devise some custom streaming solution to build up the SQL statement in a file rather than memory but at that point you really aren't using SSIS at all.

Just for giggles I dummied up a version of option 1. above. It took about 6 minutes to wire it up, 6 more minutes to shake out a few data type bugs. My first run of a 5.2M row (600MB) file took 53 seconds to load into 2 tables based on a filter, and that's before I've done any tweaks to turn on bulk inserts. Single row inserts will be running for literally hours on that kind of operation.

SQL Server is designed to perform best on set operations (i.e. bunches of rows at a time). If you can design your process in that way - you will be better off in the long run.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search