SSIS caching error

  • I am running a pretty straight forward dataflow that is:

    Source -> Lookup -> Destination

    I am in the process of trying to test run an incremental load. When I have no checked columns coming from the lookup table, this process is successful in roughly 5 mins (4 mill records cached).

    Now I need to check some of the columns in the lookup so I can push them out to a conditional split to check for changes in the source data. When I check some of the columns in the lookup and rerun this process, it appears that it is caching all of the records, but then the package fails with the following errors:

    Error: A buffer failed while allocating 10458336 bytes.

    Error: The system reports 96 percent memory load. There are 17179336704 bytes of physical memory with 526725120 bytes free. There are 4294836224 bytes of virtual memory with 159760384 bytes free. The paging file has 34356768768 bytes with 14195691520 bytes free.

    [SSIS.Pipeline] Error: The Data Flow task failed to create a buffer to call PrimeOutput for output "HZ_SCORE_EXPORT" (1) on component "OLE DB Source Output" (12). This error usually occurs due to an out-of-memory condition.

    Based on this, I am assuming that it's a RAM issue? Please let me know if that is not the case and if so what might be the error.

    Currently, this server has 16GB of RAM. If I need to add more, how much should I add?

    Thanks for the help.

  • I just added 8 GB of RAM, reran my dataflow and it still error'd with the same message, so I guess the RAM is not the true issue. Any thoughts?

  • I don't really have a answer for you but I've seen this sort of thing on the forum before. The consensus seems to be that it's not a pure RAM issue but a SSIS package architecture/configuration issue.

    Can you alter the package to process the data in chunks? Can you try a SQL Server provider rather than OLEDB?

    Have you enabled logging in the package to get more detailed messages?

    Maybe someone else can help more.

  • Thanks for the response DouglasH. The lookup is the destination table for this workflow. Not really sure how to redesign this mapping to use multiple lookups to achieve the same result. It seems kinda weird to me that this is failing because the records that are being cached are not large in width and there are roughly 15 columns that I am caching. This just seems really odd.

    Any other suggestions?

  • Would it be an option to use raw files[/url] for the lookup?

    😎

  • skaggs.andrew (5/6/2014)


    Thanks for the response DouglasH. The lookup is the destination table for this workflow. Not really sure how to redesign this mapping to use multiple lookups to achieve the same result. It seems kinda weird to me that this is failing because the records that are being cached are not large in width and there are roughly 15 columns that I am caching. This just seems really odd.

    Any other suggestions?

    Why are you caching 15 columns? Do you need them all to do the lookup?

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • This process is going to be used on a nightly basis. Not sure how I could use a raw file unless I had a process that created a new file for me in a directory, then used that file for the lookup. How would using a flat file help? Sorry, I am unfamiliar with using flat files in a lookup.

  • Koen Verbeeck (5/6/2014)


    skaggs.andrew (5/6/2014)


    Thanks for the response DouglasH. The lookup is the destination table for this workflow. Not really sure how to redesign this mapping to use multiple lookups to achieve the same result. It seems kinda weird to me that this is failing because the records that are being cached are not large in width and there are roughly 15 columns that I am caching. This just seems really odd.

    Any other suggestions?

    Why are you caching 15 columns? Do you need them all to do the lookup?

    I am caching these 15 columns because if there is a match (joining on 3 columns) I want these records to continue to a conditional split transformation. In there I will check each column to see if the column value from the source matches the column value from the target. This is where I can determine what records need updated in the target table.

    In the caching I am also already using a where clause to reduce any unnecessary records. The lookup only contains the records/columns that I need.

  • skaggs.andrew (5/6/2014)


    This process is going to be used on a nightly basis. Not sure how I could use a raw file unless I had a process that created a new file for me in a directory, then used that file for the lookup. How would using a flat file help? Sorry, I am unfamiliar with using flat files in a lookup.

    When you want to use a flat file as a cache, you need to use the cache connection manager.

    This cache needs to be created be a seperate process.

    Lookup - Using the cache connection manager

    By the way, this cache is NOT the same as an SSIS RAW file.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Koen Verbeeck (5/6/2014)


    skaggs.andrew (5/6/2014)


    This process is going to be used on a nightly basis. Not sure how I could use a raw file unless I had a process that created a new file for me in a directory, then used that file for the lookup. How would using a flat file help? Sorry, I am unfamiliar with using flat files in a lookup.

    When you want to use a flat file as a cache, you need to use the cache connection manager.

    This cache needs to be created be a seperate process.

    Lookup - Using the cache connection manager

    By the way, this cache is NOT the same as an SSIS RAW file.

    Do you think this is the route that I should be going for this dataflow? Is what I am attempting to do now not suitable for SSIS?

  • The cache is still read into memory, so it won't reduce memory pressure.

    A bit odd though you still get the out-of-memory error even after adding an extra 8GB.

    There are other options in SSIS, but they are slower.

    You could use a partial cache in the lookup component, but this certainly slows things down.

    Or you could replace the lookup with a Merge Join, but since this needs sorted inputs, you hurt performance again.

    I was maybe thinking about making the data flow buffers smaller, so less rows are into memory at the same time. But this will again make your package run longer.

    If nothing helps, you could just stage the data and do it all in TSQL.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Koen Verbeeck (5/6/2014)


    The cache is still read into memory, so it won't reduce memory pressure.

    A bit odd though you still get the out-of-memory error even after adding an extra 8GB.

    There are other options in SSIS, but they are slower.

    You could use a partial cache in the lookup component, but this certainly slows things down.

    Or you could replace the lookup with a Merge Join, but since this needs sorted inputs, you hurt performance again.

    I was maybe thinking about making the data flow buffers smaller, so less rows are into memory at the same time. But this will again make your package run longer.

    If nothing helps, you could just stage the data and do it all in TSQL.

    This is very odd to me. I am testing in my dev environment and no one else uses this server except for me, so there shouldn't be other resources taking away from what I am trying to do. Right now, this nightly process I have deletes the target table, then reloads the source. It's taking roughly a little over an hour and this is just one step of my nightly scheduler. I know this is not efficient at all. I need to reduce the nightly load time, so I thought maybe trying to implement an incremental load would help save a lot of time. Even if some of these other options are not as efficient as a full cache, I can still test them to see if it would be worth using. After that, if nothing seems to work or is faster, than maybe TSQL is the way to go.

    Suggestion of what you think may be the best option of the ones you listed? How does the Partial Cache work. Does it actually read records one at a time and if it is a new record to the cache, than it is stored? I do know that in my lookup, they are all unique records, so don't know if this is the best bet. I have never used the Merge Join so I wouldn't know what to expect there.

  • With the partial cache, the cache size is smaller (and configurable). Every time a row has no match, the lookup needs to go to the database and fetch the row. Then it puts the new row in the cache. Since there are possible a lot of database roundtrips, this can become really slow. But the chances of crashing are reduced.

    The MERGE JOIN is fine, but it needs sorted inputs on the join columns. This means that the initial sorting will take some time (especially if you have 4 million rows to read). A good work around is to have a clustered index on the join columns (hopefully a single primary key) which means the data is already presorted when you read it. In that case you can mark the data as already sorted in the advanced editor of the source component.

    Sort Data for the Merge and Merge Join Transformations

    Whatever you do, do not use the SORT transformation of SSIS.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Thanks for the information. I will test these options. Just out of curiosity, why do you say not to use the SORT transformation in SSIS?

  • skaggs.andrew (5/7/2014)


    Thanks for the information. I will test these options. Just out of curiosity, why do you say not to use the SORT transformation in SSIS?

    It is a blocking transformation, meaning it will read ALL data before outputting a single row. As if you don't have enough memory issues 😀

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply