SSIS package with Multiple Streams

  • I have a package with has multiple streams and calls the same SP.

    Why do streams are used in SSIS, like to multitask with the same SP?

    I do have a performance issue with this package. Will reducing the streams help? ( like less SP calls)

     

  • it depends.  Having multiple "streams" call the same stored procedure that is updating a table will create blocking on that table and each one will need to wait for the previous one to finish.

    Now if the stored procedure is just doing selects, that likely isn't going to be a problem.

    The first step is to determine the cause of the performance issues.  I would pull the SSIS package out to Visual Studio and try running it to get an idea of the performance per step in the SSIS package.  It MIGHT be the stored procedure cannot run multiple times at once on the server and each "stream" just sits waiting for the previous one, or it may be that you are pulling across 100TB of data in which case the bottleneck is likely going to be your disk and/or network.  Alternately, you could be pulling in 10 GB of data into SSIS memory with only 1 GB of RAM available and it is paging to disk.  Or you could be doing some crazy transforms on the data in SSIS and the CPU is running at 100%.

    Basically, I am saying load it up in visual studio and look at what it is doing at each step and figure out which step is the "slow" step and work on tuning that one.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Thanks again 🙂

    I did face blocking on the same SSIS package at one time and when I try to capture the query when it was running all I see was 14 SID's trying to talk to the same SSIS package with the query 'Create Procedure BalancePending As..with SET XACT_ABORT_O

    Could you please also let me know why steam design is opted then if we already know it creates blocking? Also, are there any effective alternatives? like what would you like to change in the package design when you face blocking?

    Thanks a lot!!

  • The procedure bascically selects a value where condition is NULL and updates in to the tbale with(updlock) with where condition.

     

  • Stream design doesn't introduce blocking.  Your stored procedure is creating the blocking.

    You even said it creates blocking - you are requesting an update lock on the table.  So anything that is trying to do anything on the table will be blocked.  Well, not quite "anything", but you are trying to run multiple streams to update the same table.  That'll create blocking.

    The idea behind stream design (at least from my understanding) is that you can do multiple reads from the same source at the same time or do multiple writes to multiple distinct sources at the same time.  But as soon as you start doing multiple writes to the same source (especially when you explicitly ask for a LOCK), you are going to get blocking.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • so should I change update lock to nolock? what is a good alternative here?

  • Yes, the blocking happens once in a while not on a regular basis.

  • I wouldn't bother with NOLOCK as that doesn't work with UPDATE queries.  Won't get a syntax error, the hint just gets ignored.

    As for a good alternative, if you NEED a lock on the table (which you will if your query is updating the table), do sequential processing on it.  I expect that blocking is happening every time, it is just short blocks so you don't see them or get notified of them.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Can you elaborate a little on how can I implement sequential processing.  please share any example which help me understand.

    is row level lock an option.?

    Thanks!!

     

  • maybe something like this. Dump the file/ records  into a temp table and have the row ID of the file appended to the record, then cycle throught the records sequentially in SQL Server using a cursor (order by file line number). Sort the file by IDs and then take the "latest" update. Break the multiple operations it separate files.?

  • What I mean by sequential is to just don't do the same task simultaneously.  So if you are calling a stored procedure, just make sure your package only calls that stored procedure once and waits for it to complete.

    Dumping the file/record into a temp table and having that happen multiple times at in your SSIS package (pulling all files from disk and pushing them to the same database table all at once) you will have blocking again.  This time, the blocking will be on the temp table though which will be less impact to end users, but still the same snags.

    Without seeing your SSIS package and knowing how it is set up, it is tricky to help with this, so I am just trying to describe the concepts for sequential processing instead of stream processing.

    But my approach would be if the file names are static, I would build it up as step 1 pulls file 1 into the database and does all work that needs to be done.  Once step 1 completes, move onto step 2 and repeat until all files are processed.  That is if all files need to do inserts/updates on the same table.  If I can break them out into isolated tables (such as file1 pushes data into the table file1 and file2 pushes data into the table file2 and so on), then I would run all of those at once.  If I needed to combine this data later, I would build up a VIEW for it.  This way I have multiple independent tables that are smaller and easier to work with and a view (likely an indexed view) for the larger data set.  This way I can do updates on all of the file tables at the same time without impacting the other file tables.  Probably turn off the index on the view prior to starting the data import and turn the index back on afterwards, but depends on how much data is changing.  If it is 10% of the data or less that is changing, I probably wouldn't bother with the index disable, but that is just a guess.  Best way to tell is to test it.  Using the multiple files to multiple tables approach should result in less blocking.  May still get blocking (ie will) if someone tries to SELECT from the view or tables during the update, but that should be short lived.

    Alternately, rather than rewriting your SSIS package, having some better indexes on the main table may make your updates faster in which case blocking would be shorter.  Or possibly you have too many indexes or duplicate indexes.  Having the data ordered in a way that makes your WHERE clause faster to search for will result in faster queries.  Having duplicate indexes or too many indexes will slow down the insert/update performance as each index will need to be modified with each insert/update.  As to which indexes to keep or how many to keep, it depends on your use cases and queries.  it might be that you can't remove any indexes and there are no good ones to add.  But if you have any duplicate indexes, those should be safe to remove and should help performance.  If you have unused indexes those MAY be safe to remove, or they may be used just very infrequently.  Be careful with index maintenance if you go that route.  You will need to review any indexes before removing them.  An index on column A and B may look like a duplicate to an index on column B and A, but they may (ie very likely) be ordered differently and thus may both be required.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply