SSIS 4 Minute Job Takes Over 19 *Hours* on New Server

  • I have an SSIS package which runs on SQL Server 2008 R2 in about 4 minutes.  It's been running successfully for years.  We are now migrating to SQL Server 2019.

    What Works: In addition to running on the old R2 server, the package runs perfectly on my little laptop in Visual Studio Data Tools with the connections pointing to the new server.  It also runs just fine in Visual Studio Data Tools when on the new 2019 server.

    What Fails:  I loaded the package into MSDB on SQL Server 2019 itself.  Then I scheduled an Agent job.  When run on Agent, the job appears to hang.  However, I have turned on Logging and log not only the Start and error events, but also the PreValidate and PostExecute events so that I can get a handle on where things might be going wrong.  When the job is running from Agent, I can run a query on the table MSDB.dbo.sysssislog to see what is happening within the package.  When I run the query on MSDB.dbo.sysssislog at different points in time, I can see that several tasks have completed.  Then there will be a pause of 1 to 5 hours.  Then a few more tasks run.  Then another pause.  Etc.

    I started the job (again) yesterday and this time decided to let it run to see how long it would take or if it would totally fail at some point.  But this morning we hit 100% CPU and the server basically came to a crashing halt.  I had to kill the job.   At that point, we were about 19 hours into it.  This is a 4 minute job when run on an old server...

    Stats On Our New Server:  (in case this helps?) Processor is listed as Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10 GHz, 2095 Mhz, 2 Core(s), 2 Logical Processors.  For memory, I don't know which lines to report.  Here are two pieces of info:  Installed physical memory is 12 GB, but available physical memory is listed only as 2.7 GB.  I don't know if that's relevant.  A consultant set this up as a virtual machine.

    HELP!  This is outside of my area of expertise.  I have no idea where to even look/begin on figuring this out.  I found other similar type posts, but they were either not quite the same or the answers didn't seem to apply to my situation?

  • When the SSIS package is running on the NEW server, and getting stuck at that 1-5 hour step (and further and pegging CPU and such), how much RAM is free?

    My first thought is that 2.7 GB of free ram does not sound like that much for SSIS to work with.  Once that is used up, you are going to start paging to disk which is SLOW!

    If that is not the case, my next thought is if the SSIS package is upgraded to SSIS 2019 or if you used the DTSX/ISPAC from the old server.  If so, I would start by upgrading that to 2019.

    Next, I would be looking at the configuration of the SSIS package - is it running against the local instance or a networked instance?  if it is a networked instance, is your network bandwidth being a bottleneck?

    Those are a few things to check out and probably my starting point with troubleshooting this.  That and check your server configurations.  Make sure your MAX MEMORY value is not set to default (2 PB) as that will not play nice with SSIS being on the same box.

  • Mr. Brian Gale:  Thank you soo, sooo much!  I will look into all of these things and let you know what happens.

  • My thought, based on past experiences, is that you are having memory pressure.

    We had an intermittent SSIS package.  Most of the time it completed in under an hour (40-ish minutes), but then some days it would still be running after 8+ hours and we'd need to force kill it as it was causing too much blocking.

    Did some digging and found that the memory on the server got pegged every time it was running slow, so adjsuted some MAX Memory values on a few databases and things started playing much nicer.

  • Another thought on this, what operations are happening in this SSIS package?

    If there are any SQL Tasks in it, have you compared the execution plans from the old server to new server?  Going from 2008 R2 to 2019 you are crossing the barrier where the new cardinality estimator was introduced, so there could be a difference in the way that the SQL statements are being executed.

  • Chris Harshman:  Thanks for the added input.  Much appreciated.

    I think I remember reading about that Cardinality Estimator thing you are talking about.  I didn't understand what the Estimator is, but if I understood the gist correctly, I could temporarily avoid worrying about the Estimator by keeping the database Compatibility Level at the old level.  I plan to turn on Query Store on go-live day.  Then after Query Store has collected a magical amount of data (I don't know how long to wait), I would upgrade the compatibility level of the database.  For any queries that suffer from the Cardinality Estimator after the compatibility level change, I could then theoretically/somehow use the Query Store to restore old query plans.   (All steps I was going to tackle after the migration.)

    If I understood all that correctly enough, then I don't think the Cardinality Estimator would be a concern right now?  Since I'm testing on my new server, but I'm testing against a database at the compatibility level of 100.  That sound about right?

    --------------------------

    That said, to answer your question about the types of tasks in the SSIS package:  This is a package that first clears data out of a local staging database.  That clearing task happens within a couple of seconds each time I run the package.  It is the following steps that seem to be the problem.  The following steps transfer data from a Sybase database somewhere else in the State into our local staging area database.

    The package has a fair number of data transfer tasks that can technically run in parallel.  I don't force those tasks to run serially.  What I see when I run the package in debugger mode in Visual Studio SSDT is that Visual Studio picks some tasks to run, completes them, and then picks some other tasks to run.  It works that way on our old server and also when pointing to the new server from Visual Studio 2017 when run on my laptop.  I bring this up, because when I tried to research this problem, I did see some talk (maybe) about tasks running in parallel being a problem?  But then I keep thinking: The job has run fine on the old server, so that can't really be my problem.  But then something is my problem.  Maybe I should look into it?  Argh.

  • To check out your concern about parallelism being an issue, I suggest that you amend the package property 'MaxConcurrentExecutables' to 1 and try running again.

    Parallelism should generally not be an issue if

    a) You have sufficient memory available, and

    b) None of the parallel tasks encroaches on any other. Don't, for example, have multiple INSERTs into the same table running concurrently.

    If the answer to your question can be found with a brief Google search, please perform the search yourself, rather than expecting one of the SSC members to do it for you.
    See https://www.sqlservercentral.com/articles/forum-etiquette-how-to-post-datacode-on-a-forum-to-get-the-best-help/ for details of how to post T-SQL code-related questions.

  • Mr Brian Gale:  Thanks for the added clarification.

    I have some more info in case that sparks any more thoughts for you:

    You asked about whether the SSIS package had been updated. I had previously been working in BIDS that went with SQL Server 2008 R2.  When I started working on our migration project to the new server, I put Visual Studio 2017 (SSDT) on my laptop.  When I opened the SSIS package, it seemed to go through a conversion process.  It popped up a message about how I wouldn't be able to use the package in older versions (or something like that) any more.  I'm not sure if that is what you are talking about, but I think the package is updated.  Note that I also made sure that the "Target Server" for the package was set to SQL Server 2019.

    I didn't understand your point/question about networked instance vs local instance.  I was going to try to do some research on what that means before I got back to you on that one.

    The point about Max Memory though was so helpful!  I found out our Max Memory *is* set to the default 2 PB!  And as you said, the internet seems to agree with you that that is a bad practice.  No one at my agency knows about this though.  So, I have to do more research to figure out what I should set the value to.

    That said, while we should set Max Memory for best practices purposes, I'm not sure it is the problem we are facing now?  I say that for two reason: 1) Our old server also has the Max Memory set to the default 2 PB.  2) I just ran the job again for one of our network admins.  He wrote back: "I am seeing the RAM usage is well within comfort zones. I see about 40 – 50 KBps (fluctuating of course) for network traffic. CPU’s have maxed out again. I see a 20% decrease from maximum at around 1440 – 1442."   Maybe that means that memory is not a problem for this SSIS package???

  • Thank you Phil Parkin!  The package definitely does not have any encroaching tasks.  I will give your suggestion a try.

  • Phil Parkin:  I researched the MaxConcurrentExecutables property.  Thanks for bringing it to my attention.

    As you suggested, I changed the value to 1 and re-imported the package into the MSDB database.  I then scheduled the job to run again.  Sadly, the jobs has the same problem as before, but the good news is that now I can stop worrying about parallelism.   That was a quick/easy way to test it, and I learned something.  Thanks!

  • I am curious - why are you deploying your packages to MSDB?  As of SQL Server 2012 we now have the Integration Services Catalog and project deployments which is much better than using MSDB.

    I don't think that is your problem - but moving to a project deployment model using the catalog (SSISDB) would allow for much better control and includes reporting from the catalog that you don't have to enable.

    Jeffrey Williams
    Problems are opportunities brilliantly disguised as insurmountable obstacles.

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Jeffrey Williams:  Oh that's simple: Because I had never really heard of Integration Services Catalog before your post.  (sigh)  I did see the text "Integration Services Catalog" in Object Explorer in SSMS for the first time a few weeks ago as part of the migration project, but the text didn't trigger me to think I had to worry about it.  It just sounds like a list.

    I have never used any version beyond SQL Server 2008 R2.  My process has historically been to do DTS/SSIS package development off a file on our LAN using BIDS.  When ready to deploy, I then loaded up into MSDB and set the agent job. (Lest that sounds like I do more with SSIS than I do: I made 4 or so SSIS packages over 10 years ago and haven't done anything with SSIS since then.  I learned just enough to meet our mostly SSIS stagnant business needs.)

    Your note has clued me in that I need to learn about the Integration Services Catalog.  However, since you don't think using MSDB is part of my problem, and I'm desperate to get us moved to the new server (our old server is ancient with parts I've been told would be hard to replace, oy), I'm not going to stress about the Catalog this moment.  I will definitely look into the Catalog in the future though.  Thank you for the head's up.  I have quite a list of things I will have to do post go-live day.

  • You can start here for the SSIS Catalog: https://docs.microsoft.com/en-us/sql/integration-services/catalog/ssis-catalog?view=sql-server-ver15

    Installing is quite simple - it really is just a matter of right-clicking on the item in Object Explorer and selecting the option to create the catalog.  The link above has the instructions...

    If you are the sysadmin - and the packages are all owned by you, then you don't even need to worry about permissions.  You just need to create a folder where you will deploy your packages.  Then - you export the packages out of MSDB and create a new project in Visual Studio (SSDT).  Import the packages into that project - upgrade as needed - and deploy to the catalog.

    You may need to do something like this just to be able to identify the issue...

    Jeffrey Williams
    Problems are opportunities brilliantly disguised as insurmountable obstacles.

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Jeffrey Williams:  Thanks for the link!  Much appreciated.  I will definitely be looking at that.

  • Update For Anyone Interested

    I have spent the last couple of days trying to chase theories of memory problems and/or network problems.  I would be happy to share the memory research if anyone wants, but my Network Administrator and I are thinking that memory problems are not the issue in this case.  So, in this post, I wanted to share what I found regarding network activity in case this will spark an idea for someone.

    We turned on Windows "Resource Monitor" program on the server, called lets say MyServer, which is running the SQL Server instance in question.   I tested running the SSIS package from two places on MyServer: 1) Visual Studio 2017 (SSDT) and 2) SQL Server, with package loaded into MSDB.

    1. When running the SSIS package from Visual Studio on MyServer, the Resource Monitor shows that the "image" of "DtsDebugHost.exe" is what is running.  CPU ranges from about 50 to 80+ percent.  The "send" column shows about 6,000 to 7,000 bits per second.  The "receive" column shows 700,000 bits per second, give or take.  These numbers are fairly consistent until the job finishes running about 3 to 4 minutes after starting.
    2. When running the SSIS package as a scheduled Agent job on SQL Server (which is on the same MyServer as test #1 above), the Resource Manager show thats the "image" which runs is "DTExec.exe", not the DtsDebugHost.exe.  I think that is what we would expect, right?  CPU ranges from about 80 to 90+ percent.  In the first few seconds, the "send" column shows about 8,00o bits per second.  The "Receive" column shows about 350,000 bits per second, give or take. It is during this time that a few small tables *do* successfully download from the outside Sybase database into our local database.

    ---> Then after a few seconds, the DTExec.exe image still shows that CPU is at about 94%, but the image has completely *disappeared* from the Network box.  i.e.: There is just no network activity at all for the SSIS package while CPU usage remains ridiculously high.

    --->I didn't keep my eyes glued to the Resource Monitor the whole time after that, but I left the SSIS/Agent job running while I checked in with the Resource Monitor periodically.  What I would see sometimes is that DTExec.exe would appear in the Network box again.  I never saw the same volume that I would see in the first few seconds of the job running.  If DTExec.exe made an appearance again, I would see "send" values of often 0/zero (which makes sense if my query already went to Sybase and now it is just just waiting for data to come back) and "receive" values were maybe 8,000 to 12,000 bits per second.   After about 50 minutes, another table finished downloading.  After another hour after that, I stopped the job and no other tables had finished downloading.

    What does this mean???

    My Network Administrator thinks all of the above information means that this is not a network problem.  That it is a problem with the software within SQL Server.  Does that sound right to you?  Do you think this information is a clue to what may be going on or is this irrelevant information?  Does anyone have any thoughts on what I should try to look into next? 

    I can try looking into the SSIS Catalog, but if that isn't going to change how SSIS runs, I'd rather focus on a solution to this problem right now.  I took a quick look at the above link.  I think it is going to be helpful when I switch to using the Catalog, but nothing jumped out at me that it would affect how any package runs.

Viewing 15 posts - 1 through 15 (of 52 total)

You must be logged in to reply to this topic. Login to reply