SSIS 4 Minute Job Takes Over 19 *Hours* on New Server

  • Steve Jones:

    re: "If it hangs in SSIS, can you get it to hang in SSMS?"

    I think you mean to try this?: Open Integration Services in Object Explorer in SSMS.  Then scroll to the Stored Package, right click and Run Package.

    I had previously tried those steps from my laptop.  At that time, I ran into an error because SSIS is not installed on my laptop.  I then got distracted and never followed up on that line of research.  After seeing your post just now, I remoted into the server and opened SSMS from the server itself.  I connected to Integration Services and then right clicked on the Stored Package and ran it.  IT WORKED!!!

    Something To Note: When I looked in Resource Monitor while the package was running just now, the following executable was running: DTExecUI.exe.   When I run the package from Agent (when it fails), the executable is: DTExec.exe.  I find that interesting.

    My Thoughts: My understanding of the main thing to know when I run the package manually from Object Explorer in SSMS is that the package runs under my account.  Ie, the "Operator" on the log shows my account.  On the other hand, when a job is run from Agent, the Operator is the SSIS service account.  My Point: When I've read about situations where a job runs under a person's account, but not the Agent/service account, that generally seems to mean that there is a permissions problem.  However, I don't think I have a permissions problem right now because the job, even when run from Agent, is perfectly able to access the Sybase database and to query/download at least two tables.

    So, what could it mean that the job runs fine under DTexecUI.exe but not DTexec.exe?   I wonder what this tells us?

     

  • JJ B:  I'd say you've ruled out a disk I/O issue.  My rule of thumb is that the disk subsystem needs a closer look if you have a sustained disk queue of over 1.5 - 2.0.  Key word being sustained.  You don't so on to other potential issues.  Good luck!

  • No, I mean execute the query(ies) that the package runs from SSMS from the instance where the data will be stored.

  • The difference is related to the connection settings or the user.  To determine the difference you need to review what is happening on the Sybase system.

    Nothing you do within SQL Server is going to help determine why those queries are having a problem running on Sybase.

    To exclude a user issue, create a credential on SQL Server for your account.  Then setup a proxy in the agent using that credential and set the job step to run as that proxy.

    If that runs as expected, then the problem is related to the service account.  Of it does not run then it is related to the connection settings.

     

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Jeffrey Williams:

    re: "I think you need to be looking at the Sybase system to see what is executing on that system."

    That possibility makes sense to me too.  However, I may have ruled that out?

    Two days ago, I broke down and asked our consultant for some help.  Yesterday, the consultant monitored Profiler while I ran the SSIS job.  The consultant watched Profiler when the job runs successfully on our old server and again when the job fails on our new server.  (And thank goodness the Profiler test happened yesterday - in time for me to answer Jeff's question above so I didn't look stupid for not having done that yet.  😉   And yes, I realize that I'm confessing to not having done it before yesterday and that looks stupid.  Sigh.  Profiler isn't something I know a lot about and I had been trying to figure this out without using the consultant.)

    Admittedly, there are some language barriers with the consultant.  The person is in another country, and I don't always understand what he says.  We were on a Teams call while all of this was happening.  However, if I did understand the consultant correctly, the consultant was telling me that when SQL Server sends a query request to Sybase, we get a response back from Sybase with the data right away.  We get a response right away *both* when the job is run in Agent on our old server and when the job is run in Agent on our new server.  The difference is that when the job gets to the point that it hangs on our new server, there's no query request going out to Sybase.  If that's true, then I believe that the problem can't be at the Sybase end.   (Unless I'm missing something?  This is really out of my depth...)

    Just to be sure I can rule out the Sybase end as being the problem, I will send an email to the consultant to confirm what he saw on profiler.  Thanks for pushing your thought.  I think getting confirmation is a good idea.  If the problem is at the Sybase end, I'm in big trouble, because that's not a server I have any control over.  (But the data is vital to my main applications.  I can't migrate without the nightly downloads of the Sybase data.)

  • Steve Jones and Jeffrey Williams:  Sorry, I've been trying to do tests and typing replies as fast as I can when your posts come in.  So, I'm not responding to your posts in a timely/orderly manner.

    Jeffrey: I have read about proxy accounts in passing but never tried it.  I will try to figure out how to set up a proxy account for myself and let you know what happens.

    Steve: I've read your latest post several times and tried to figure it out.  I'm sorry, I just don't know what you mean.  Do you mean to try to run one of the package's queries against the State's Sybase database from within the SQL Server Management Studio's query window?  You want me to run the SELECT query by clicking the "New Query" button on the toolbar in SSMS?  Is it possible for the query window in SSMS to link to a Sybase database that is outside our system?  If I'm just frustrating you with my ignorance, you can ignore this question.

  • As a thought - can you log into the server as the SSIS Service account?  I am thinking about taking the opposite approach as Jeffery Williams (creating a proxy) and logging in as the service account and starting visual studio and running the package from there.

    Doing this you can narrow it down to see if it might be a user permission issue or not.

    And possibly a dumb question but do you have resource governor configured on the new server?  My thoughts here are that it may be configured to restrict CPU/memory used by some users (like SSIS service account) or applications (dtexec.exe) but not for other users (such as yourself) or applications (such as DtsDebugHost.exe).  I've not set up or used resource governor before, so this may be something that doesn't need to be looked into.  Just tossing some other ideas out there.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Mr Brian Gale:  I love how you think outside the box.  To answer your questions:  I just asked and was emailed back, "Resource governor is not configured on [OurNewServer] and also we have witnessed 100% CPU usage."  I don't know what the 100% CPU usage has to do with it, but that was the response I got back.   As for logging into the server with the problem account and running the package in Visual Studio:  I really like that idea, but the account that the job is running under when the job hangs is "NT Service\SQLSERVERAGENT".   I just looked up information about SQLSERVERAGENT and it's my understanding that it is not a real account with a password.  So, I think that leaves me with going back and trying the proxy idea.

    Jeffrey Williams:  I wrote our consultant to ask him to verify what he saw on Profiler.  The consultant wrote back:  "That's right. With respect to [TheOldServer], connection is initiated and the query fetched results immediately from Sybase and loaded into staging database. But in case of [TheNewServer], SQL connection stays idle and no query was initiated nor running."  If you still think the problem could be at the Sybase end, I can continue to follow up on that.  I'm just not sure how I would do that since the queries are not being sent to Sybase in the first place when the job hangs...   I am still following up on your proxy idea though.  (Unless the proxy thing no longer makes sense to worry about?)

    Thank you both!

  • Is it the same account on the old system (NT Service\SQLSERVERAGENT) or do you have an actual account for it?

    The proxy account will likely be your only option on the new server to test permissions then as logging in as an "NT Service\*" account is not possible (as far as I know).

    If this is a test system (I wouldn't want to try this on a production system), you could try changing the service to run as you and see if things behave better, the same, or worse.  And if it does work better, then you have narrowed it down to permissions (at least VERY likely to be permissions).  You can get your AD team to make an SSIS service account in AD and then get the permissions set up to be similar to your account on the associated systems and the problem may be fixed.  The reason I wouldn't leave it as you is because as soon as you change your password, all SSIS related jobs will start to fail and after so many attempts may end up locking your account and it is a pain in the butt trying to figure out what automated system keeps locking you out especially when you can't remain unlocked long enough to investigate the issue.

    Just trying to think of other ways to test permissions.  Proxy account is probably the easiest and most reliable, but I am also not a fan of having local accounts handling my critical services.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Update For Anyone Interested

    I figured out how to create a 'Credential' on the server.  The Credential is linked to my Windows account.  Then I created a Proxy which is linked to the Credential.  Then I changed the Agent Job so that the Step in question used my Proxy to run instead of the original value "SQL Server Agent Service Account".  The result however, remains unchanged: When the package is run from the Agent job, the package downloads a couple tables and then appears to hang.  How can I be sure that the job was running under my account this time?  I confirmed that the Agent job *was* running under my account by running a query on the log table msdb.dbo.sysssislog.  The "operator" this time showed me (as opposed to "NT Service\SQLSERVERAGENT").  I believe this test rules out that I have a permissions or account/operator problem.

    Friendly Reminder: If the Profiler information from the consultant is correct, then we also know that when the job appears to hang, it hangs at a point where all queries send to Sybase have been answered by Sybase.  In other words, SSIS/Agent appears to just stop sending queries to Sybase.  I think that means we can also rule out Sybase as the problem???

    I wonder what else could be the difference (besides the operator) between running the SSIS package through DTexec.exe verses DTexecUI.exe.

    I don't know where that leaves me, but my brain is has flat lined for today.  I'll be very happy to take any more suggestions people have tomorrow.  (I know I still need to define exactly which tasks are failing per Steve's suggestion.  I got distracted with other ideas today.)  Good night everyone and thanks so much for your interest and help.

  • Just a thought... have you tried to change the driver on your connection in the package?

    If you're using OLE DB, then you're most likely using the old "SQL Server Native Client" aka SQLNCLI (especially since the package was developed years ago and just converted to SQL Server 2019 if I read your posts correctly). This driver is a rather old one and has been superseeded by the "Microsoft OLE DB Driver for SQL Server" aka MSOLEDBSQL (not to be confused with the ancient SQLOLEDB).

    It's a rather long shot, but maybe there's something that doesn't play well between an old driver generation and a sparkling new SQL Server 2019.

    https://docs.microsoft.com/en-us/sql/connect/oledb/oledb-driver-for-sql-server?view=sql-server-ver15

  • Another thought - Visual Studio is 32-bit so the debugger is 32-bit.  SSIS is going to be 64 bit.  Just thinking of other things that may affect performance...

    What happens to performance if you run the SSIS package in 32-bit mode?  To set this, go to the job step properties, select the Configure tab then the Advanced tab and check the box "32 bit runtime".

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • kaj: That was a great idea. I'm all about long shots at this point, but this particular shot didn't help this problem. Doesn't matter, because that is something I should have done anyway. I had noticed the new driver option in my earlier research, but I only thought about installing the new driver on user PCs for use by my front-end apps. I hadn't thought to change the driver in the SSIS packages. Thanks for that idea.

    Mr. Brian Gale: I would have given your idea a try too. (I appreciate you telling me how to do it.) However, I am now following a more promising lead. If my current lead doesn't pan out, I'll come back to this idea.

    Steve Jones: I'm not sure separating out tables to download has lead to a solution, but it did help me identify another problem on a table that SSIS hadn't gotten to before. I need to know about this new problem too, so it's good I started going through the process of picking out individual tables to download. I'm researching the new problem now to see if I can figure it out myself. If not, I'll post about it in a new thread.

    Everyone, especially TL: I had been told by people in my agency that my new server was at least good as the old one. However, when the consultant compared the old server to the new one yesterday morning, he shared this with me:

    OLD Server: 16 vCPU, 24 GB RAM

    NEW Server: 4 vCPU, 8 GB RAM

    Argh!  That had not been my understanding.  I'm guessing that this is what TL meant that he said that he saw this kind of problem when a new server is "smaller". Since my new server (compared to the old one which is a physical PC) is on a blade center, I understand that my network administrators can just change the server specifications at will. The consultant recommended trying to make the new server like the old server to see if the problem would resolve. My network administrators are not cooperating. They are only agreeing to change the CPUs from 4 vCPU to 8 vCPU and no change in RAM.

    I don't know how long this will take, but I feel like changing the server specification is probably going to be the answer, and I didn't want to leave people on this forum hanging for long.  (Hence this update.)  I'll report back again when I've thoroughly tested the idea of changing the server specs (if I can get enough cooperation from my own people to do that). If this works, I'll come back and mark TLs post as the answer.

    I try really hard not to waste people's time.  I feel bad that I didn't know this about our server specs sooner.  I truly do appreciate *everyone's* help. Thank you so much!

    • This reply was modified 3 years, 9 months ago by  JJ B. Reason: Fixed a formatting problem
  • Even if different server specs does turn out to be the reason, it seems that you have learned quite a lot of new things during the whole investigative process, which is helpful in its own right.

    While I can't speak for others, the fact that you took suggestions onboard and then went about investigating them and feeding back the results makes the provision of any advice so much more rewarding.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • I've fought the server spec battle many times when migrating from physical to virtual or cloud servers.  When moves like that are made there is often an effort by the server team to right-size the box for more efficient use of resources.  Noble goals, unfortunately this often is done by looking at the average usage on the old box.  Database servers which have big spikes in load often get a big hit on available resources to handle those spikes.

    I would push back on the memory requirements.  At least to get a temporary bump up to 16GB for a test.  Don't know that it is your ultimate problem, but on a VM it should be an easy change for them to do so you can test.  If it has no impact it should also be an easy change for them to drop the allocated memory back down.  Really though, for a modern (Windows 2016 or later SQL 2016 or later) box I generally say 4 cores / vCPUs and 16GB memory are the minimum requirements for a production box.  A production SQL server should be at least as powerful as my laptop.  Less than that and I have trouble justifying why this is on it's own server rather than co-locating on another SQL server and sharing it's SQL license.

Viewing 15 posts - 31 through 45 (of 51 total)

You must be logged in to reply to this topic. Login to reply