• bgrossnickle (12/11/2014)


    We have a nightly ETLV job that our 30 clients run every night. Depending on the size of the client, it can run from 1 to 7 hours. We have one client, clientX, where the etlv job takes about 7 hours. But about 3 times a month, it takes 20 hours. Looking at the logs, it is always the same proc, procx, that is the issue. Normally procx runs at clientx for 35 minutes. But these 3 times a month it runs 838, 839, 828, 835, or 825 minutes. And then procx finishes successfully! There does NOT seem to be a pattern on day of week, time of day procx starts, time of day procx finishes, amount of data, etc. But it is so strange that if it does not finish in 35 or so minutes, it finishes in 832 +- 7 minutes.

    Yesterday when the etlv at clientx was running over by several hours, i could tell by our log that it was in procx. I did a sp_who2 and there was nothing in the BlkBy column for any SPID. Unfortunately I am not good with diagnosing performance problems and do not remember exactly what sp_who2 said about procx. Know that it was not taking any CPU. Believe that the Status was Suspended. The Command said Update.

    What can I do so that in the future so that I can better diagnose the issue? I know that it is procx which has 3 insert and 10 update statements to the same db. I do not want to do something that will impact the entire etlv job, which is already pushing its 7 hour limit.

    I want some diagnosis steps that I can run if I am called while procx is running. And I would like some diagnosis steps that I can setup in advance because many times I am just told that "oh, yesterday we had that issue again.'

    Thanks

    Could be bad parameter sniffing as the others have suggested. Here are a couple of things that you could do to investigate the issue beyond the parameter sniffing issue. First, you could get Adam Machanic's sp_WhoisActive and run that while the ETLV is going. It is a custom script based on the ideas present in sp_who2. Lots of people use it so it is safe and you easily can get help understanding what it does. Second, you could get sp_AskBrent from BrentOzar.com and run that to capture what is happening as well. You absolutely can set up either of these two procedures to run in a job and log data to a table for later review. A third option is to set up a job to capture long running queries to a file. You can then review the trace file after the fact to pinpoint high duration statements. If you do this, I would definitely set it up so that it also captures blocking information. This third option could have a much higher impact on your system than the first two.