Strange Job Slowdown 30 secs to 30 minutes

  • Stefan Krzywicki (8/8/2013)


    Brandie Tarvin (8/8/2013)


    Yeah, I caught that "other job at 4" comment after I posted.

    Try 3:45 instead.

    If the page file examination doesn't reveal anything, we will.

    I talked to the Server/VM guy. He said it isn't just one page file any more, so he's going to go through them all and add up their space.

    They can be fixed or dynamic for size. Used to be 1.5 to 2 times RAM, but that is outdated.

    With all services on the machine, the big thing is if 1 service starts pushing memory out of another.

    You mention the job before being intensive, so look for this and paging.

    At least R2 (both on Windows and SQL server) has much better memory management.

    Check some of the basics on configuration - possible that something either was installed with just the default and not set, or might need to be different as the VM is not the same as the physical it replaced.

  • Greg Edwards-268690 (8/9/2013)


    Stefan Krzywicki (8/8/2013)


    Brandie Tarvin (8/8/2013)


    Yeah, I caught that "other job at 4" comment after I posted.

    Try 3:45 instead.

    If the page file examination doesn't reveal anything, we will.

    I talked to the Server/VM guy. He said it isn't just one page file any more, so he's going to go through them all and add up their space.

    They can be fixed or dynamic for size. Used to be 1.5 to 2 times RAM, but that is outdated.

    With all services on the machine, the big thing is if 1 service starts pushing memory out of another.

    You mention the job before being intensive, so look for this and paging.

    At least R2 (both on Windows and SQL server) has much better memory management.

    Check some of the basics on configuration - possible that something either was installed with just the default and not set, or might need to be different as the VM is not the same as the physical it replaced.

    Our server guy said page files are done differently now. It isn't one page file, it is a bunch of smaller ones.

    How do I check services pushing memory? What we see is that SQL Server has nearly all the memory not reserved for the OS and no other jobs are running.

    How do I check for paging?

    If these are things I should just google, feel free to tell me. : -)

    --------------------------------------
    When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
    --------------------------------------
    It’s unpleasantly like being drunk.
    What’s so unpleasant about being drunk?
    You ask a glass of water. -- Douglas Adams

  • At this point, you need to run two things during this job running. A profiler trace and Performance Monitor.

    Performance Monitor is a Windows tool that has several SQL Server counters now added to it. When you run these at the same time, you can now pull the PerfMon trace into Profiler (with the profiler trace) and review them at the same time to connect spikes in one to queries in the other. YAY new technology. @=)

    Page file spikes can be tracked using the SQL memory counters in PerfMon.

    I think you'll want to run PerfMon in the VM while running the Profiler trace, just to see what access the VM has. Then the next day, run PerfMon on the physical box (the base of the VM) to see what it might be doing differently than what the VM does.

    Greg, As far as the outdated requirements on the Page File size, yeah, I know it no longer recommends 1 & 1/2 to 2 times the RAM, but every time I've tried to not do that for my systems, things did not end well because the file size ended up so much smaller than it needed to be.

    Stefan, My recommendation is that if the multiple paging files do not add up to the total RAM in the server that their sizes need to be increased to at least that size if not more. It won't hurt your box to test out an increase in the size and it will tell you if the paging files are the problem or not by how this job responds.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Brandie Tarvin (8/9/2013)


    At this point, you need to run two things during this job running. A profiler trace and Performance Monitor.

    Performance Monitor is a Windows tool that has several SQL Server counters now added to it. When you run these at the same time, you can now pull the PerfMon trace into Profiler (with the profiler trace) and review them at the same time to connect spikes in one to queries in the other. YAY new technology. @=)

    Page file spikes can be tracked using the SQL memory counters in PerfMon.

    I think you'll want to run PerfMon in the VM while running the Profiler trace, just to see what access the VM has. Then the next day, run PerfMon on the physical box (the base of the VM) to see what it might be doing differently than what the VM does.

    Greg, As far as the outdated requirements on the Page File size, yeah, I know it no longer recommends 1 & 1/2 to 2 times the RAM, but every time I've tried to not do that for my systems, things did not end well because the file size ended up so much smaller than it needed to be.

    Stefan, My recommendation is that if the multiple paging files do not add up to the total RAM in the server that their sizes need to be increased to at least that size if not more. It won't hurt your box to test out an increase in the size and it will tell you if the paging files are the problem or not by how this job responds.

    Thanks, I'll do some more research on PerMon.

    Your recommendation for the paging files is what we were planning on. Good to have that idea confirmed, thanks! Nice to have someone who knows what they're doing validate our stab in the dark. : -)

    --------------------------------------
    When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
    --------------------------------------
    It’s unpleasantly like being drunk.
    What’s so unpleasant about being drunk?
    You ask a glass of water. -- Douglas Adams

  • Given your other questions, it occurs to me that I clarify this before you start digging into your research.

    Performance Monitor = PerfMon (it's just a shorter way of saying it for most people).

    If you already did know that, then I apologize for the infodump assumption.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Brandie Tarvin (8/9/2013)


    Given your other questions, it occurs to me that I clarify this before you start digging into your research.

    Performance Monitor = PerfMon (it's just a shorter way of saying it for most people).

    If you already did know that, then I apologize for the infodump assumption.

    No need to apologize. I'd assumed that, but this isn't just for me, it is for other people as well and that's good information to have here for anyone that finds this through a search.

    --------------------------------------
    When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
    --------------------------------------
    It’s unpleasantly like being drunk.
    What’s so unpleasant about being drunk?
    You ask a glass of water. -- Douglas Adams

  • I second Brandie on the PerfMon along with the SQL trace.

    What you will see is I/O, which can consume a lot of CPU.

    It will flush memory out of process (so the services will drop in consumed memory), while the page file grows.

    I had real good luck with putting the page file on another disk (not where Windows is installed).

    Running a smaller page file, at least in my case, caused a bit more agressive clearing of older cache.

    The idea, at least in the BI world, was to keep most used available.

    I found RS and SQL could live well togehter, but SSAS would cause issues when someone would try and extract low level details at times.

    I recall a whitepaper about performance, published by Microsoft.

    It covered many different topics - SQL Server, SSAS, Windows, disk, etc.

    Seems like it was centered around how they loaded a TB in 30 minutes.

    I'll see if I can find a link.

  • This should be it.

  • Have you had the server guys check the power saving settings on the VM and the host.

    See Brent Ozar's article on this issue:

    http://www.brentozar.com/archive/2010/10/sql-server-on-powersaving-cpus-not-so-fast/

    Also, I am curious why you are running this as a VM if it is the only one on the host?

  • arnipetursson (8/9/2013)


    Have you had the server guys check the power saving settings on the VM and the host.

    See Brent Ozar's article on this issue:

    http://www.brentozar.com/archive/2010/10/sql-server-on-powersaving-cpus-not-so-fast/

    Also, I am curious why you are running this as a VM if it is the only one on the host?

    There can be benefits, snapshot backups, easier to move to new hardware, easier to change resource allocation, etc...

    I'll take a look at the article. Thanks.

    I don't think that's it as the very next job runs at normal speed with no intervention. If I remember the power issue correctly, the cpus will slow and never speed up again until you reboot.

    --------------------------------------
    When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
    --------------------------------------
    It’s unpleasantly like being drunk.
    What’s so unpleasant about being drunk?
    You ask a glass of water. -- Douglas Adams

  • Since there is a theory that SQL Server is being paged out, you should make sure that the service account for SQL Server has the Windows permission "Lock pages in memory". (Speaking of that, you did not post the output of "SELECT @@version, did you?)

    My gut feeling, though, is that the issue is related to the job itself. That is, the input at these wee hours of night is different.

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

  • Erland Sommarskog (8/9/2013)


    Since there is a theory that SQL Server is being paged out, you should make sure that the service account for SQL Server has the Windows permission "Lock pages in memory". (Speaking of that, you did not post the output of "SELECT @@version, did you?)

    My gut feeling, though, is that the issue is related to the job itself. That is, the input at these wee hours of night is different.

    The issue being related to the job itself would be my first inclination too, but nothing has changed about it or the data feeding into it. There's nothing special about the data or the job at that hour. Before VM it ran at 30 seconds to a minute, after 16 minutes to 30 minutes.

    --------------------------------------
    When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
    --------------------------------------
    It’s unpleasantly like being drunk.
    What’s so unpleasant about being drunk?
    You ask a glass of water. -- Douglas Adams

  • I think part of it being suspect of paging lends itself to being run after that largest job of the day.

    At least that is what I get out of the description.

    Kind of sounds like it is waiting for something.

    By chance, is there some other machine involved?

  • Greg Edwards-268690 (8/9/2013)


    I think part of it being suspect of paging lends itself to being run after that largest job of the day.

    At least that is what I get out of the description.

    Kind of sounds like it is waiting for something.

    By chance, is there some other machine involved?

    Yeah, I'm thinking the timing after the largest job has something to do with it.

    It could also be a memory allocation problem. There was no free RAM on the machine, it was all taken by the OS and SQL Server. We were given 38Gig RAM, but it turns out it is running Win 2008 and that can only use 32. We'd given SQL Server 28 of the 38 thinking there was 10 left. We've reduced that to 22 to see what happens and now there's some free RAM.

    --------------------------------------
    When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
    --------------------------------------
    It’s unpleasantly like being drunk.
    What’s so unpleasant about being drunk?
    You ask a glass of water. -- Douglas Adams

  • Stefan Krzywicki (8/9/2013)


    Greg Edwards-268690 (8/9/2013)


    I think part of it being suspect of paging lends itself to being run after that largest job of the day.

    At least that is what I get out of the description.

    Kind of sounds like it is waiting for something.

    By chance, is there some other machine involved?

    Yeah, I'm thinking the timing after the largest job has something to do with it.

    It could also be a memory allocation problem. There was no free RAM on the machine, it was all taken by the OS and SQL Server. We were given 38Gig RAM, but it turns out it is running Win 2008 and that can only use 32. We'd given SQL Server 28 of the 38 thinking there was 10 left. We've reduced that to 22 to see what happens and now there's some free RAM.

    Standard Edition of Win 2008 R2, same as I was running.

    Be sure to double check if SSAS settings do not also contribute.

    From my experience, SSAS seems more of a reference.

    Also take into account VM Host overhead.

    I think you're headed the right way.

    Less may mean it clears out old cache before tipping, if that's what it ends up being.

Viewing 15 posts - 16 through 30 (of 50 total)

You must be logged in to reply to this topic. Login to reply