DB server comes to a crawl every two weeks

  • We have had an on going issue with our ERP system that neither Microsoft nor three consulting forms can resolve.

    MS did diagnostics on the DB server (2008 R2 runing on server 2012). They found no issues with the server or the configuration.

    The same goes for consulting firms we've hired.

    Symptoms:

    Sales order entry comes to a crawl. It takes 12 seconds to retrieve a part description and price for a new line item.

    Existing orders with 200 lines take 10 minutes to retrieve.

    No other programs are impacted that we can tell.

    Current Solution:

    If we restart the sql server service the problem goes away for about two weeks.

    Today is the first time the problem came back the next day.

    Observations:

    All reports show very low usage of the server.

    The server (DELL R720) which has 128 gigs of ram and 98gigs have been allocated to SQL.

    Nothing indicates the server is working hard. CPU usage is <10%.

    Cache is perfect.

    No locks

    No Blocking

    We've monitored wait states and found nothing unusual.

    Nothing else is running on this server except SQL Server.

    Request:

    We are at our wits end trying to figure out what is causing this.

    Some people try to tell us it's the application.

    But we are the only company running this software that has this problem.

    The strange part is that it only effects the order entry system.

    Can anyone provide insight, ask the right question, or make the right comment that can help us?

    tia,

    Todd

  • This may or may not be of help. We had a similar problem, on an older version of windows, and this was related to not enabling "background memory trimming" as the MS technician called it.

    Lock Pages in Memory fixed it but as I recall there was a hotfix available. I've not found it yet but I think it was to do with large applications like SQL and Exchange.

  • How often are you updating statistics?

    How often are you rebuilding your indexes?

    Have you reviewed the 'slow' queries to see if they can be improved?

    Just a few thoughts...

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Jeffrey Williams 3188 (12/31/2013)


    How often are you updating statistics?

    How often are you rebuilding your indexes?

    Have you reviewed the 'slow' queries to see if they can be improved?

    Just a few thoughts...

    I would hope these guys checked these things

    TC-416047 (12/31/2013)


    that neither Microsoft nor three consulting forms can resolve.

  • Jeffrey Williams 3188 (12/31/2013)


    How often are you updating statistics?

    How often are you rebuilding your indexes?

    Have you reviewed the 'slow' queries to see if they can be improved?

    Just a few thoughts...

    We update statistics every evening.

    We rebuild indexes every evening if fragmentation > 30%.

    The queries aren't slow except when they are.

    Yes....... I'm from Oklahoma where the fork in the road is taken.:)))

    Thank you for your questions. Your response is sincerely appreciated.

  • MysteryJimbo (12/31/2013)


    This may or may not be of help. We had a similar problem, on an older version of windows, and this was related to not enabling "background memory trimming" as the MS technician called it.

    Lock Pages in Memory fixed it but as I recall there was a hotfix available. I've not found it yet but I think it was to do with large applications like SQL and Exchange.

    I'm not sure what to do with this yet but thank you for responding.

    We'll certainly look into this.

  • Here's a couple of articles for you to look into it.

    http://bradmcgehee.com/2011/03/10/do-you-enable-lock-pages-in-memory/[/url]

    https://www.simple-talk.com/sql/database-administration/great-sql-server-debates-lock-pages-in-memory/[/url]

    In my case, the only reason this option wasn't enabled was based on a MS article at the time saying it was no longer required. As it turns out it was.

    FWIW the SQLCAT team now recommend this be enabled as standard. If you do need to enable this setting, ensure you set you MAX Server memory to allow for enough OS space and any other over heads the system has.

  • Grab actual query plans for whatever query is responsible for "Existing orders with 200 lines take 10 minutes to retrieve."

    One plan from when response is fast, one plan from when response is slow. Save the plans as .sqlplan files and post here.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • MysteryJimbo (1/2/2014)


    Here's a couple of articles for you to look into it.

    http://bradmcgehee.com/2011/03/10/do-you-enable-lock-pages-in-memory/[/url]

    https://www.simple-talk.com/sql/database-administration/great-sql-server-debates-lock-pages-in-memory/[/url]

    In my case, the only reason this option wasn't enabled was based on a MS article at the time saying it was no longer required. As it turns out it was.

    FWIW the SQLCAT team now recommend this be enabled as standard. If you do need to enable this setting, ensure you set you MAX Server memory to allow for enough OS space and any other over heads the system has.

    Lock Pages is a good start.

    If this truly impacts only Sales Orders in your ERP, has anything come up in a trace when you run a query?

    And any unusual disk activity at this time?

    Restarting clearing up the issue would tend to be a memory issue.

    Only SQL on the machine, no SSAS, correct?

  • After reading these links this sounds promising.

    Thank you both (Greg and MysteryJimbo).

    The next occurrence of the problem should be somewhere around the 15th +/- a day or two.

    We will have a chance to test this settings impact.

  • Chris,

    The only way I know of discovering the query is to set a trace, execute the action, and stop the trace.

    Then I could copy it from the trace file and run it to see the plan.

    Do you have any other ways to do this?

    We have no access to code.

    -tia,

    Todd

  • What type of storage do you have?

    Thin provisioning?

    Did you see disk queues go up during these events?

    Does it happen on the same days of week/time of day?

    Any maintenance going on on the storage side?

    Are your log files growing at all? At the time of the events?

    Best of luck..

  • TC-416047 (1/3/2014)


    Chris,

    The only way I know of discovering the query is to set a trace, execute the action, and stop the trace.

    Then I could copy it from the trace file and run it to see the plan.

    Do you have any other ways to do this?

    We have no access to code.

    -tia,

    Todd

    Search for "dmv and long running query".

    Might be a clue in there.

    We used SCOM to monitor the servers, and could run some reports to look for some patterns.

    Handy place to start if you have it. Even at 15 minute captures, some things show up.

    And it did have an SQL mgmt. pack available.

    Once had an issue where they were doing backups for the domain, and it was killing us at 5am.

    Kind of a pain to track down, as VM's were involved. Had to get up to the activity of the host to seem the bottleneck.

    No mention of VM's in your thread, but something to tuck in the back of your mind for the future.

  • You can export the execution plan from the buffer cache using this dmv. You can find many examples of the queries if you google this dmv.

    http://technet.microsoft.com/en-us/library/ms189747.aspx

    Export the execution plan of a proc now and compare with when performance is slow.

    Assuming your problem is all procs and queries I don't think this is going to make much difference. However it's a good tool for you to have in your arsenal.

  • arnipetursson (1/3/2014)


    What type of storage do you have?

    Thin provisioning?

    Did you see disk queues go up during these events?

    Does it happen on the same days of week/time of day?

    Any maintenance going on on the storage side?

    Are your log files growing at all? At the time of the events?

    Best of luck..

    Thanks for the response.

    We've been down all these roads several times the past 6 months.

    -Todd

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply