Random SQL Slowness once a month

  • For about 5-7 days in a month (usually around the 15th-22nd) we see SQL performance issues that are not normally there. Basically, things are just "slow". I think the slowness is most "visable" in our AR application. Posting batch payments (500-4,000 in a batch), which normally takes seconds to minutes, can take up to 3-4 hours when things are "slow". We run Sentryone to alert us on deadlocks and blocking = or > 2 seconds. I am not seeing any more blocks or deadlocks from the alerting when things are "slow".

    I have found some information that points to SQL execution plans changing. Honestly I'm not too familiar with troubleshooting these in SQL, but I have found some queries to look at the execution plans. Although I haven't been able to see what, if anything, has changed.

    Any tips that would point me in a general direction would be great!

    • This topic was modified 3 years, 9 months ago by  nbaer.
  • I'd be checking your maintenance schedules.  Things like statistics updates, index rebuilds, index reorganizations, etc.  Or an absolute worst case - flushing any caches.

    Since the problem happens around the same time every month, I expect you have something scheduled during that time.

    In the event that isn't the case, I'd be checking system resources on both the database side and the end user side. Just because people report "slowness" doesn't mean that SQL is the culprit; it is just easy to blame the DBA.  DBA = default blame acceptor after all.

    So my first step would be to check on any scheduled tasks that are running on the SQL instance(s) AND on the OS at this time.  Next, I'd check resources on the system around this time (perfmon, or I think SentryOne tracks that too) and try to find abnormalities.

    If there is a specific SELECT query that you know is normally fast but slow during this time according to end users, you could test that query.  I say SELECT because then you won't be changing any data and you can capture the execution plan during the "quick" times and during the "slow" times.

    My expectation is that it is some scheduled process that happens on the 15th.  Do you have your antivirus configured to NOT scan the SQL databases?

    If it is not a scheduled thing, I'd be checking disk I/O, CPU, free memory,and network I/O during "normal" times and during "slow" times to get an idea of what normal resource utilization looks like and what it looks like when things are slow.

    On top of that, I'd be asking end users what they are doing differently.  For example, if your IT department pushes out windows updates around the 15th and forces a reboot, but otherwise the machines are on 24/7 with the user logged in, it could be a local caching issue (although I'd expect that to last a day tops, not 5-7 days).  But it might not be database performance issues, but client side issues.

    Another way to test this is if you have permissions to one or more of the tools, run an extended events session to capture your SQL queries and their execution times, run the tool.  Then wait for the 15th when things slow down and repeat.  This way you can check if SQL is the bottleneck or if it seems to be something else.  Again, only do read-only operations to prevent data changes.

     

    My VERY last thought is are end users (1 or more) doing something different during this time period?  Like do they have a "month end" process that they run on the 15th through 22nd that looks at or changes large sets of data? This may block things but not be visible in SQL.  For example, it may put a "block" flag in a table and then all the clients are waiting on that block flag to get reset back to 0 and then whoever asks for the block flag after gets it.  This type of blocking from the application side would not be visible in SentryOne or any SQL monitoring tools without knowing about a blocking table and manually querying it.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Is your SQL Server installed on physical or virtual machine?

    _____________
    Code for TallyGenerator

  • If you're running SQL Server 2017, Query Store is your bestest friend in the world. Enable Query Store, You'll be able to look at a given query and see how many and what shape execution plans it has. You'll be able to see a history of recompiles and when the plans changed. This will tell you a ton.

    Also, because it's so scheduled, unless you've got some data load that's occurring at that spot each month that somehow changes statistics in an odd way, then, another few days later, another stats updates them back to "normal," this sounds odd. I'd add in looking at wait statistics before and during the change in behavior. Maybe there are outside processes affecting things.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Sergiy, the server is virtual now. Until last October our SQL server was a physical cluster, and we saw the issue when it was physical as well. We went from SQL server 2008 to 2017, FYI.

  • If the issue existed on physical server as well, and you're on SQL2017 now, then Grant's advice is the one to follow.

    You may also wish to trace tempdb usage. Might give you a hint.

     

    _____________
    Code for TallyGenerator

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply