• chreo - Thursday, December 14, 2017 7:59 AM

    @SuperCat:
    - SQL Profiler - we had it, but we fuc**d up and Trace file was overwritten. We will have info next time when disaster appear again
    - WhoIsActive - we run it every 30 seconds and save results to table, but when disaster started then WhoIsActive couldn't complete in 7 minutes and finally was killed.
                            (last WhoIsActive results 30 seconds before disaster didn't show any problematic queries, but we will analyze it again, because this is all we have now)

    @Grant:
    - QueryStore - we used this tool since day one after migration to SQL2016. But we have seen many CPU problems each time we tried to see stats in QueryStore.
                           Finally we decided to turn it off, which also decreased total CPU usage on server (not sure if that was 100% related, but it was much better without QueryStore)
                           also I've seen some forum posts where QueryStore was suspected of memory pressure
    - ExtendedEvents - no experience with this tool but I guess we need to start using it 🙂

    What if problem wasn't caused by some query? We think we know all bad queries in our system.
    What if some other process slowly eats memory and doesn't release it causing unexpected memory pressure after few days.
    Is there some info that memory pressure will appear soon (when we reach some limit)?

    It's possible that it's some weird memory leak, sure. However, you haven't eliminated through the data presented that it's a query or queries causing the problems. Most of the time, it's something to do with queries, data structures, indexes, missing indexes, and out of date or missing statistics. These things cause about 85% of all performance problems. You guys are just capturing system metrics. Those can tell you where a problem is, disk or memory or cpu, but not what. You need to drill down further. The majority of cases the problem is very straight forward, not obscure memory leaks that are only happening to you. That's just now how this stuff works.

    As to Query Store causing problems, it can, like any other monitoring tool. However, once more, you turned it off without any sort of indication that it was the problem. There are Query Store specific wait statistics that would give you an indication of what was wrong. Also, by turning it off, you've made it so that you're not monitoring queries (since you don't have extended events up & running). You're just not gathering a full set of metrics for making your decisions.

    In addition to capturing query metrics, what about wait statistics. Knowing you have memory pressure doesn't say where the slow down is occurring. You need more and better data.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning