Query Performance Standards - Overall Per DB

  • Good morning fellow SQL server folks!

    I have a general question regarding Query Performance standards.
    If you have them, what are your standards?
    When do you take action?

    For our Production Environment I start investigating queries when a one starts taking over 1 second to execute. But I set the standard at 5 seconds average execution over a 10 minute span.
    We use SolarWinds DPA to monitor our environments, which works well enough.

    Our transaction rate varies from 1000 to 3000 per second, Page Life Expectancy is around 2.5 million seconds, Disk read / write latency is @ 1 millisecond each.
    Production DB size is just over 250 GB. We are running SignalR in a BackPlane configuration. We have 3 Webservers accessing data for over 2500 users in US / Canada.

    We are running SQL Server 2014 SP1 CU9 Clustered with Replication on 2 sockets / 16 cores / 32 logical processors 3.2 GHz with 294GB of RAM with a bunch of storage space... (about 1.2 TB total just for SQL server).

    We just added this beauty last year and I am being told by some of our Dev's that the DB is the bottleneck when it comes to performance.
    I know there is no "right" answer to this but had to ask out of curiosity.

    Thanks!
    Chris

  • chris.munsell - Wednesday, April 26, 2017 8:03 AM

    Good morning fellow SQL server folks!

    I have a general question regarding Query Performance standards.
    If you have them, what are your standards?
    When do you take action?

    For our Production Environment I start investigating queries when a one starts taking over 1 second to execute. But I set the standard at 5 seconds average execution over a 10 minute span.
    We use SolarWinds DPA to monitor our environments, which works well enough.

    Our transaction rate varies from 1000 to 3000 per second, Page Life Expectancy is around 2.5 million seconds, Disk read / write latency is @ 1 millisecond each.
    Production DB size is just over 250 GB. We are running SignalR in a BackPlane configuration. We have 3 Webservers accessing data for over 2500 users in US / Canada.

    We are running SQL Server 2014 SP1 CU9 Clustered with Replication on 2 sockets / 16 cores / 32 logical processors 3.2 GHz with 294GB of RAM with a bunch of storage space... (about 1.2 TB total just for SQL server).

    We just added this beauty last year and I am being told by some of our Dev's that the DB is the bottleneck when it comes to performance.
    I know there is no "right" answer to this but had to ask out of curiosity.

    Thanks!
    Chris

    It is literally impossible to state a definitive "if a query is over X then tune it". Note that what goes in X is also a variable in this. Some may need to tune for physical reads, others duration, some CPU, etc.

    In YOUR particular scenario however, it is absolutely likely that a "small/quick" query IS a bottleneck. But before you think about trying to find it the obvious thing is to ask the DEVs WHAT query(s) THEY think needs to be tuned - i.e. what is their basis for the statement that the DB is the bottleneck?? 

    Now, as for FINDING what to tune yourself -  given your high call volume, you need to do an aggregate profiler analysis. Capture a workload for some period of time (I have done this to local disk on systems with thousands of calls per second with very little overhead). Then roll up the calls in a "normalized" fashion while aggregating the various work buckets (duration, cpu, reads, etc). This will show you the queries with the most total hit on the server in total (or average workload either one). This lets you find the query that runs in 0.5 seconds but which is HAMMERING your server with 200 calls per second (or more). If you can knock that back by 10% - or as is often possible when I tune client code an order of magnitude or more - that can be a HUGE win for both the server and the application.

    Oh, DUH! After writing all that I just noticed that you have Solar Winds. IIRC that will do the same thing I mentioned above. :hehe:

    Another thing to do is file IO stall and wait stats differential analysis. Believe it or not a 1ms IO stall can add up. I have a client at the moment with 1.7ms average write times on the transaction log file. But they are doint upwards of 500000 of them per 180 second differential monitoing time. Do the math and you will find that is more than 3 seconds of tlog write time per each clock second. They are stalling the data ingestion process due to this. Our first approach to helping that will be delayed durability, then moving to in-memory OLTP.

    Note that a 5 second average execution time is an ETERNITY when you are doing 3000 calls per second!!!!

    Sounds like a fun environment to work on! If your DEVs are like all of the clients I review then I am certain there are some low-hanging fruit items that will help out immensely.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru - Wednesday, April 26, 2017 9:08 AM

    chris.munsell - Wednesday, April 26, 2017 8:03 AM

    Good morning fellow SQL server folks!

    I have a general question regarding Query Performance standards.
    If you have them, what are your standards?
    When do you take action?

    For our Production Environment I start investigating queries when a one starts taking over 1 second to execute. But I set the standard at 5 seconds average execution over a 10 minute span.
    We use SolarWinds DPA to monitor our environments, which works well enough.

    Our transaction rate varies from 1000 to 3000 per second, Page Life Expectancy is around 2.5 million seconds, Disk read / write latency is @ 1 millisecond each.
    Production DB size is just over 250 GB. We are running SignalR in a BackPlane configuration. We have 3 Webservers accessing data for over 2500 users in US / Canada.

    We are running SQL Server 2014 SP1 CU9 Clustered with Replication on 2 sockets / 16 cores / 32 logical processors 3.2 GHz with 294GB of RAM with a bunch of storage space... (about 1.2 TB total just for SQL server).

    We just added this beauty last year and I am being told by some of our Dev's that the DB is the bottleneck when it comes to performance.
    I know there is no "right" answer to this but had to ask out of curiosity.

    Thanks!
    Chris

    It is literally impossible to state a definitive "if a query is over X then tune it". Note that what goes in X is also a variable in this. Some may need to tune for physical reads, others duration, some CPU, etc.

    In YOUR particular scenario however, it is absolutely likely that a "small/quick" query IS a bottleneck. But before you think about trying to find it the obvious thing is to ask the DEVs WHAT query(s) THEY think needs to be tuned - i.e. what is their basis for the statement that the DB is the bottleneck?? 

    Now, as for FINDING what to tune yourself -  given your high call volume, you need to do an aggregate profiler analysis. Capture a workload for some period of time (I have done this to local disk on systems with thousands of calls per second with very little overhead). Then roll up the calls in a "normalized" fashion while aggregating the various work buckets (duration, cpu, reads, etc). This will show you the queries with the most total hit on the server in total (or average workload either one). This lets you find the query that runs in 0.5 seconds but which is HAMMERING your server with 200 calls per second (or more). If you can knock that back by 10% - or as is often possible when I tune client code an order of magnitude or more - that can be a HUGE win for both the server and the application.

    Oh, DUH! After writing all that I just noticed that you have Solar Winds. IIRC that will do the same thing I mentioned above. :hehe:

    Another thing to do is file IO stall and wait stats differential analysis. Believe it or not a 1ms IO stall can add up. I have a client at the moment with 1.7ms average write times on the transaction log file. But they are doint upwards of 500000 of them per 180 second differential monitoing time. Do the math and you will find that is more than 3 seconds of tlog write time per each clock second. They are stalling the data ingestion process due to this. Our first approach to helping that will be delayed durability, then moving to in-memory OLTP.

    Note that a 5 second average execution time is an ETERNITY when you are doing 3000 calls per second!!!!

    Sounds like a fun environment to work on! If your DEVs are like all of the clients I review then I am certain there are some low-hanging fruit items that will help out immensely.

    Thanks for your reply Kevin.

    I certainly understand what you are saying, that the X factor is  somewhat of a misnomer. There are truly to many variables to allow for a "fair" comparison.

    I guess I was just curious to see if anyone else had a standard that they worked within.

    One of our Dev's suggested that we replace our SQL environment with Intel Optane Products when we just went replaced our SAN late last year.
    Your reply gave me some other items to look at regarding performance though, so Thank you for that.
    I guess I get to have another meeting :crazy:

  • chris.munsell - Wednesday, April 26, 2017 8:03 AM

    For our Production Environment I start investigating queries when a one starts taking over 1 second to execute. But I set the standard at 5 seconds average execution over a 10 minute span.
    ...
    Our transaction rate varies from 1000 to 3000 per second, Page Life Expectancy is around 2.5 million seconds, Disk read / write latency is @ 1 millisecond each.
    Production DB size is just over 250 GB.

    I tend to think of it not as "what queries take over 1 second to execute" but a more holistic approach looking at what queries are taking the most time, either by total duration, or by CPU (worker) time, or by number of reads.  For example, I found a query that was only taking 2/10 of a second, but was being called thousands of times a day, so improving it to only run in hundredths of a second did improve overall performance.  Also looking at queries doing the most reads can help find queries that have bad or missing join criteria, improper or non-SARGable WHERE clauses, etc.

    Here's some good information discussing such an approach:

    http://www.sqlpassion.at/archive/2015/04/20/how-to-find-your-worst-performing-sql-server-queries/
    https://www.brentozar.com/responder/get-top-resource-consuming-queries/

  • Kevins's answer is great (+1).

    We did have a 3 second standard that all queries had to meet. However, as Kevin notes, that doesn't mean that if a query ran less than 3 seconds we were done. We had some queries called hundreds of times in a second. They were taking about 15ms. However, that was causing problems. So, we started tuning. We managed to get it down to 7ms. We still had complaints. A few more experiments got us down to 5ms and we couldn't get any better.

    So, while we had that 3 second limit, it's completely dependent on the system, the queries, the volume, the data, etc., as to when and what we tune. Largely you go with identifying where the pain is and reducing it. That will then unmask other levels of pain which you reduce, etc., etc., etc.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • chris.munsell - Wednesday, April 26, 2017 9:26 AM

    One of our Dev's suggested that we replace our SQL environment with Intel Optane Products when we just went replaced our SAN late last year.
    Your reply gave me some other items to look at regarding performance though, so Thank you for that.
    I guess I get to have another meeting :crazy:

    Despite being a hard-core SQL Server consultant (and SQL Server relational engine ONLY for about 45000 professional hours now), I will be the first to tell my client that SQL Server is not the correct tool for their needs (or more often that a particular part of their app/process should not be done on SQL Server). Over 20 years of consulting I have only had ONE client where their entire application needed to be migrated off of SQL Server. That was a web-scale entity back before federated databases and Hekaton were viable solutions in SQL Server. 

    Speaking of Hekaton, any client that says "we have thousands of transactions per second and things are slow" I immediately think of that feature (now known as In-Memory OLTP). This is especially true given that you are already on Enterprise Edition of SQL Server. BWin has benchmarked their ASP.NET session state system using that at 1.2 MILLION TPS on a SINGLE 4-CPU BOX!!! Oh, and BTW, ASP.NET session state has a BLOB in it!

    I will note that there are a BUNCH of things that can be put to use to help you scale if you switch to SQL Server 2016 too, including one that can double your TPS in some scenarios all by itself with ZERO application changes!

    I was serious when I said I have never come across a client that didn't have low-hanging fruit I could quickly identify and fix to give often substantial performance (and concurrency usually) benefits. If you can find a few you are likely to get acceptable performance while you and the dev team investigate more powerful/helpful fixes.

    I will add that there you need to beware, because there are some changes you can make (faster IO is one of them believe it or not, there are others) that can CRUSH your application, even to the point of making it unusable. There's an interesting and quite memorable story there ... 😎

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Heh.... dollars to donuts, it's going to turn out to be an ORM whammy that takes a tiny time to execute, has a 2 to 29 second compile time, and recompiles EVERY time because of an insanely long WHERE clause.  Of course, the database will be accused of being the bottleneck but it isn't.  It'll be the ORM code.

    When I get to work tomorrow, I'll try to remember to dig up the code I used to find this problem.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Wednesday, April 26, 2017 9:38 PM

    Heh.... dollars to donuts, it's going to turn out to be an ORM whammy that takes a tiny time to execute, has a 2 to 29 second compile time, and recompiles EVERY time because of an insanely long WHERE clause.  Of course, the database will be accused of being the bottleneck but it isn't.  It'll be the ORM code.

    When I get to work tomorrow, I'll try to remember to dig up the code I used to find this problem.

    Any luck finding that information Jeff?
    I would love to find some gold like that....

    Thanks for all the feedback so far!

  • chris.munsell - Monday, May 1, 2017 8:11 AM

    Jeff Moden - Wednesday, April 26, 2017 9:38 PM

    Heh.... dollars to donuts, it's going to turn out to be an ORM whammy that takes a tiny time to execute, has a 2 to 29 second compile time, and recompiles EVERY time because of an insanely long WHERE clause.  Of course, the database will be accused of being the bottleneck but it isn't.  It'll be the ORM code.

    When I get to work tomorrow, I'll try to remember to dig up the code I used to find this problem.

    Any luck finding that information Jeff?
    I would love to find some gold like that....

    Thanks for all the feedback so far!

    Thanks for the reminder.  Yes, I did find it.  I had started to modify it for my own usage and hadn't completed it yet.  Of course, since it reads XML, I can't take any credit for any of it so here's the original link I got the base code from.  Kudos to Jonathan Kehayias for it.
    https://www.sqlskills.com/blogs/jonathan/identifying-high-compile-time-statements-from-the-plan-cache/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thank you Jeff. I will look into using this. I have long thought that ORM and Entity Framework and Linq queries were the cause of most of my headaches. Hopefully this will help to narrow those items down.

  • chris.munsell - Monday, May 1, 2017 2:08 PM

    Thank you Jeff. I will look into using this. I have long thought that ORM and Entity Framework and Linq queries were the cause of most of my headaches. Hopefully this will help to narrow those items down.

    I have a SQL Saturday session I give entitled "Know What Your Code is Doing to SQL Server" that covers a number of the unpleasantries that EF (and other) ORMs do to/with SQL Server. NOT PRETTY!

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply