Premature Optimisation

  • davoscollective (9/23/2014)

    You can't just take 3 TIME IO measurements and average them and say that a faster query is necessarily better than a slower one.

    Sometimes it just seems like a dark art.

    First, start using Profiler instead of TIME IO; use Reads, Writes, CPU, and Duration as your go-to columns, and I generally work at a SQL:Batch Completed and RPC:Completed event level; you can also turn on the SQL:Statement Completed event if that helps you, but always use the batch as your guide to results. You'll have a lot more information, _and_ TIME IO doesn't report UDF usage correctly (i.e. at all).

    Second, don't look at "faster" first. Look at reads, writes, and CPU first (i.e. resources used), and know your systems (the whole system). On many systems, IO is the general bottleneck, so trading more CPU for less reads and writes is a good trade. On some systems, the opposite. On a few, they're more or less equal. Duration can also be seriously impacted by parallel plans or the lack thereof; in some cases, you may need to spend more resources to meet a duration limit... at the expense that you're now much more vulnerable to failing that limit if too many sessions run that batch at overlapping times (i.e. at "the same time"; for instance, if several users all schedule it for 9:00a.m.).

    After a point, it is a dark art; like all arts, practice and experience is critical. Also, like some arts, you can easily just keep experimenting - try writing queries three or five different ways, and watch the differences in Profiler. That's roughly the point of "premature optimization", I'd say! Before that point, it's very scientific and simple:

    0) Know the data

    1) Know the business goal of the SQL you're looking at

    2) Remove everything not required by the business goal

    2a) If your screen is now blank, it never worked right

    3) Remove everything no longer required because it was required by something not required by the business goal

    4) Good query design - set based, simple, efficient, using only the rows and columns required

    5) Schema integrity

    6) Data integrity

    7) Intermediate result integrity - don't cause problems between the data and the results!

    7a) If there's a DISTINCT, there are probably incomplete or incorrect joins or join criteria resulting in too many rows being used... and some may be the wrong rows, or the query may have an intermediate cartesian product

    7b) If the DISTINCT is really required, use it at the finest granularity possible (in a derived table, #temp table, etc.)

    7c) Verify that if you expect a join to be 1:1, that you're getting 1:1 results on the _entire_ data set, not just a sample - many constructs (select @var = col1 from tab1) will return only the first row found without any warning that it found multiple rows.

    8) Final result integrity - return, accurately, exactly what is required. No more, no less.

    After that are the dark arts, to be used only when the benefits outweigh the costs (or as a training exercise) involve writing the query three to five different ways and watching the differences between each, indexing, FK constraints, SSD's, etc. etc., and know that results can vary environment to environment, version to version, and from one hour to the next (based, say, on what buffers usually contain at any given time). They are, in fact, scientific, but we rarely have enough information to plan them out purely scientifically.

    P.S. either discard the first result out of a set, or if the "first time" cost is what's important, after each try, drop clean buffers for relevant tables and clearing the procedure cache for the relevant statements.

    P.P.S. Resource Monitor is very useful for a fast view of disk IO latency vs CPU vs memory vs network.

    ETA scientific example.

  • Adding to what Davos mentioned. I've run into several issues where I can't reproduce issues from production on dev when dealing in virtual server environment. It gets quite frustrating when everything runs fine for months and then all of suddent some code takes a nose dive in performance, only in production at random times. No clues and no access to the host system to look at stats or history. So begins the full time traces to try and capture some relevant stats so you can debate resource issues with infrastructure. Glorious.

    Aigle de Guerre!

  • You could always blame the storage or network teams, that seems to work well. 😀

    I would like to point out that sometimes the dark art of query optimization is just that; a dark art that occasionally seems to either need an index or a few more statistics files to fix the poorly written query (queries... lets be serious if there is one there is more).

    Yea, yea, you could fix the query but if you are like me or most large government agencies buying off the shelf applications; changing the query would mean invalidating the warranty and thus screwing yourself. I do try to suggest better queries but most applications will put it into the we will get to it pile and leave it as just that.

    Again thanks for the great article, it lets me know I am still on the right path and that my methods are not just crazy but are established craziness by other people who do similar things to get better results.

    Over 12yrs in IT and 10yrs happily stuck with SQL.
    - SQL 2008/R2/2012/2014/2016/2017
    - Oracle 8/9/10/11
    - MySQL 4/5 and MariaDB

  • Meow Now (9/23/2014)

    It gets quite frustrating when everything runs fine for months and then all of suddent some code takes a nose dive in performance, only in production at random times.

    Out of date statistics.

    The 'falling off the bridge' behaviour is a strong indicator that it's statistics-related. Could also be a bad execution plan got into cache, if you have queries that are prone to that kind of behaviour, but to be honest I'd look at statistics first.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Very nice article, Gail.

    Nice way to tell: Have a baseline of your environment and/or in troubleshooting scenarios and use a TOP X Trouble approach for the win. :hehe:


    "Tuning queries randomly without knowing whether or not they are a performance problem is an excellent waste of time."

    Excellent irony. Luv that honesty.


  • Lovely article, Gail.

    I think "premature optimization is the root of all evil" is the oldest and best known statement that is commonly misinterpreted as a reason for developers not to use good practices. Of course knowing that it is misused like that is also commonly used as an excuse to "optimize" things that need no optimisation and even things that are already optimal. So it's great to see an article that points out both the fallacy that trying to design decent schemas and write decent code from the start is premature optimisation and the fallacy that all code has to be optimised.


Viewing 6 posts - 16 through 20 (of 20 total)

You must be logged in to reply to this topic. Login to reply