Dynamically Query a 100 Million Row Table-Efficiently

  • TheSQLGuru (5/27/2016)


    ChrisM@Work (2/24/2015)


    ... Two omissions stood out however - the cost of recompiling, ...

    I will jump through hoops to trade CPU ticks to avoid disastrously bad plans (and the horrible IO and concurrency issues they bring among other things). 🙂 This is clearly such a beneficial case. In the past 2 decades of my SQL Server consulting business I have had hundreds of such opportunities but can't recall a single situation where I removed an OPTION (RECOMPILE) from a query at a client that was causing bad things to happen.

    Amen.

    The exception might be a query that runs 1000's+ of times a day, takes a couple secs to compile and milliseconds after.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan.B (5/27/2016)


    TheSQLGuru (5/27/2016)


    ChrisM@Work (2/24/2015)


    ... Two omissions stood out however - the cost of recompiling, ...

    I will jump through hoops to trade CPU ticks to avoid disastrously bad plans (and the horrible IO and concurrency issues they bring among other things). 🙂 This is clearly such a beneficial case. In the past 2 decades of my SQL Server consulting business I have had hundreds of such opportunities but can't recall a single situation where I removed an OPTION (RECOMPILE) from a query at a client that was causing bad things to happen.

    Amen.

    The exception might be a query that runs 1000's+ of times a day, takes a couple secs to compile and milliseconds after.

    Even that scenario becomes a loser very quickly under even mild "out-of-whack" plan situations. If a query that takes 2 seconds to compile (a LONG time btw - up to 6 BILLION CPU executions single-threaded) takes just 1 second to run for the "fast" parameter combos but takes 50 seconds every 20 executions and 100 seconds every 50-100 executions, doing the OPTION (RECOMPILE) thing is a decent win on total time (and that doesn't mention potential concurrency issues or server-overload scenarios where the whole box grinds to a crawl). And believe me that is a VERY mild case of unbalanced executions!! I often come across 3-6 ORDERS OF MAGNITUDE performance differences between good and bad plans that are avoided by OPTION (RECOMPILE). One of those a day or even week and it is a win.

    Oh, and this doesn't cover the situation where the BAD plan (scan-hash for 1K out of 500M total touched rows for example) gets executed over and over either! In this scenario people who don't know what they are doing often do silly things such as update statistics (crushing the box further) then declare that that "fixed" the problem. 🙂

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • I haven't seen bad things with proc recompiles since the SQL2000 days. Every generation of SQLSERVER advances a little bit more rendering old hard won knowledge and techniques obsolete or of reduced relevance.

    SQL2016 gives us partition truncation thereby rendering partition switching old knowledge provided you have the relevant edition.

    We could almost do with a register of the old performance tricks that are now no longer relevant or even detrimental today

  • TheSQLGuru (5/27/2016)


    Alan.B (5/27/2016)


    TheSQLGuru (5/27/2016)


    ChrisM@Work (2/24/2015)


    ... Two omissions stood out however - the cost of recompiling, ...

    I will jump through hoops to trade CPU ticks to avoid disastrously bad plans (and the horrible IO and concurrency issues they bring among other things). 🙂 This is clearly such a beneficial case. In the past 2 decades of my SQL Server consulting business I have had hundreds of such opportunities but can't recall a single situation where I removed an OPTION (RECOMPILE) from a query at a client that was causing bad things to happen.

    Amen.

    The exception might be a query that runs 1000's+ of times a day, takes a couple secs to compile and milliseconds after.

    Even that scenario becomes a loser very quickly under even mild "out-of-whack" plan situations. If a query that takes 2 seconds to compile (a LONG time btw - up to 6 BILLION CPU executions single-threaded) takes just 1 second to run for the "fast" parameter combos but takes 50 seconds every 20 executions and 100 seconds every 50-100 executions, doing the OPTION (RECOMPILE) thing is a decent win on total time (and that doesn't mention potential concurrency issues or server-overload scenarios where the whole box grinds to a crawl). And believe me that is a VERY mild case of unbalanced executions!! I often come across 3-6 ORDERS OF MAGNITUDE performance differences between good and bad plans that are avoided by OPTION (RECOMPILE). One of those a day or even week and it is a win.

    Oh, and this doesn't cover the situation where the BAD plan (scan-hash for 1K out of 500M total touched rows for example) gets executed over and over either! In this scenario people who don't know what they are doing often do silly things such as update statistics (crushing the box further) then declare that that "fixed" the problem. 🙂

    I would really love it if a newer version of SQL Server would:

    1. detect costly differences between estimated and actual values during execution and trigger improved plan/plan branch generation.

    2. allow for branching of plans into new plan versions, when during execution, certain new facts come to light not derivable from statistics.

    3. for compiled query plans that executed in full, do a post optimization pass using free CPU cycles and more accurate facts.

    Combining feature 1 and 2 allows for complex plans to evolve on demand, while executing.

    Feature 3 allows for very optimized plans, using a wider range of strategies, that would never be considered based on normal statistics and limited compile time.

  • TheSQLGuru (5/27/2016)


    David.Poole (5/27/2016)


    Some time ago I saw an article where the author had noticed a problem with a query where the principle condition was WHERE OrderDate BETWEEN @StartDate AND @EndDate.

    The problem that was observed was that if the query ran to produce a weekly report first it performed well but the daily and monthly reports ran like snails.

    If the Monthly report ran first then that ran fine but the others ran slow.

    In the end the author decided to produce a dummy variable into dynamic SQL using sp_executesql. The name of the variable depended on the difference in days between @StartDate and @EndDate. In effect the daily, weekly and monthly reports all got a separate execution plan even though the working mechanicals were identical.

    SELECT {column list} FROM {tables} WHERE OrderDate BETWEEN @StartDate AND @EndDate AND @Daily=1

    SELECT ... WHERE OrderDate BETWEEN @StartDate AND @EndDate AND @Weekly =1

    ...etc

    The approach solved the problem.

    I'm nervous about an all encompassing stored procedure with loads of "maybe" parameters. I had a query that had qualified as such but when I used SQL Profiler I found that there were 3 permutations accounting for an almost total majority of cases.

    What I did was to write procs for those 3 cases and a 4th that simply decided which of the 3 to call.

    There are down sides to this approach as well but it met the SLAs and was easy to read

    Report queries like you mention are one of the most common wins for OPTION (RECOMPILE). You clearly don't want the same plan used for a 1 day date range as you would for a 10 year date range (or vice-versa). OPTION (RECOMPILE) in this case also protects you against actual data value skew at the same time in cases where you may also filter on a ClientID for example and one of your clients is General Electric and the rest are mom-and-pop corner stores.

    Why bother with fake stuff work arounds when you have a perfect solution at hand? 😎

    Hi,

    Why not create its own aggregation for each granularity you need? If there are no "distinct" measures so you can use already prepaired aggregation on lower level of granularity. Thus making preparation of monthly, quarter, year reports in a blink of an eye.

  • Nice article!

    I usually try to avoid cases when query predicaments look like this:

    WHERE (ColumnX = @variableX OR @variableX IS NULL)

    AND (ColumnY = @variableY OR @variableY IS NULL)

    AND (ColumnZ = @variableZ OR @variableZ IS NULL)

    even when there is only one OR. Because usually OR is a performance killer. Optimizer usually cant predict number of rows, thus build efficient plan and goes on "safe" side using "scan". When it is one or two OR i just

    remake the query with "UNION [ALL]".

    With multi purpose procedures it is whole different story and you have chosen the most hard case, when the data distribution cause to change query plan for each set of parameters by different input.

    I prefer to use dynamic query in such cases, because i don't like the look of query with all this ors and nulls. And without "RECOMPILE" option you won't have efficient plan anyway.

    Dynamicly build query with the exact input and not with all this nulls for me looks more clean and i don't have problem with all the strings.

    Though with dynamic queries you can even have better performance.

    If distribution of the data for each set of parameters is even use sp_executeSQL and it will store plan cashe for each set and reuse it - so we have efficient query plans.

    If distribituion is making query plan to change use "EXEC" and it will create each time new query plan. Just like your TVF with RECOMPILE option.

    About dependency chain - i always use "sql search" by red gate. No problem here to find who use specific object.

  • Excellent, well written, and very informative article. Keep up the great work.

Viewing 7 posts - 31 through 36 (of 36 total)

You must be logged in to reply to this topic. Login to reply