How does the query optimizer gets the statistics information

  • Good Morning Experts,

    How does the query optimizer(in the relational engine) gets the statistics information that it needs to create execution plan?

  • coolchaitu - Saturday, August 11, 2018 6:19 AM

    Good Morning Experts,

    How does the query optimizer(in the relational engine) gets the statistics information that it needs to create execution plan?

    Grab the books of Benjamin Navarez, Grant Fritchey and then some...
    😎

    The SQL Server's query optimizer is basically a cost based optimizer which uses statistics to find the path of least resistance. That is of course oversimplifying things and the full answer comes in book volumes, not forum posts.😉

  • coolchaitu - Saturday, August 11, 2018 6:19 AM

    Good Morning Experts,

    How does the query optimizer(in the relational engine) gets the statistics information that it needs to create execution plan?

    The answer is... sometimes "very poorly" unless you maintain statistics.  Also remember that rebuilding indexes is a very expensive method for updating statistics.  You can (and I have) gone quite literally for years without doing any index maintenance (except for the very rare need).  I've not rebuilt the bulk of my instances since 18 Jan 2016 and performance actually got better during the first 3 months and stayed there (CPU dropped from an average of 22% to just 8% across 16 CPUs).  Since that time, we've quadrupled the size of most of our databases (from new data... not wasted space) and performance is still great.  The big reason why is that I keep stats up to date. 

    The great experiment continues even today.  I have found that not-rebuilding indexes isn't the panacea that some would have you believe but I have found out that it's a heck of a lot better than doing index maintenance if you haven't correctly identified what the actual Fill Factor should be and then identify the KIND of index it actually is (evenly distributed, silo distributed, append only (ever increasing), DELETEable, and expansive UPDATE) and then tune both the Fill Factor AND whether or not you should actually do a REORG or REBUILD.  It turns out that with more indexes than you would expect, REORG perpetuates page splits, which perpetuates both logical and physical fragmentation which, like a bad drug habit, perpetuates the need for index maintenance.  In other words, the "Best Practice" methods of REORG at 10% and REBUILD at 30% are rarely a good recommendation if you actually want Fill Factor to work correctly and avoid page splits. 

    As an example, I did a test which simulated 1,000 individual row inserts per hour for ten hours per simulated day for a simulated year.... using GUIDs for the Clustered PK.  Reorganize didn't prevent page splits even when done on a daily basis (in fact, it forced them to happen) and REBUILDs allowed too many page splits even when the correct Fill Factor was used because the 10 and 30% recommendation is actually very wrong for such an index (and, as it turns out, most indexes).

    Heh... and yes, I have the fixes for all that.  Getting very close to completing all the testing I need to but you'll have to wait until I'm done testing.

    In the meantime, rebuild those stats!!!  If you're having major blocking issues the day after you run index maintenance, then stop doing index maintenance until you know the better way (that's what I went through in Jan 2016).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply