When do we manually update Statistics

  • We know we can create indexes and the value of it.

    By default AUTOSTATS seems to be set to ON

    meaning every 20 minutes or so, all statistcs are being updated.

    So the optimizer should be doing its job properly.

    everything gung ho!!!

    --===============================================================

    Now, we can also manually create statistics.

    The question is why do we need them?

    If I have to manually create statistics on a new table.field(s)

    I would create an index itself.

    If on a older table.field(s), I would drop and recreate or rebuild

    the index with autostats set to ON

    I don't see where REALLY does manually creating statistics help.

    Microsoft is giving the following explanation, which atleast I am not able to understand.

    A key benefit of creating statistics manually is that you can create statistics that contain densities of values for a combination of columns, the database engine could make a better estimate for query execution.

    A simple lay-man's explanation could help many of us, if it is really a helpful tool.

  • >>meaning every 20 minutes or so, all statistcs are being updated.

    In certain situations, you do not want this happening during a busy time of the day.  It may drain performance.

  • hankz much Sam. I am a dumbo. Many times I myself have turned off AUTOSTATS because of performance reasons. But yesterday could not catch up on that knowledge bank.

     

    But still any idea what this explanation of Microsoft is? No length explanation needed, just a lay-man's understanding.

    A key benefit of creating statistics manually is that you can create statistics that contain densities of values for a combination of columns, the database engine could make a better estimate for query execution.

  • Alex is my long time DB mentor/guru. Thought his feedback on this may be useful for others.

    Greetings,

    I'll try to elaborate on this one...

    As defined in MSDN - When statistics are created, the Database Engine sorts the values of the columns on which the statistics are being built and creates a histogram based on up to 200 of these values, separated by intervals. The histogram specifies how many rows exactly match each interval value, how many rows fall within an interval, and a calculation of the density of values, or the incidence of duplicate values, within an interval.

    Without getting into too much detail of the aforementioned, as well as what actually triggers an auto-update of statistics (which is not necessary time-based), the key point is that statistics do not have the overhead of indexes since only statistical sampling of the values in rows is kept. Which also translates to "... sometimes it is more efficient to use column statistics instead of an index to optimize query performance..."

    Another important point is that for auto-update statistics on non-indexed columns as well as statistcs for composite indexes ONLY uses the first column, which may not always provide the most optimal execution plan with respect to non-first columns on the list. This is where composite indexes really don't help at all, even if regenerated from scratch. Here's what MSDN shows as a n example:

    Creating statistics manually lets you create statistics that contain multiple column densities. These are average number of duplicates for the combination of columns. For example, a query contains the clause WHERE a = 7 and b = 9.

    Creating manual statistics on both columns together (a, b) could allow the Database Engine to make a better estimate for the query, because the statistics also contain the average number of distinct values for the combination of columns a and b.

    So the bottom line is that creating manual statistics would benefit queries that utilize multiple columns together. In other words, if all columns were indexed separately or as a composite, such structure may not provide the most efficient execution plan than manually created multi-column statistics. As with many other situations though, there's no one solution to all cases, and one would have to make controlled experiments to see what works best for each situation.

    References:

    http://www.sql-server-performance.com/statistics.asp

    http://msdn2.microsoft.com/en-us/library/ms190397.aspx

    Hopefully I was able to clarify a bit 🙂

    Thanks a lot,

    Alex

  • I don't know where you get the info about auto stats updating every 20 mins for all tables - absolutely not so - auto update stats will occur ONLY for a column where there are stats, this is at column level please note, not table level, at the predefined settings widely published by microsoft ( see elesewhere ) It's generally best practice to leave auto update stats on unless you find it causing problems - if you don't know how to find out if autostats are giving you problems then basically you can't post making such a claim!!

    So .. should you manually update stats - yes every day at least if you can, unless you're rebuilding all your indexes using a dbreindex( tablename ) every day which will also update all your stats, so no real point to duplicate work.

    The auto update stats is designed to give some measure of up to date stats but it doesn't better doing your own updates. I've always updated stats every night/day ( on oltp dbs ) where possible right from 6.0 days.

    As to manual stats - well I've never found any real reason to use them over indexes, and auto create stats will create loads of system stats on your tables anyway, if you're not using auto create stats then maybe you might want to create stats yourself.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • sorry that is a factual error:

    The frequency at which the statistical information is updated is determined by the volume of data in the column or index and the amount of changing data.

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply