Why we need to use UPDATE STATISTICS ?

  • Hi,

    I have a some questions here -

    1. Why we need to use UPDATE STATISTICS ?

    2. When we need to do this?

    3. How do we know Table need to run UPDATE STATISTICS ?

    I would be appreciated if you can explain me. I am starting to change my SQL Developer Role to DBA so that I really want to know more experience with SQL.

    Also what is the best practise for DBA?

    Thanks

  • I believe in a high traffic environment, update staistics can slow down processing, so some folks like to save it as a nightly maintenance plan, instead of letting it be done automatically.

    Second, I seem to remember the auto update of the statistics is based on a percentage of changes, i believe, say it's 1 percent of the sample size,

    not really noticable in a small table with a couple of hundred records, but if you had a billion rows in your table, it would take a long time to insert enough rows to trigger the auto-statistics....900K rows of inserts would be enough to screw up execution plans and make them run slower, but not enough to make the statistics auto-update, so you might need to do it manually...or as a nightly maintenance plan or something

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Lowell (10/27/2008)


    Second, I seem to remember the auto update of the statistics is based on a percentage of changes, i believe, say it's 1 percent of the sample size,

    20% of the table + 500 rows. (for a table with more than 500 rows)

    Especially on large tables, that's not often enough, so running the stats update manually is needed. Also sometimes the default sample size that the auto stats uses isn't good enough and it's necessary to update stats with full scan.

    There's no hard and fast rule for when you should update. Usually the auto-update is good enough, plus rebuilding indexes updates their stats. Generally what I've done in the past is add tables to a manual update job on a case-by-case basis. Anytime I notice a query running poorly because of bad stats, that table get added to the list.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (10/27/2008)


    Lowell (10/27/2008)


    Second, I seem to remember the auto update of the statistics is based on a percentage of changes, i believe, say it's 1 percent of the sample size,

    Anytime I notice a query running poorly because of bad stats, that table get added to the list.

    How do I check of bad stats? Do you have any command line to check the table?

    What is the useful command line for DBCC UPDATESTATS ?

    Thanks.

  • Leo (10/27/2008)


    How do I check of bad stats? Do you have any command line to check the table?

    It's not a command. It's a case of noticing queries that run well one day, poorly the next with no changed. If the exec plan of the query shows a large discrepancy between actual and estimated rows then it's very likely the result of bad stats. If a manual update statistics helps, then it's definitely the result of bad statistics.

    What is the useful command line for DBCC UPDATESTATS ?

    It's not a DBCC. Look up UPDATE STATISTICS in Books Online

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • OK Gail put your thinking hat on, because you've got to double check my work.

    I'm combining a couple of scripts i had laying around to try and find statistics that might need updating.

    I figured i'd assume a lowly 4% or 1400 rows as my personal goal for identifying "what" needs updated statistics.

    I'm getting more rows modified than i have total rows in the table in my query; was under the impression that sysindexes.rowmodctr was a count of rows modified, and when statistics got automatically updated, it was reset to zero.

    as you can see below, I show 5896 rows modified, on a table with only 1537 rows.

    here's a couple of typical rows of data:

    RB_TABLE TABLE_NAME_IDX 5896 1537 100.00

    SSFUNCAT PK__SSFUNCAT__19C17488 1400 2468 56.73

    here's my prototype sql:

    [font="Courier New"]SELECT

      sysobjects.name                  AS [Table Name],

      sysindexes.name                  AS [Index Name],

      sysindexes.rowmodctr             AS [Rows Modified],

      IndexTotals.NUMROWS              AS [Total Rows],

      CASE

        WHEN sysindexes.rowmodctr > IndexTotals.NUMROWS

        THEN 100.00

        WHEN IndexTotals.NUMROWS = 0

        THEN 100.00

        ELSE CONVERT(DECIMAL(10,2),(CONVERT(DECIMAL(10,2),sysindexes.rowmodctr ) / CONVERT(DECIMAL(10,2),IndexTotals.NUMROWS ) * 100))

      END AS [Percent Modified]

    FROM sysobjects

      INNER JOIN sysindexes  

       ON sysobjects.id = sysindexes.id

      INNER JOIN (SELECT

                    sysobjects.id,

                    sysobjects.name,  

                    MAX(sysindexes.rows) AS NUMROWS

                  FROM sysobjects  

                    INNER JOIN sysindexes

                      ON sysobjects.id = sysindexes.id

                  WHERE sysobjects.xtype = 'U'

                  GROUP BY sysobjects.id,sysobjects.name

                  )IndexTotals

        ON sysobjects.id = IndexTotals.id

    WHERE sysindexes.rowmodctr > 0  

      AND sysobjects.xtype = 'U'

      AND LEFT(sysindexes.name,7) <> '_WA_Sys'

    ORDER BY sysindexes.rowmodctr DESC

      

    [/font]

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Is RB_TABLE a heap by any chance? If so, it'll have no stats that could be updated.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • in my case RB_TABLE has two indexes; that table has a clustered index one column, and a non-clustered on the exact same column for some reason; but looking at all my tabbles, that is not the only table showing that behavior.

    I was hoping if you ran the same query, you might see simililar results and tell me what assumption I had was wrong. I'm going to fiddle with this today;I'll update stats, confirm whether or not sysindexes.rowmodctr updates the way I thought, and come up a decent detection proc today I think.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Just a little add on..

    If you want to get an overview of your stats ....

    SELECT object_name(id) as ObjectName

    , indid as index_id

    , name AS index_name

    , STATS_DATE(id, indid) AS statistics_update_date

    FROM sysindexes

    -- WHERE OBJECT_ID = OBJECT_ID('myschema.mytable')

    order by ObjectName, Index_id;

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Hi ALZDBA,

    Thanks for the Code, I just run it on one of my production database, I got the following -

    ObjectName, IndexID, indexName, statistics_update_Date

    Some of my table are 'NULL' in statistics_Update_Date column, is that mean UPDATE STATICS is never been run?

    Some are 2008-01-01 02:00:00 in it, is that mean last UPDATE STATICS is in January and never been done since?

    Thanks.

  • That would be a valid conclusion.

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Leo (10/28/2008)


    Some of my table are 'NULL' in statistics_Update_Date column, is that mean UPDATE STATICS is never been run?

    More likely it means that the object in question is a heap (table without a clustered index). Heaps don't have stats. Even if the stats have never been updates since creation, stats_date will have a value - the date of creation of that stats set.

    Edit: It can also mean that the table is empty or that statistics don't exist for that index. I've seen that occationally, not sure how it happens. Possibly if the index is never used.

    Some are 2008-01-01 02:00:00 in it, is that mean last UPDATE STATICS is in January and never been done since?

    It means that the stats were last updated (by an index rebuild, an update statistics or an automatic stats update) on the 1st Jan. If the table hasn't been changed since then, that's nothing to be concerned about.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Lowell (10/28/2008)


    I was hoping if you ran the same query, you might see simililar results and tell me what assumption I had was wrong. I'm going to fiddle with this today;I'll update stats, confirm whether or not sysindexes.rowmodctr updates the way I thought, and come up a decent detection proc today I think.

    I'll try it later. Check that the index in question doesn't have auto-update stats disabled on it.

    I think you can get that with the INDEXPROPERTY function.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (10/28/2008)


    (10/28/2008...More likely it means that the object in question is a heap (table without a clustered index). Heaps don't have stats. Even if the stats have never been updates since creation, stats_date will have a value - the date of creation of that stats set.

    Edit: It can also mean that the table is empty or that statistics don't exist for that index. I've seen that occationally, not sure how it happens. Possibly if the index is never used.

    Actually yesterday I ran into a pk having NULL for stats date, and the table contained +/- 9000 rows.

    A secondary index did have a stats date...

    sp_updatestatistics fixed it.

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • ALZDBA (10/29/2008)


    Actually yesterday I ran into a pk having NULL for stats date, and the table contained +/- 9000 rows.

    A secondary index did have a stats date...

    Coincidentally, I ran into the explanation in a book I was reading this morning. If the index is created before data is added to the table, the stats aren't created with the index. They may be created later, or they may not.

    While it's too late to check, I'm betting that if you'd done a DBCC Show_Statistics, there would have been no histogram.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply