Finding & Fixing Statistics Without Histograms

Statistics
38 Comments

Men Without Hats – now those guys were cool:

Statistics Without Histograms – not so much.

If you have a database that’s been passed along from one SQL Server to another, gradually upgraded over the years, or if you’ve had a table that’s been loaded but never queried, you can end up with a curious situation. For example, take the StackOverflow2010 database: I build it on SQL Server 2008, detach it, and then make it available for you to download and play with. The nice thing about that is you can attach it to any currently supported version – SQL Server 2017 attaches the database, runs through an upgrade process, and it just works.

Mostly.

But when you go to look at statistics:

We can leave your stats behind

That means when we ask SQL Server to guess how many rows are going to come back for queries, it’s going to act real rude and totally removed.

Finding This Problem in SQL Server 2016 & Newer

Starting with SQL Server 2016 Service Pack 1 Cumulative Update 2, you can run this in an affected database to find the tables with missing stats, and generate an UPDATE STATISTICS command to fix them.

Warning: before you run this, test it in development. In the comments on this post, we’re getting reports that querying this new DMV is causing some servers to restart! Michael J. Swart reports that it was fixed in 2017 Cumulative Update 8.

That gives you a list of fixes:

Everything’ll work out right

Run those in a way that makes you feel – what’s the word I’m looking for – safe. Depending on your server’s horsepower and the size of the objects, you may want to do it after hours.

Updating statistics can be a read-intensive operation since it’s going to scan the table, but at least it’s not write-intensive (since statistics are just single 8KB pages.) However, be aware that this can also cause recompilations of query plans that are currently cached.

Finding This Problem in Earlier Versions

That’s left as an exercise for the reader. Parsing DBCC SHOW_STATISTICS would be a bit of a pain in the rear, and I’m already dancing to a list of YouTube’s related videos. Good luck!

Previous Post
A Simple Stored Procedure Pattern To Avoid
Next Post
6 DBA Lessons I Wish Someone Would Have Taught Me Earlier

38 Comments. Leave new

  • The query is returning some false positives for me – when a column only has NULL values for every row (I know, I know…..) DBCC SHOW_STATISTICS returns a one row histogram, with RANGE_HI_KEY = NULL and EQ_ROWS= total number of rows in the table.
    But sys.dm_db_stats_histogram returns nothing.

    Can’t see anything in BOL that would call this out?

    Reply
  • you should exclude the case of having columnstore indexes : its stat is empty (built on the fly) but this leads your script to give the fix when there is any issue.
    maybe it could be better to have a statement for each stats with an empty histogram instead of giving simply the name of the overstanding table. i could open a pull request on GitHub 😛

    Reply
  • A bit fiddly but not too bad, would need some extra SQL to pull back table and statistics names and a cursor to test each one in turn.

    create table #st
    (
    RANGE_HI_KEY bigint,
    RANGE_ROWS bigint,
    EQ_ROWS bigint,
    DISTINCT_RANGE_ROWS bigint,
    AVG_RANGE_ROWS bigint
    )

    declare @statement varchar(1000)
    set @statement = ‘dbcc show_statistics (”[dbo].[yourtable]”, [PK_yourtable]) with histogram’

    insert #st (RANGE_HI_KEY, RANGE_ROWS, EQ_ROWS, DISTINCT_RANGE_ROWS, AVG_RANGE_ROWS)
    exec(@statement)

    select count(*) from #st

    Reply
  • I’m curious as to why you have the generated command update all stats on the table instead of just the stat without the histogram..

    Reply
  • Useful query. Thank you.

    Lots of duplicate UPDATE STATISTICS statements are generated. SELECT DISTINCT clears that up.

    Reply
  • Justin Hoffmann
    September 20, 2018 1:50 pm

    You may need to adjust the spacing a bit, but I copied the above and applied what it looks like Ola Hallengren does from one of my CommandLog tables:

    SELECT SCHEMA_NAME(o.schema_id) AS schema_name,
    o.name AS table_name,
    ‘UPDATE STATISTICS ‘ + QUOTENAME(SCHEMA_NAME(o.schema_id)) + ‘.’ + QUOTENAME(o.name) + ‘ ‘ +QUOTENAME(s.name) + ‘ WITH FULLSCAN;’ AS the_fix
    FROM sys.all_objects o
    INNER JOIN sys.stats s ON o.object_id = s.object_id AND s.has_filter = 0
    OUTER APPLY sys.dm_db_stats_histogram(o.object_id, s.stats_id) h
    WHERE o.is_ms_shipped = 0 AND o.type_desc = ‘USER_TABLE’
    AND h.object_id IS NULL
    AND 0 < (SELECT SUM(row_count) FROM sys.dm_db_partition_stats ps WHERE ps.object_id = o.object_id)
    ORDER BY 1, 2;

    Reply
  • I ran it on prod and found 70 missing histograms. Thank you for a script. Going to build missing stats this week and hope CPU will go down :))

    Reply
  • You mention SP1 CU2. What fix was in the CU that makes this happen?

    Reply
  • Not sure about Columnstore but for normal tables if I run the query with DISTINCT and then run the update stats the problem should disappear. We have Auto Create and Auto Update Stats on and run OLA on a regular basis.

    Reply
  • Anyone else getting a system assertion check failure when running this? It’s worked most places, but on two databases on 13.0.5201.2 instances it bombs with the following.

    Location: e:\b\s3\sources\sql\ntdbms\include\typesystem\_retypesbase.inl:544
    Expression: cbLen <= x_cbStringMost
    SPID: 59
    Process ID: 1456
    Msg 3624, Level 20, State 1, Line 1
    A system assertion check has failed. Check the SQL Server error log for details. Typically, an assertion failure is caused by a software bug or data corruption. To check for database corruption, consider running DBCC CHECKDB. If you agreed to send dumps to Microsoft during setup, a mini dump will be sent to Microsoft. An update might be available from Microsoft in the latest Service Pack or in a Hotfix from Technical Support.
    Msg 596, Level 21, State 1, Line 0
    Cannot continue the execution because the session is in the kill state.
    Msg 0, Level 20, State 0, Line 0
    A severe error occurred on the current command. The results, if any, should be discarded.

    DBCC CHECKDB comes back clean on the same db after.

    Reply
  • Changing the last WHERE condition to
    AND EXISTS (SELECT * FROM sys.dm_db_partition_stats ps WHERE ps.object_id = o.object_id AND ps.row_count > 0)
    will reduce the execution time on my database (with many partitionized tables) from 1 minute to 1 second -> never use SUM() or COUNT() when an EXISTS would be sufficient

    Reply
  • Kev Riley:

    To eliminate columns that contain all NULLs (and a 1 row histogram) from the result set, cross apply dm_db_stats_properties and only select results where dm_db_stats_properties.rows is NULL. Like this (which also incorporates Thomas Franz’ change and adds a command to only update the single stat instead of all stats on the table):

    SELECT SCHEMA_NAME(o.schema_id) AS schema_name,
    o.name AS table_name,
    s.[name] AS stats_name,
    ‘USE ‘ + QUOTENAME(DB_NAME()) + ‘;UPDATE STATISTICS ‘ + QUOTENAME(SCHEMA_NAME(o.schema_id)) + ‘.’
    + QUOTENAME(o.name) + ‘ WITH FULLSCAN;’ AS the_fix_all_table_stats,
    ‘USE ‘ + QUOTENAME(DB_NAME()) + ‘;UPDATE STATISTICS ‘ + QUOTENAME(SCHEMA_NAME(o.schema_id)) + ‘.’
    + QUOTENAME(o.name) + ‘ ‘ + QUOTENAME(s.name) + ‘ WITH FULLSCAN;’ AS the_fix_just_bad_stat
    FROM sys.all_objects o
    INNER JOIN sys.stats s
    ON o.object_id = s.object_id
    AND s.has_filter = 0
    OUTER APPLY sys.dm_db_stats_histogram(o.object_id, s.stats_id) h
    CROSS APPLY sys.dm_db_stats_properties(s.object_id, s.stats_id) AS sp
    WHERE o.is_ms_shipped = 0
    AND o.type_desc = ‘USER_TABLE’
    AND h.object_id IS NULL
    AND EXISTS
    (
    SELECT *
    FROM sys.dm_db_partition_stats ps
    WHERE ps.object_id = o.object_id
    AND ps.row_count > 0
    )
    AND sp.rows IS NULL
    ORDER BY 1,
    2;

    Reply
    • Thanks Shaun – works a treat!

      Reply
    • @Shaun:
      nearly perfect – since UPDATE STATISTICS is not allowed on columnstore indexes, we have to add a

      AND NOT EXISTS (SELECT * FROM sys.indexes AS i WHERE i.object_id = o.object_id AND i.name = s.name AND i.type_desc like ‘%COLUMNSTORE%’)

      to the WHERE condition

      Reply
  • Well Brent, you can start with the following for Pre-SQL Server 2016 (script is a bit long though), the below is good for a single database. I have a modified version that swings through all databases …
    set nocount on;
    declare
    @SQL varchar(2048),
    @MinID int = 1,
    @MaxID int
    ;
    if object_id(‘tempdb..#tmpStatsInfo’) is not null drop table #tmpStatsInfo;
    if object_id(‘tempdb..#tmpStatHeader’) is not null drop table #tmpStatHeader;

    declare @tmpObjTbl table (
    ID int identity(1,1),
    ObjSchema varchar(128) not null,
    TblName varchar(1024) not null,

    StatNm varchar(512) null,
    stats_id int not null,
    auto_created bit null,
    user_created bit null,
    no_recompute bit null,
    has_filter bit null,
    filter_definition varchar(max) null,
    is_temporary bit null,
    is_incremental bit null,

    last_updated datetime2 null,
    rows bigint null,
    rows_sampled bigint null,
    steps int null,
    unfiltered_rows bigint null,
    modification_counter bigint null
    );

    declare @tmpStatHeader table (
    Name varchar(1024) not null,
    Updated datetime null,
    Rows bigint null,
    RowsSampled bigint null,
    Steps int null,
    Density decimal (10, 7) null,
    AvgKeyLen int null,
    StringIndex varchar(3) null,
    FilterExpression varchar(4096) null,
    UnfilteredRows bigint null
    );

    create table #tmpStatHeader (
    ObjTblID int not null,
    Updated datetime null,
    Rows bigint null,
    RowsSampled bigint null,
    Steps int null,
    UnfilteredRows bigint null
    );

    create table #tmpStatsInfo (
    DBName varchar(512) not null,
    ObjSchema varchar(128) not null,
    TblName varchar(1024) not null,

    name varchar(512) null,
    stats_id int not null,
    auto_created bit null,
    user_created bit null,
    no_recompute bit null,
    has_filter bit null,
    filter_definition varchar(max) null,
    is_temporary bit null,
    is_incremental bit null,

    last_updated datetime2 null,
    rows bigint null,
    rows_sampled bigint null,
    steps int null,
    unfiltered_rows bigint null,
    modification_counter bigint null,

    ChangeCnt decimal(10,2) null,
    ExceededChangeCnt char(1) null
    );

    insert @tmpObjTbl (
    ObjSchema,
    TblName,
    StatNm,
    stats_id,
    auto_created,
    user_created,
    no_recompute,
    has_filter,
    filter_definition,
    is_temporary,
    is_incremental,
    rows,
    modification_counter
    )
    select distinct object_schema_name(o.id) as ObjSchema, o.name as TblName, s.name as StatNm,
    s.stats_id, s.auto_created, s.user_created, s.no_recompute, s.has_filter, s.filter_definition,
    s.is_temporary, s.is_incremental,
    si.rows, si.rowmodctr as modification_counter
    from sys.sysindexes si
    join sys.sysobjects o
    on o.id = si.id
    join sys.stats s
    on s.object_id = si.id
    order by o.name, s.name
    ;

    select @MaxID = max(id) from @tmpObjTbl;

    while (@MinID <= @MaxID)
    begin
    delete from @tmpStatHeader;

    select @SQL = 'dbcc show_statistics (''' + ObjSchema + '.' + TblName + ''', [' + StatNm + ']) with stat_header;' from @tmpObjTbl where ID = @MinID;

    if @SQL is not null
    begin
    insert @tmpStatHeader
    exec (@SQL);

    insert #tmpStatHeader (ObjTblID, Updated, Rows, RowsSampled, Steps, UnfilteredRows)
    select @MinID, Updated, Rows, RowsSampled, Steps, UnfilteredRows
    from @tmpStatHeader
    end

    set @MinID += 1;
    end

    update t
    set
    last_updated = h.Updated,
    rows = (case when t.rows = 0 and h.Rows 0 then h.Rows else t.rows end),
    rows_sampled = h.RowsSampled,
    steps = h.Steps,
    unfiltered_rows = h.UnfilteredRows
    from @tmpObjTbl t
    join #tmpStatHeader h
    on h.ObjTblID = t.ID;

    insert #tmpStatsInfo (
    DBName,
    ObjSchema,
    TblName,
    name,
    stats_id,
    auto_created,
    user_created,
    no_recompute,
    has_filter,
    filter_definition,
    is_temporary,
    is_incremental,
    last_updated,
    rows,
    rows_sampled,
    steps,
    unfiltered_rows,
    modification_counter,
    ChangeCnt,
    ExceededChangeCnt
    )
    select db_name(), ObjSchema, TblName, StatNm, stats_id, auto_created, user_created, no_recompute, has_filter,
    filter_definition, is_temporary, is_incremental, last_updated, rows, rows_sampled, steps, unfiltered_rows,
    modification_counter,
    case
    when (unfiltered_rows > 1000000 and unfiltered_rows 10000000 and unfiltered_rows 100000000 then [rows] * .05
    else [rows] * .2
    end as ChangeCnt,
    case
    when unfiltered_rows ([rows] * .2) then ‘Y’
    when (unfiltered_rows > 1000000 and unfiltered_rows ([rows] * .15) then ‘Y’
    when (unfiltered_rows > 10000000 and unfiltered_rows ([rows] * .1) then ‘Y’
    when unfiltered_rows > 100000000 and modification_counter > ([rows] * .05) then ‘Y’
    else ‘N’
    end as ExceededChangeCnt
    from @tmpObjTbl;

    select
    ‘use [‘ + DBName + ‘]; update statistics [‘ + ObjSchema + ‘].[‘ + TblName + ‘] ([‘ + [name] + ‘]) with ‘ +
    case
    when auto_created = 1 then ‘resample’
    else ‘fullscan’
    end +
    case when auto_created = 0 then ‘, index’ else ” end +
    case when no_recompute = 1 then ‘, norecompute’ else ” end +
    case when is_incremental = 1 then ‘, incremental = on’ else ” end +
    ‘;’ as Script, *
    from #tmpStatsInfo
    where ExceededChangeCnt = ‘Y’
    order by DBName, ObjSchema, TblName, [name];

    Reply
  • Just to point out that as for official documentation “Partitioned tables and indexes are not fully supported in this view(sys.sysindex); use the sys.indexes catalog view instead”.

    Reply
    • The above is only for a single database. When I do multi-database, I use the following:
      from sys.stats s
      cross apply sys.dm_db_stats_Properties(s.object_id, s.stats_id) p
      Additionally, sys.indexes does not give the row count or row modified that you can find in sys.sysindexes. If you can point me to someplace that contains that information, I would gladly switch it out. Already looking at switching to the sys.dm_db_stats_Properties as it contains a lot of the information I need.

      Reply
      • You can look at sys.dm_db_incremental_stats_properties 🙂

        Reply
        • Problem with the sys.dm_db_incremental_stats_properties is that it came about with SQL Server 2014 (which, thankfully, I am on). However, with the sys.dm_db_stats_Properties, you can use it as early as SQL Server 2008. How can we go back further, to say SQL Server 2005/2000? I think that is where Brent is trying to get back to.

          Reply
  • I have been using the following to find these since 2008. KISS
    SELECT *
    FROM sys.stats
    WHERE STATS_DATE(object_id, stats_id) IS NULL

    Reply
  • This is not a situation unique to old databases or tables that have never been queried. I just discovered you can lose your stats data if you run UPDATE STATISTICS with a sample size of zero. It says so right in BOL: “We recommend against specifying 0 PERCENT or 0 ROWS. When 0 PERCENT or ROWS is specified, the statistics object is updated but does not contain statistics data.”

    It amazing what you can learn when you read the manual..

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.