Too many statistics

  • Hi,

    After reading Gail Shaws articles about performance issue finding, I thought i'd give it a go myself.

    http://www.simple-talk.com/sql/performance/finding-the-causes-of-poor-performance-in-sql-server,-part-1/

    While playing around with different NC indexes to no avail, I wanted to change the PK to include more columns. Unfortunately I can't. The second column I want to include is nullable. I cannot alter the column because a statistic is dependent on it.

    In Object Viewer I noticed that there are a lot of statistics on this table.

    SELECT COUNT(*)

    FROM sys.stats

    WHERE OBJECT_NAME(OBJECT_ID) = '<my table>'

    Returns 246. Names start with _dta_ or _WA_sys.

    I pretty sure dta stats are created by Database Tuning Advisor, but I don't know what the other one is.

    So what I was wondering....

    1)Is this a normal amount?

    2)Can I safely drop all these?

    3)If I do, do I have to create them all again?

    4)How can I best go about altering my column?

    5)Why did DTA create so many?

    6)Where'd the _WA_sys stats come from?

    Working with SQL 2005 on this dev box.

    BTW, I don't know who ran DTA and commited the changes.



    For better, quicker answers on T-SQL questions, read Jeff Moden's suggestions.[/url]

    "Million-to-one chances crop up nine times out of ten." ― Terry Pratchett, Mort

  • The _WA_Sys stats are automatically created by SQL Server.

    You can drop them if you want. SQL will recreate them if it needs. The _dta stats were created when someone ran the database tuning adviser and accepted its suggestions.

    btw, I would not recommend you go and add more columns to your PK. The primary key is the unique key for the table, it should just be the column or set of columns that uniquely identifies a row and nothing more.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks for replying.

    So having so many statistics on a table with (36 cols) is normal or at least not unusual?

    No bloat or overhead involved?

    As for the PK, one of the indexes I was trying was a unique index with the PK column (CompNr Char(14)!!) plus another column (CustNr Int) which is involved in 2 joins. I think this would be a suitable candidate for a composite PK.

    Turns out just dropping the existing nonclustered index on CustNr gives me 3x better performance. 3205 scans now 1 (Clustered index scan, 500000+rows 76%) and reads dropped from 502017 to 31760.

    Btw, do you know where I can find documentation on Retrace? My searches just lead me back to the blog you posted. In the mean time I just grab the left 25-50 of textdata and group by that.



    For better, quicker answers on T-SQL questions, read Jeff Moden's suggestions.[/url]

    "Million-to-one chances crop up nine times out of ten." ― Terry Pratchett, Mort

  • MOC Ewez (11/29/2012)


    So having so many statistics on a table with (36 cols) is normal or at least not unusual?

    Depends. 🙂 If there are duplicate stats (between the dta and the auto-created), then you can drop the duplicate stats.

    As for the PK, one of the indexes I was trying was a unique index with the PK column (CompNr Char(14)!!) plus another column (CustNr Int) which is involved in 2 joins. I think this would be a suitable candidate for a composite PK.

    If the CompNr is the column that uniquely identifies the row, then the PK should be on CompNbr, not CompNbr + other columns. This is database design principals. Widen the PK unnecessarily and you introduce the possibility of incorrect data.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Ahhh, the dreaded but expected "Depends". :hehe:

    How does one find duplicate statistics?

    I've got a basic start.

    SELECTOBJECT_NAME(S.[OBJECT_ID]), S.Name, SC.*

    FROMSys.Stats S

    JOIN sys.Stats_Columns SC

    ON S.Stats_ID = SC.Stats_ID

    AND S.[Object_ID] = SC.[Object_ID]

    JOIN sys.Columns C

    ON SC.Column_ID = C.Column_ID

    AND SC.[Object_ID] = C.[Object_ID]

    WHEREOBJECT_NAME(S.[OBJECT_ID]) = 'myTable'

    ORDER BY SC.[Object_ID], SC.Stats_ID DESC

    I guess i need a recursive CTE or pivot into # table then group by and count....???

    These are still weak spots for me. I think an overly dramatic cursor would be my best bet.

    Do you happen to have a script for this up your sleeve?

    .....I found one at : http://sqlserver-online.blogspot.nl/2010/11/multiple-statistics-sharing-same.htm

    But it returns more rows than I have statistics on this object.

    Btw, how do you alias urls? "click here

    "



    For better, quicker answers on T-SQL questions, read Jeff Moden's suggestions.[/url]

    "Million-to-one chances crop up nine times out of ten." ― Terry Pratchett, Mort

  • MOC Ewez (11/29/2012)


    I guess i need a recursive CTE or pivot into # table then group by and count....???

    These are still weak spots for me. I think an overly dramatic cursor would be my best bet.

    Personally I prefer just eyeballing it. Far less effort and more accurate.

    Query for the stats and their columns with an appropriate order by and read over the list.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • MOC Ewez (11/29/2012)


    As for the PK, one of the indexes I was trying was a unique index with the PK column (CompNr Char(14)!!) plus another column (CustNr Int) which is involved in 2 joins. I think this would be a suitable candidate for a composite PK.

    i think you are thinking other way around , PK provide clustred index by default , so while thinking abt PK you should think abt uniqueness of records. then next thought would be about indexes So if joins getting help from PK (clus index ) which certainly will in most of the cases that perfect but if not then other combination of column should be considered (inlcuding key column) for other index creation

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • Personally I prefer just eyeballing it. Far less effort and more accurate.

    :Wow:

    I've been eyeballing for a good hour now. Even tried tilting my head an closing one eye for a while.

    I reckon this eyeballing skill of yours isn't a natural trait built into human dna. Well at least not mine anyway. 😛

    Do you do this with or without shoes?:-)



    For better, quicker answers on T-SQL questions, read Jeff Moden's suggestions.[/url]

    "Million-to-one chances crop up nine times out of ten." ― Terry Pratchett, Mort

  • MOC Ewez (11/29/2012)


    Do you do this with or without shoes?:-)

    Without of course. 🙂

    What you want to do is compare the columns the stats are on and see if you have any subsets. So do you have a stats object on Col1 and another on Col1, Col2. If you do, the first is redundant.

    I have a script that pulls all stats with a delimited list of columns, but it doesn't work on SQL 2005, so not much point in posting it here.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • i think you are thinking other way around , PK provide clustred index by default , so while thinking abt PK you should think abt uniqueness of records. then next thought would be about indexes So if joins getting help from PK (clus index ) which certainly will in most of the cases that perfect but if not then other combination of column should be considered (inlcuding key column) for other index creation

    What is wise for a PK seems to be highly debated amongst the pros. Some say it should be a short as possible "Description" of what your table is about, others will say keep it as small and contiguous possible (Often Identity) due to it being appended to other indexes. (And more)

    Indeed you are right, I was thinking about the point of using a unique index when I could just make that index the PK. But testing things out with so many statistics stumped me.

    It looks as if all my inherited DBs are chocablock with stats produced from both DTA and Auto Statistics. Is there a point when too many (Non-duplicate) stats is too much?



    For better, quicker answers on T-SQL questions, read Jeff Moden's suggestions.[/url]

    "Million-to-one chances crop up nine times out of ten." ― Terry Pratchett, Mort

  • MOC Ewez (11/29/2012)


    What is wise for a PK seems to be highly debated amongst the pros. Some say it should be a short as possible "Description" of what your table is about, others will say keep it as small and contiguous possible (Often Identity) due to it being appended to other indexes. (And more)

    Small and contiguous are considerations for the clustered index. PK != clustered index.

    The primary key is a logical database design decision, it is a column or set of columns that uniquely identifies the row (used in foreign key relationships). That is all.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Small and contiguous are considerations for the clustered index. PK != clustered index.

    The primary key is a logical database design decision, it is a column or set of columns that uniquely identifies the row (used in foreign key relationships). That is all.

    Thanks for pointing that out! I had always thought it was a clustered index as i always create a PK when creating a table, which creates a clustered index if one doesn't exist. Never seen a table with a clustered index that is not the same as the pk.



    For better, quicker answers on T-SQL questions, read Jeff Moden's suggestions.[/url]

    "Million-to-one chances crop up nine times out of ten." ― Terry Pratchett, Mort

  • CREATE TABLE Blah (

    ID int identity,

    SomeDate DATETIME

    -- other fields

    )

    CREATE CLUSTERED INDEX idx_Blah on Blah (SomeDate)

    ALTER TABLE Blah ADD CONSTRAINT pk_blah PRIMARY KEY NONCLUSTERED (ID)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I made the same insightful example.

    Found some helpful info to correct my mind set. Also found out you can you unique constraint to reference FKs. Don't know that either. This is turning out to be very educational. 🙂

    PKs is a heated topic : http://social.msdn.microsoft.com/Forums/en/transactsql/thread/158d77f7-3029-43bc-bba6-a8a12374f00c

    Anyway, back to the stats business.

    I found and read Kendal van Dykes blog on overlapping statistics.

    http://www.sqlservercentral.com/blogs/kendalvandyke/2010/09/09/more-on-overlapping-statistics/ Which leads to another and another.

    He's talking about overlapping statistics, is that what you mean by duplicates?

    Here's the code to find my duplicate statistics.

    Start by creating a function that reorders a string of strings. (For Keys in sp_HelpStats)

    -- Reorder a set of values in a string.

    CREATE FUNCTION ufn_SortStrings (@Base Varchar(MAX), @Splitter Char(1) = ',', @RetSplitter Char(1) = ',')

    RETURNS Varchar(MAX)

    AS

    BEGIN

    DECLARE -- Variables and initialize

    @RetVal Varchar(MAX),

    @Tmp Varchar(MAX)

    SET @RetVal = ''

    SET @Tmp = ''

    DECLARE @TmpVals TABLE (Val Varchar(50))

    -- Get rid of any preceding @Splitter

    WHILE LEFT(@Base, 1) = @Splitter

    BEGIN

    SET @Base = SUBSTRING(@Base, 2, LEN(@BASE) - 1)

    END

    -- Get rid of any trailing @Splitter

    WHILE RIGHT(@Base, 1) = @Splitter

    BEGIN

    SET @Base = LEFT(@Base, LEN(@BASE) - 1)

    END

    -- Loop through The base string

    WHILE LEN(@Base) > 0

    BEGIN

    IF CHARINDEX(@Splitter, @Base) > 0 -- Check if the splitter is there

    BEGIN

    SET @Tmp = LTRIM(RTRIM(SUBSTRING(@Base, 1, CHARINDEX(@Splitter,@Base)-1))) -- Grab the first part and trim any spaces

    SET @Base = SUBSTRING(@Base, CHARINDEX(@Splitter,@Base), LEN(@Base) - LEN(@Tmp)) -- Remove the split from the base string

    IF LEFT(@Base, 1) = @Splitter -- Remove the trailing @Spiltter if it's there

    SET @Base = SUBSTRING(@Base, 2, LEN(@BASE) - 1)

    END

    ELSE -- No splitter, must be the last part.

    BEGIN

    SET @Tmp = LTRIM(RTRIM(@Base))

    SET @Base = REPLACE(@Base, @Tmp,'')

    END

    IF REPLACE(@Tmp, ' ', '') <> '' -- Don't insert any empty values

    INSERT INTO @TmpVals VALUES (@Tmp) -- Enter the split value into a table

    END

    -- Loop through the ordered split values and add them to a string

    DECLARE cVals Cursor FOR SELECT Val FROM @TmpVals ORDER BY 1

    OPEN cVals

    FETCH NEXT FROM cVals INTO @Tmp

    WHILE @@FETCH_STATUS = 0

    BEGIN

    SET @RetVal = @RetVal + @Tmp + @RetSplitter

    FETCH NEXT FROM cVals INTO @Tmp

    END

    CLOSE cVals

    DEALLOCATE cVals

    IF RIGHT(@RetVal, 1) = @RetSplitter

    SET @RetVal = LEFT(@RetVal, LEN(@RetVal) - 1) -- Get rid of the trailing @RetSplitter

    RETURN @RetVal

    END

    And the code to create an SP to return only duplicates from the specified table.

    --EXEC usp_FindDupStats '<myTable>'

    CREATE PROCEDURE usp_FindDupStats @TblName Varchar(150)

    AS

    BEGIN

    --DECLARE -- Parameters (Testing)

    --@TblName Varchar(150)

    --SET @TblName = '<my Table>'

    DECLARE -- Variables

    @SQL NVarchar(500)

    CREATE TABLE #TblStats

    (

    StatName Varchar(100),

    Keys Varchar(MAX)

    )

    CREATE TABLE #TblStatsExt

    (

    TblName Varchar(150),

    StatName Varchar(100) NOT NULL,

    Keys Varchar(MAX),

    OrderedKeys Varchar(MAX)

    )

    ALTER TABLE #TblStatsExt ADD CONSTRAINT PK_Tmp PRIMARY KEY (StatName)

    SET @SQL = 'INSERT INTO #TblStats EXEC sp_HelpStats @Tbl, ''ALL'''

    EXEC sp_ExecuteSQL @SQL, N'@Tbl Varchar(150)', @Tbl=@TblName

    INSERT INTO #TblStatsExt

    SELECT@TblName, StatName, Keys, Sandbox.dbo.ufn_SortStrings(Keys,',',',') --<<== Change DB

    FROM#TblStats

    SELECTT1.TblName, T1.StatName, T1.Keys, T1.OrderedKeys, RowNr

    FROM#TblStatsExt T1

    JOIN

    (

    SELECTStatName, OrderedKeys,

    ROW_NUMBER() OVER (PARTITION BY OrderedKeys ORDER BY StatName) RowNr

    FROM#TblStatsExt

    ) T2

    ON T1.StatName = T2.StatName

    WHERERowNr > 1

    DROP TABLE #TblStats

    DROP TABLE #TblStatsExt

    END

    Now I can use this to loop through and drop the duplicates.

    I haven't done this yet as I don't (Yet) know enough about statistics to know if this could be dangerous.



    For better, quicker answers on T-SQL questions, read Jeff Moden's suggestions.[/url]

    "Million-to-one chances crop up nine times out of ten." ― Terry Pratchett, Mort

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply