Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Ctrl-alt-geek

Matt Bowler is a DBA at Trade Me, and teaches database design and administration courses at the local institute of technology. A recent but avid discoverer of SQL Server, Matt is a regular contributor to forums at MSDN, Experts Exchange and SSC and he blogs at mattsql.wordpress.com.

Duplicate Statistics

The Setup:

Standard best practise is to have auto create and auto update statistics set for SQL Server databases. But there is no corresponding setting to remove statistics that are no longer necessary. Unless explicitly managed by the DBA these statistics could be causing excess resource usage.

Demo:

Create a new database and port some data across from the AdventureWorks sample database.

if db_id('stats_duplicate') is not null
	drop database stats_duplicate
go

create database stats_duplicate
go

use stats_duplicate
go

select * into dbo.only_table
from AdventureWorks2012.Sales.SalesOrderDetail

Verify that auto create and auto update statistics are set as expected.

select 
	is_auto_create_stats_on
	, is_auto_update_stats_on
from sys.databases 
where name = 'stats_duplicate'

There is a single index (the heap) on the new table and no statistics.

select * from sys.indexes
where [object_id] = object_id('dbo.only_table')

select * from sys.stats
where [object_id] = object_id('dbo.only_table')

no_stats

Now we run a query with a predicate.

select * from dbo.only_table
where ProductID = 715

The query optimiser will automatically create a statistic on the ProductID column.

auto_stat

And we can examine the statistic:

dbcc show_statistics('dbo.only_table',_WA_Sys_00000005_0EA330E9)

stat_header1

We can see that the statistic is created on the ProductID column. Take note of when the statistic was last updated. (I know – I should have better things to do at 9:45pm!)

Now if we create an index on the ProductID column this will also create a statistic on the column.

create nonclustered index ix_productid 
on dbo.only_table(ProductID)

Which we can examine:

dbcc show_statistics('dbo.only_table',ix_productid)

stat_header2

Again – take note of the updated time.

Now let’s run an update of the ProductID column against all rows. This will mark the statistics as stale. If we run a query that will use the statistics this will trigger an automatic update.

update dbo.only_table
set ProductID += 1

select * from dbo.only_table
where ProductID = 715

And look at the stat headers.

dbcc show_statistics('dbo.only_table',_WA_Sys_00000005_0EA330E9) 
with stat_header

dbcc show_statistics('dbo.only_table',ix_productid) 
with stat_header

stat_header3

We can see by the dates that both statistics were updated even though only one will ever be used. We can confirm this by using a technique from Paul White that can show what statistics are considered, and what statistics are used for a given query.

DBCC FREEPROCCACHE
go
select * from dbo.only_table
where ProductID = 715
OPTION
(
    QUERYTRACEON 3604,
    QUERYTRACEON 9292,
    QUERYTRACEON 9204
)

Conclusion: Automatically created statistics can be rendered obsolete by subsequent index creations but they will still be updated and managed by SQL Server – consuming resources. I’ll leave with a script that I’ve based on ideas from Erin Stellato┬áthat will pick out any duplicate statistics in a given database.

with stats_on_indexes([object_id],[table_column_id],[index_name])
	as(
		select
			o.[object_id] as [object_id],
			ic.[column_id] as [table_column_id],
			i.name
		from sys.indexes i
		join sys.objects o on i.[object_id] = o.[object_id]
		join sys.stats st on i.[object_id] = st.[object_id] and i.name = st.name
		join sys.index_columns ic on i.index_id = ic.index_id and i.[object_id] = ic.[object_id]
		where o.is_ms_shipped = 0
		and i.has_filter = 0
		and ic.index_column_id = 1
	)
select
	o.[object_id] as [ID],
	o.name as [Table],
	c.name as [Column],
	s.name as [AutoCreatedStatistic],
	stats_on_indexes.index_name as [Index]
from sys.stats s
	join sys.stats_columns sc
		on s.stats_id = sc.stats_id and s.[object_id] = sc.[object_id]
	join sys.objects o 
		on sc.[object_id] = o.[object_id]
	join sys.columns c 
		on sc.[object_id] = c.[object_id] and sc.column_id = c.column_id
	join stats_on_indexes 
		on o.[object_id] = stats_on_indexes.[object_id] and stats_on_indexes.table_column_id = c.column_id
where s.auto_created = 1
and s.has_filter = 0

Comments

Leave a comment on the original post [mattsql.wordpress.com, opens in a new window]

Loading comments...