Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Auto update stats not good enough Expand / Collapse
Author
Message
Posted Monday, October 5, 2009 10:17 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Friday, November 22, 2013 10:13 AM
Points: 79, Visits: 200
Hello - Long winded post, but performance Guru's should be interested, so please read. I would appreciate any feedback

We have some clients who deploy our product(and the standard database footprint on SQL2005 that goes with it), and we are finding that SQL Server's auto update stats feature just doest cut it. When a high volume of processing is pushed thru it, SQL Server's inbuilt auto update stats feature is not fast enough to keep the stats up to date and quickly falls behind the speed with which we some times delete / update / insert into the tables. The result is very unpredictable run times when it comes to our processing, which is very intricate and very complex.

We built our own update stats proc which scans all tables touched during our processing and check on rowmodctr in sys.sysindexes, and based on our own preset threshold, selectively update stats on tables that need it. We then tried two possible approaches :

1. Run this proc on a selected schedule of say, every five minutes. OR,
2. Run this proc from within our processing logic whenever we have completed one "unit" of processing(I am not at liberty to discuss the details of what the "unit of processing" is) which we know would do a certain amount of insert/delete/update into the tables concerned.

The results from the first are somewhat improved stability but still leaving open the chance of occasional bad performance(run times).

The second approach is also somewhat consistent but worse from the first option as it seems to exacerbate the possibility of deadlocks (also another known issue that we are working on) and increases our processing times. (We do handle deadlocks by coming back and retrying, but that increases our processing times).
Post #798040
Posted Monday, October 5, 2009 11:02 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 9:42 AM
Points: 12,957, Visits: 32,491
the problem is SQL uses a hard 20% rule to determine if statistics are auto-updated or not;on a million row+ table, that's not often enough

here's a work in progress i was fiddling with; point it at a database, and *in theory*, it will identify anything that statistics should be updated on, based on the logi you see inside...small tables, 20 percent, rows with lots of changes, much more often;
see if this gives you any ideas...one of the problems i was having is even after updating statistics, some tables would still identify that there are a lot of changes, and i don't think i fixed that issue yet:
--The Analysis: my crappy assumptions:
--UPDATE STATISTICS dbo.GMPVDET
--tables under 1000 rows, I'll leave then at 20%
--tables with more rows than that, I'll use an arbitrary sliding scale formula.
--formula to be modified based on analysis
SELECT X.*,
ISNULL(CASE
WHEN X.[Total Rows]<=1000
THEN
CASE
WHEN [Percent Modified] >=20.0
THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' WITH ALL, FULLSCAN --20% Small Table Rule'
END
WHEN [Percent Modified] =100.00
THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' WITH ALL, FULLSCAN --100% No real Stats Rule'
--WHEN X.[Rows Modified] > 1000
--THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' WITH ALL, FULLSCAN --1000 Rows Modified Rule'
ELSE
CASE
WHEN X.[Total Rows] > 1000000000 --billion rows
THEN CASE
WHEN [Percent Modified] > 0.1
THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' WITH ALL, FULLSCAN -- 1B Big Table Rule'
END
WHEN X.[Total Rows] > 100000000 --hundred million rows
THEN CASE
WHEN [Percent Modified] > 1.0
THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' WITH ALL, FULLSCAN -- 100M Big Table Rule'
END
WHEN X.[Total Rows] > 10000000 --ten million rows
THEN CASE
WHEN [Percent Modified] > 2.0
THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' WITH ALL, FULLSCAN -- 10M Big Table Rule'
END
WHEN X.[Total Rows] > 1000000 --million rows
THEN CASE
WHEN [Percent Modified] > 5.0
THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' WITH ALL, FULLSCAN -- 1M Big Table Rule'
END
WHEN X.[Total Rows] > 100000 --hundred thousand rows
THEN CASE
WHEN [Percent Modified] > 10.0
THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' WITH ALL, FULLSCAN -- 100K Big Table Rule'
END
WHEN X.[Total Rows] > 10000 --ten thousand rows
THEN CASE
WHEN [Percent Modified] > 20.0
THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' WITH ALL, FULLSCAN -- 10K Big Table Rule'
END
END
END,'') AS [Statistics SQL]
FROM (
SELECT DISTINCT
DB_NAME() AS [Database],
S.name AS [Schema Name],
T.name AS [Table Name],
I.rowmodctr AS [Rows Modified],
P.rows AS [Total Rows],
CASE
WHEN I.rowmodctr > P.rows
THEN 100
ELSE CONVERT(decimal(8,2),((I.rowmodctr * 1.0) / P.rows * 1.) * 100.0)
END AS [Percent Modified]
FROM
sys.partitions P
INNER JOIN sys.tables T ON P.object_Id = T.object_id
INNER JOIN sys.schemas S ON T.schema_id = S.schema_id
INNER JOIN sysindexes I ON P.object_id = I.id
WHERE P.index_id in (0,1)
AND I.rowmodctr > 0
) X




Lowell

--There is no spoon, and there's no default ORDER BY in sql server either.
Actually, Common Sense is so rare, it should be considered a Superpower. --my son
Post #798067
Posted Monday, October 5, 2009 11:09 AM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 8:36 AM
Points: 40,615, Visits: 37,080
Guru Nagabhushan (10/5/2009)
We have some clients who deploy our product(and the standard database footprint on SQL2005 that goes with it), and we are finding that SQL Server's auto update stats feature just doest cut it.


Hardly unusual on larger tables, especially ones where the clustered index is on ascending columns and you're always querying for the latest rows.

Add an update stats ... with full scan on a regular interval. Daily if the DB has steady changes, more often if necessary. Every 5 min is probably a little too often, unless you're doing bursts of updates. Generally I prefer not to do the shotgun approach of updating all stats that may be wrong (or all stats regardless) but to target manual updates on just the tables/indexes that are known to have a problem.

I've done your second approach before, very successfully. Add an update stats with fullscan after finishing loading the day's data. Update stats shouldn't be able to cause deadlocks, it runs in read uncommitted isolation, it takes no locks other than schema stability and only a schema modification can block it.



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #798070
Posted Monday, October 5, 2009 11:17 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: Administrators
Last Login: Today @ 9:42 AM
Points: 31,362, Visits: 15,824
I tend to agree with Gail's approach. Don't update everything if you don't need to.

There's a counter to keep track of # of rows changed. Lowell's script seems to make good use of it. If you can tailor your updates, I'd do that. It will save some resources on the server.







Follow me on Twitter: @way0utwest

Forum Etiquette: How to post data/code on a forum to get the best help
Post #798076
Posted Monday, October 5, 2009 11:33 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Friday, November 22, 2013 10:13 AM
Points: 79, Visits: 200
Appreciate your feedback, guys. I like the looks of Lowell's script. Actually, I use logic very similar to that myself. I dont update stats blindly, I actually check to see if they need to be updated, based on a % rows changed threshold, and then proceed to update stats. here is how my script looks like:

SET NOCOUNT ON
SET ARITHABORT OFF
SET ANSI_WARNINGS OFF

DECLARE @sql_string NVARCHAR(512) --To hold the dynamic SQL string
DECLARE @tblname NVARCHAR(256) --To hold the cursor variable
DECLARE @rc int --To hold the return code value when calling another proc
DECLARE @ERROR INT

--If no threshold has been passed, use 3.00%
IF @p_threshold IS NULL
SET @p_threshold = 3.00

BEGIN TRY
--Get a list of all tables that qualify for stats to be updated based on the threshold
--Dump it into a temp table
SELECT DISTINCT o.name as TABLENAME, (i.rowmodctr*100)/i.rowcnt AS PERCENTROWCHANGE, i.rowmodctr, i.rowcnt
INTO #TABLES_THAT_NEED_UPDATESTATS
FROM sys.objects o
INNER JOIN sys.sysindexes i ON o.object_id = i.id
where o.name like 'xxx_%'
and i.name is NOT NULL
AND i.rowmodctr > 0
AND i.rowcnt > 0
AND ((i.rowmodctr*100)/i.rowcnt) > @p_threshold
and o.name in (select name from sys.objects
where type like 'U'
and --- Filters based on Product specific table prefixes
and --- Filters based on Product specific table prefixes
and --- Filters based on Product specific table prefixes
)
UNION
SELECT DISTINCT o.name as TABLENAME, (i.rowmodctr*100)/i.rowcnt AS PERCENTROWCHANGE, i.rowmodctr, i.rowcnt
FROM sys.objects o
INNER JOIN sys.sysindexes i ON o.object_id = i.id
where o.name like 'xxx_%'
and i.name is NOT NULL
AND i.rowmodctr > 0
AND i.rowcnt = 0
and o.name in (select name from sys.objects
where type like 'U'
and --- Filters based on Product specific table prefixes
and --- Filters based on Product specific table prefixes
and --- Filters based on Product specific table prefixes
)
ORDER BY PERCENTROWCHANGE DESC

--If the parameter has been passed to only list the table, list the contenrts and exit
IF @p_listOnly = 1
BEGIN
SELECT * FROM #TABLES_THAT_NEED_UPDATESTATS
RETURN 0
END
ELSE
--If not, create a cursor and loop thru all the tables that qualify for an update of
--stats, using sp_executesql to execute the dynamic sql
BEGIN
DECLARE CR_TABLES_THAT_NEED_UPDATESTATS INSENSITIVE CURSOR
FOR SELECT TABLENAME FROM #TABLES_THAT_NEED_UPDATESTATS

OPEN CR_TABLES_THAT_NEED_UPDATESTATS

FETCH NEXT FROM CR_TABLES_THAT_NEED_UPDATESTATS
INTO @tblname

WHILE @@FETCH_STATUS = 0
BEGIN

SELECT @sql_string = 'UPDATE STATISTICS ' + @tblname
Exec sp_executesql @sql_string

FETCH NEXT FROM CR_TABLES_THAT_NEED_UPDATESTATS
INTO @tblname
END
--close and deallocate cursor
CLOSE CR_TABLES_THAT_NEED_UPDATESTATS
DEALLOCATE CR_TABLES_THAT_NEED_UPDATESTATS
END
END TRY
--Handle any unexpected errors
BEGIN CATCH
SELECT @ERROR = @@ERROR
IF @ERROR <> 0
begin
CLOSE CR_TABLES_THAT_NEED_UPDATESTATS
DEALLOCATE CR_TABLES_THAT_NEED_UPDATESTATS
end
PRINT @ERROR
SELECT 'An unexpected error occurred'
RETURN -1
END CATCH

RETURN 0

Post #798084
Posted Monday, October 5, 2009 11:41 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Friday, November 22, 2013 10:13 AM
Points: 79, Visits: 200
The other thing I wanted to point out was, it's a high volume transactional environment(Financial data). It's not a warehouse where you load data at fixed time intervals. We can have a user come in and say, I want to process my financials based on my investments, financial transactions entered, say for a month's worth of daily data. At the same time, we have another user who comes in and says, reprocess all my data for the past month(Different set of data, same tables). We would then go and wipe out his financial results for last month(delete) and go back and reprocess data based on some modifications to some transactions he entered effective for last month. We have store results on a daily basis, for months and yearss on end. Like this, we can have about 20 users on the system doing a myriad of processing, and it begins to give you an idea of the volume. Each period(a day) can have results worth hundreds / thousands of rows of data, etc. So, in other words, on demand, hundreds if not thousands of rows of data being deleted, inserted , updated every few minutes...
Post #798088
Posted Monday, October 5, 2009 11:52 AM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 8:36 AM
Points: 40,615, Visits: 37,080
Just bear in mind that sysindexes is deprecated, will be removed in a future version of SQL server and that the rowmodcol is not longer 'accurate'. It's a calculation based off the hidden column modification counters that are in one of the system tables.


Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #798097
Posted Monday, October 5, 2009 11:58 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 9:42 AM
Points: 12,957, Visits: 32,491

ouch that sysindexes is handy and i haven't found any view that still tracks that [rows] and [rowmodctr] yet.

GilaMonster (10/5/2009)
Just bear in mind that sysindexes is deprecated, will be removed in a future version of SQL server and that the rowmodcol is not longer 'accurate'. It's a calculation based off the hidden column modification counters that are in one of the system tables.


Lowell

--There is no spoon, and there's no default ORDER BY in sql server either.
Actually, Common Sense is so rare, it should be considered a Superpower. --my son
Post #798101
Posted Monday, October 5, 2009 12:28 PM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 8:36 AM
Points: 40,615, Visits: 37,080
Lowell (10/5/2009)

ouch that sysindexes is handy and i haven't found any view that still tracks that [rows] and [rowmodctr] yet.


Rows are in sys.partitions, though you should aggregate by object and index in case there's a partitioned table.

There's no view that exposes the column mod counters (modifications are tracked by column in SQL 2005, not just by table as they were in SQL 2000). I could tell you the system table, but since system tables can't be read without jumping through hoops, it won't help much. Besides, the system table is not documented and not very clear in what's what.



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #798109
Posted Monday, October 5, 2009 12:52 PM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Friday, November 22, 2013 10:13 AM
Points: 79, Visits: 200
Thanks - Gail and Lowell. Yes, I know sysindexes is being deprecated, but as Lowell pointed out, it is very handy and having worked with SQL Server since 6.5, I am much more handier with all the old system objects than what 2005 brought to the party. I am hoping that SQL 2008 doesn't have these issues and has a better algorithm for managing stats.

SQL 2005 in my view is the buggiest SQL Server release to date. There are other problems that I have talked to MS about (and they claim it's a feature !) and these problems (aka "features") have mysteriously dissappeared in SQL 2008.

The trouble though, is that the consistency is still not where we want it to be. It will be fine 9 times out of 10, when we update stats ourselves with a frequency of 5 minutes and the 10th round of processing will again slow down, because the processing started, say 2 seconds before the next scheduled run of my update stats script kicked off, and processing of financials in the intervening 4 1/2 minutes or so have rendered the stats out of date. The bigger picture here is that we are not able to scale with the hardware. We have clients bringing online HP G3's and G4's with 4 quad cores and 64 gigs of memory, with the expectation that they can push thru higher volume of processing and once they present the higher volume to SQL, stats are not kept up to date and slowdowns occur.

Very uncomfortable sessions result with Client DBA's and having to explain that their most recent purchase of a $20,000 server doesn't really help their users. It is a very sticky situation and an awkward conversation to have not to mention seeing the same in our production environments.
Post #798121
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse