RE: Need some help with some "XML Hell" please.

SSC Guru

Points: 1003875

May 28, 2017 at 4:24 pm

#1944450

There's only one term that comes to mind after all the incredible tutelage I've received on this thread and what it has resulted in and that would be UNBELIEVABLE!

When I first ran across Fabiano's post, I gave it a try. It took more than 10 minutes just to return the name of the stats that came into play and I still couldn't use that information because the names of stats are only unique within a database, not across all databases as the code necessarily does. To top it off, that was on only about 17,000 stored plans at the time.

To summarize what you good folks have helped me figure out...

1. The pre-processing into a temporary semi-permanent working table took only 00:08:29 and that was on a whopping 84,817 execution plans. That's 5 times the number of execution plans that I had started on and it actually returns useful information now.

2. Then, using everything else you good folks taught me, the following code produces a list of distinct statistics names along with the database and table each stat is associated with in just 00:01:05 and that's without doing any XML indexing.

3. The total time for the whole shebang is 00:09:24 and produces nearly no load on the server at all because the target tables are in a scratch database that uses the Simple Recovery Model, which allows SELECT INTO to be minimally logged and about twice as fast as if it wasn't.

4. Because of the very light load and the comparatively low run time involved, I can run the code once an hour without being obtrusive. Over time (a month or so so that I can pick up even on month end stats usage, this will allow me to create a list of stats that are actually used. So, what am I going to do with that? My big problem has been rebuilding stats on a regular basis. We have some insane tables that are more than 100 columns wide (not my design, I assure you) and a couple of hundred million rows in length and some ridiculous stats rebuild code that the well-meaning folks before me used. The thing that makes it ridiculous is that it examines all available stats and compares them to all available columns. If a column has no column stats on them, it automatically creates them. Even doing a "clean" stats rebuild of only the stats that need it takes more than 10 hours just for 1 of the databases on our primary production server. This code will allow me to find all unused stats (which can also naturally happen over time) and safely drop them.

We have more than 17,000 column based stats just in one of our main databases. The following code identified that there were less than 5,000 that were used in the last 48 hours for the entire server! Yes, I do realize that is partially due to execution plans dropping out of proc cache but, at < 10 minutes a pop, I can build a true history over a month and do the stats drops surgically instead of dropping them willy-nilly and letting the server rebuild them when a query needs them. For some of the larger tables, that was taking more than 10 minutes on some of the larger tables. Even if I don't drop the stats, I can still do a surgical rebuild until I get enough history to drop truly unused stats.

Here's the code that pulls the necessary stats info for all of this from the working table in my previous post. It looks damned simple now but 3 days ago, I was at a total loss for how to pull this off. Thank you, thank you, one and all. I might even be able to get a good night's sleep tonight knowing that the hard part of this fire-drill is now behind me.

--===== If the working table already exists, drop it. IF OBJECT_ID('Scratch_Prod.scratch.StatsDbTableInfo','U') IS NOT NULL DROP TABLE Scratch_Prod.scratch.StatsDbTableInfo ; --===== Find all used stats by StatName and which database and table they're associated with and store -- the unique occurances in a working table for comparisons over time. WITH XMLNAMESPACES ('http://schemas.microsoft.com/sqlserver/2004/07/showplan' AS p) SELECT DISTINCT DbName = db.DbInfo.value ('(p:Field[@FieldName="wszDb"]) [1] /@FieldValue','SYSNAME') ,SchemaName = db.DbInfo.value ('(p:Field[@FieldName="wszSchema"]) [1] /@FieldValue','SYSNAME') ,TableName = db.DbInfo.value ('(p:Field[@FieldName="wszTable"]) [1] /@FieldValue','SYSNAME') ,StatName = si.StatsInfo.value ('(p:Field[@FieldName="wszStatName"]) [1] /@FieldValue','SYSNAME') ,ColumnName = si.StatsInfo.value ('(p:Field[@FieldName="wszColName"]) [1] /@FieldValue','SYSNAME') INTO Scratch_Prod.scratch.StatsDbTableInfo FROM Scratch_Prod.scratch.StatsPreProcess tbl OUTER APPLY tbl.XMLFragment.nodes ('p:Recompile') db (DbInfo) OUTER APPLY DbInfo.nodes ('p:ModTrackingInfo') si (StatsInfo) WHERE si.StatsInfo.value ('(p:Field[@FieldName="wszStatName"]) [1] /@FieldValue','SYSNAME') > N'' ;

And, yeah... you can bet there is going to be a small series of articles on this. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)