September 20, 2011 at 11:19 am
GilaMonster (9/20/2011)
Ninja's_RGR'us (9/20/2011)
Not 100% sure if the plan in cache has that info, I would guess it does (I don't usually use that to pull out plans).No, it doesn't.
If the cached plan was reused 5 times, which one's run-time info should it have? Add to that, enabling actual plans adds overhead to query execution. SQL's not going to do that unless asked (profiler trace or a set statement)
No I was thinking about the compile time parameters. You could at least compare that.
September 20, 2011 at 11:32 am
I'd be looking at a few things:
Widen the index [OLAP].[dbo].[W_SRVREQ_F].[W_SRVREQ_F_F10], add ACCNT_WID, SR_WID and DURATION_MINS as include columns.
Change the where clause:
case when T68314.[STATUS] = N'Closed' then N'RESOLVED' when T68314.[STATUS] = N'Cancelled' then N'CANCELLED' else N'OPEN' end = N'RESOLVED'
becomes
T68314.[STATUS] = N'Closed'
and change the index W_SRVREQ_D_D3, move Status into the key. (it's currently an include column)
Widen the index [OLAP].[dbo].[W_PARTY_ORG_D].[W_PARTY_ORG_D_D5], include INTEGRATION_ID
Add an index on [OLAP].[dbo].[WC_EPCR_GA_OWNERSHIP_H] on GA_ACCNT_ID
and T70332.[PER_NAME_MONTH[ in (N'2004 / 12', N'2005 / 05', N'2005 / 06', N'2006 / 03', N'2006 / 05', N'2006 / 06', N'2006 / 12', N'2007 / 03', N'2007 / 04', N'2007 / 05', N'2007 / 06', N'2007 / 08', N'2007 / 09', N'2007 / 10', N'2007 / 11', N'2007 / 12', N'2008 / 01', N'2008 / 02', N'2008 / 03', N'2008 / 04', N'2008 / 05', N'2008 / 06', N'2008 / 09', N'2009 / 02', N'2009 / 03', N'2009 / 04', N'2009 / 05', N'2009 / 06', N'2009 / 07', N'2009 / 08', N'2009 / 09', N'2009 / 10', N'2009 / 11')
and T70332.[PER_NAME_YEAR] in (N'2004', N'2005', N'2006', N'2007', N'2008', N'2009') )
Are both of these predicates required? The second seems to be redundant in the light of the first (if my assumptions of the data is correct). If possible, remove the second. SQL will be assuming these are independent predicates and will very likely lead to a lower estimated row count
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 20, 2011 at 11:33 am
Ninja's_RGR'us (9/20/2011)
GilaMonster (9/20/2011)
Ninja's_RGR'us (9/20/2011)
Not 100% sure if the plan in cache has that info, I would guess it does (I don't usually use that to pull out plans).No, it doesn't.
If the cached plan was reused 5 times, which one's run-time info should it have? Add to that, enabling actual plans adds overhead to query execution. SQL's not going to do that unless asked (profiler trace or a set statement)
No I was thinking about the compile time parameters. You could at least compare that.
The parameter values will be there, but the only row counts will be the estimated row counts that we currently have.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 20, 2011 at 11:56 am
Hi Guys,
Thanks for all of your insights so far. As a side note, the query in both environments returns zero records. The way I know is that if I force the query in PROD to use the execution plan from UAT then I get results in a split second.
Server settings are identical, but there are differences in the architecture. UAT has 16 CPUs and 54 GB of RAM, while PROD has 24 CPUs and 96GB of RAM. MAXDOP is set to 8 for both. PROD is clustered, UAT isn't.
Anyone have a good script where you can pass in as arguments a list of tables and get as an output all statistics on the tables and their indexes?
Gail, I'll try out your suggestions and let you know if it works. You are right about the redundant predicates. Also, if I remove the second one, the query completes in no time and gives a good execution plan. Unfortunately the SQL is generated by the application. (Oracle Business intelligence in case anyone is curious).
I can take a look at rejigging the metadata for the next release to avoid the redundant predicates, but I'm looking for a short term solution.
I'll post any breakthroughs I get. Thanks again.
September 20, 2011 at 12:07 pm
M_E_K (9/20/2011)
You are right about the redundant predicates. Also, if I remove the second one, the query completes in no time and gives a good execution plan.
I suspected something like that....
The optimiser's good, but when there's correlation between two columns that are both filtered on, the plan can go bad fast.
Essentially the optimiser assumes that all columns are independent of each other. To take a car example, let's say that we have a table with cars and we have a make column and a model and a query like this
WHERE Make = 'Mazda' and Model = 'MX6'
Let's further day that the % of rows that the optimiser estimates will qualify for the first is 20% (selectivity of 0.2) and the % of rows the optimiser estimates will qualify for the second is 5% (0.05). It will estimate that the % of rows that qualify for both is 1% (0.01), even though all MX6s are Mazdas (and so actually 5% of the table is returned)
I'll stick that onto my to-blog list (for sometime)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 20, 2011 at 8:12 pm
Ok, thanks everyone for the excellent advice and suggestions.
I have finally found the variable between the 2 environments that triggers this problem. However, it's counter-intuitive to me, so if anyone can shed some light on the following, I would be much obliged...
I can trigger this problem at will by running an update statistics with full scan on W_PARTY_ORG_D:
UPDATE STATISTICS W_PARTY_ORG_D WITH FULLSCAN
after the fullscan update is done, I will get the *bad* query plan consistently on both environments.
Then, if I run the default update stats (which I guess uses a sample)...
UPDATE STATISTICS W_PARTY_ORG_D
then try the query again, I get the split second performance with the good query plan.
None of the other tables in the query seem to be affected by whether stats is run fullscan or the default sample, it is really just this table that triggers the behavior.
To me this doesn't make sense... I would expect the optimizer to choose a better plan after stats with fullscan is taken, but it chooses the better plan only after the default sample.
Has anyone seen this behavior before?
I know if you asked 10 SQL DBAs you could get 10 different opinions and reasons why, but in general for an OLAP DB with large tables, is it better to do update stats with fullscan or with sample?
Thanks again everyone!
P.S.... I don't see a way to award points in this forum, so if there is and I'm missing it, just let me know how to do this 🙂
September 20, 2011 at 8:34 pm
Nah, no points, don't sweat it.
Basically, you're hitting a situation that's not far removed from parameter sniffing. Your query is better off with an average, sampled, set of values, rather than a more specific, scanned, set. It's not unheard of, but this seems a little extreme. I still think with some restructuring on your indexes this will resolve itself in a better fashion.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
September 20, 2011 at 8:44 pm
My 2 bits - I'd add the indexes that Grant, Remi, and Gail have been suggesting.
I am curious why the indexes can't be added? Policy? Red-tape?
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
September 21, 2011 at 5:36 am
SQLRNNR (9/20/2011)
My 2 bits - I'd add the indexes that Grant, Remi, and Gail have been suggesting.I am curious why the indexes can't be added? Policy? Red-tape?
Thanks for the unwarranted credit ;-).
And yes I'd keep chugging on those indexes. Especially since you seem to be right on a tiping point.
September 21, 2011 at 5:41 am
Grant:
Thanks for your feedback. I still have a feeling we are hitting some bug, and I'm tempted to raise an SR with microsoft. Even with the poor execution plan, I don't think it should result in 24 CPUs spiking and the query never finishing... Anyway, your insights have been much appreciated.
SQLRNNR:
A strict change control policy 🙂 (which is a good thing really)
We'll definitely evaluate index changes suggested, but I think you have to be careful. In our case there is this one poorly performing query, but overall the system is very performance. I don't want to introduce something that will fix this query, but cause 10 others to start running slowly.
September 21, 2011 at 5:46 am
I don't think this is bug.
You say 1 query hits the cpus at 30%. It takes 3-4 of them to kill the server.
I've seen powerfull server (way overkill for the needs) be murdered for hours by a single query that was working fine just 2 days ago (mini dashboard in ssrs).
I agree it's annoying, but it's also your job to help the optimizer do a better job. You might call PSS if you're really sure you have a new edge case but I think it's a wasted 300$ at this point.
September 21, 2011 at 6:43 am
M_E_K (9/21/2011)
Grant:Thanks for your feedback. I still have a feeling we are hitting some bug, and I'm tempted to raise an SR with microsoft. Even with the poor execution plan, I don't think it should result in 24 CPUs spiking and the query never finishing
You're not hitting a bug. You're hitting a case where poor estimations (caused by the optimiser assuming something that is not true) is causing a massively sub-optimal plan.
The query will finish. Eventually. I've seen a similar case where the query went from 45 minutes to not finishing in 8 hours. Purely a plan change caused by an incorrect estimation of the rows involved in the query.
Going forward to fix this properly you need to tune your indexes (they are not as optimal as they could be and the changes I suggested are highly unlikely to have a noticable impact on the other queries) and remove the redundant predicates (which are at the root of the problem for the reasons I explained)
The less accurate statistics is helping in this case by resulting in a higher-than-accurate estimation of row counts, which is countering the optimiser's assumption that the predicates are independent. Next time the stats update (automatically or scheduled) that may change and you'll have the query back to running for a few days.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 21, 2011 at 1:50 pm
This thread was an awesome read!
@M_E_K, if this query is consistently pegging your DB instance where it affects other activity consider a defensive move...add a MAXDOP hint to the problem query to prevent it from saturating all available CPUs.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
September 21, 2011 at 5:04 pm
opc.three (9/21/2011)
This thread was an awesome read!@M_E_K, if this query is consistently pegging your DB instance where it affects other activity consider a defensive move...add a MAXDOP hint to the problem query to prevent it from saturating all available CPUs.
Would be second to last resort.
Only after the indexes have been tuned.
September 21, 2011 at 5:08 pm
Ninja's_RGR'us (9/21/2011)
opc.three (9/21/2011)
This thread was an awesome read!@M_E_K, if this query is consistently pegging your DB instance where it affects other activity consider a defensive move...add a MAXDOP hint to the problem query to prevent it from saturating all available CPUs.
Would be second to last resort.
Only after the indexes have been tuned.
I was thinking more along the lines of until the indexes can be tuned...
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
Viewing 15 posts - 16 through 30 (of 35 total)
You must be logged in to reply to this topic. Login to reply