Unique Index Causing Slow Read Performance

Question

Post reply

Unique Index Causing Slow Read Performance

ben.mcintyre

SSCommitted

Points: 1878
More actions
August 26, 2009 at 7:24 am

#225420

Hi All
This one's for the gurus out there.
I am a pretty darn experienced SQL developer. This problem cuts right to the root of my beefs with SQL, but also has the potential to open up major new tuning skills if it can be solved. I hope it is not as simple as the fact I am using SQL Server 2000 for this (as are many of our clients) and it's a bug.
OK, on with the scenario.
I have a fairly big query (short SQL but involving about 5 views or nested sub-views, probably about 15-20 tables in all), taking about 17 seconds to run on my unloaded dev PC.
Mucking around with it a bit, I found deleting a particular seemingly inconsequential table improved response to 4 seconds. Note that production machines at client sites are likely to take substantially longer due to load conditions, so this improvement could make the difference between 2 mins vs 25 sec for a user. Ah-ha, I thought, poor indexing on the table.
After considerably more experimentation, I found that no, it wasn't a lack of, or incorrect, indexing. Rather, adding this table into the query was causing a major change to the execution plan, which was resulting in the much slower performance. There was no clear way to jog the (quite complex) execution plan back in the speedier direction.
After leaving this alone for several days, I came back to it and noticed that there was a UNIQUE constraint on the table in question.
Let's get specific here:
CREATE TABLE [tblSummLocation] (
[SummaryXID] [numeric](19, 0) NOT NULL ,
[LocationID] [int] NOT NULL
) ON [PRIMARY]
GO
This is a 'glue' or 'many-to-many' table, storing relationships between two other tables.
Here is the indexing:
ALTER TABLE [tblSummLocation] WITH NOCHECK ADD
CONSTRAINT [PK_tblSummLocation] PRIMARY KEY CLUSTERED
([SummaryXID], [LocationID]) WITH FILLFACTOR = 90 ON [PRIMARY]
GO
ALTER TABLE [dbo].[tblSummLocation] ADD
CONSTRAINT [IX_tblSummLocation_SummaryXID] UNIQUE NONCLUSTERED ([SummaryXID]) WITH FILLFACTOR = 90 ON
[PRIMARY]
GO
CREATE INDEX [IX_tblSummLocation_LocationID] ON [tblSummLocation]([LocationID]) ON [PRIMARY]
GO
I noticed when I dropped the unique constraint [IX_tblSummLocation_SummaryXID] and replaced it with
CREATE INDEX [IX_tblSummLocation_SummaryXID1] ON [dbo].[tblSummLocation]([SummaryXID]) ON [PRIMARY]
GO
we got back to the 4 second response.
This table was one of a group of about 12 tables of very similar design, but is different from most of the others in that the SummaryXID is actually unique (in the other tables it's only the combination of the two columns tat is unique). So I realised at this point that I could simplify things and just make SummaryXID the PK:
ALTER TABLE tblSummLocation DROP CONSTRAINT PK_tblSummLocation
ALTER TABLE tblSummLocation DROP CONSTRAINT IX_tblSummLocation_SummaryXID
GO
ALTER TABLE [tblSummLocation] WITH NOCHECK ADD
CONSTRAINT [PK_tblSummLocation] PRIMARY KEY CLUSTERED
([SummaryXID]) WITH FILLFACTOR = 90 ON [PRIMARY]
GO
whoops ... back to 17 seconds.
So, to cut a long story short, whenever I have a unique index defined on SummaryXID on this table, whether it be an index, constraint or PK, I get the 17 seconds response.
With the composite PK and a nonunique index, it's 4 seconds.
So the questions that have been bugging me are:
(1) Why is this so much slower ? I can appreciate that insertions and updates might be substantially slower under a unique index, but I wouldn't imagine reads would be any different. Perhaps even faster.
(2) If as I suspect, the change is not actually to do with the indexing per se, but with the execution plan being mangled, how can I jog it back to the faster setting ?
I have tried table hints in the underlying query specifying WITH (INDEX (IX_tblSummLocation_SummaryXID1)) for the table, but no dice.
I also mucked around with JOIN hints, specifying the join algorithms, but that was a desperate and silly idea to begin with and didn't get anywhere.
I really need to speed this up, so my only option at present is to drop the UNIQUE constraint on the column. I simply don't want to do that, as it circumvents a business rule of the database.
I find it really frustrating when I know the query can run faster, and I even know exactly what is slowing it down and where, but I just don't seem to have the control over the execution plan to make it run the quick way.
Ben

Viewing 15 posts - 1 through 15 (of 29 total)

You must be logged in to reply to this topic. Login to reply

Dave Ballantyne SSC-Dedicated Points: 33566 More actions · Answer 1

Its all down to the query plans...

See this article on how to post them ,

http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Clear Sky SQL
My Blog[/url]

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 2

Need execution plans to say anything useful here.

What's the output of STATISTICS IO and STATISTICS TIME for the query with the unique index and the one without?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

ben.mcintyre SSCommitted Points: 1878 More actions · Answer 3

Thanks for link. I'll post the plans in another day or so.

ben.mcintyre SSCommitted Points: 1878 More actions · Answer 4

OK, here are the plans and a database diagram (attached).

I have simplified the query a bit, but unfortunately as I take bits out, it all runs fast and the problem disappears.

Hopefully you can work out the difference between the plans without having to know too much else. The index on tblSummLocation is definitely one of the triggers of the major change in plans (removing bits of the query also does this).

I simplifed a bit before and I'll give the details again below:

if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[tblSummLocation]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)

drop table [dbo].[tblSummLocation]

GO

CREATE TABLE [dbo].[tblSummLocation] (

[SummaryXID] [numeric](19, 0) NOT NULL ,

[LocationID] [int] NOT NULL ,

[BayID] [int] NOT NULL

)

GO

Have a look at the diagram to get a feel for what's happening.

Note that all the 'ID' columns in the DB are int, and all the 'XID' columns are [numeric](19, 0).

This is a stock control database and uses an abstract SummaryXID to represent a group of stock so we can apply characteristics to either a pallet, or down to individual cartons or pieces of fruit. This facilitates only breaking down records to the extent needed rather than to the smallest unit (clients pack 10 million cartons a year). Characteristics (such as product type, location, pack type, are all applied by linking through the 'Summ' tables (tblSummProduce, tblSummItems, tblSummLocation, ...).

We're only talking 50,000 rows here.

This indexing is applied in the 'Unique' case:

ALTER TABLE tblSummLocation DROP CONSTRAINT PK_tblSummLocation

GO

ALTER TABLE tblSummLocation DROP CONSTRAINT IX_tblSummLocation_SummaryXID

GO

DROP INDEX tblSummLocation.[IX_tblSummLocation_LocationID]

GO

DROP INDEX tblSummLocation.[IX_tblSummLocation_BayID]

GO

DROP INDEX tblSummLocation.[IX_tblSummLocation_SummaryXID1]

GO

ALTER TABLE [dbo].[tblSummLocation] WITH NOCHECK ADD

CONSTRAINT [PK_tblSummLocation] PRIMARY KEY CLUSTERED

([SummaryXID],[LocationID]) WITH FILLFACTOR = 90 ON [PRIMARY]

GO

ALTER TABLE [dbo].[tblSummLocation] ADD

CONSTRAINT [IX_tblSummLocation_SummaryXID] UNIQUE NONCLUSTERED ([SummaryXID]) WITH FILLFACTOR = 90 ON [PRIMARY]

GO

CREATE INDEX [IX_tblSummLocation_BayID] ON [dbo].[tblSummLocation]([BayID]) ON [PRIMARY]

GO

CREATE INDEX [IX_tblSummLocation_LocationID] ON [dbo].[tblSummLocation]([LocationID]) ON [PRIMARY]

GO

CREATE INDEX [IX_tblSummLocation_SummaryXID1] ON [dbo].[tblSummLocation]([SummaryXID]) ON [PRIMARY]

GO

And the same in the 'NonUnique' case but with the unique constraint commented out :

--ALTER TABLE [dbo].[tblSummLocation] ADD

--CONSTRAINT [IX_tblSummLocation_SummaryXID] UNIQUE NONCLUSTERED ([SummaryXID]) WITH FILLFACTOR = 90 ON --[PRIMARY]

GO

Nonunique: 5 sec. Unique: 17 sec.

The actual query is

SELECT vSes_PivotA.*, vrpIPD.*

FROM (vSes_PivotA_6385 vSes_PivotA

INNER JOIN vrp_ItemProduceData1 vrpIPD ON vSes_PivotA.SP_SummXID = vrpIPD.SummaryXID)

WHERE vrpIPD.RecType=0;

but as I have said, that's a long piece of string (about 6 views and 20 tables); I'm really only interested in why the unique index changes the query plan so profoundly.

Happy to scipt up any more objects if you need them.

cheers

Ben

ben.mcintyre SSCommitted Points: 1878 More actions · Answer 5

ben.mcintyre

SSCommitted

Points: 1878

August 26, 2009 at 8:38 pm

#1045299

STATISTICS IO and STATISTICS TIME

See Attachments

ben.mcintyre SSCommitted Points: 1878 More actions · Answer 6

Just wanted to say thanks in advance, I really appreciate any time anyone spends on this one.

I've combed the internet for years over problems like this and been completely unable to find any resource to help.

Perhaps this is the point of initiation into the undocumented 'black arts' side of SQL ?

3:52 am ? I'm in Australia. It's 12:30 in the arvo over here 🙂

B

ben.mcintyre SSCommitted Points: 1878 More actions · Answer 7

Hmmm ....

The silence in this thread is almost deafening.

Was it what I said about being Australian ? Those bloody 'Flight of the Conchords' guys have been giving us a bad name ...

I assume everyone's been going 'OMG ... LOL ... he wants us to look at THOSE query plans !?!? Maybe when I can use 6pt font on a 24" monitor !'

Yeah, they are big. Unfortunately, when I make them smaller the problem disappears.

Anyway, the PROBLEM IS SOLVED.

In final desperation I got around to firing up both these cases in SQL2005. Both come in at just over 1 sec.

Read it and weep, folks (well, I almost did, after probably 14 hours spent trying to optimise this single query, including forum posting time).

So the final analysis: SQL 2000 has problems drawing up good query plans for complex queries. I would rate this as a bug, and I'm pretty darn certain that there is no workaround for it (in terms of hints or tuning). I tried everything.

SQL Server 2005 fixes this problem.

I'm currently in the process of contacting all our clients and recommending the upgrade to SQL2005 (even the express version would outperform the paid SQL2000 on this database by the looks).

I've always been a bit slow on the upgrade, with a 'if it ain't broke don't fix it' attitude. Particularly that the SQL2005 Management Studio is woeful compared to the old tools (Enterprise Manager). It alone cuts my productivity by a good percentage.

But anyway, it IS broke, so here I go.

cheers

Ben McIntyre

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 8

ben.mcintyre (9/3/2009)
Hmmm ....
The silence in this thread is almost deafening.

I was going to dig into this over the weekend. Been rather busy prepping and delivering a presentation and then catching up with real work

So the final analysis: SQL 2000 has problems drawing up good query plans for complex queries. I would rate this as a bug, and I'm pretty darn certain that there is no workaround for it (in terms of hints or tuning). I tried everything.
SQL Server 2005 fixes this problem.

2005 is better than 2000, it's still not perfect. The optimiser in 2000 could make some rally dumb decisions, but then so can the 2005 one. When that happens, often the best approach is 'divide and conquer' Split the query up, use temp tables, that kind of thing. Sometimes helps, sometimes doesn't.

Do you still want this investigating?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 9

1) it doesn't have to be a bug if the optimizer for 2000 derives a suboptimal query plan for a complex query involving 15 tables in a mishmash of views.

2) Gail is correct in that it can often truly benefit the optimizer to subdivide large queries such as this into smaller interim steps using temp tables (NOT table variables) because each step along the way can then have it's own set of statistics that prevents small skews from becoming large query performance problems.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 10

TheSQLGuru (9/4/2009)
1) it doesn't have to be a bug if the optimizer for 2000 derives a suboptimal query plan for a complex query involving 15 tables in a mishmash of views.

Agreed. The optimiser's job is not to find the best plan. Never has been. It's job is to find a good plan fast and it's only allowed a certain amount of time to find said plan. If it doesn't identify a 'good enough' plan by the time the time limit is up, it'll return with the best plan found up til that point. Which can be very bad indeed.

2) Gail is correct in that it can often truly benefit the optimizer to subdivide large queries such as this into smaller interim steps using temp tables (NOT table variables) because each step along the way can then have it's own set of statistics that prevents small skews from becoming large query performance problems.

Also because the optimiser gets to work on several smaller queries rather than one massive great big one. The search-space for the plans is smaller (much smaller) and the chance that it'll find a good enough plan is a lot better.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

ben.mcintyre SSCommitted Points: 1878 More actions · Answer 11

Thanks, again, for your replies.

Yes, I was worried that I might be swapping one set of fast and slow plans for a new, different, set of fast and slow plans on the the basis of one which just happenend to be fast in SQL2005.

Tha main reason I converted to SQL2005 was to try to give you the more 'user friendly' set of plans, but clearly I can't do that.

Gail, I am still interested in what is going on here, specifically that the single change in an index from unique to nonunique can flip the query plan on its head.

I'm also interested in how you analyse the plan for this. I suspect there is not any way to 'nudge' the query plan back, but if you worked one out I would be most impressed. This is probably about as complex as it gets, so if it can be fixed, I'd guess just about anything could be. That would be a trick worth knowing.

But obviously, not urgent.

If you feel it's a better use of time, I could give you my skype ID and do a verbal rundown rather than try and type it all.

Re the recent comments (TheSQLGuru), I understand that this is not strictly a 'bug'.

The point is that the optimiser CAN derive a good plan, and even does derive a good plan for the tables and the views (ie. indexing is adequate, it's not a 'bad design' issue), but change this one index and it all goes down the gurgler for no apparent reason.

We can't expect that the optimiser will _always_ find a good plan, but when it doesn't it would be great to be able to feed it some hints to nudge it in the right direction.

But it seems that the hints are exactly that - only hints which may be ignored - and what I'm trying to find out is whether there is some way to do this more reliably rather than hint and pray, ie: am I missing something ?

I have tried the hints which would be logical here with no success.

Is there any way of giving the optimiser a bit more time to work out its plan ?

I realise the temp tables are a viable workaround (and one I have used in places), but this is for reporting and it means rather than simply constructing a query for a report I have to deal with instantiating and dropping possibly several temp tables with dynamic names (since I have multiple sessions and each name needs to have the sessionid added to the end).

It raises the complexity of things greatly and given I have about 200 reports in the database, you can understand what this means in terms of time.

Now, if only I could get that pesky query plan to work ...

Ben

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 12

>>but change this one index and it all goes down the gurgler for no apparent reason.

Maybe it simply isn't apparent to you. You have changed the information and options available to the optimizer in a very complex construct. It does what it can with that information.

>>We can't expect that the optimiser will _always_ find a good plan, but when it doesn't it would be great to be able to feed it some hints to nudge it in the right direction.

But it seems that the hints are exactly that - only hints which may be ignored - and what I'm trying to find out is whether there is some way to do this more reliably rather than hint and pray, ie: am I missing something ?

I have tried the hints which would be logical here with no success.

hints are not a panacea, and again just because they may seem logical to you does not mean they will be so 'logical' to the optimizer engine. 🙂 BTW, have you tried unwinding all of the views in your query and maing them into a single statement and trying to work from that from a 'hint' perspective?

>>Is there any way of giving the optimiser a bit more time to work out its plan ?

Not to my knowledge

>>I realise the temp tables are a viable workaround (and one I have used in places), but this is for reporting and it means rather than simply constructing a query for a report I have to deal with instantiating and dropping possibly several temp tables with dynamic names (since I have multiple sessions and each name needs to have the sessionid added to the end). It raises the complexity of things greatly and given I have about 200 reports in the database, you can understand what this means in terms of time.

Actually I don't understand this. Why do you need dynamic temp table names with the sessionid added to the end of the names?? You can execute the same code in 50 different SSMS windows or 50 different calls of the same sproc at the same time without any temp table name collisions.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 13

ben.mcintyre (9/4/2009)
Gail, I am still interested in what is going on here, specifically that the single change in an index from unique to nonunique can flip the query plan on its head.
I'm also interested in how you analyse the plan for this. I suspect there is not any way to 'nudge' the query plan back, but if you worked one out I would be most impressed.

Same way you'd analyse any other plan, and same fix as for any other case of poor plans - rewrite the query, maybe add hints

If you feel it's a better use of time, I could give you my skype ID and do a verbal rundown rather than try and type it all.

I don't use skype, don't have the bandwidth to support it.

But it seems that the hints are exactly that - only hints which may be ignored - and what I'm trying to find out is whether there is some way to do this more reliably rather than hint and pray, ie: am I missing something ?

In my experience, hints are directives that the optimiser has to obey. That said, with tonnes of view and extra tables, the hints will probably have to be applied in the views, not in the outer query, plus it will be very hard to work out an optimal set of hints, especially when working with just the ones available in SQL 2000.

I have to deal with instantiating and dropping possibly several temp tables with dynamic names (since I have multiple sessions and each name needs to have the sessionid added to the end).

Why? SQL gives temp tables dynamic names itself. There's no need for you to do its work for it.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

ben.mcintyre SSCommitted Points: 1878 More actions · Answer 14

> Maybe it simply isn't apparent to you. You have changed the information and options available to the optimizer in a very

> complex construct. It does what it can with that information.

True, but I have changed only one thing: a unique index to a nonunique index. Now call me old fashioned, but I believe that there should be either no difference in how these are implemented for a read-only scenario, or the unique index should be more efficient, hence this shouldn't cause a significant change in the plan.

This is why I have the audacity to call it a bug.

The way I see it, the plan is big, but that's not necessarily the same as complex. Big things that can be broken down in to small simple chunks are not necessarily that complex, and the indexing and constraints are the fuel to drive the problem solving on a micro level. This should scale.

I'm kinda wanting a little more consistency and control than I'm getting here.

But I do acknowledge that I'm very much out on my own here, and I concede that you guys, being some of the most experienced around here, are clearly right. Thank you.

> In my experience, hints are directives that the optimiser has to obey. That said, with tonnes of view and extra tables, the

> hints will probably have to be applied in the views, not in the outer query, plus it will be very hard to work out an optimal

> set of hints, especially when working with just the ones available in SQL 2000.

Totally agree - low likelihood a hint which is good for one query will be good for the next. I tried applying a hint to an inner view just as an experiment and it didn't work. I felt relieved once I'd got it out of there.

> have you tried unwinding all of the views in your query and maing them into a single statement a

This is a solution, but we are wandering OT a bit - lets get back to the point of the OP.

I badly need my views here. Due to the size of the data chunks I'm dealing with I have to wrap them in views to stay sane. eg. Produce details are needed frequently, so there's a view wrapping the produce table and the associated brands, grades, classes, product type, varieties, pack types, pack weights, colours, etc (about 10 tables). If I had to deal with these individually in every query, my head would explode, and I would have to upgrade to that 26" monitor.

I'm one programmmer and this is a large database (and we haven't even touched on the actual app ...).

Similarly, with temporary tables for partial results, I generally need to make them part of a view 'upstream' to avoid horribly complicated queries.

Can't do this with temp tables (they can't be incorporated into views).

Now, I HAVE re-worked some critical reports with partial result tables etc, and it took nearly a day for about six reports, and it turned the code into a maintenance nightmare.

Then I noticed that I could dramatically alter query speeds with a few little modifications, and as I mentioned in the OP, this was found to be due to the query plan not the indexing.

But results were highly inconsistent (adding and removing seemingly inconsequential tables has an almost random effect), and I can never get back to my original query without the speed hit.

So what I'm asking in this thread is can I nudge this query plan specifically WITHOUT significant changes to the structure of any of my current SQL, views, or application.

Seems the answer is 'not reliably'.

Anyway, I'm very happy with SQL 2005's performance in this regard (so far ...), and believe this thread is pretty much wrapped up.

If you still want to compare the query plans and see if you can work out what has caused the shift, I'd be intrigued.

I would love it if I could have more control, but I suspect that it would require some sort of grammar/language for describing join behaviour far more sophisticated that what is there at present.

It is interesting though that as we inevitably move towards RAM-based database servers and work on the 'object/relational impedance mismatch', I think a well written API-interactive query parser/optimiser is going to be a major part of the solution in the long term.

Regarding the original reasons for the session-related views/tables, it took me a while to remember the reasoning - this project was coded mainly in 2001-2003.

I believe that I originally used the sessionIDs because I had SPs to do the very complex charges calculations, and these used dynamic SQL. Global cursors are needed to be visible to the dynamic SQL, which means one SP per session. I also wanted to use the temp views to feed selected data into the charge calculation SPs, and because temp tables/views aren't visible from dynamic SQL, again these need to be declared as non-temp and hence need the SessionID.

(Hey, please, I know this site is prettly much dedicated to the hunting down and elimination of dynamic SQL. I know what rbar is and who Jeff Moden is. I respect and agree with all of that. The dynamic SQL is actually quite necessary 🙂 )

Add this to the fact you can't add temp tables to views and that pretty much clinched it.

Thanks for the reminder though - true temp tables could be a useful tool to throw in the mix.