Huge speed difference with clause "If exists"

Question

Huge speed difference with clause "If exists"

pdanes

SSCrazy Eights

Points: 8504
More actions
September 11, 2019 at 8:53 am

#3679036
I have a fairly simple stored procedure that checks to see if a value is present in a view. It's horribly slow when it's executed one way, and reasonably quick another way. i don't understand why there is such a difference.
Here they are:
Slow:
```
if exists (select 1 From BotCBO.vwClovekStrings_PrZk Where ClovekString = '[gmelin]')
select 1 As Existuje
else
select 0 As Existuje
```
Fast:
```
Fast:
select 1 From BotCBO.vwClovekStrings_PrZk Where ClovekString = '[gmelin]'
```
In production, the literal text string is a variable, supplied to the stored procedure, but the speed difference is that same, variable or literal. The core is the select from the view. The view itself is a fairly inefficient grouping view, shown below, which I would love to make persisted for speed, but Sql Server doesn't allow such a view to be persisted. It isn't executed very often, so it doesn't matter all that much, and anyway, my problem is not the view, but the enormous difference in how it's called. When called alone, it's fairly quick (cca. one second), despite the inefficiency and dataset size, but when bundled inside the Exists clause, it can take up to a minute for the same results. What's going on?
```
ALTER VIEW [BotCBO].[vwClovekStrings_PrZk]
with schemabinding
AS
SELECT  VLTO.SkupinaID,
cast(Replace(Replace(Replace(STUFF
          ((SELECT   ',' + rtrim(ltrim(isnull(BPAuthorAbbreviation,'')))
              FROM dbo.SouhrnyCloveks VLTI
left JOIN (select ClovekAutoID, BPAuthorAbbreviation from dbo.TableOfCloveks) tCN
ON VLTI.ClovekAutoID = tCN.ClovekAutoID
              WHERE VLTI.SkupinaID = VLTO.SkupinaID
              order by VLTI.SkupinaID, VLTI.Poradi
              FOR XML PATH('')), 1, 1, ''), '&amp;', '&'), ',&,', ' & '), ',,,', ', ') as varchar(200)) ClovekString
FROM         dbo.SouhrnyCloveks VLTO
GROUP BY VLTO.SkupinaID
```

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply

Grant Fritchey SSC Guru Points: 399381 More actions · Answer 1

Get and compare the execution plans for the two queries. That's your best bet when the question "What's happening with this query" comes up.

Also, why XML for an existence check? You're chewing up a lot of memory to find out if there's a row from a query.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

pdanes SSCrazy Eights Points: 8504 More actions · Answer 2

I looked at the execution plans, and of course, they are different, but I'm not that good at reading them. Or more properly, I can read them okay, but understanding all the details of what they mean for performance sometimes still leaves me with a blank look on my face, despite the numerous tutorials and videos (many of yours among them, thank you, BTW) that I have read, studied, watched, experimented with, etc.

SSMS suggested I put an index on the ID field, so I tried that, and it did help. But I don't like just slapping an index on without understanding what I'm doing and why I'm doing it. It's too easy to just plaster them all over the place, and then I'll get stuck later on wondering why I have all those indexes and whether they are important (and probably, why my insert performance is in the toilet).

The XML format is used to assemble names. Each principal item in the database has an associated string of zero or more names, from a table of names, in a particular order. The order is not important for this particular query, that is giving me the speed problems, but it is important in normal operation. (It's a list of authors for a botanical database - each taxon has a set of authors, and I need to retrieve that set, in the order given in the taxon's published information.) I'm using this view because I already had it done, and it was fast enough for my needs, until I stuck it inside the Exists test.

I can write a faster procedure, using just the IDs, and I may have to if I can't get this fixed, but I'd still like to understand why such a seemingly simple thing as wrapping the select from a view into an Exists clause causes such massive performance degradation, when the view itself is fairly quick.

I can send along the execution plans, if you're willing to look at them and maybe point out what obvious thing I'm overlooking.

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 3

The real underlying problem here is that you're trying to filter on an aggregated (XML concatenation IS a form of aggregation) column, which is never a good idea and is one of the problems I have with people using views. To be clear, the ENTIRE view must be materialized for your filter query to work. There has to be a quicker way to isolate what you're looking for.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

pdanes SSCrazy Eights Points: 8504 More actions · Answer 4

Certainly, this is not a model of efficiency - I wrote that myself in the initial post.

However, the entire view (either a single "1" or an empty recordset) is materialized and completely done executing within a second or so when I execute the select alone. Given that, why does the enclosing If Exists cause such an enormous delay?

All I want from the If Exists is to tell me whether the embedded select does or does not return a record. I can tell that myself, at a glance, when the embedded select finishes - again, within a second or so. The select that does all the (admittedly unnecessary) work is quick. If Exists should be pretty much instantaneous after the embedded select finishes.

If I can tell manually in a fraction of a second whether the result set does or does not contain the result of a single record, it is borderline insanity for the Sql Server engine to require around a minute to perform that same task.

Grant Fritchey SSC Guru Points: 399381 More actions · Answer 5

If you can post the plans, do that (SentryOne Plan Explorer can make them anonymous).

It's likely that Jeff has nailed it (Jeff is right a lot). Most of the time, a query is just a query. Even if it has a common table expression (not the recursive ones, different story there), a correlated sub-query, or most anything else, the optimizer resolves it all as a query. So, your straight XML query resolves fast. But, now we're adding an additional filter to the outside of that query and that query returns as XML. So, it has to do all that work in two steps (the plan likely shows this), with late filtering after it resolves all the data.

The horror of T-SQL is that it really doesn't lend itself well to code reuse. The view you have does what you want, or close enough, so why not use it with some additional filters? That's how any programming language would work. However, T-SQL isn't programming and what's going on is the optimizer is failing to see the ease of your logic and doing all sorts of crazy stuff to resolve it.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

pdanes SSCrazy Eights Points: 8504 More actions · Answer 6

Thanks, that explains it. I thought, as you write, that the engine would evaluate the inner select individually, then apply the outer If Exists. But if it's generating an entire new plan on the query as a whole, then all bets are off. Come to think of it, both you and Steve told me in person at a SQL in the City event several years ago that code reuse was often problematical. I've gotten it to work sometimes, but sometimes I run into stuff like this, that on the surface, seems to make absolutely no sense.

Three plans enclosed - one with the index that SSMS suggested. It helps, but it's still noticeably slower than the bare select from the view. I guess I'll have to get to work and write the entire query properly from scratch.

Attachments:

You must be logged in to view attached files.

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 7

Looking at the "Plain" version of the execution plans, it would appear that the IF EXISTS is trivializing the the estimated row counts to <2 in most cases. That is preventing the parallelism that shows up when you run the inner query separately. There's also a substantial difference in the memory grants for the overall queries.

I was going to suggest that it could be the fault of the new "Cardinality Estimator" but I see you're using 2008, long before the new CE came to bear.

The only reason why the index helped a little was because it had more narrow information to look at and so the data read was less. It actually made things a bit worse in that it did tens of thousands of individual seeks on the one index instead of just doing the much needed scan.

The trivialization of the row counts by IF EXISTS might also be the reason why the query optimizer elect to do 44 thousand RIDs (which had been estimated and only 18 executions) and those are as bad as doing tens of thousands of individual index seeks.

Speaking of wicked smart people, I can't figure out why the IF EXISTS trivialized the expected row counts from the execution plans. I'm not sure anyone could but Grant is much smarter than I am in the area of execution plans. If he can't figure it out, then it's going to take a Ninja from MS (or a super-internals wire-head like Paul White) to suss it.

The real key here is, I wouldn't waste my time. I'd find a different way that doesn't require a view of more than 44 thousand rows to materialize even if it did run "quickly" (and, to be sure, 1+ seconds for such a thing is not "quickly" nor resource friendly IMHO).

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Grant Fritchey SSC Guru Points: 399381 More actions · Answer 8

I think you're giving me WAY too much credit on this one Jeff. Let me look the plans over...

I'm not sure.

I assume that the EXISTS operator must be pushing down a row estimate because all it takes is a single match and EXISTS stops executing. So instead of hash joins, it's doing loops so that it can get one row quick... and then ends up having to parse all 34,096.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

pdanes SSCrazy Eights Points: 8504 More actions · Answer 9

1+ seconds for such a thing is not "quickly" nor resource friendly IMHO

True, one second is not great, but for my purposes, in this application, it was good enough - it doesn't get used all that much, and there are other things that need my attention more.

pdanes SSCrazy Eights Points: 8504 More actions · Answer 10

Thank you both for looking at this. Another spanking for me by SQL Server - what else is new? But it's not going to be that much work to make a new procedure that does a clean lookup. I was just trying to save myself a bit of effort.

That didn't work out, but I learned something, so it's by no means a total loss.

Cheers.

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 11

Grant Fritchey wrote:

I assume that the EXISTS operator must be pushing down a row estimate because all it takes is a single match and EXISTS stops executing. So instead of hash joins, it's doing loops so that it can get one row quick... and then ends up having to parse all 34,096.

That totally agrees with my thoughts on it. I just don't know how to prove it in the execution plan and, with that, I'll also admit I've not looked the XML behind the scenes to find out. To be honest, I'm not sure it's worth the effort because, even if one could find it, I don't believe that someone could find a way around it even with the "result set expectation setting" trick of using TOP 2 Billion in the inner query. Even if that worked, you would still have an extremely resource intensive bit of code to return, what... one row?

PDanes has stated that the code is used very infrequently. You know how that goes against my nature but I also agree that sometimes the are bigger, more important fish to fry. My personal problem would be that it would always be on the back of my mind once I've found such a problem. Someday, I'll get over that... maybe... 😀

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 12

pdanes wrote:

Thank you both for looking at this. Another spanking for me by SQL Server - what else is new? But it's not going to be that much work to make a new procedure that does a clean lookup. I was just trying to save myself a bit of effort.
That didn't work out, but I learned something, so it's by no means a total loss.
Cheers.

Heh... SQL Server is why I keep a book in my britches. 😀 And absolutely correct. I'm pretty sure that Grant will agree that we both learned something new (we like to avoid SQL spankings, as well). I've never seen this particular issue before.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Grant Fritchey SSC Guru Points: 399381 More actions · Answer 13

Yeah, first time I've seen this kind of filtering done against an XML result set. It actually worked WAY better than I would have imagined.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

pdanes SSCrazy Eights Points: 8504 More actions · Answer 14

I'm still unclear about something, though. Grant, you wrote "...all it takes is a single match and EXISTS stops executing...", which is exactly what I thought EXISTS does, which is why I used it here. But in the next sentence, you write "...and then ends up having to parse all 34,096..."

Why does it "end up having to"? Why doesn't it quit as soon as one record is found? Thinking more about it now, it seems that wrapping the EXISTS function around the select should make it run faster, not slower.

The bare select necessarily materializes the entire view: "Select 1 From x Where something". This should produce a record with a 1 in it for EVERY instance that matches the Where condition in the materialized view, possibly a very large number, theoretically, up to the entire dataset. But adding the EXISTS function should make it quit as soon as one match is found, ESPECIALLY if SQL Server is generating an entirely new plan that takes the EXISTS function into account right from the start.

Jeff, you mentioned that the slow version does not go parallel, which I also don't understand. Does parallelism negate the ability to stop when a condition is met? That is, cannot ANY ONE of the parallel threads announce success, with the result that all threads of the entire query are then halted, because the query as a whole has what it wanted?

I'm still confused.