sql performance question

Question

sql performance question

samantha.sqldba

SSC Journeyman

Points: 80
More actions
November 28, 2018 at 11:44 pm

#369203
Hi All,
Recently, I have undergone a computer based exam where in they asked below question.Need some inputs from experts.
1. Below correlated query is causing performance issue. Please explain why it is causing performance issue and optimize the query.
SELECT DISTINCT col1 FROM tab1 WHERE col2 = 'A' AND NOT EXISTS (SELECT 'x’ from tab2 where tab2.col2 = 'B’ and tab2.col1 = tab1.col1)
I don't any idea on how many rows each table has or what indexes does exist on the tables. Looking at the above question can anyone tell this is a bad query or how can we re-write the query to perform better ?
Thanks,
Sam

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Thom A SSC Guru Points: 99126 More actions · Answer 1

Maybe the DISTINCT, which just gets more expensive the more rows and columns you have? More than likely the lack of indexes which we have no details on. I can't really see why the correlated query would be the cause of the performance issue unless there weren't indexes on the tables; and then the problem is that there aren't good indexes, not the correlated query.

You could, however, alternatively, right is as a LEFT JOIN query:
SELECT DISTINCT T1.Col1 FROM dbo.Tab1 T1 LEFT JOIN dbo.Tab2 T2 ON T1.Col1 = T2.Col1 AND T2.Col2 = 'B' WHERE T1.Col1 = 'A' AND T2.Col1 IS NULL;

There's very little between these 2 execution plans though.

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 2

Thom A - Thursday, November 29, 2018 2:21 AM
Maybe the DISTINCT, which just gets more expensive the more rows and columns you have? More than likely the lack of indexes which we have no details on. I can't really see why the correlated query would be the cause of the performance issue unless there weren't indexes on the tables; and then the problem is that there aren't good indexes, not the correlated query.
You could, however, alternatively, right is as a LEFT JOIN query:
SELECT DISTINCT T1.Col1 FROM dbo.Tab1 T1 LEFT JOIN dbo.Tab2 T2 ON T1.Col1 = T2.Col1 AND T2.Col2 = 'B' WHERE T1.Col1 = 'A' AND T2.Col1 IS NULL;
There's very little between these 2 execution plans though.

I'd agree with Thom here. Often the plans for the two query forms are indistinguishable.
There's an optimization in the NOT EXISTS version which under some circumstances can make this form of the query much faster than the LEFT JOIN version.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

samantha.sqldba SSC Journeyman Points: 80 More actions · Answer 3

samantha.sqldba

SSC Journeyman

Points: 80

November 29, 2018 at 4:08 am

#2014914

Thank you all.

Grant Fritchey SSC Guru Points: 398690 More actions · Answer 4

I'd experiment with making it a JOIN instead of an exists, but there's nothing inherently evil in the code that I can see. Did they give you a "correct" answer?

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

samantha.sqldba SSC Journeyman Points: 80 More actions · Answer 5

Grant Fritchey - Thursday, November 29, 2018 6:21 AM
I'd experiment with making it a JOIN instead of an exists, but there's nothing inherently evil in the code that I can see. Did they give you a "correct" answer?

No Sir.

Thom A SSC Guru Points: 99126 More actions · Answer 6

samantha.sqldba - Thursday, November 29, 2018 6:38 AM
Grant Fritchey - Thursday, November 29, 2018 6:21 AM
I'd experiment with making it a JOIN instead of an exists, but there's nothing inherently evil in the code that I can see. Did they give you a "correct" answer?
No Sir.

Did you ask them for one, or could you? Considering how open ended the question is, one would hope that their answer is quite broad.

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

ScottPletcher SSC Guru Points: 100947 More actions · Answer 7

To avoid having to lookup every duplicate tab1.col1 value in tab2, which is pointless, I'd try this query:

SELECT t1.col1 FROM ( SELECT DISTINCT col1 FROM tab1 WHERE col2 = 'A' ) AS t1 LEFT OUTER JOIN ( SELECT col1 FROM tab2 WHERE col2 = 'B' ) AS t2 ON t2.col1 = t1.col1 WHERE t2.col1 IS NULL

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 8

Just my humble opinion but (provided I'm not missing some awesome trick) my initial thought is that this is is a "manhole cover" question. It's littered with supposed "worst practices" (DISTINCT, not using SELECT * in the correlated subquery, which are both just a distraction when it comes to this particular code). If the correct indexes aren't available... or if they are... I believe you can write this code 7 different ways from Sunday and still not improve the performance so as to not be labeled a performance problem because, as it stands, it's not a performance problem to begin with.

My answer on the test would have been...

"The code is fine and requires no changes. Whomever reported the problem needed to check on whether or not the correct indexes are available to increase the speed of this already fast code. Once that's done (which would substantially improve the performance and reduce resource usage), you could squeeze out a tiny bit more duration performance (or not and with a bit more CPU usage) and seriously increase the readability using the following nicely symmetrical code, which is functionally equivalent to the original code. The bottom line is you shouldn't waste your time trying to fix something you've not tested. And, no... doing things like adding a DISTINCT to the subquery, changing the "X" to a * (NOT EXISTS doesn't even look at those "placeholders"), using a TOP 1 in the subquery, or trying to pre-isolate the rows of tab1 or tab2 won't improve anything and can sometimes make them worse because the existing code actually already does that. And, no... an outer join with NULL detection will be slower, as well."

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jonathan AC Roberts SSCoach Points: 18263 More actions · Answer 9

samantha.sqldba - Wednesday, November 28, 2018 11:44 PM
Hi All,
Recently, I have undergone a computer based exam where in they asked below question.Need some inputs from experts.
Below correlated query is causing performance issue. Please explain why it is causing performance issue and optimize the query.
SELECT DISTINCT col1 FROM tab1 WHERE col2 = 'A' AND NOT EXISTS (SELECT 'xâ€™ from tab2 where tab2.col2 = 'Bâ€™ and tab2.col1 = tab1.col1)
I don't any idea on how many rows each table has or what indexes does exist on the tables. Looking at the above question can anyone tell this is a bad query or how can we re-write the query to perform better ?
Thanks,
Sam

I'd say that's a good query, in fact it's almost exactly how I'd write it.
If there is a performance problem with it it will be due to missing indexes.
The following indexes would help with this query.
CREATE NONCLUSTERED INDEX IX_tab1_col2_col1 ON [dbo].[tab1] ([col2],[col1]) CREATE NONCLUSTERED INDEX IX_tab2_col2_col1 ON [dbo].[tab2] ([col2],[col1])
Some other possible ways to rewrite the query are:

SELECT DISTINCT col1
FROM tab1
WHERE col2 = 'A'
EXCEPT
SELECT col1
FROM tab2
WHERE tab2.col2 = 'B'

GO

SELECT DISTINCT tab1.col1
FROM tab1
LEFT JOIN tab2
ON tab2.col2 = 'B'
AND tab2.col1 = tab1.col1
WHERE tab2.col1 IS NULL

GO

SELECT DISTINCT tab1.col1
FROM tab1
WHERE tab1.col2 = 'A'
AND tab1.col1 NOT IN (SELECT tab2.col1 FROM tab2 WHERE tab2.col2 = 'B')