November 28, 2018 at 11:44 pm
Hi All,
Recently, I have undergone a computer based exam where in they asked below question.Need some inputs from experts.
SELECT DISTINCT col1 FROM tab1 WHERE col2 = 'A' AND NOT EXISTS (SELECT 'x’ from tab2 where tab2.col2 = 'B’ and tab2.col1 = tab1.col1)
I don't any idea on how many rows each table has or what indexes does exist on the tables. Looking at the above question can anyone tell this is a bad query or how can we re-write the query to perform better ?
Thanks,
Sam
November 29, 2018 at 2:21 am
Maybe the DISTINCT, which just gets more expensive the more rows and columns you have? More than likely the lack of indexes which we have no details on. I can't really see why the correlated query would be the cause of the performance issue unless there weren't indexes on the tables; and then the problem is that there aren't good indexes, not the correlated query.
You could, however, alternatively, right is as a LEFT JOIN query:SELECT DISTINCT T1.Col1
FROM dbo.Tab1 T1
LEFT JOIN dbo.Tab2 T2 ON T1.Col1 = T2.Col1
AND T2.Col2 = 'B'
WHERE T1.Col1 = 'A'
AND T2.Col1 IS NULL;
There's very little between these 2 execution plans though.
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
November 29, 2018 at 3:04 am
Thom A - Thursday, November 29, 2018 2:21 AMMaybe the DISTINCT, which just gets more expensive the more rows and columns you have? More than likely the lack of indexes which we have no details on. I can't really see why the correlated query would be the cause of the performance issue unless there weren't indexes on the tables; and then the problem is that there aren't good indexes, not the correlated query.You could, however, alternatively, right is as a LEFT JOIN query:
SELECT DISTINCT T1.Col1
FROM dbo.Tab1 T1
LEFT JOIN dbo.Tab2 T2 ON T1.Col1 = T2.Col1
AND T2.Col2 = 'B'
WHERE T1.Col1 = 'A'
AND T2.Col1 IS NULL;There's very little between these 2 execution plans though.
I'd agree with Thom here. Often the plans for the two query forms are indistinguishable.
There's an optimization in the NOT EXISTS version which under some circumstances can make this form of the query much faster than the LEFT JOIN version.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
November 29, 2018 at 4:08 am
Thank you all.
November 29, 2018 at 6:21 am
I'd experiment with making it a JOIN instead of an exists, but there's nothing inherently evil in the code that I can see. Did they give you a "correct" answer?
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
November 29, 2018 at 6:38 am
Grant Fritchey - Thursday, November 29, 2018 6:21 AMI'd experiment with making it a JOIN instead of an exists, but there's nothing inherently evil in the code that I can see. Did they give you a "correct" answer?
No Sir.
November 29, 2018 at 6:47 am
samantha.sqldba - Thursday, November 29, 2018 6:38 AMGrant Fritchey - Thursday, November 29, 2018 6:21 AMI'd experiment with making it a JOIN instead of an exists, but there's nothing inherently evil in the code that I can see. Did they give you a "correct" answer?No Sir.
Did you ask them for one, or could you? Considering how open ended the question is, one would hope that their answer is quite broad.
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
November 29, 2018 at 3:36 pm
To avoid having to lookup every duplicate tab1.col1 value in tab2, which is pointless, I'd try this query:
SELECT t1.col1
FROM (
SELECT DISTINCT col1
FROM tab1
WHERE col2 = 'A'
) AS t1
LEFT OUTER JOIN (
SELECT col1
FROM tab2
WHERE col2 = 'B'
) AS t2 ON t2.col1 = t1.col1
WHERE t2.col1 IS NULL
SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.
December 3, 2018 at 10:21 pm
Just my humble opinion but (provided I'm not missing some awesome trick) my initial thought is that this is is a "manhole cover" question. It's littered with supposed "worst practices" (DISTINCT, not using SELECT * in the correlated subquery, which are both just a distraction when it comes to this particular code). If the correct indexes aren't available... or if they are... I believe you can write this code 7 different ways from Sunday and still not improve the performance so as to not be labeled a performance problem because, as it stands, it's not a performance problem to begin with.
My answer on the test would have been...
"The code is fine and requires no changes. Whomever reported the problem needed to check on whether or not the correct indexes are available to increase the speed of this already fast code. Once that's done (which would substantially improve the performance and reduce resource usage), you could squeeze out a tiny bit more duration performance (or not and with a bit more CPU usage) and seriously increase the readability using the following nicely symmetrical code, which is functionally equivalent to the original code. The bottom line is you shouldn't waste your time trying to fix something you've not tested. And, no... doing things like adding a DISTINCT to the subquery, changing the "X" to a * (NOT EXISTS doesn't even look at those "placeholders"), using a TOP 1 in the subquery, or trying to pre-isolate the rows of tab1 or tab2 won't improve anything and can sometimes make them worse because the existing code actually already does that. And, no... an outer join with NULL detection will be slower, as well."
--Jeff Moden
Change is inevitable... Change for the better is not.
December 4, 2018 at 9:15 am
samantha.sqldba - Wednesday, November 28, 2018 11:44 PMHi All,
Recently, I have undergone a computer based exam where in they asked below question.Need some inputs from experts.
- Below correlated query is causing performance issue. Please explain why it is causing performance issue and optimize the query.
SELECT DISTINCT col1 FROM tab1 WHERE col2 = 'A' AND NOT EXISTS (SELECT 'x’ from tab2 where tab2.col2 = 'B’ and tab2.col1 = tab1.col1)
I don't any idea on how many rows each table has or what indexes does exist on the tables. Looking at the above question can anyone tell this is a bad query or how can we re-write the query to perform better ?
Thanks,
Sam
I'd say that's a good query, in fact it's almost exactly how I'd write it.
If there is a performance problem with it it will be due to missing indexes.
The following indexes would help with this query.CREATE NONCLUSTERED INDEX IX_tab1_col2_col1 ON [dbo].[tab1] ([col2],[col1])
CREATE NONCLUSTERED INDEX IX_tab2_col2_col1 ON [dbo].[tab2] ([col2],[col1])
Some other possible ways to rewrite the query are:
SELECT DISTINCT col1
FROM tab1
WHERE col2 = 'A'
EXCEPT
SELECT col1
FROM tab2
WHERE tab2.col2 = 'B'
GO
SELECT DISTINCT tab1.col1
FROM tab1
LEFT JOIN tab2
ON tab2.col2 = 'B'
AND tab2.col1 = tab1.col1
WHERE tab2.col1 IS NULL
GO
SELECT DISTINCT tab1.col1
FROM tab1
WHERE tab1.col2 = 'A'
AND tab1.col1 NOT IN (SELECT tab2.col1 FROM tab2 WHERE tab2.col2 = 'B')
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply