SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


The Pitfall of "Not Equal To" Operator in Queries!


The Pitfall of "Not Equal To" Operator in Queries!

Author
Message
Ian Griffiths
Ian Griffiths
SSC Veteran
SSC Veteran (281 reputation)SSC Veteran (281 reputation)SSC Veteran (281 reputation)SSC Veteran (281 reputation)SSC Veteran (281 reputation)SSC Veteran (281 reputation)SSC Veteran (281 reputation)SSC Veteran (281 reputation)

Group: General Forum Members
Points: 281 Visits: 2
DISTINCT is definitely a bad idea, performance-wise. Avoid if at all possible as it usually involves having to sort the result set and then scan through it, making lots of unnecessary overhead that can often be avoided.
Amin Sobati
Amin Sobati
SSC Rookie
SSC Rookie (41 reputation)SSC Rookie (41 reputation)SSC Rookie (41 reputation)SSC Rookie (41 reputation)SSC Rookie (41 reputation)SSC Rookie (41 reputation)SSC Rookie (41 reputation)SSC Rookie (41 reputation)

Group: General Forum Members
Points: 41 Visits: 1
Thanks indeed!
Certainly there can be variations depending on the experience of the developer. I have tried to BOLD a situation that can lead a developer to strange result. This is a real case and I have seen this mistake by some developers during the years on my IT career.

Amin Sobati
Amin Sobati
SSC Rookie
SSC Rookie (41 reputation)SSC Rookie (41 reputation)SSC Rookie (41 reputation)SSC Rookie (41 reputation)SSC Rookie (41 reputation)SSC Rookie (41 reputation)SSC Rookie (41 reputation)SSC Rookie (41 reputation)

Group: General Forum Members
Points: 41 Visits: 1
Thank Duray!
I agree! But I intended to show a mistranslation between what a human can mean and what a developer might write in TSQL! This really depends on the experience...

Mike C
Mike C
SSCertifiable
SSCertifiable (6.4K reputation)SSCertifiable (6.4K reputation)SSCertifiable (6.4K reputation)SSCertifiable (6.4K reputation)SSCertifiable (6.4K reputation)SSCertifiable (6.4K reputation)SSCertifiable (6.4K reputation)SSCertifiable (6.4K reputation)

Group: General Forum Members
Points: 6445 Visits: 1172

"Actually comparisons of a NULL (or a non-NULL value) to a NULL results in UNKNOWN. Your point is taken though: The WHERE clause only returns results for rows comparisons that evaluate to TRUE. Comparisons that evaluate to FALSE or UNKNOWN are dropped from the results."

Well now we're splitting hairs!

Actually UNKNOWN represents a possible result of a logical comparison while NULL represents a missing/unknown value. There can be some confusion with the terminology, but NULL is actually not the same thing as UNKNOWN. Part of the confusion pops up because the word "unknown" is so often used in the definition of NULLs.

As an example (and to add to the confusion) consider the ANSI SQL:1999 standard which added a BOOLEAN data type to SQL. The BOOLEAN data type can hold one of three possible values: TRUE, FALSE and UNKNOWN: SQL three-valued logic and all, so far so good.

However it also has to be nullable, and to be consistent with the rest of the SQL model, NULL would have to indicate that you don't know whether the BOOLEAN column or variable is TRUE, FALSE or UNKNOWN. I'll give a quick example. Before someone points it out, I know this won't actually run on T-SQL... this is "pseudo-SQL" to illustrate what it might look like if ANSI BOOLEANs were implemented in T-SQL:

DECLARE @x BOOLEAN;
DECLARE @y BOOLEAN;
DECLARE @z BOOLEAN;

SELECT @x = UNKNOWN; -- UNKNOWN
SELECT @y = (10 < NULL); -- evaluates to UNKNOWN
SELECT @z = NULL; -- NULL value

According to this example @x is equal to @y (they are both UNKNOWN). However, is @x equal to @z? We don't know the value of @z (is it TRUE, FALSE or UNKNOWN?), but we do know that @x is UNKNOWN. So is UNKNOWN the same as NULL? According to the SQL logical model the result of this comparison should be UNKNOWN, since we don't know if @z is TRUE, FALSE or UNKNOWN. According to SQL:1999 the answer could be TRUE since they say to treat UNKNOWN as equivalent to NULL.

To implement this correctly you actually have to re-work the SQL logical model from three-valued logic to four-valued logic everywhere The ANSI standard just says to treat UNKNOWNs the same as NULLs, which means you're starting to introduce inconsistent exceptions into the logical model rather than expanding it to accomodate four-valued logic.

As for the ANSI_NULLS setting, I highly recommend leaving it set to the ON position and using the IS NULL and IS NOT NULL operators (or CASE and COALESCE) to check for NULLs.


evaleah
evaleah
Old Hand
Old Hand (312 reputation)Old Hand (312 reputation)Old Hand (312 reputation)Old Hand (312 reputation)Old Hand (312 reputation)Old Hand (312 reputation)Old Hand (312 reputation)Old Hand (312 reputation)

Group: General Forum Members
Points: 312 Visits: 52

I didn't test it but is seems this should work and is a bit simpler:

SELECT s.* FROM Students s
LEFT JOIN StudentExam se
ON s.StID=se.StID
WHERE ISNULL(se.ExamName,'') <> 'SQL Server'

This way any NULLs would be replaced with '' and therefore would not include 'SQL Server'.

Eva





Amin Sobati
Amin Sobati
SSC Rookie
SSC Rookie (41 reputation)SSC Rookie (41 reputation)SSC Rookie (41 reputation)SSC Rookie (41 reputation)SSC Rookie (41 reputation)SSC Rookie (41 reputation)SSC Rookie (41 reputation)SSC Rookie (41 reputation)

Group: General Forum Members
Points: 41 Visits: 1

Thanks Eva,

All of students appear again! You have replaced the NULL with new value, but the logic is the same!


john hill
john hill
Grasshopper
Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)

Group: General Forum Members
Points: 24 Visits: 68

If you are running SQL 2005 add the index below and see what plan the engine picks...

create nonclustered index IX_Exam_name on StudentExam(ExamName) include(StID)

SELECT s.* FROM Students s

JOIN StudentExam se

ON s.StID=se.StID

WHERE se.ExamName<>'SQL Server'

Here's the plan with the index for the original != query...

SELECT s.* FROM Students s JOIN StudentExam se ON s.StID=se.StID WHERE se.ExamName<>'SQL Server'
|--Nested Loops(Inner Join, OUTER REFERENCES[se].[StID]))
|--Index Seek(OBJECT[TestingDB].[dbo].[StudentExam].[ix_Exam_name] AS [se]), SEEK[se].[ExamName] < 'SQL Server' OR [se].[ExamName] > 'SQL Server') ORDERED FORWARD)
|--Clustered Index Seek(OBJECT[TestingDB].[dbo].[Students].[PK__Students__345EC57D] AS [s]), SEEK[s].[StID]=[TestingDB].[dbo].[StudentExam].[StID] as [se].[StID]) ORDERED FORWARD)


Dave F-425609
Dave F-425609
Old Hand
Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)

Group: General Forum Members
Points: 370 Visits: 37

Can anyone explain, in detail, why the outer join solution is faster than the Subquery solution in this example? Is the Exists method fastest of all?


Dave F-425609
Dave F-425609
Old Hand
Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)

Group: General Forum Members
Points: 370 Visits: 37

Sorry, let me rephrase the question:

Can anyone explain, in detail, why the outer join solution is faster than the Subquery solution in this example? Is the Exists method fastest of all?

I've examined all three on a limited number of rows and the performance and the plans are identical. Why do you reckon one will be faster than the next with table growth?

Correct me if I'm wrong, but when you do a left outer join the left part of the join has a full table scan. How is this faster than a subquery based on an index compared to another column based upon an index? Also, I can't see the difference in using EXISTS over a standard subquery as it still is effectively comparing values inside the EXISTS subquery?

Also the LEFT JOIN thing looks nice, but how would you work it if (as it would seem a more reasonable requirement) you wanted the people who hadn't taken the SQL Server exam BUT HAD TAKEN another?


Mike C
Mike C
SSCertifiable
SSCertifiable (6.4K reputation)SSCertifiable (6.4K reputation)SSCertifiable (6.4K reputation)SSCertifiable (6.4K reputation)SSCertifiable (6.4K reputation)SSCertifiable (6.4K reputation)SSCertifiable (6.4K reputation)SSCertifiable (6.4K reputation)

Group: General Forum Members
Points: 6445 Visits: 1172

Can anyone explain, in detail, why the outer join solution is faster than the Subquery solution in this example? Is the Exists method fastest of all?

It depends... One could be faster than another in different situations. It all boils down to which query plan SQL Server decides to use for a given query and a given set of data. As a side not, on SQL 2000 EXISTS tends to be faster than the IN operator. The explanation I've seen for this is that the IN operator needs to serialize the entire subquery before it can perform the comparison whereas the EXISTS operator does not. This behavior may be changed in SQL 2005, although I can't test it myself right now.

I've examined all three on a limited number of rows and the performance and the plans are identical. Why do you reckon one will be faster than the next with table growth?

Basically it's all up to the optimizer. It takes a lot of factors into consideration, including amount of data, when generating query plans. Changing the amount of data could change the query plan it chooses.

Correct me if I'm wrong, but when you do a left outer join the left part of the join has a full table scan. How is this faster than a subquery based on an index compared to another column based upon an index? Also, I can't see the difference in using EXISTS over a standard subquery as it still is effectively comparing values inside the EXISTS subquery?

Not necessarily a "full table scan", if it's properly indexed. You could end up with an (clustered) index scan, which would be more efficient in most cases. The rationale against the subquery versus EXISTS might be the same as for EXISTS versus IN. If SQL Server has to serialize the entire subquery before using it, it could be less efficient than using EXISTS.


Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search