The Pitfall of "Not Equal To" Operator in Queries!

Question

Post reply

The Pitfall of "Not Equal To" Operator in Queries!

Yaroslav Pentsarskyy-353753

Old Hand

Points: 333
More actions
August 16, 2006 at 9:25 am

#173684

Comments posted to this topic are about the content posted at temp
Regards,
Yaroslav

Viewing 15 posts - 1 through 15 (of 34 total)

You must be logged in to reply to this topic. Login to reply

nileshsane Default port Points: 1465 More actions · Answer 1

Nice article.

Another way of writing the same, for those who prefer the old convention (I dont recommend it).

SELECT

s.*, se.* FROM Students s ,

(SELECT * FROM StudentExam WHERE ExamName='SQL Server') se

where

s.StID !=se.StID

What I hear I forget, what I see I remember, what I do I understand

Vladan SSC-Insane Points: 22072 More actions · Answer 2

I'm not sure why one would use precisely this form... IMHO it is more complicated than necessary, and the complication does not bring any advantages :

SELECT s.* FROM Students s

LEFT JOIN (SELECT StID FROM StudentExam WHERE ExamName='SQL Server') se ON s.StID=se.StID

WHERE se.StID IS NULL

I would prefer to use one of these variants:

variant 1) - simplest

SELECT s.* FROM Students s

LEFT JOIN StudentExam E ON E.stID=s.stID AND ExamName='SQL Server'

WHERE E.StID IS NULL

variant 2) - to avoid possible duplicates

SELECT s.* FROM Students s

LEFT JOIN (SELECT DISTINCT StID FROM StudentExam WHERE ExamName='SQL Server') se

ON s.StID=se.StID

WHERE se.StID IS NULL

Is there any reason why to use the code as shown in the article?

Yes I know that if condition is NOT, there can't be any duplicities because I only take those that don't have any corresponding rows... but generally when speaking about similar JOINs, duplicities are things that can cause problems - so I thought I'll mention that... also because it is the main reason why/when I would use the derived table. Otherwise, variant 1 should be good enough.

Duray AKAR SSC Veteran Points: 261 More actions · Answer 3

SELECT s.* FROM Students s

JOIN StudentExam se

ON s.StID=se.StID

WHERE se.ExamName<>'SQL Server'

is definitely NOT the query to be used to retrieve:

The students that has not taken "SQL Server" exam

in the given data architecture...

It is a query to get:

The students that has taken an exam that is NOT "SQL Server".

And it successfully delivers the expected result.

What if there are students that has not taken ANY exams ?

So, I don't think this is a pitfall of "<>" operator at all !

The query should be in the final form that the author suggests to begin with....

Tatsu SSCertifiable Points: 7824 More actions · Answer 4

Here is how I would answer the original question:

select s.* from Students s
where not exists
(select 1 from StudentExam se
where se.ExamName = 'SQL Server'
and s.StID=se.StID)

While I have seen some performance issues when using an in (select ...) clause exists seems to work great.

[font="Tahoma"]Bryant E. Byrd, BSSE MCDBA MCAD[/font]
Business Intelligence Administrator
MSBI Administration Blog

Mike C SSC-Insane Points: 23224 More actions · Answer 5

Here's your sample tables with indexes on them. Should eliminate the table scans in your query plan.

CREATE TABLE Students (

StID INT NOT NULL PRIMARY KEY NONCLUSTERED,

StName NVARCHAR(50) NOT NULL)

GO

CREATE CLUSTERED INDEX IX_Students

ON Students (StName)

GO

INSERT Students VALUES (1,'Jack')

INSERT Students VALUES (2,'Anna')

INSERT Students VALUES (3,'Bob')

GO

CREATE TABLE StudentExam (

StID INT NOT NULL,

ExamName VARCHAR(50) NOT NULL,

PRIMARY KEY (StID, ExamName))

GO

INSERT StudentExam VALUES (1,'SQL Server')

INSERT StudentExam VALUES (2,'VB.NET')

INSERT StudentExam VALUES (2,'C#.NET')

INSERT StudentExam VALUES (1,'XML')

GO

Then run your query and see if it comes back with a more efficient query plan.

Jason Hopkins SSC Enthusiast Points: 117 More actions · Answer 6

Subqueries are usually a bad idea as a first option- how about a left join, perhaps with a DISTINCT in the SELECT clause?

Dan Knowlton SSC Rookie Points: 39 More actions · Answer 7

The point of the article is excellent. It reinforces the fact that we as SQL programmers need to be careful and be sure of the results that our queries will return. It's a reminder to us veterans to not be complacent and whip out some crappy code when we're in a hurry. It's a nice lesson for new programmers who could easily fall into the trap of using the "Not Equal To" operator when they should really be using a sub-query or outer join.

Donald Eberhart SSC Rookie Points: 45 More actions · Answer 8

Ehsan,

Bob does not show up in your results.

Regards,

Don

Andy DBA SSCommitted Points: 1776 More actions · Answer 9

If you "Check out" Ehsan's example you'll find Jack is retrieved even though he did take the class and, even though Bob did not take the class, he is not retrieved.

"Stupid people compose a query and expect other unusual result "

Vladan's variant 1 example returns the correct results because the ExamName criteria is in the join, NOT the where clause. This causes all StudentExam columns in the join's result set to be NULL for students without a 'SQL Server' exam. The example then filters out the non-null rows from the result set with the where clause.

In Ehsan's example, putting the ExamName <> 'SQL Server' criteria in the where clause filters out all rows where ExamName = 'SQL Server'. This allows Jack to still be retrieved because there is a row in the result set where ExamName <> 'SQL Server'.

This also causes Bob to be filtered out because all of the StudentExam columns in the result set are NULL for Bob (who didn't take any exams). ANY COMPARISON OF A NON-NULL TO A NULL IS NULL. So, ExamName <> 'SQL Server' evaluates to NULL even though it seems like it should evaluate to TRUE and the row is filtered out. Bye bye Bob.

There's more good stuff in BOL on NULL comparisons in the "Comparison Search Conditions" subtopic under "null values" in the index.

Finally, Ehsan's example will return duplicates for students taking more than one exam other than 'SQL Server'. The beauty of Vladan's example is that you only get one row in the result set for each row in the "left" table without a match in the "right" table so "select distinct" is not required.

Dennis D. Allen Say Hey Kid Points: 699 More actions · Answer 10

The question is one of membership, and the IN clause seems to be a very clear way to pose that question. After proper indexing, it yields the same query plan as the methods proposed, but in a more understandable query.

/* Students that have taken the SQL Server test */

SELECT s.*

FROM dbo.Students AS s

WHERE s.StID IN (SELECT StID FROM dbo.StudentExam WHERE ExamName = 'SQL Server')

/* Students that have not taken the SQL Server test */

SELECT s.*

FROM dbo.Students AS s

WHERE s.StID NOT IN (SELECT StID FROM dbo.StudentExam WHERE ExamName = 'SQL Server')

YSLGuru SSC-Insane Points: 21632 More actions · Answer 11

Regardless of the material, a thanks to Amin Sobati for writing it. I often wonder why any of these contributing authors would take the time to produce material when too often that maetrial recieves not just constructive criticism but out right crude & rude behaivor. Is it necessary to sue workds like 'stupid' in a constructive criticism?

Kindest Regards,

Just say No to Facebook!

Scalability Doug SSCertifiable Points: 5351 More actions · Answer 12

This is a classic example of an exclusion list and point sout the limitiations of the inequality operator. What is presented in this example is that members in one are are to be excluded from from others. (Note that the example is students who have not taken the SQL Server exam. Bob gets omitted. Does he get omitted because he has not taken an exam?). And as the author points out, "you can simply use a subquery to build the list of students who have taken SQL Server exam" and then build your exclusion using the NOT EXISTS clause:

SELECT s.*

FROM Students s

where

not exists ( select stid from studentExam se where s.StID = stid and examname = 'SQL Server' )

Mike C SSC-Insane Points: 23224 More actions · Answer 13

"ANY COMPARISON OF A NON-NULL TO A NULL IS NULL"

Actually comparisons of a NULL (or a non-NULL value) to a NULL results in UNKNOWN. Your point is taken though: The WHERE clause only returns results for rows comparisons that evaluate to TRUE. Comparisons that evaluate to FALSE or UNKNOWN are dropped from the results.

Tim Chapman-218780 Ten Centuries Points: 1051 More actions · Answer 14

I 100% agree with the comment made by Duray AKAR. The query listed will never retrieve the results "the students that has not taken the "SQL Server" exam...will only return those results for students that have taken an exam that is not SQL Server. Very good point Duray AKAR.

Tim