Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 1234»»»

The Pitfall of "Not Equal To" Operator in Queries! Expand / Collapse
Author
Message
Posted Wednesday, August 16, 2006 9:25 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Sunday, March 15, 2009 7:51 PM
Points: 5, Visits: 3
Comments posted to this topic are about the content posted at temp


Regards,

Yaroslav

Post #302141
Posted Wednesday, August 23, 2006 4:51 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Thursday, November 24, 2011 1:55 AM
Points: 139, Visits: 43

Nice article.

Another way of writing the same, for those who prefer the old convention (I dont recommend it).

 

SELECT s.*, se.* FROM Students s ,

(SELECT * FROM StudentExam WHERE ExamName='SQL Server') se

where s.StID !=se.StID




What I hear I forget, what I see I remember, what I do I understand
Post #303450
Posted Wednesday, August 23, 2006 5:14 AM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Thursday, May 5, 2011 1:38 AM
Points: 1,636, Visits: 604

I'm not sure why one would use precisely this form... IMHO it is more complicated than necessary, and the complication does not bring any advantages :
SELECT s.* FROM Students s
LEFT JOIN (SELECT StID FROM StudentExam WHERE ExamName='SQL Server') se ON s.StID=se.StID
WHERE se.StID IS NULL

I would prefer to use one of these variants:
variant 1) - simplest
SELECT s.* FROM Students s
LEFT JOIN StudentExam E ON E.stID=s.stID AND ExamName='SQL Server'
WHERE E.StID IS NULL

variant 2) - to avoid possible duplicates
SELECT s.* FROM Students s
LEFT JOIN (SELECT DISTINCT StID FROM StudentExam WHERE ExamName='SQL Server') se
  ON s.StID=se.StID
WHERE se.StID IS NULL

Is there any reason why to use the code as shown in the article?

Yes I know that if condition is NOT, there can't be any duplicities because I only take those that don't have any corresponding rows... but generally when speaking about similar JOINs, duplicities are things that can cause problems - so I thought I'll mention that... also because it is the main reason why/when I would use the derived table. Otherwise, variant 1 should be good enough.




Post #303454
Posted Wednesday, August 23, 2006 7:10 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Wednesday, August 30, 2006 11:06 AM
Points: 11, Visits: 1

SELECT s.* FROM Students s
JOIN StudentExam se
ON s.StID=se.StID
WHERE se.ExamName<>'SQL Server'

is definitely NOT the query to be used to retrieve:
The students that has not taken "SQL Server" exam
in the given data architecture...

It is a query to get:
The students that has taken an exam that is NOT "SQL Server".

And it successfully delivers the expected result.
What if there are students that has not taken ANY exams ?

So, I don't think this is a pitfall of "<>" operator at all !

The query should be in the final form that the author suggests to begin with....

 

Post #303484
Posted Wednesday, August 23, 2006 7:19 AM


SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Monday, July 7, 2014 12:50 PM
Points: 292, Visits: 264

Here is how I would answer the original question:

select s.* from Students s
where not exists
  (select 1 from StudentExam se
   where se.ExamName = 'SQL Server'
     and s.StID=se.StID)

While I have seen some performance issues when using an in (select ...) clause exists seems to work great.



Bryant E. Byrd, BSSE MCDBA MCAD
Business Intelligence Administrator
MSBI Administration Blog
Post #303489
Posted Wednesday, August 23, 2006 7:39 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, April 14, 2014 4:18 PM
Points: 1,276, Visits: 1,132

Here's your sample tables with indexes on them.  Should eliminate the table scans in your query plan.

CREATE TABLE Students (
StID INT NOT NULL PRIMARY KEY NONCLUSTERED,
StName NVARCHAR(50) NOT NULL)
GO

CREATE CLUSTERED INDEX IX_Students
ON Students (StName)
GO

INSERT Students VALUES (1,'Jack')
INSERT Students VALUES (2,'Anna')
INSERT Students VALUES (3,'Bob')
GO

CREATE TABLE StudentExam (
StID INT NOT NULL,
ExamName VARCHAR(50) NOT NULL,
PRIMARY KEY (StID, ExamName))
GO

INSERT StudentExam VALUES (1,'SQL Server')
INSERT StudentExam VALUES (2,'VB.NET')
INSERT StudentExam VALUES (2,'C#.NET')
INSERT StudentExam VALUES (1,'XML')
GO

Then run your query and see if it comes back with a more efficient query plan.

Post #303501
Posted Wednesday, August 23, 2006 8:36 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, August 23, 2006 8:46 AM
Points: 3, Visits: 1
Subqueries are usually a bad idea as a first option- how about a left join, perhaps with a DISTINCT in the SELECT clause?
Post #303515
Posted Wednesday, August 23, 2006 8:50 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Friday, May 6, 2011 1:05 PM
Points: 3, Visits: 22
The point of the article is excellent. It reinforces the fact that we as SQL programmers need to be careful and be sure of the results that our queries will return. It's a reminder to us veterans to not be complacent and whip out some crappy code when we're in a hurry. It's a nice lesson for new programmers who could easily fall into the trap of using the "Not Equal To" operator when they should really be using a sub-query or outer join.
Post #303526
Posted Wednesday, August 23, 2006 8:56 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, August 11, 2009 8:42 AM
Points: 1, Visits: 9

Ehsan,

Bob does not show up in your results.

Regards,

Don

Post #303531
Posted Wednesday, August 23, 2006 9:20 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, June 6, 2014 2:13 PM
Points: 147, Visits: 533

If you "Check out" Ehsan's example you'll find Jack is retrieved even though he did take the class and, even though Bob did not take the class, he is not retrieved.

"Stupid people compose a query and expect other unusual result "

Vladan's variant 1 example returns the correct results because the ExamName criteria is in the join, NOT the where clause.  This causes all StudentExam columns in the join's result set to be NULL for students without a 'SQL Server' exam.  The example then filters out the non-null rows from the result set with the where clause.

In Ehsan's example, putting the ExamName <> 'SQL Server' criteria in the where clause filters out all rows where ExamName = 'SQL Server'.  This allows Jack to still be retrieved because there is a row in the result set where ExamName <> 'SQL Server'.

This also causes Bob to be filtered out because all of the StudentExam columns in the result set are NULL for Bob (who didn't take any exams).  ANY COMPARISON OF A NON-NULL TO A NULL IS NULL.  So, ExamName <> 'SQL Server' evaluates to NULL even though it seems like it should evaluate to TRUE and the row is filtered out.  Bye bye Bob.

There's more good stuff in BOL on NULL comparisons in the "Comparison Search Conditions" subtopic under "null values" in the index.

Finally, Ehsan's example will return duplicates for students taking more than one exam other than 'SQL Server'.  The beauty of Vladan's example is that you only get one row in the result set for each row in the "left" table without a match in the "right" table so "select distinct" is not required.




Post #303545
« Prev Topic | Next Topic »

Add to briefcase 1234»»»

Permissions Expand / Collapse