Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««1234»»

The Pitfall of "Not Equal To" Operator in Queries! Expand / Collapse
Author
Message
Posted Thursday, August 24, 2006 2:18 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Monday, February 11, 2008 2:10 AM
Points: 167, Visits: 2
DISTINCT is definitely a bad idea, performance-wise. Avoid if at all possible as it usually involves having to sort the result set and then scan through it, making lots of unnecessary overhead that can often be avoided.
Post #303701
Posted Thursday, August 24, 2006 2:30 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, December 26, 2006 12:11 PM
Points: 7, Visits: 1
Thanks indeed!
Certainly there can be variations depending on the experience of the developer. I have tried to BOLD a situation that can lead a developer to strange result. This is a real case and I have seen this mistake by some developers during the years on my IT career.
Post #303702
Posted Thursday, August 24, 2006 4:02 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, December 26, 2006 12:11 PM
Points: 7, Visits: 1
Thank Duray!
I agree! But I intended to show a mistranslation between what a human can mean and what a developer might write in TSQL! This really depends on the experience...
Post #303711
Posted Thursday, August 24, 2006 7:56 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, April 14, 2014 4:18 PM
Points: 1,276, Visits: 1,132

"Actually comparisons of a NULL (or a non-NULL value) to a NULL results in UNKNOWN.  Your point is taken though:  The WHERE clause only returns results for rows comparisons that evaluate to TRUE.  Comparisons that evaluate to FALSE or UNKNOWN are dropped from the results."

Well now we're splitting hairs! 

Actually UNKNOWN represents a possible result of a logical comparison while NULL represents a missing/unknown value.  There can be some confusion with the terminology, but NULL is actually not the same thing as UNKNOWN.  Part of the confusion pops up because the word "unknown" is so often used in the definition of NULLs.

As an example (and to add to the confusion) consider the ANSI SQL:1999 standard which added a BOOLEAN data type to SQL.  The BOOLEAN data type can hold one of three possible values:  TRUE, FALSE and UNKNOWN:  SQL three-valued logic and all, so far so good.

However it also has to be nullable, and to be consistent with the rest of the SQL model, NULL would have to indicate that you don't know whether the BOOLEAN column or variable is TRUE, FALSE or UNKNOWN.  I'll give a quick example.  Before someone points it out, I know this won't actually run on T-SQL... this is "pseudo-SQL" to illustrate what it might look like if ANSI BOOLEANs were implemented in T-SQL:

DECLARE @x BOOLEAN;
DECLARE @y BOOLEAN;
DECLARE @z BOOLEAN;

SELECT @x = UNKNOWN;     -- UNKNOWN
SELECT @y = (10 < NULL); -- evaluates to UNKNOWN
SELECT @z = NULL;        -- NULL value

According to this example @x is equal to @y (they are both UNKNOWN).  However, is @x equal to @z?  We don't know the value of @z (is it TRUE, FALSE or UNKNOWN?), but we do know that @x is UNKNOWN.  So is UNKNOWN the same as NULL?  According to the SQL logical model the result of this comparison should be UNKNOWN, since we don't know if @z is TRUE, FALSE or UNKNOWN.  According to SQL:1999 the answer could be TRUE since they say to treat UNKNOWN as equivalent to NULL.

To implement this correctly you actually have to re-work the SQL logical model from three-valued logic to four-valued logic everywhere   The ANSI standard just says to treat UNKNOWNs the same as NULLs, which means you're starting to introduce inconsistent exceptions into the logical model rather than expanding it to accomodate four-valued logic.

As for the ANSI_NULLS setting, I highly recommend leaving it set to the ON position and using the IS NULL and IS NOT NULL operators (or CASE and COALESCE) to check for NULLs.

Post #303781
Posted Thursday, August 24, 2006 11:41 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Wednesday, February 1, 2012 11:33 AM
Points: 68, Visits: 52

I didn't test it but is seems this should work and is a bit simpler:

SELECT s.* FROM Students s
LEFT JOIN StudentExam se
ON s.StID=se.StID
WHERE ISNULL(se.ExamName,'') <> 'SQL Server'

This way any NULLs would be replaced with '' and therefore would not include 'SQL Server'.

Eva




Post #303898
Posted Thursday, August 24, 2006 12:53 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, December 26, 2006 12:11 PM
Points: 7, Visits: 1

Thanks Eva,

All of students appear again! You have replaced the NULL with new value, but the logic is the same!

Post #303925
Posted Thursday, August 24, 2006 6:12 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Saturday, February 15, 2014 10:40 AM
Points: 14, Visits: 66

If you are running SQL 2005 add the index below and see what plan the engine picks...

 

create nonclustered index IX_Exam_name on StudentExam(ExamName) include(StID)

SELECT s.* FROM Students s

JOIN StudentExam se

ON s.StID=se.StID

WHERE se.ExamName<>'SQL Server'

 

Here's the plan with the index for the original != query...

SELECT s.* FROM Students s   JOIN StudentExam se   ON s.StID=se.StID    WHERE se.ExamName<>'SQL Server'
  |--Nested Loops(Inner Join, OUTER REFERENCES[se].[StID]))
       |--Index Seek(OBJECT[TestingDB].[dbo].[StudentExam].[ix_Exam_name] AS [se]), SEEK[se].[ExamName] < 'SQL Server' OR [se].[ExamName] > 'SQL Server') ORDERED FORWARD)
       |--Clustered Index Seek(OBJECT[TestingDB].[dbo].[Students].[PK__Students__345EC57D] AS [s]), SEEK[s].[StID]=[TestingDB].[dbo].[StudentExam].[StID] as [se].[StID]) ORDERED FORWARD)

Post #304006
Posted Thursday, August 23, 2007 3:38 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Monday, February 9, 2009 7:44 PM
Points: 282, Visits: 37

Can anyone explain, in detail, why the outer join solution is faster than the Subquery solution in this example?  Is the Exists method fastest of all?

Post #393132
Posted Thursday, August 23, 2007 3:47 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Monday, February 9, 2009 7:44 PM
Points: 282, Visits: 37

Sorry, let me rephrase the question:

Can anyone explain, in detail, why the outer join solution is faster than the Subquery solution in this example?  Is the Exists method fastest of all?

I've examined all three on a limited number of rows and the performance and the plans are identical.  Why do you reckon one will be faster than the next with table growth?

Correct me if I'm wrong, but when you do a left outer join the left part of the join has a full table scan.  How is this faster than a subquery based on an index compared to another column based upon an index?  Also, I can't see the difference in using EXISTS over a standard subquery as it still is effectively comparing values inside the EXISTS subquery?

Also the LEFT JOIN thing looks nice, but how would you work it if (as it would seem a more reasonable requirement) you wanted the people who hadn't taken the SQL Server exam BUT HAD TAKEN another?

Post #393134
Posted Thursday, August 23, 2007 9:46 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, April 14, 2014 4:18 PM
Points: 1,276, Visits: 1,132

Can anyone explain, in detail, why the outer join solution is faster than the Subquery solution in this example?  Is the Exists method fastest of all?

It depends...  One could be faster than another in different situations.  It all boils down to which query plan SQL Server decides to use for a given query and a given set of data.  As a side not, on SQL 2000 EXISTS tends to be faster than the IN operator.  The explanation I've seen for this is that the IN operator needs to serialize the entire subquery before it can perform the comparison whereas the EXISTS operator does not.  This behavior may be changed in SQL 2005, although I can't test it myself right now.

I've examined all three on a limited number of rows and the performance and the plans are identical.  Why do you reckon one will be faster than the next with table growth?

Basically it's all up to the optimizer.  It takes a lot of factors into consideration, including amount of data, when generating query plans.  Changing the amount of data could change the query plan it chooses.

Correct me if I'm wrong, but when you do a left outer join the left part of the join has a full table scan.  How is this faster than a subquery based on an index compared to another column based upon an index?  Also, I can't see the difference in using EXISTS over a standard subquery as it still is effectively comparing values inside the EXISTS subquery?

Not necessarily a "full table scan", if it's properly indexed.  You could end up with an (clustered) index scan, which would be more efficient in most cases.  The rationale against the subquery versus EXISTS might be the same as for EXISTS versus IN.  If SQL Server has to serialize the entire subquery before using it, it could be less efficient than using EXISTS.

Post #393358
« Prev Topic | Next Topic »

Add to briefcase «««1234»»

Permissions Expand / Collapse