April 28, 2005 at 9:25 am
I noticed the following SQL behavior and want to share my concerns with you.
I believe that this should be considered as a bug in setting the execution plan in SQL Server!
CREATE TABLE [Table1] (
[C1] [int] IDENTITY (1, 1) NOT NULL ,
[C2] [char] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[C3] [char] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[C4] [char] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
CONSTRAINT [PK_Table1] PRIMARY KEY CLUSTERED
( [C1] ) ON [PRIMARY]
) ON [PRIMARY]
--GO
INSERT INTO [Table1] ([C2],[C3],[C4])VALUES('a','b','c')
INSERT INTO [Table1] ([C2],[C3],[C4])VALUES('d','e','f')
INSERT INTO [Table1] ([C2],[C3],[C4])VALUES('g','h','i')
-- APPLY THE BELOW WITH SHOW EXECUTION PLAN ON
DECLARE @X INT
SET @X = 2
SELECT C1, C2, C3, C4
FROM Table1
WHERE C1 = @X OR @X = -1
-- CLUSTERED INDEX SCAN!!!! 85.46%
SELECT C1, C2, C3, C4
FROM Table1
WHERE C1 = @X --OR @X = -1
-- CLUSTERED INDEX SEEK 14.54%
What do you think?
I know that I can solve this issue by implementing dynamic queries, but all my stored procedures are built on this principle which is making them generic.
Most of my WHERE conditions look like:
WHERE (Table.Col1 = @Col1 OR @Col1 = -1)
AND (Table.Col2 = @Col2 OR @Col2 = -1)
AND (Table.Col3 = @Col3 OR @Col3 = -1)
AND (Table.Col4 = @Col4 OR @Col4 = -1)
AND (Table.Col5 = @Col5 OR @Col5 = -1)
And so when I call this SP I can do the filter in the way I like, I can send -1 for the column I don’t like to apply the filter on.
Unfortunately, I noticed that the performance is so bad! And discovered that Indexes Scan is being done always!
April 28, 2005 at 9:10 pm
I wouldn't call this a bug, SQL is evaluating @X = -1 for each row so it is using a scan. It can't use the primary key to find the rows since "or @X = -1" doesn't have anything to do with the primary key. Generally, using dynamic SQL is frowned upon if you can avoid it. But it is a business decision you have to make. If the proc is used a lot and performance is a big issue, you might have to go that route.
The only other solutions I can think of would be so extremely tedious that they wouldn't be worth it. For instance, creating a separate stored proc for each possible combination of parameters. For 5 columns that would be 120 combinations if I did the math right.
Aunt Kathi Data Platform MVP
Author of Expert T-SQL Window Functions
Simple-Talk Editor
April 29, 2005 at 3:08 am
In effect your query is one of the type "give me every record or give me some records". However the Plan must deal with both situations....and a "give me every record" query invokes a SCAN....be that INDEX scan or TABLE SCAN.
Your 2 examples are not equivalent....and thus the results are not equivalent either.
Reexamining the need to have such flexable code may be the route to go.
April 29, 2005 at 7:44 am
Break the SQL statement into two statements with a UNION operator. This should allow for an index seek on ends of the sql statement.
April 29, 2005 at 8:08 am
Unless I misunderstand your code, @x is a value determined *before* the select and presumably does not change during the process!!!!, so it would make much more sense to evaluate @x in logic, and conditionally call the select in an 'if' statement.
...
-- FORTRAN manual for Xerox Computers --
April 30, 2005 at 5:29 am
its a bad behaviour! scanning the table searching nothing! do you agree?
April 30, 2005 at 5:34 am
Two examples are 100% equivilant in case you send value other than -1. see in my example i am sending 2 as a values and the returned dataset is only 1 row! scan should not occure.
April 30, 2005 at 5:37 am
i agree with you. but what if you have a query where you have 15 filtering condtions! you will have too many if conds in your code!
if you have condtions: c1, c2, c3 then the if will look like:
if c1 is true and c2 is false and c3 is false
then ...
if c1 is true and c2 is true and c3 is false
then ...
if c1 is true and c2 is true and c3 is true
then ...
if c1 is false and c2 is true and c3 is false
then ...
etc ...
April 30, 2005 at 5:41 pm
SQL Server could perform the clustered index scan for the @x = -1 and then perform an index seek for C1 = @X, and then perform a union on the result sets for both queries. But it does not. It takes the most efficient route, of only performing a single clustered index scan, which can evaluate and satisfy both conditions.
You have to understand that SQL Server cannot ignore a clause just because you know it means nothing. Every clause is a filtering condition (good or bad) so it has to be evaluated.
If you want to eliminate the scans, remove the need to evaluate open conditions.
And to reiterate what the others have said, your queries are not equivalent. If you reread your queries, 1 has two filter conditions. The other has one filter condition. just because the result set is the same it doesn't mean the queries generating them are.
May 2, 2005 at 5:32 am
People!
SELECT SID
FROM Sales
WHERE (SID = 15) or (1=2)
Clustered Index Seek!
declare @x int
set @x = 1
SELECT SID
FROM Sales
WHERE (SID = 15) or (@x=2)
Clustered Index Scan!
Is that fear???
May 3, 2005 at 7:53 pm
rata,
no it is not fear, fear is a feeling of anxiety, apprehenion or agitation, usually due to the presence or imminence of danger or the imagined thought of.
There are several things that need to be understood to resolve your problem.
1st, the Query Optimizer produces the Execution Plans
2nd, the Query Optimizer does not resolve @variables, the Relational Engine does, but not until after the Query Optimizer has been run (the Optimizer is but one step inside the Relational Engine).
3rd, the Query Optimizer determines the BEST execution plan for ALL possible combinations of execution
4th, the Query Optimizer runs PRIOR to the execution of the statements.
EDIT: Changed 4th item reference from the "Relational Engine" to "execution of the statements". Added bracketed stuff to 2nd item.
With your query examples, this means that when the Query Optimizer is determining the best plan, with the C1 = @X, C1 is a known value, @X is an unknown value (remembering that the Optimizer doesn't know what the value of @X will be but it does know the possible range of values due to the type), and can only be one instance of C1 (due to the =), thus an Index Seek is the best plan. With C1 = @X OR @X = -1 there are several possible plans, but the most efficient is a Clustered Index Scan, why?, because @X is an unknown value. @X is an integer so falls somewhere in the range of -2,147,483,648 through 2,147,483,647, the Optimizer knows this, and can also see that the clause COULD return all rows from the table IF (@X = -1) = True. If you write the query
SELECT C1, C2, C3, C4
FROM Table1
WHERE @X = -1
you will get an execution plan of a Clustered Index Scan, due to the fact that the query will either return all rows, or no rows. Combined this with C1 = @X by OR and the best plan to support both filter requirements is a Clustered Index Scan. Simply put, the Optimizer will return a Clustered Index Scan as the best choice, because the Optimizer does not know the value of @X
With your second set of examples
SID = 15 OR 1 = 2
If you read the execution plan that is used for this query, you would see that the Optimizer has evaluated the literal 1=2 and determined that is has no impact (result is false and OR false means the other side of the OR determines result), and as such is DROPPED from the Query and Filter, thus you get a Clustered Index Seek. If you try running this with SID = 2 OR 2 = 2, then you will get a Clustered Index Scan, because 2=2 is true and anything OR true, will return all possible results.
The second item is a Clustered Index Scan purely for the reasons outlined above.
May 3, 2005 at 9:30 pm
Where the H3ll did you learn that?
May 3, 2005 at 9:37 pm
fear? looked it up in the dictionary.
seriously, its how a static compiler works.
May 4, 2005 at 6:16 am
Any book I can get to learn more about that?
May 4, 2005 at 9:21 am
Dear Grasshopper,
Appreciating all what you mentioned, it’s all correct, BUT...
My friend: The part “OR @x = -1” is being evaluated for each row in the table! The optimizer should be smart enough to evaluate this part one time only as it will never change between a row and another... this is simply my point.
if you take the part “Table1.Col1 = 5” for example, the result of this condition differs from a row to another,, the rows which have Col1 = 5 will affect the condition to be TRUE, while other values will affect the condition to be FALSE, so for each row in he table the condition should be evaluated or in other words:
This condition is DEPENDENT ,, it depends on each row in the table
This is not valid for the my condition @x = -1 it will be either true or false for all rows.
This condition is INDEPENDENT ,, it does not depend on any row in the table.
I swear I am saying the truth, regardless how Microsoft implemented the optimizer, but it should be smart enough to evaluate the condition for each row or evaluate it one time only.
Regards,
Viewing 15 posts - 1 through 15 (of 19 total)
You must be logged in to reply to this topic. Login to reply