June 28, 2011 at 2:48 am
Hi all,
I'm having a debate with a colleague regarding the performance of greater than equal vs greater than query.
Of the follow queries which one would perform more efficiently / better for the optimizer?
declare @dateVar datetime
set @dateVar= 'to some date'
select*
fromtableA
wherecolA >= @dateVar
declare @dateVar datetime
set @dateVar= dateadd(d, -1, 'to some date')
select*
fromtableA
wherecolA > @dateVar
-- assume we don't care about the time component
I tested both these queries out and found no difference in performances - however my colleague is still convinced that in certain circumstances the >= operator will perform worse that the > operator - what are other your opinion on this?
Is this just merely down to peference?
June 28, 2011 at 4:04 am
i dont think it makes a difference, i have tested it and don't see anything different in performance.
would like to see evidence to the contary if there is any..
June 28, 2011 at 4:09 am
Another vote for no difference.
I can't think of a scenario where access paths would be different.
An internals expert such as Paul White would probably expand the answer with implementation details.
-- Gianluca Sartori
June 28, 2011 at 4:37 am
kwhytee (6/28/2011)
I tested both these queries out and found no difference in performance - however my colleague is still convinced that in certain circumstances the >= operator will perform worse that the > operator...
Your colleague is perhaps thinking that >= might require two comparisons (one for equality, and then if that fails, for greater-than). That's not the case - even at the microprocessor level, greater than and greater-than-or-equal are implemented as single operations with the same cycle cost. As far as SQL Server is concerned, the operations are also atomic. For a scan, you will see one of the following XML show plan elements:
<Compare CompareOp="GT">
<Compare CompareOp="GE">
For a seek, you will see one of these:
<StartRange ScanType="GE">
<StartRange ScanType="GT">
Where GT represents greater-than and GE represents greater-than-or-equal. There will be no advantage to rewriting GE as GT, and you might even introduce a bug. For example, the code example given would break if we are looking for DATETIME values greater than '1753-01-01'. There are also subtle problems around finding the 'next lowest' value for some types like REAL and FLOAT.
I encourage people to write SQL in the most natural form possible, and only tweak it to work around specific (and measurable!) issues. If you are looking for a value greater than or equal to 'x', use >=.
June 28, 2011 at 5:54 am
One more vote 🙂 !
Logically & practically there won't be any performance impact as such.
The only differenece between using >= and > i can see here is one result will retrive the values which are greater than equal to specific datetime and another will retrive only greater than specific datetime
Abhijit - http://abhijitmore.wordpress.com
June 28, 2011 at 6:48 am
I found the below article on the usage of greater than vs greater than or equal to
http://www.lcard.ru/~nail/sybase/perf/18295.htm
The logic seems sound but i doubt if i will consider it when I write my queries
June 28, 2011 at 6:49 am
Thanks for clearing that up chaps.
June 28, 2011 at 7:10 am
Jayanth_Kurup (6/28/2011)
I found the below article on the usage of greater than vs greater than or equal tohttp://www.lcard.ru/~nail/sybase/perf/18295.htm
The logic seems sound but i doubt if i will consider it when I write my queries
there is something about the way that article is written that does not really fill me with confidence..
June 28, 2011 at 2:41 pm
steveb. (6/28/2011)
Jayanth_Kurup (6/28/2011)
I found the below article on the usage of greater than vs greater than or equal tohttp://www.lcard.ru/~nail/sybase/perf/18295.htm
The logic seems sound but i doubt if i will consider it when I write my queries
there is something about the way that article is written that does not really fill me with confidence..
It could be the fact that the article is about Sybase SQL Server 11.0 and not Microsoft SQL Server 😛
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
June 28, 2011 at 3:18 pm
That and the total lack of proof. The author just says do this instead of that because it is faster but presents no facts to back up the claims. Jeff Moden would roast somebody who posted that kind of stuff as fact with no actual stats as proof.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
June 28, 2011 at 3:38 pm
Sean Lange (6/28/2011)
That and the total lack of proof. The author just says do this instead of that because it is faster but presents no facts to back up the claims. Jeff Moden would roast somebody who posted that kind of stuff as fact with no actual stats as proof.
yeah pretty much this..
June 28, 2011 at 3:54 pm
Like it matters...its referencing query optimizations for Sybase 11.0 (the current version is 15.5) and the article was written around the time when MSSQL 6.5 was coming out...years after the Sybase and MSSQL codebases officially diverged.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
June 28, 2011 at 9:04 pm
Jayanth_Kurup (6/28/2011)
I found the below article on the usage of greater than vs greater than or equal tohttp://www.lcard.ru/~nail/sybase/perf/18295.htm
The logic seems sound but i doubt if i will consider it when I write my queries
The section on "Greater Than" Queries:
This query, with an index on int_col:
select * from table where int_col > 3
uses the index to find the first value where int_col equals 3, and then scans forward to find the first value greater than 3. If there are many rows where int_col equals 3, the server has to scan many pages to find the first row where int_col is greater than 3.
It is probably much more efficient to write this query like this:
select * from table where int_col >= 4
This optimization is easier with integers, but more difficult with character strings and floating-point data. You need to know your data.
I doubt this was ever true, frankly, but it certainly isn't today:
SELECT TOP (10000000)
id = ISNULL(CONVERT(INTEGER, 3), 0),
padding = SPACE(100)
INTO #n
FROM
master.sys.columns AS c,
master.sys.columns AS c2,
master.sys.columns AS c3;
INSERT #n (id) VALUES (CONVERT(INTEGER, 4));
CREATE INDEX nc1 ON #n (id) INCLUDE (padding);
SET STATISTICS IO, TIME ON;
SELECT * FROM #n AS n WHERE id > 3;
SELECT * FROM #n AS n WHERE id >= 4;
SET STATISTICS IO, TIME OFF;
DROP TABLE #n;
Both forms of the query complete immeasurably quickly, in 4 logical reads, using an index seek. If you choose to run that code yourself, be warned it might take several minutes to create the 10 million row table!
June 29, 2011 at 4:29 am
(I'm kwhytee's colleague.)
I'm pretty certain I've seen significant performance improvements by replacing ">= x" with "> x-1ms" when nothing else has been changed, in certain cases but not always. We haven't done rigorous tests but it has made several complex reports feasible that simply wouldn't run otherwise. One of the conditions that improved was (don't ask about the business logic - a pretty unique FY change requirement):
dbo.Trade.TradeDate > (CASE WHEN MONTH(GETDATE()) < 4 THEN DATEADD(year, YEAR(GETDATE()) - 2002, (CAST('2000-03-31' AS DATETIME))) ELSE DATEADD(year, YEAR(GETDATE()) - 2001, (CAST('2000-03-31' AS DATETIME))) END) AND dbo.Trade.TradeDate < (CASE WHEN MONTH(GETDATE()) < 4 THEN DATEADD(year, YEAR(GETDATE()) - 2001, (CAST('2000-04-01' AS DATETIME))) ELSE DATEADD(year, YEAR(GETDATE()) - 2000, (CAST('2000-04-01' AS DATETIME))) END)
(TradeDate is not indexed.)
So I think it could be an anomoly of the optimization, given I can't repeat the performance improvement in simpler conditions.
In parallel I have also found that these sort of calculations on getdate() are treated as deterministic when wrapped in a user-defined function, (UDF), (so calculated once for the query), but non-deterministic when explicitly used as above, (so calculated once per row), (in SQL Server 2005+ but perhaps not SQL Server 2000). This is very much work-in-progress and I'm not sure whether it is related to how ">=" is optimized, but I have sped up a number of reports by a factor exceeding 100 by wrapping getdate() in a simple UDF. (I say exceeding 100 because we never waited until the former query finished as it took over 2 hours on a few million rows.)
Sorry to muddy the picture even more, hope this triggers a thought in someone more knowledgable than me.
Thanks enormously for your thoughts.
PS. SQLkiwi's point that the comparison is atomic is interesting, I thought perhaps a byte-by-byte ">" could give up before the last byte more often than a ">=" could. But even if the server is 32 bit, meaning two machine instructions per datetime, both variants need to check all 64 bits for every datetime where the date matches the constant, and only 32 bits otherwise. (A datetime is made of a 16-bit int for the date, then a 16 bit int for the time.)
PPS. Re the article, I don't think it holds water either. If you had a ordered index and you were looking for value 5 in a list ranging from 1 to 10, you wouldn't start at item 1, you'd start in the middle, and if you didn't have an ordered index you'd have to compare every value anyway. Besides, it can't be that simple as if it were, surely the optimizer would internally convert the comparison.
June 29, 2011 at 4:42 am
Qualification: Asked another colleague with better memory than me of when we first found this, and we found that the execution plan produced when the >= was replaced with > differed by minutes (on a different query to that above).
Viewing 15 posts - 1 through 15 (of 19 total)
You must be logged in to reply to this topic. Login to reply