﻿<?xml version='1.0' encoding='UTF-8'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>SQLServerCentral / Article Discussions / Article Discussions by Author / Discuss content posted by Terry Steadman  / A Faster BETWEEN Dates / Latest Posts</title><generator>InstantForum.NET v2.9.0</generator><description>SQLServerCentral</description><link>http://www.sqlservercentral.com/Forums/</link><webMaster>notifications@sqlservercentral.com</webMaster><lastBuildDate>Tue, 21 May 2013 10:28:44 GMT</lastBuildDate><ttl>20</ttl><item><title>RE: A Faster BETWEEN Dates</title><link>http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx</link><description>Wouldn't you be better off using temporary tables rather than table variables, i was under the impression I should only use table variables for very small (500 records) amounts of data?Sorry I have read more replies and see that table variables were being used as examples and other replies have covered the table variables angleChris</description><pubDate>Fri, 27 May 2011 02:58:40 GMT</pubDate><dc:creator>Chanson 54862</dc:creator></item><item><title>RE: A Faster BETWEEN Dates</title><link>http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx</link><description>I just tried this technique on a moderate table size - BETWEEN was around twice as fast. I have a feeling that the improvement of BETWEEN would scale as the data does (indexing and statistics come in to play as well). I also concur with Hugo's statements (imagine that Hugo, we agree!). </description><pubDate>Tue, 09 Nov 2010 08:35:27 GMT</pubDate><dc:creator>james_luetkehoelter</dc:creator></item><item><title>RE: A Faster BETWEEN Dates</title><link>http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx</link><description>[quote][b]sql.monkey (11/4/2010)[/b][hr]I have a table with an unindexed date column in a table of billions of rowsI collect the lowest and highest primary keys between those dates into variables I set two variablesdeclare @min int declare @max intselect @min = min(primarykey)  where datecol =&amp;gt; 'begindate'select @max= max(primarykey)  where datecol &amp;lt;= 'enddate'select primarykey, datecol, x,y,z from table where primarykey between @min and @maxworks for me[/quote]You'll get a syntax error - no FROM clause in the two queries that set @min and @max.After fixing that, if the datecol column is indeed unindexed, you get two complete table scans for setting the @min and @max variables. You can reduce that to one scan by usingSELECT @min = min(primarykey), @max=max(primarykey)FROM tableWHERE datecol BETWEEN @begindate AND @enddate;But it's still a scan. The same scan you would get if you throws away all the unnecessary logic and useSELECT primarykey, datecol, x,y,zFROM tableWHERE datecol BETWEEN @begindate AND @enddate;If you check the execution plan for your query, you will probably find that is uses an index on the datecol column that you had forgotten existed.And the objection posted by GPO is valid as well - this (useless) technique only gives the correct results if ascending key order and ascending datecol order match up completely. Which is probably only the case if one column is an IDENTITY and the other has a DEFAULT(CURRENT_TIMESTAMP) and is never ever manually changed.</description><pubDate>Fri, 05 Nov 2010 02:38:45 GMT</pubDate><dc:creator>Hugo Kornelis</dc:creator></item><item><title>RE: A Faster BETWEEN Dates</title><link>http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx</link><description>[quote][b]Michael Ebaya (11/3/2010)[/b][hr][quote][b]happycat59 (11/3/2010)[/b][hr]Not only is the original article of interest (and it is great to have someone prepared to write about their findings...thanks Terry)[/quote]Does no one actually care the entire article is wrong, top to bottom?[/quote]The reason for my interest is not just the original post.  The discussion that it has generated really does show how much interest there is in this topic.  Yes, there have been concerns expressed about whether the OP's original solution is equivalent to the original code.  The fact that there are so many replies that have corrected the error or, at least, pointed it out means that I am not concerned.  It has made people think and that is more important than anything else</description><pubDate>Fri, 05 Nov 2010 02:09:20 GMT</pubDate><dc:creator>happycat59</dc:creator></item><item><title>RE: A Faster BETWEEN Dates</title><link>http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx</link><description>@sql.monkeyI'm probably missing something here - wouldn't be the first time, but what do you do when your PK and your date column are in a different order? There's no reason why they should be in the same order is there?PK   datecol1    200707032    200707023    200707034    200707095    200707106    200707067    200707068    20070705select @min = min(primarykey) where datecol &amp;gt;= '20070702'--@min = 2select @max= max(primarykey) where datecol &amp;lt;= '20070705'--@max = 8select primarykey, datecol from table where primarykey between @min and @maxSo it looks like you end up missing records you do want and including records that you don't want.CheersGPO</description><pubDate>Fri, 05 Nov 2010 00:47:12 GMT</pubDate><dc:creator>GPO</dc:creator></item><item><title>RE: A Faster BETWEEN Dates</title><link>http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx</link><description>[quote][b]sql.monkey (11/4/2010)[/b][hr]It works, and yes there are tens of billions of rows.[/quote]Sorry, I don't buy ocean-front property in Kansas.  If you want to convince us your server manages to violate the laws of the universe, however, post the output of a SHOWPLAN against a test table that size.</description><pubDate>Thu, 04 Nov 2010 20:12:39 GMT</pubDate><dc:creator>Michael Ebaya</dc:creator></item><item><title>RE: A Faster BETWEEN Dates</title><link>http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx</link><description>[quote][b]Michael Ebaya (11/4/2010)[/b][hr][quote][b]sql.monkey (11/4/2010)[/b][hr]The point was to use the index on the primary key column, which is usually a clustered index, and actually the tables I work with have tens of billions of rows.[/quote]What you don't understand is that to find the min and max key values to use, you have to table scan the date column.  You think aggregate functions like min() or max() come for free?I'll say it again.  If you're getting good performance on that script, then either you have an index on the date column you don't know about, your tables are 1/1000 the size you say they are, or you simply made up the entire thing.  There is no magic fairy dust that lets you quickly range scan a few billion values without an index.[/quote]It works, and yes there are tens of billions of rows.</description><pubDate>Thu, 04 Nov 2010 20:01:06 GMT</pubDate><dc:creator>ohmygoshitsbig</dc:creator></item><item><title>RE: A Faster BETWEEN Dates</title><link>http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx</link><description>[quote][b]sql.monkey (11/4/2010)[/b][hr]The point was to use the index on the primary key column, which is usually a clustered index, and actually the tables I work with have tens of billions of rows.[/quote]What you don't understand is that to find the min and max key values to use, you have to table scan the date column.  You think aggregate functions like min() or max() come for free?I'll say it again.  If you're getting good performance on that script, then either you have an index on the date column you don't know about, your tables are 1/1000 the size you say they are, or you simply made up the entire thing.  There is no magic fairy dust that lets you quickly range scan a few billion values without an index.</description><pubDate>Thu, 04 Nov 2010 19:55:14 GMT</pubDate><dc:creator>Michael Ebaya</dc:creator></item><item><title>RE: A Faster BETWEEN Dates</title><link>http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx</link><description>[quote][b]Michael Ebaya (11/4/2010)[/b][hr][quote][b]sql.monkey (11/4/2010)[/b][hr]select @min = min(primarykey)  where datecol =&amp;gt; 'begindate'select @max= max(primarykey)  where datecol &amp;lt;= 'enddate'select primarykey, datecol, x,y,z from table where primarykey between @min and @maxworks for me[/quote]Yes, because you've missed the entire point, and your query is nothing close to what we're even discussing here.  You're finding ONE date column between TWO static values.  A B-tree index works fine for that.  Finding one STATIC value between two date columns is an entirely different problem.Further, if your column isn't indexed and you have "billions of rows", you're going to be table scanning, which means performance is not going to be acceptable.  Either you have an index on the column that you don't know about, or the table is orders or magnitude smaller than "billions", or the entire scenario was fabricated to make a nice-sounding post.[/quote]The point was to use the index on the primary key column, which is usually a clustered index, and actually the tables I work with have tens of billions of rows.I may have missed the point about the original discussion youre right.</description><pubDate>Thu, 04 Nov 2010 19:19:38 GMT</pubDate><dc:creator>ohmygoshitsbig</dc:creator></item><item><title>RE: A Faster BETWEEN Dates</title><link>http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx</link><description>[quote][b]sql.monkey (11/4/2010)[/b][hr]select @min = min(primarykey)  where datecol =&amp;gt; 'begindate'select @max= max(primarykey)  where datecol &amp;lt;= 'enddate'select primarykey, datecol, x,y,z from table where primarykey between @min and @maxworks for me[/quote]Yes, because you've missed the entire point, and your query is nothing close to what we're even discussing here.  You're finding ONE date column between TWO static values.  A B-tree index works fine for that.  Finding one STATIC value between two date columns is an entirely different problem.Further, if your column isn't indexed and you have "billions of rows", you're going to be table scanning, which means performance is not going to be acceptable.  Either you have an index on the column that you don't know about, or the table is orders or magnitude smaller than "billions", or the entire scenario was fabricated to make a nice-sounding post.</description><pubDate>Thu, 04 Nov 2010 19:14:12 GMT</pubDate><dc:creator>Michael Ebaya</dc:creator></item><item><title>RE: A Faster BETWEEN Dates</title><link>http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx</link><description>[quote][b]TheSQLGuru (11/1/2010)[/b][hr]2) Personally I like dynamic SQL (when available) for widely varying date inputs because the optimizer always gets the best possible chance to create the optimum plan for each execution.  This is even better in databases if you have the ad hoc optimization enabled.[/quote]The query optimizer also gets the opportunity to dynamically screw up with a wacky execution plan at an inopportune time when youre asleep dreaming of wacky ways of writing wacky code and processing has stopped and your business has stopped functioning and the poor old production dba is woken from his sleep again, and to trawl through your wacky code and fix it, again. The way you write code today may take your business down tomorrow, because its not usually possible to test for all scenarios. Datasets grow and execution plans change so keeps things simple and predictable I say. Dont try to be too clever.</description><pubDate>Thu, 04 Nov 2010 18:48:28 GMT</pubDate><dc:creator>ohmygoshitsbig</dc:creator></item><item><title>RE: A Faster BETWEEN Dates</title><link>http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx</link><description>I have a table with an unindexed date column in a table of billions of rowsI collect the lowest and highest primary keys between those dates into variables I set two variablesdeclare @min int declare @max intselect @min = min(primarykey)  where datecol =&amp;gt; 'begindate'select @max= max(primarykey)  where datecol &amp;lt;= 'enddate'select primarykey, datecol, x,y,z from table where primarykey between @min and @maxworks for me</description><pubDate>Thu, 04 Nov 2010 18:33:22 GMT</pubDate><dc:creator>ohmygoshitsbig</dc:creator></item><item><title>RE: A Faster BETWEEN Dates</title><link>http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx</link><description>[quote][b]Michael Ebaya (11/3/2010)[/b][hr][quote][b]happycat59 (11/3/2010)[/b][hr]Not only is the original article of interest (and it is great to have someone prepared to write about their findings...thanks Terry)[/quote]Does no one actually care the entire article is wrong, top to bottom?[/quote]The speed increase is probably caused by the inadvertent change of the meaning of the query, that has been shown painfully clearly. Other discussion that has ensued, taught me a trick or two. So thanks Terry!But the core of the article, the technique of using CASE expressions to tame NULL values is sometimes very handy, especially when [i]either[/i] side of the comparison may be NULL.The CASE expression makes the intent crystal clear to first-time reader of the code, or the writer after a few years--&amp;gt;weeks. Changing the polarity of the comparison and switching the "Boolean" values 0 and 1 makes it easy to change which way NULLs lean.The major caveat in the discussion, placing the CASE expressions in OUTER JOINs, not WHERE clauses where they easily bite with their INNER nature, was spot-on. My own bite-marks? Much better now, thank you.</description><pubDate>Thu, 04 Nov 2010 08:59:42 GMT</pubDate><dc:creator>Arto Ahlstedt</dc:creator></item><item><title>RE: A Faster BETWEEN Dates</title><link>http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx</link><description>[quote][b]happycat59 (11/3/2010)[/b][hr]Not only is the original article of interest (and it is great to have someone prepared to write about their findings...thanks Terry)[/quote]Does no one actually care the entire article is wrong, top to bottom?</description><pubDate>Wed, 03 Nov 2010 19:37:12 GMT</pubDate><dc:creator>Michael Ebaya</dc:creator></item><item><title>RE: A Faster BETWEEN Dates</title><link>http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx</link><description>Thank you to all of you who have contributed to this discussion.  I am kinda glad that I was too busy to read the article and the discussions until now.Not only is the original article of interest (and it is great to have someone prepared to write about their findings...thanks Terry) but the discussions show that this topic is of great interest to many of us, myself included.</description><pubDate>Wed, 03 Nov 2010 19:27:35 GMT</pubDate><dc:creator>happycat59</dc:creator></item><item><title>RE: A Faster BETWEEN Dates</title><link>http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx</link><description>Interesting approach and thanks for sharing.</description><pubDate>Tue, 02 Nov 2010 14:32:25 GMT</pubDate><dc:creator>SQLRNNR</dc:creator></item><item><title>RE: A Faster BETWEEN Dates</title><link>http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx</link><description>[quote][b]Adam Machanic (11/2/2010)[/b][hr][quote][b]david.avraamides (11/1/2010)[/b][hr]I would consider using a sentinel date such as the maximum T-SQL datetime (9999-12-31) for two reasons: 1) your queries become simpler since you don't always need that "or end_date is null"....[/quote]Sometimes.  Sometimes they become more complex, since you need to add an "...or date is not (sentinel)" to them.  The overarching concern in data modelling should be correctness of expression, not how to simplify a query.[quote]How should you index for this query? There are basically two options: (startDate, endDate), and (endDate, startDate). [/quote]Without a two-dimensional index such as a Quadtree, you can't do fully-indexed 2D range searches.  However, there are a few tricks you can do to get close, by exploiting information the QO doesn't know about your data.  I should probably write an article about it, as its a bit more complex than one can easily cover in a post.Also, you *can* implement a quadtree index in Sql Server, with a little work.</description><pubDate>Tue, 02 Nov 2010 08:29:09 GMT</pubDate><dc:creator>Michael Ebaya</dc:creator></item><item><title>RE: A Faster BETWEEN Dates</title><link>http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx</link><description>[quote][b]david.avraamides (11/1/2010)[/b][hr]I would consider using a sentinel date such as the maximum T-SQL datetime (9999-12-31) for two reasons: 1) your queries become simpler since you don't always need that "or end_date is null" part of the clause, and 2) you can create an index on end_date since it is not nullable.[/quote]Agreed, 9999-12-31 or a similar future day is preferable to a NULL end date. The queries are much simpler to write that way. However, you can create an index either way; NULLs are perfectly acceptable in all non-primary key indexes (with some restrictions on unique keys).The problem with start date/end date schemes is that creating a perfect index is nearly impossible with or without the NULLs. Let's say that you want all rows that were "active" as of 2010-11-02:SELECT *FROM YourTblWHERE '2010-11-02' BETWEEN startDate AND endDateHow should you index for this query? There are basically two options: (startDate, endDate), and (endDate, startDate). Given the first index, the QP has to find the first row where startDate &amp;lt;= '2010-11-02', then it must do a scan from there to evaluate all of the endDate values. The reverse is true if you have endDate first.So which index is better? The best index for a given query is generally ordered by the most selective columns. So if you have fewer rows that start before 2010-11-02 than those that end after 2010-11-02, startDate should be first. Otherwise, endDate should be first. But what if you have an equal number of rows that meet each criteria? And what if there are 50,000,000,000 rows that start before 2010-11-02 but end earlier, and 50,000,000,000 rows that both start and end after 2010-11-02? Now you're stuck scanning 50,000,000,000 rows either way, even if there is only a single row that's actually valid for that day.Solution? I haven't figured it out yet. Some other DBMS products have multidimensional index types (R and Quad trees) and I suspect that these indexes can be exploited to solve these problems, but for now, in SQL Server, we're in a bit of a bind... Unless someone out there has a solution? Speak up if so :-)</description><pubDate>Tue, 02 Nov 2010 08:05:36 GMT</pubDate><dc:creator>Adam Machanic</dc:creator></item><item><title>RE: A Faster BETWEEN Dates</title><link>http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx</link><description>Which is more efficient, individual indexes on Proj_Start and Proj_End or a composite index on  Proj_Start and Proj_End?</description><pubDate>Tue, 02 Nov 2010 07:54:28 GMT</pubDate><dc:creator>cjones-956169</dc:creator></item><item><title>RE: A Faster BETWEEN Dates</title><link>http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx</link><description>[quote][b]Paul Muharsky-474732 (11/1/2010)[/b][hr]Absolutely would put a rediculous enddate in place.  NULL values are stored as a bitmap at the leaf level of on index (clustered or otherwise), but are not used in the B-Tree portion of an index.  In some cases you will get seek-like behavior out of NULLS, because the optimizer can basically do an inverse search.  Dates are a fixed length fields, so NULL or not, you are still taking the same amount of physical space.  By specifying a rediculous date as the End_Date_Time, your query eliminates the NULL check, and can always rely on the B-Tree portion of an index that uses End_Date_Time in the key and perform a SEEK.[/quote]Several misconceptions.a) [b]All[/b] data is stored in the leaf level of an index.  It's not just NULLs.  b) If by "the B-tree part of the index", you mean non-leaf pages, then yes, NULL values are not stored there.   But why would you want them there?  Intermediate levels are to allow quick location of a specific value ... but we [i]already know[/i] where the NULLS in the index are.  c) The belief that NULLs always hurt performance is wrong.  It depends on several factors, including cardinality/null ratio of the column, the query structure, etc.    Many times they can increase performance.   Unless you know the specific cases where eliminating NULLS can give you a large performance boost  (such as left-join elimination), it's better to use them where they appropriately model your data.d) The belief that NULLs always complicate queries is wrong.  Since they match nothing, they can simplify many queries. For instance, this query as to which events have not yet ended:WHERE EndDate &amp;gt; GETDATE() OR EndDate IS NULL Would indeed be simplified by a sentinel value instead of a NULL.  But using a sentinel means this query to retrieve all events that end after next year now becomes more complex:WHERE EndDate &amp;gt; '20110101' AND EndDate &amp;lt;&amp;gt; @SentinelDateWhen using aggregates in particular, sentinel values can be a real nightmare.</description><pubDate>Mon, 01 Nov 2010 20:30:17 GMT</pubDate><dc:creator>Michael Ebaya</dc:creator></item><item><title>RE: A Faster BETWEEN Dates</title><link>http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx</link><description>[quote][b]ebaya (11/1/2010)[/b][hr][quote][b]david.avraamides (11/1/2010)[/b][hr][quote]I would consider using a sentinel date such as the maximum T-SQL datetime (9999-12-31) for two reasons: 1) your queries become simpler since you don't always need that "or end_date is null" part of the clause, and 2) you can create an index on end_date since it is not nullable.[/quote]Eh?  You can index on a nullable column.[/quote]You can, but NULLS are stored as a bimap at the leaf level, and are not part of the organization of the B-Tree itself, so tend not to optimize correctly.</description><pubDate>Mon, 01 Nov 2010 16:45:10 GMT</pubDate><dc:creator>Paul Muharsky-474732</dc:creator></item><item><title>RE: A Faster BETWEEN Dates</title><link>http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx</link><description>[quote][b]ebaya (11/1/2010)[/b][hr][quote][b]jpenniman (11/1/2010)[/b][hr]Did you clear the buffers between tests?  Using: dbcc dropcleanbuffers.  If not, it would explain your "performance gain"[/quote]His illusory performance gain is explained entirely by the fact that it's an entirely [i]different[/i]query, that will return different results.  You don't need to invoke any buffering or anything else to explain it.[quote]...as the result is already in memory.  Your "faster" query would perform a table scan every time in real life, as your condition is an expression... and a case statement at that.[/quote]Eh?  You can buffer an expression as you can anything else.   You simply can't index off it (in technical terms, its not SARGable).[/quote]Not sure what I was thinking there... agreed the plan wouldn't be in memory.  Still in shock I guess.Agreed it's not SARGable... that's why I noted the scan.</description><pubDate>Mon, 01 Nov 2010 16:42:15 GMT</pubDate><dc:creator>jpenniman</dc:creator></item><item><title>RE: A Faster BETWEEN Dates</title><link>http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx</link><description>[quote]similarly to find all events that overlapped with a period of time, you might go:select blah from events ev where ev.start_date_time &amp;lt;= '20100701'   and (ev.end_date_time &amp;gt; '20100101' or ev.end_date_time is null)If it was your job to design these tables, would you be tempted to have an end_date_time column that could not be null, and instead substitute a ridiculous future date for all events that have not yet ended (like 22000101)? Are there indexing or other performance advantages to making your end_date_time column not null? [/quote]Absolutely would put a rediculous enddate in place.  NULL values are stored as a bitmap at the leaf level of on index (clustered or otherwise), but are not used in the B-Tree portion of an index.  In some cases you will get seek-like behavior out of NULLS, because the optimizer can basically do an inverse search.  Dates are a fixed length fields, so NULL or not, you are still taking the same amount of physical space.  By specifying a rediculous date as the End_Date_Time, your query eliminates the NULL check, and can always rely on the B-Tree portion of an index that uses End_Date_Time in the key and perform a SEEK.</description><pubDate>Mon, 01 Nov 2010 16:40:39 GMT</pubDate><dc:creator>Paul Muharsky-474732</dc:creator></item><item><title>RE: A Faster BETWEEN Dates</title><link>http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx</link><description>Although I like the idea of being creative on 'older' hardware (my (far from new) portable runs the test queries in sub-second time frames), I too think there are a bit of issues with  the article's content :* as pointed out already, the LEFT OUTER JOIN has been 'transformed' into a 'disguised' INNER JOIN* the use of table-variables for performance testing is a bit of a no-no in my book, they suffer from strangenessitis too often. Additionally, you can't quite index them the way you want and they don't have statistics  (that I know of)* there are no indexes on the@ProjectList table, nor on the @WorkTable ... probably worth a try (using temp-tables; sequence : put data in first, then add the indexes)* @CalTable does have indexes, but for reasons I don't get the CLUSTERED PRIMARY KEY gets another UNIQUE INDEX on the same field. (**)I would suggest to look into indexing the working tables first, if that's not already the case, and stay clear of table-variables for this kind of tuning. In my personal opinion table variables are useful for VERY small datasets, for recursive stuff (no clashes when dealing with "shared" temp-tables) and when you want to work around the Transaction model. Otherwise I'm quite happy with 'ordinary' temp-tables as -in my experience (***)- they seem to result in much better query plans. That said, YMMV off course.. if this change seems to get you better performance then it might be the way forward on your particular system... but frankly I think you'll be back in the near future once the unexplainable benefit wears off...That said, although I do not quite agree with the contents of the article, I do applaud the way it was written : nicely structured, clear language and clean example code ! Please do not be put-off by the fact people have doubts about your findings ... finding the reason behind your observations might show you're in some kind of borderline situation; but as there is none given it's hard to accept something /counter-logic/ that turns out to be false when tested by other people.**: query plan indicates that the optimizer takes the clustered index in both situations.***: I tend to work on largish data-sets most of the time; for small things, table-variables might work as light-weight temptables and do show some performance benefit.. then again, if it's small it should be fast by definition =)</description><pubDate>Mon, 01 Nov 2010 14:43:19 GMT</pubDate><dc:creator>deroby</dc:creator></item><item><title>RE: A Faster BETWEEN Dates</title><link>http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx</link><description>[quote][b]jpenniman (11/1/2010)[/b][hr]Did you clear the buffers between tests?  Using: dbcc dropcleanbuffers.  If not, it would explain your "performance gain"[/quote]His illusory performance gain is explained entirely by the fact that it's an entirely [i]different[/i]query, that will return different results.  You don't need to invoke any buffering or anything else to explain it.[quote]...as the result is already in memory.  Your "faster" query would perform a table scan every time in real life, as your condition is an expression... and a case statement at that.[/quote]Eh?  You can buffer an expression as you can anything else.   You simply can't index off it (in technical terms, its not SARGable).</description><pubDate>Mon, 01 Nov 2010 14:34:51 GMT</pubDate><dc:creator>Michael Ebaya</dc:creator></item><item><title>RE: A Faster BETWEEN Dates</title><link>http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx</link><description>I would agree with earlier comments that if your table is indexed properly dates shouldn't be an issue. Our main order table has many tens of millions of rows and the Order Date field is one of the indexes which we use in many reports for BETWEEN checks - and speed is never an issue.A novel approach perhaps, but I would prefer to bug a standard BETWEEN statement to something more unusual.</description><pubDate>Mon, 01 Nov 2010 13:42:42 GMT</pubDate><dc:creator>jts_2003</dc:creator></item><item><title>RE: A Faster BETWEEN Dates</title><link>http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx</link><description>Did you clear the buffers between tests?  Using: dbcc dropcleanbuffersIf not, it would explain your "performance gain", as the result is already in memory.  Your "faster" query would perform a table scan every time in real life, as your condition is an expression... and a case statement at that.I agree with the many other replies; with proper indexing, between/and or a simple &amp;gt;=/&amp;lt;= to test the range is extremely performant.  We have a table with 80,000,000 rows and growing that is always queried by a date range, and the queries execute in 1-3ms.  If partitioned, it's less than 1ms.</description><pubDate>Mon, 01 Nov 2010 13:31:27 GMT</pubDate><dc:creator>jpenniman</dc:creator></item><item><title>RE: A Faster BETWEEN Dates</title><link>http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx</link><description>[quote][b]ebaya (11/1/2010)[/b][hr][quote][b]UMG Developer (11/1/2010)[/b][hr]You might as well make the query an INNER join so that it is more readable, you get the same results:[/quote]Actually, that doesn't give the same results either.  He's split the join predicate into two halves; one of which uses an outer join, the other an inner.  Clearly not the results he wanted,  and one that invalidates the "optimization" itself.   Moving to an inner join is the entire reason the query runs faster; it has nothing at all to do with him splitting the BETWEEN into two checks (SQL Server's engine does that internally anyway).I'll also add that the boolean substitution he performs is entirely spurious, and further serves to cloud the issues here.[/quote]I was trying to say that the INNER JOIN version is the same as his second/alternate query and gives the same result as that. It certainly doesn't give the same results as the original query.</description><pubDate>Mon, 01 Nov 2010 12:36:09 GMT</pubDate><dc:creator>UMG Developer</dc:creator></item><item><title>RE: A Faster BETWEEN Dates</title><link>http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx</link><description>[quote][b]UMG Developer (11/1/2010)[/b][hr]You might as well make the query an INNER join so that it is more readable, you get the same results:[/quote]Actually, that doesn't give the same results either.  He's split the join predicate into two halves; one of which uses an outer join, the other an inner.  Clearly not the results he wanted,  and one that invalidates the "optimization" itself.   Moving to an inner join is the entire reason the query runs faster; it has nothing at all to do with him splitting the BETWEEN into two checks (SQL Server's engine does that internally anyway).I'll also add that the boolean substitution he performs is entirely spurious, and further serves to cloud the issues here.</description><pubDate>Mon, 01 Nov 2010 12:32:09 GMT</pubDate><dc:creator>Michael Ebaya</dc:creator></item><item><title>RE: A Faster BETWEEN Dates</title><link>http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx</link><description>I agree with the previous comments, your second query is no longer a real LEFT JOIN and won't always return the same results as the first query. Maybe in your case it does, but according to your comment of not leaving rows out, you might have a problem.You might as well make the query an INNER join so that it is more readable, you get the same results:[code="sql"]SELECT   *  FROM Test1   INNER JOIN Test2     ON Date &amp;gt;= startDate     AND CASE       WHEN Date &amp;lt;= endDate THEN 1       ELSE 0       END = 1[/code]If you really want the left join you could try this:[code="sql"]SELECT   *  FROM Test1   LEFT JOIN Test2     ON Date &amp;gt;= startDate     AND CASE       WHEN Date &amp;lt;= endDate THEN 1       ELSE 0       END = 1[/code]But I suspect it won't get the speed benefit you want.</description><pubDate>Mon, 01 Nov 2010 11:13:08 GMT</pubDate><dc:creator>UMG Developer</dc:creator></item><item><title>RE: A Faster BETWEEN Dates</title><link>http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx</link><description>I too had a process that took 16 minutes using the standard between.  Using your method resulted in 34 minutes (using 2008).[quote]As an example, I had a script that was taking over 5 minutes to process. Adding this one change to the script reduced its processing time down to 30 seconds. Definitely a happy increase in speed. When I had ran this test script 3 times, I had the process time of 5 seconds, 3 seconds, and 3 seconds respectively.[/quote]Thanks you for your alternative method, but I might suggest you display explain plans in the future to prove your finds.  Also, your quote above mentions going from 30 seconds, then to 5,3,3.  I'm sure you're aware that you likely have cached results there.</description><pubDate>Mon, 01 Nov 2010 11:09:10 GMT</pubDate><dc:creator>Jeff Graham-162174</dc:creator></item><item><title>RE: A Faster BETWEEN Dates</title><link>http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx</link><description>[quote][b]david.avraamides (11/1/2010)[/b][hr][quote]I would consider using a sentinel date such as the maximum T-SQL datetime (9999-12-31) for two reasons: 1) your queries become simpler since you don't always need that "or end_date is null" part of the clause, and 2) you can create an index on end_date since it is not nullable.[/quote]Eh?  You can index on a nullable column.</description><pubDate>Mon, 01 Nov 2010 09:24:00 GMT</pubDate><dc:creator>Michael Ebaya</dc:creator></item><item><title>RE: A Faster BETWEEN Dates</title><link>http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx</link><description>Terry,Here's a short, simple script that shows your "improved" query doesn't yield the same results as the original.    The first query returns 3 rows; the second only two, for the results I explained in my previous post.[code]CREATE TABLE Test1(	Date datetime)INSERT INTO Test1 SELECT '01/01/2010' UNION ALL SELECT '02/01/2010'UNION ALL SELECT '03/01/2010'CREATE TABLE Test2(	startDate datetime,	endDate datetime)INSERT INTO Test2 SELECT '01/01/2010', '02/01/2010'-- Query 1: Returns 3 rowsSELECT * FROM Test1 LEFT JOIN Test2 ON Date BETWEEN startDate and enddate-- Query 2: Returns only 2 rows: a portion of the join predicate is now an inner join, not a left joinSELECT * FROM Test1 LEFT JOIN Test2 ON Date &amp;gt;= startDateWHERE  (    CASE      WHEN Date &amp;lt;= endDate THEN 1      ELSE 0    END  ) = 1[/code]</description><pubDate>Mon, 01 Nov 2010 09:18:37 GMT</pubDate><dc:creator>Michael Ebaya</dc:creator></item><item><title>RE: A Faster BETWEEN Dates</title><link>http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx</link><description>[quote]If it was your job to design these tables, would you be tempted to have an end_date_time column that could not be null, and instead substitute a ridiculous future date for all events that have not yet ended (like 22000101)?[/quote]I would consider using a sentinel date such as the maximum T-SQL datetime (9999-12-31) for two reasons: 1) your queries become simpler since you don't always need that "or end_date is null" part of the clause, and 2) you can create an index on end_date since it is not nullable.</description><pubDate>Mon, 01 Nov 2010 09:14:00 GMT</pubDate><dc:creator>david.avraamides</dc:creator></item><item><title>RE: A Faster BETWEEN Dates</title><link>http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx</link><description>i know it's a big no-no, but i have a few tables for logging/admin duties and i just created the clustered index on the date column on those. most of the toughest queries on those tables select the last few days of data or data between 2 days. in the case where this is not the case, putting the clustered index on the date columns also causes other records that are joined between 2 tables to be on the same pages making the join faster as well.why not just cluster the date column instead of creating a date table? doesn't work in all cases like when you need to calculate specific weeks and days, but if you need to select a few days of data it's a pretty good solution</description><pubDate>Mon, 01 Nov 2010 09:10:46 GMT</pubDate><dc:creator>alen teplitsky</dc:creator></item><item><title>RE: A Faster BETWEEN Dates</title><link>http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx</link><description>Hi Terry,Thanks for the article!In a data warehousing environment with larger data I am always looking for ways to enhance the query performance. I have to put your approach into test and see how it works.Thanks again.</description><pubDate>Mon, 01 Nov 2010 09:05:58 GMT</pubDate><dc:creator>Mohammad Meimandi</dc:creator></item><item><title>RE: A Faster BETWEEN Dates</title><link>http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx</link><description>[quote][b]terrance.steadman (11/1/2010)[/b][hr]Ok, some of you are complaining about the use of the LEFT JOIN.  Please allow me to explain.  When I am running some reports on the data, I am requested to show which ones fall into certain date ranges and which ones fall into different date ranges.  The final set of records will end up having some flags on them for reporting purposes.  None of the records are to be excluded just because they don't fall within the date check.  So, I can't just throw them away in the script by doing a regular JOIN.[/quote]But you ARE doing a 'regular' join in your "new" query.   You're left joining on one end date, and inner-joining on the other.  That's the entire point.   The two queries don't yield identical results in all cases.   And you're surprised by their having differing run times?   The entire test (and the story resulting from it) is meaningless.Does anyone verify these stories before they're posted?</description><pubDate>Mon, 01 Nov 2010 09:04:58 GMT</pubDate><dc:creator>Michael Ebaya</dc:creator></item><item><title>RE: A Faster BETWEEN Dates</title><link>http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx</link><description>[quote][b]GPO (11/1/2010)[/b][hr]If it was your job to design these tables, would you be tempted to have an end_date_time column that could not be null, and instead substitute a ridiculous future date for all events that have not yet ended (like 22000101)? Are there indexing or other performance advantages to making your end_date_time column not null?[/quote]I would not be the least bit tempted to do that. Fake data is the bane of my existence.</description><pubDate>Mon, 01 Nov 2010 08:44:00 GMT</pubDate><dc:creator>Ron Porter</dc:creator></item><item><title>RE: A Faster BETWEEN Dates</title><link>http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx</link><description>Greetings All,Ok, some of you are complaining about the use of the LEFT JOIN.  Please allow me to explain.  When I am running some reports on the data, I am requested to show which ones fall into certain date ranges and which ones fall into different date ranges.  The final set of records will end up having some flags on them for reporting purposes.  None of the records are to be excluded just because they don't fall within the date check.  So, I can't just throw them away in the script by doing a regular JOIN.  A record that does not match one criteria may match another and so must still be included.This example used temporary tables and created a calendar table.  My regular script DOES NOT!!!  That was done only to show the logic behind the idea here and to create some test data.  If you want to test this better, then you will need to create your own regular table that has a start and end date in it and a ton of records.  I also have to be careful what I post for scripts and data.  As many of you already know, security and privacy are very important.  So, I need to show this with completely different tables and names but still show the logic.  I can't show you the real scripts or real data, it has to be completely different or else I get into serious trouble (lose job and/or jail time).  I really don't want to have to create a whole test database just to show one idea.  If I did that, it would make the attachments on my articles a lot larger.I am also dealing with 4 or 5 different tables that are linked together for my reports.  These tables will be using "Primary Keys" that range between 4 to 7 fields in size.  Not a lot of fun to write scripts on in my book.  But, I inherited this database when I received this job from the prior DBA.  I can't just change the layout of the database as I wish.  I have to take what exists and just make it work.  Sort of like when the street crew has your main highway closed for construction and you have to find a way through the back roads to get to work.  Your boss usually doesn't care about the transportation hurdles, he just wants you at your desk on time.  You can't just use the fast highway like you want so you have to be inventive with what you have.This logic was only a small portion of a much larger script I had to do a major report.  But, this also was the biggest slowdown in that report, (the BETWEEN dates).  This one change made the report run a lot faster.  Everything else I had tried, including adding several different indexes, did not.  The report had to analyze the data on several levels so sometimes it's not easy to find just the right index.  When I did find a way that worked very well, I wanted to share that idea with others to help them too.  Showing the original script would not be good as this one idea would have been lost in the complexity of the entire script.  So, splitting it out into a smaller script to highlight the one idea and adding some test data was hopefully a bit better.  If you have some ideas on how to write this better, then fine, please share.  I will always be a learning DBA and am willing to accept constructive advise.Have a good day.Terry Steadman</description><pubDate>Mon, 01 Nov 2010 08:40:21 GMT</pubDate><dc:creator>terrance.steadman</dc:creator></item><item><title>RE: A Faster BETWEEN Dates</title><link>http://www.sqlservercentral.com/Forums/Topic1013494-2758-1.aspx</link><description>1) the fact that table variables are used in the demos is likely creating execution plans that will not play out in the real world where actual tables are used.  Table variables usually cause the optimizer to create vastly different plans.  2) Personally I like dynamic SQL (when available) for widely varying date inputs because the optimizer always gets the best possible chance to create the optimum plan for each execution.  This is even better in databases if you have the ad hoc optimization enabled.</description><pubDate>Mon, 01 Nov 2010 08:01:51 GMT</pubDate><dc:creator>TheSQLGuru</dc:creator></item></channel></rss>