Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««23456»»»

A Faster BETWEEN Dates Expand / Collapse
Author
Message
Posted Monday, November 1, 2010 2:43 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Wednesday, December 17, 2014 9:24 AM
Points: 27, Visits: 193
Although I like the idea of being creative on 'older' hardware (my (far from new) portable runs the test queries in sub-second time frames), I too think there are a bit of issues with the article's content :
* as pointed out already, the LEFT OUTER JOIN has been 'transformed' into a 'disguised' INNER JOIN
* the use of table-variables for performance testing is a bit of a no-no in my book, they suffer from strangenessitis too often. Additionally, you can't quite index them the way you want and they don't have statistics (that I know of)
* there are no indexes on the@ProjectList table, nor on the @WorkTable ... probably worth a try (using temp-tables; sequence : put data in first, then add the indexes)
* @CalTable does have indexes, but for reasons I don't get the CLUSTERED PRIMARY KEY gets another UNIQUE INDEX on the same field. (**)

I would suggest to look into indexing the working tables first, if that's not already the case, and stay clear of table-variables for this kind of tuning. In my personal opinion table variables are useful for VERY small datasets, for recursive stuff (no clashes when dealing with "shared" temp-tables) and when you want to work around the Transaction model. Otherwise I'm quite happy with 'ordinary' temp-tables as -in my experience (***)- they seem to result in much better query plans. That said, YMMV off course.. if this change seems to get you better performance then it might be the way forward on your particular system... but frankly I think you'll be back in the near future once the unexplainable benefit wears off...

That said, although I do not quite agree with the contents of the article, I do applaud the way it was written : nicely structured, clear language and clean example code ! Please do not be put-off by the fact people have doubts about your findings ... finding the reason behind your observations might show you're in some kind of borderline situation; but as there is none given it's hard to accept something /counter-logic/ that turns out to be false when tested by other people.

**: query plan indicates that the optimizer takes the clustered index in both situations.
***: I tend to work on largish data-sets most of the time; for small things, table-variables might work as light-weight temptables and do show some performance benefit.. then again, if it's small it should be fast by definition =)



Post #1014139
Posted Monday, November 1, 2010 4:40 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Tuesday, November 18, 2014 11:59 AM
Points: 32, Visits: 108

similarly to find all events that overlapped with a period of time, you might go:

select blah
from events ev
where ev.start_date_time <= '20100701'
and (ev.end_date_time > '20100101' or ev.end_date_time is null)

If it was your job to design these tables, would you be tempted to have an end_date_time column that could not be null, and instead substitute a ridiculous future date for all events that have not yet ended (like 22000101)? Are there indexing or other performance advantages to making your end_date_time column not null?


Absolutely would put a rediculous enddate in place. NULL values are stored as a bitmap at the leaf level of on index (clustered or otherwise), but are not used in the B-Tree portion of an index. In some cases you will get seek-like behavior out of NULLS, because the optimizer can basically do an inverse search. Dates are a fixed length fields, so NULL or not, you are still taking the same amount of physical space. By specifying a rediculous date as the End_Date_Time, your query eliminates the NULL check, and can always rely on the B-Tree portion of an index that uses End_Date_Time in the key and perform a SEEK.
Post #1014188
Posted Monday, November 1, 2010 4:42 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, June 7, 2012 12:02 PM
Points: 7, Visits: 31
ebaya (11/1/2010)
jpenniman (11/1/2010)
Did you clear the buffers between tests? Using: dbcc dropcleanbuffers. If not, it would explain your "performance gain"
His illusory performance gain is explained entirely by the fact that it's an entirely differentquery, that will return different results. You don't need to invoke any buffering or anything else to explain it.

...as the result is already in memory. Your "faster" query would perform a table scan every time in real life, as your condition is an expression... and a case statement at that.
Eh? You can buffer an expression as you can anything else. You simply can't index off it (in technical terms, its not SARGable).



Not sure what I was thinking there... agreed the plan wouldn't be in memory. Still in shock I guess.

Agreed it's not SARGable... that's why I noted the scan.
Post #1014190
Posted Monday, November 1, 2010 4:45 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Tuesday, November 18, 2014 11:59 AM
Points: 32, Visits: 108
ebaya (11/1/2010)
david.avraamides (11/1/2010)
[quote]I would consider using a sentinel date such as the maximum T-SQL datetime (9999-12-31) for two reasons: 1) your queries become simpler since you don't always need that "or end_date is null" part of the clause, and 2) you can create an index on end_date since it is not nullable.
Eh? You can index on a nullable column.


You can, but NULLS are stored as a bimap at the leaf level, and are not part of the organization of the B-Tree itself, so tend not to optimize correctly.
Post #1014192
Posted Monday, November 1, 2010 8:30 PM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Saturday, November 27, 2010 10:41 PM
Points: 58, Visits: 113
Paul Muharsky-474732 (11/1/2010)
Absolutely would put a rediculous enddate in place. NULL values are stored as a bitmap at the leaf level of on index (clustered or otherwise), but are not used in the B-Tree portion of an index. In some cases you will get seek-like behavior out of NULLS, because the optimizer can basically do an inverse search. Dates are a fixed length fields, so NULL or not, you are still taking the same amount of physical space. By specifying a rediculous date as the End_Date_Time, your query eliminates the NULL check, and can always rely on the B-Tree portion of an index that uses End_Date_Time in the key and perform a SEEK.

Several misconceptions.

a) All data is stored in the leaf level of an index. It's not just NULLs.

b) If by "the B-tree part of the index", you mean non-leaf pages, then yes, NULL values are not stored there. But why would you want them there? Intermediate levels are to allow quick location of a specific value ... but we already know where the NULLS in the index are.

c) The belief that NULLs always hurt performance is wrong. It depends on several factors, including cardinality/null ratio of the column, the query structure, etc. Many times they can increase performance. Unless you know the specific cases where eliminating NULLS can give you a large performance boost (such as left-join elimination), it's better to use them where they appropriately model your data.

d) The belief that NULLs always complicate queries is wrong. Since they match nothing, they can simplify many queries. For instance, this query as to which events have not yet ended:

WHERE EndDate > GETDATE() OR EndDate IS NULL

Would indeed be simplified by a sentinel value instead of a NULL. But using a sentinel means this query to retrieve all events that end after next year now becomes more complex:

WHERE EndDate > '20110101' AND EndDate <> @SentinelDate

When using aggregates in particular, sentinel values can be a real nightmare.
Post #1014248
Posted Tuesday, November 2, 2010 7:54 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, February 19, 2014 10:44 AM
Points: 7, Visits: 282
Which is more efficient, individual indexes on Proj_Start and Proj_End or a composite index on Proj_Start and Proj_End?
Post #1014488
Posted Tuesday, November 2, 2010 8:05 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Wednesday, December 24, 2014 9:27 AM
Points: 1,113, Visits: 708
david.avraamides (11/1/2010)
I would consider using a sentinel date such as the maximum T-SQL datetime (9999-12-31) for two reasons: 1) your queries become simpler since you don't always need that "or end_date is null" part of the clause, and 2) you can create an index on end_date since it is not nullable.


Agreed, 9999-12-31 or a similar future day is preferable to a NULL end date. The queries are much simpler to write that way. However, you can create an index either way; NULLs are perfectly acceptable in all non-primary key indexes (with some restrictions on unique keys).

The problem with start date/end date schemes is that creating a perfect index is nearly impossible with or without the NULLs. Let's say that you want all rows that were "active" as of 2010-11-02:

SELECT *
FROM YourTbl
WHERE '2010-11-02' BETWEEN startDate AND endDate

How should you index for this query? There are basically two options: (startDate, endDate), and (endDate, startDate). Given the first index, the QP has to find the first row where startDate <= '2010-11-02', then it must do a scan from there to evaluate all of the endDate values. The reverse is true if you have endDate first.

So which index is better? The best index for a given query is generally ordered by the most selective columns. So if you have fewer rows that start before 2010-11-02 than those that end after 2010-11-02, startDate should be first. Otherwise, endDate should be first. But what if you have an equal number of rows that meet each criteria? And what if there are 50,000,000,000 rows that start before 2010-11-02 but end earlier, and 50,000,000,000 rows that both start and end after 2010-11-02? Now you're stuck scanning 50,000,000,000 rows either way, even if there is only a single row that's actually valid for that day.

Solution? I haven't figured it out yet. Some other DBMS products have multidimensional index types (R and Quad trees) and I suspect that these indexes can be exploited to solve these problems, but for now, in SQL Server, we're in a bit of a bind... Unless someone out there has a solution? Speak up if so


--
Adam Machanic
SQL Server MVP
SQLblog.com: THE SQL Server Blog Spot on the Web
Post #1014513
Posted Tuesday, November 2, 2010 8:29 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Saturday, November 27, 2010 10:41 PM
Points: 58, Visits: 113
Adam Machanic (11/2/2010)
[quote]david.avraamides (11/1/2010)
I would consider using a sentinel date such as the maximum T-SQL datetime (9999-12-31) for two reasons: 1) your queries become simpler since you don't always need that "or end_date is null"....
Sometimes. Sometimes they become more complex, since you need to add an "...or date is not (sentinel)" to them. The overarching concern in data modelling should be correctness of expression, not how to simplify a query.

How should you index for this query? There are basically two options: (startDate, endDate), and (endDate, startDate).
Without a two-dimensional index such as a Quadtree, you can't do fully-indexed 2D range searches. However, there are a few tricks you can do to get close, by exploiting information the QO doesn't know about your data. I should probably write an article about it, as its a bit more complex than one can easily cover in a post.

Also, you *can* implement a quadtree index in Sql Server, with a little work.

Post #1014546
Posted Tuesday, November 2, 2010 2:32 PM


SSCoach

SSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoach

Group: General Forum Members
Last Login: Yesterday @ 6:12 PM
Points: 18,083, Visits: 16,117
Interesting approach and thanks for sharing.



Jason AKA CirqueDeSQLeil
I have given a name to my pain...
MCM SQL Server, MVP


SQL RNNR

Posting Performance Based Questions - Gail Shaw
Post #1014897
Posted Wednesday, November 3, 2010 7:27 PM
Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Thursday, December 4, 2014 7:54 PM
Points: 3,087, Visits: 2,686
Thank you to all of you who have contributed to this discussion. I am kinda glad that I was too busy to read the article and the discussions until now.

Not only is the original article of interest (and it is great to have someone prepared to write about their findings...thanks Terry) but the discussions show that this topic is of great interest to many of us, myself included.



Post #1015653
« Prev Topic | Next Topic »

Add to briefcase «««23456»»»

Permissions Expand / Collapse