• david.avraamides (11/1/2010)


    I would consider using a sentinel date such as the maximum T-SQL datetime (9999-12-31) for two reasons: 1) your queries become simpler since you don't always need that "or end_date is null" part of the clause, and 2) you can create an index on end_date since it is not nullable.

    Agreed, 9999-12-31 or a similar future day is preferable to a NULL end date. The queries are much simpler to write that way. However, you can create an index either way; NULLs are perfectly acceptable in all non-primary key indexes (with some restrictions on unique keys).

    The problem with start date/end date schemes is that creating a perfect index is nearly impossible with or without the NULLs. Let's say that you want all rows that were "active" as of 2010-11-02:

    SELECT *

    FROM YourTbl

    WHERE '2010-11-02' BETWEEN startDate AND endDate

    How should you index for this query? There are basically two options: (startDate, endDate), and (endDate, startDate). Given the first index, the QP has to find the first row where startDate <= '2010-11-02', then it must do a scan from there to evaluate all of the endDate values. The reverse is true if you have endDate first.

    So which index is better? The best index for a given query is generally ordered by the most selective columns. So if you have fewer rows that start before 2010-11-02 than those that end after 2010-11-02, startDate should be first. Otherwise, endDate should be first. But what if you have an equal number of rows that meet each criteria? And what if there are 50,000,000,000 rows that start before 2010-11-02 but end earlier, and 50,000,000,000 rows that both start and end after 2010-11-02? Now you're stuck scanning 50,000,000,000 rows either way, even if there is only a single row that's actually valid for that day.

    Solution? I haven't figured it out yet. Some other DBMS products have multidimensional index types (R and Quad trees) and I suspect that these indexes can be exploited to solve these problems, but for now, in SQL Server, we're in a bit of a bind... Unless someone out there has a solution? Speak up if so 🙂

    --
    Adam Machanic
    whoisactive