A Faster BETWEEN Dates

  • Comments posted to this topic are about the item A Faster BETWEEN Dates

  • Novel approach !

    It would also be interesting to compare these two approaches using the Calendar table, but with an Integer PK (as per a data warehouse DIM table), then in the FACT table carry this Integer as a FK.

    Well done & kind regards

    Geoff Johns

  • Hi Terry,

    Thanks for the article.

    It's definitely an interesting approach - but I miss an explanation. WHY does the second version run faster than the first? I would have expected the reverse, but your measurements disagree - there really should have been a paragraph or two to explain this in your article.

    I tried to verify your findings on my computer, and I found that each version ran in subsecond duration; your tests must have been done on much older hardware, or on much larger tables. I'm a bit short for time now, but I plan (if I find the time) to do some more experiments, maybe on larger tables. For now, I can only say that the execution plans do not support your observation that the second query is faster, and that the execution times are too short to be sure.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hi Terry,

    You begin by saying "Finding records that exist between two dates can sometimes be slow". In my experience this has never been true when the table in question is properly indexed.

    That aside however, as with Hugo's observation, I see no difference between your final two statements. The execution plans are identical.

    Instead of using a table variable (which I would never use for large data sets anyway) try creating a temp table and create suitable indexes on the date fields.

    You could then completely avoid having to create a calender table and directly query the one table instead with a simple between statement in the where clause that should be even faster.

  • Hello Terry

    Afraid I see the same results as Hugo. The elapsed time for the second approach seems much higher and far more records are read. Looking at the plan the problem seems to be that SQL Server is generating an intermediate table and then scanning it for each candidate record in a nested loop. Its a sub optimal plan probably resulting from the relatively small number of records in the driving table.

    Interesting approach though, thanks for the article.

    Pete

    Pete

  • As a slight aside, I have to do this sort of a lot at the moment, and it's common to have tables with start_date_time and end_date_time columns. You'll often want to select all the records that were extant at a point in time. In other words if you wanted to retrieve all the events that had started but not finished on 1 Jan 2010 you might go (untested freehand follows):

    select blah

    from events ev

    where ev.start_date_time <= '20100101'

    and (ev.end_date_time > '20100101' or ev.end_date_time is null)

    similarly to find all events that overlapped with a period of time, you might go:

    select blah

    from events ev

    where ev.start_date_time <= '20100701'

    and (ev.end_date_time > '20100101' or ev.end_date_time is null)

    If it was your job to design these tables, would you be tempted to have an end_date_time column that could not be null, and instead substitute a ridiculous future date for all events that have not yet ended (like 22000101)? Are there indexing or other performance advantages to making your end_date_time column not null?

    ...One of the symptoms of an approaching nervous breakdown is the belief that ones work is terribly important.... Bertrand Russell

  • I'd have to agree with earlier posts. In the set up we have here I dont see any real beneifts. I've tried on a larger set of data but it could be our hardware set up that is hiding the gains you see. It would be interesting to know your set up and a bit more detail on results you got.

  • Hi Terry,

    It was indeed very interesting as an aproach. We tried it but for larger volumes, Our result; the 'between'-statement was 7 times faster (we are running on mssql 2008)

    Still any try that can make a faster between I will always look at with interest.

    thanks for the demo

    best regards

    Marcel

  • Hello, terrance

    May you explain this syntax?

    I think it is very dangerous to transfer search condition in the WHERE clause,

    because you can alter result of "LEFT JOIN" that becomes a pure JOIN.

    FROM @WorkTable AS wt

    LEFT JOIN @CalTable AS ct ON

    wt.Work_Start <= ct.CalDate

    WHERE

    (

    CASE

    WHEN ct.CalDate <= wt.Work_End THEN 1

    ELSE 0

    END

    ) = 1

  • doesnt it defeat the purpose of a 4th generation language (specify desired results, not exact method), if you have to use such non-intuitive different ways of specifying the query in order to gain speed: if you cant trust the 4gl, might as well use a 3rd generation language?

  • We have a number of servers which often run into tight-memory situations, we've tried the above approach quite exhaustively. The frequent creation and destruction of temporary tables can make a subsecond SP end up taking over 30 seconds to wait for object memory to become available. If this query was frequently run on a busy server, I'd expect to start seeing vast slow-downs. If the server's idling, then we found this approach can help in certain situations. However, a good index is as fast.

    If your app is running very slowly because of date comparison, a good solution is to add another int column, index it, and copy over the dates into it. Then cast any querrent date to int and use the new column(s) instead. Everything runs super-fast, especially if the table is transactional, so the index fill-factor can be set to 100% to make it compact and take fewer IOs. The increase in table size is a good trade-off of hard-disk usage versus the memory usage of table variables.

    🙂

  • Greetings All,

    I have not had time to reply to your messages individually due to time constraints so I will try to shed some light here.

    The calendar and temporary tables were used in this example only to create test data - nothing more. For me, I am applying this method against a regular data table that has a start and end date in it.

    The environment is on an older (slower) server using SQL 2005. The database and table layout is not optimal, but I did not design it and changing it to be better is not always an option.

    Since I can't just change things for the better, I have to find more inventive ways to make it work better, hence, why I came up with this start date check and a boolean flag for the end date check.

    Put this against a faster server and also on SQL 2008 and yes, these speed changes may disappear. But, when walking into a database situation not of your design and this may help you.

    I had tried different ways of applying an index to the data but none of them were giving me the speed enhancement that this method did. Bit then, maybe I just did not get the order of the fields correct or something. I will look at the suggestions listed here to see if any of them work too.

    I enjoy learning new and different ways of solving a problem. I can list ways I had solved them here and learn from others who have better knowledge than I. I certainly am not too proud to try to say that my script doesn't stink, but this trick may still benefit another as well as the suggestions given.

    Have a good day.

    Terry Steadman

  • terrance.steadman (11/1/2010)


    I enjoy learning new and different ways of solving a problem. I can list ways I had solved them here and learn from others who have better knowledge than I. I certainly am not too proud to try to say that my script doesn't stink, but this trick may still benefit another as well as the suggestions given.

    I personally like to compare the author's logic with logic I've used to solve similar situations. SQL Server as a common ground makes for better reading and understanding than say, reading about or even watching a chess match between two other players (*yawn*). As much as I like to play chess, I don't seem to learn much watching a match I'm not playing (probably an attention thing). Put this reply in the "for what it's worth" category. 😀

    Thanks for the article.

  • First of all, did you look at the query plans between the two approaches? Or flush caches?

    Secondly, why use a boolean flag when the below approach is simpler and should yield the same plan? (I'll try to verify that with your test data):

    LEFT JOIN @CalTable AS ct ON

    wt.Work_Start <= ct.CalDate

    WHERE ct.CalDate <= wt.Work_End

    Third, as Carlo points out, your approach is not identical to the original query. It partially transforms a left join into an inner join, which may go a long way to explaining why it ran faster.

  • 1) the fact that table variables are used in the demos is likely creating execution plans that will not play out in the real world where actual tables are used. Table variables usually cause the optimizer to create vastly different plans.

    2) Personally I like dynamic SQL (when available) for widely varying date inputs because the optimizer always gets the best possible chance to create the optimum plan for each execution. This is even better in databases if you have the ad hoc optimization enabled.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 15 posts - 1 through 15 (of 53 total)

You must be logged in to reply to this topic. Login to reply