Last day of month in a WHERE clause

  • Ah yes - good spot! It still gives totally different results from the other queries, though - possibly because of those pesky time portions that you mentioned.

    John

  • twin.devil (8/12/2016)


    There are few issues regarding this solution:

    1. This logic will not work if MyDate column have TIME data in it, So MyDate should be cast as date.

    I agree it should be cast as date, if times are included. However, some datetime columns are legacies in tables created before there was such a character type as DATE. This is why I asked the question.

    2. You need to have distinct value in Eom to avoid getting the same result.

    I disagree. Since Tally generates a set of unique values for N, the EOM values will also be unique. No SELECT DISTINCT is required.

    3. Its would be ending up like a Sub Query written Select statement ( Old School Style).

    Yes. And? The point is that generating the list of end-of-month dates is dirt cheap, and using it to drive the subqueries cuts out all the unnecessary reads. Divide and conquer is a proven design strategy.

    Lastly, the query that I posted works correctly for the column names as described by the op. Perhaps a view could be created over the test table to conform to those original names? Then no tweaking would be necessary. Also, it's convenient to implement the tally table as a view, so you don't have to retype the ctes over and over. Mine uses Itzik Ben-Gaan's exponentiation method.

    CREATE VIEW [dbo].[Tally]

    AS

    WITH -- Itzik Ben-Gaan's virtual tally runs fastest and I don't know why.

    L0 AS (SELECT convert(int, 1) AS C UNION ALL SELECT convert(int,1)),--2 rows

    L1 AS (SELECT 1 AS C FROM L0 AS A, L0 AS B),--4 rows

    L2 AS (SELECT 1 AS C FROM L1 AS A, L1 AS B),--16 rows

    L3 AS (SELECT 1 AS C FROM L2 AS A, L2 AS B),--256 rowshttp://www.sqlservercentral.com/Forums/Skins/Classic/Images/RichTextBoxTable/tbl_bottomleft.gif

    L4 AS (SELECT 1 AS C FROM L3 AS A, L3 AS B),--65536 rows

    L5 AS (SELECT 1 AS C FROM L4 AS A, L4 AS B)

    SELECT convert(int,ROW_NUMBER() OVER(ORDER BY (SELECT NULL))) as N from L5

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • After thinking about times being included, we don't need to waste cycles converting Check_Date from datetime to date. We just need to go a little bit more old school in the subquery. Also EOMONTH() has a lovely little second parameter that makes the cte cleaner, and possibly quicker by a few milliseconds.

    Reworked to use EOMONTH parm2 and handle possibility of times being included.

    ;with Eomonths (Eom) as (select top(500) EoMonth('1/1/2016',(N-1)) as Eom from tally)

    select Eom as Check_date, Total

    from Eomonths

    cross apply (select sum(1) as Total from a_dates where check_date >= eom and check_date < dateadd(day,1,eom)) ca

    Once it is working correctly in the test harness, could you include it in the performance comparisons please?

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • The Dixie Flatline (8/12/2016)


    After thinking about times being included, we don't need to waste cycles converting Check_Date from datetime to date. We just need to go a little bit more old school in the subquery. Also EOMONTH() has a lovely little second parameter that makes the cte cleaner, and possibly quicker by a few milliseconds.

    Reworked to use EOMONTH parm2 and handle possibility of times being included.

    ;with Eomonths (Eom) as (select top(500) EoMonth('1/1/2016',(N-1)) as Eom from tally)

    select Eom as Check_date, Total

    from Eomonths

    cross apply (select sum(1) as Total from a_dates where check_date >= eom and check_date < dateadd(day,1,eom)) ca

    Once it is working correctly in the test harness, could you include it in the performance comparisons please?

    CROSS APPLY favours index seeks, and using my 36 million row table again, this query runs in minutes rather than seconds.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • WITH -- Itzik Ben-Gaan's virtual tally runs fastest and I don't know why.

    L0 AS (SELECT con vert(int, 1) AS C UN ION ALL SELECT con vert(int,1)),--2 rows

    L1 AS (SELECT 1 AS C FROM L0 AS A, L0 AS B),--4 rows

    L2 AS (SELECT 1 AS C FROM L1 AS A, L1 AS B),--16 rows

    L3 AS (SELECT 1 AS C FROM L2 AS A, L2 AS B),--256 rows

    L4 AS (SELECT 1 AS C FROM L3 AS A, L3 AS B),--65536 rows

    L5 AS (SELECT 1 AS C FROM L4 AS A, L4 AS B)

    ,tally AS (SELECT con vert(int,ROW_NUMBER() OVER(ORDER BY (SELECT NULL))) as N from L5)

    , Eomonths (Eom) as (select EoMonth('1/1/2016',(N-1)) as Eom from tally)

    select Eom as Check_date, Total

    from Eomonths

    cross apply (select sum(1) as Total from Dates where MyDate >= eom and MyDate < dateadd(day,1,eom)) ca

    Works well with the TOP 500, but in my test harness over 13000 rows are returned. If I take out the TOP clause, this happens:Msg 517, Level 16, State 1, Line 1

    Adding a value to a 'date' column caused an overflow.

    If I change it to TOP 14000, I get lots of rows with NULL in the second column. If it's because my test harness is slightly different from the original table, I'm happy for anyone to change it or create a view on it. The code is all out there.

    John

  • Guys, I'm trying to figure out the disconnect here. I am running against a table of almost 41 million rows using the OPs table description. It has a clustered index on [ID] and a nonclustered index on [check_date] only.

    It has been populated with dates from Jan1, 2016 through July 7, 2125 with rows more or less evenly distributed across all days of the month.

    From a cold start, so we get maximum physical reads, the query at the bottom returns the following, without having to go parallel.

    Table 'x_dates'. Scan count 4000, logical reads 18995, physical reads 34, read-ahead reads 8096, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Tally'. Scan count 1, logical reads 13, physical reads 3, read-ahead reads 1625, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 438 ms, elapsed time = 1096 ms.

    set statistics time,io on;

    ;with Eomonths (Eom) as (select top(4000) EoMonth('1/1/2016',(N-1)) as Eom from tally)

    select Eom as Check_date, Total

    from Eomonths

    cross apply (select sum(1) as Total from x_dates where check_date >= eom and check_date < dateadd(day,1,eom)) ca

    where total is not null

    set statistics time,io off;

    Help me out here. What am I missing?

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • The Dixie Flatline (8/12/2016)


    Guys, I'm trying to figure out the disconnect here. I am running against a table of almost 41 million rows using the OPs table description. It has a clustered index on [ID] and a nonclustered index on [check_date] only.

    It has been populated with dates from Jan1, 2016 through July 7, 2125 with rows more or less evenly distributed across all days of the month.

    From a cold start, so we get maximum physical reads, the query at the bottom returns the following, without having to go parallel.

    Table 'x_dates'. Scan count 4000, logical reads 18995, physical reads 34, read-ahead reads 8096, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Tally'. Scan count 1, logical reads 13, physical reads 3, read-ahead reads 1625, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 438 ms, elapsed time = 1096 ms.

    set statistics time,io on;

    ;with Eomonths (Eom) as (select top(4000) EoMonth('1/1/2016',(N-1)) as Eom from tally)

    select Eom as Check_date, Total

    from Eomonths

    cross apply (select sum(1) as Total from x_dates where check_date >= eom and check_date < dateadd(day,1,eom)) ca

    where total is not null

    set statistics time,io off;

    Help me out here. What am I missing?

    Absolutely nothing, Bob, it works a treat - now that I've created an index on just the date column.

    330ms, which is about 5 times faster than any of the other options. That's using my 36 mill row table.

    I had to change EoMonth('1/1/2016',(N-1)) to EoMonth('1/1/2016',(1-n)) for it to work with dates in the past. Top work. Now I'll just slink under the table and wait until nightfall.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • I still can't get it to work on my data set - I've clearly done something wrong, and I don't really have time to investigate. Looks like I've been outcoded, though. Interesting discussion!

    John

    Edit - I think it's because my data goes hundreds of years into the future, whereas yours is set up to work across the ten years of dates you have in your table.

  • This -

    select top(185) EoMonth('1/1/2016',(1-n)) as Eom from tally

    - using Bob's tally view covers roughly 2000 to 2016, the entire range of dates in my table of interest.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • I had to change EoMonth('1/1/2016',(N-1)) to EoMonth('1/1/2016',(1-n)) for it to work with dates in the past.

    An alternative is to just back up the starting date to cover your date range. If your data starts back in the year 1995, use EoMonth('1/1/1995',N-1). It might be a nice wrinkle to use min(check_date) for the starting date of EOMONTH() and set the TOP() value to be the number of months between the min/max dates in the table.

    declare @MinDate date

    ,@MaxDate date

    ,@Months int

    select @MinDate = min(check_date),@MaxDate = max(check_date) from x_dates

    set @months = datediff(month, @MinDate, @MaxDate) + 1

    select top(@months) EoMONTH(@MinDate, N-1) as EoM from tally

    Fun stuff 😀

    P.S. After all this discussion, we just saved maybe a second from the runtime on this particular query. How many seconds have we all spent thinking about how to do it best? How many times will it have to run to make up for the time we spent? The OP was the wise man: He took the first answer, ran his query and moved on. Philosophically, sometimes good enough is good enough. But then again, twice as fast is twice as fast. If we could make everything run twice as fast, why shouldn't we?

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • The Dixie Flatline (8/12/2016)


    I had to change EoMonth('1/1/2016',(N-1)) to EoMonth('1/1/2016',(1-n)) for it to work with dates in the past.

    An alternative is to just back up the starting date to cover your date range. If your data starts back in the year 1995, use EoMonth('1/1/1995',N-1). It might be a nice wrinkle to use min(check_date) for the starting date of EOMONTH() and set the TOP() value to be the number of months between the min/max dates in the table.

    declare @MinDate date

    ,@MaxDate date

    ,@Months int

    select @MinDate = min(check_date),@MaxDate = max(check_date) from x_dates

    set @months = datediff(month, @MinDate, @MaxDate) + 1

    select top(@months) EoMONTH(@MinDate, N-1) as EoM from tally

    Fun stuff 😀

    P.S. After all this discussion, we just saved maybe a second from the runtime on this particular query. How many seconds have we all spent thinking about how to do it best? How many times will it have to run to make up for the time we spent? The OP was the wise man: He took the first answer, ran his query and moved on. Philosophically, sometimes good enough is good enough. But then again, twice as fast is twice as fast. If we could make everything run twice as fast, why shouldn't we?

    You, John and I now have a new quantum of knowledge. Whilst it's not priceless, it was certainly worth the cost of the time involved in arriving at the final solution. The OP doesn't seem to care. No doubt Kevin will be along in a year or two to fix their perf problems 😉

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 11 posts - 16 through 26 (of 26 total)

You must be logged in to reply to this topic. Login to reply