Aggregate Data for the Last Day of the Month

  • Comments posted to this topic are about the item Aggregate Data for the Last Day of the Month

  • Hello

    Firstly, I appreciate anyone who takes the time to write helpful articles on SQL as I am mostly self-taught. I'm still a bit of an SQL novice so apologies if this question has an obvious answer. If all you need to run the last day of the month CTE is a list of IDs from 1-12, why do we need to go through the bother and processing time/power of having to create the list from an arbitrary source like the 1st CTE?

    Thanks
    Drew

  • Thank you Adam for this effort but I must criticize the inefficiency of the query which can easily be vastly improved.
    😎 

    Here is a version that is more than 10 more efficient than the one posted in the article


    ;WITH T(N) AS (SELECT X.N FROM (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12)) X(N))
    ,EOM(EOM_DATE) AS
    (
      SELECT
       EOMONTH(DATEFROMPARTS(2016,TD.N,1)) AS EOM_DATE
      FROM  T TD
    )
    SELECT
      ED.EOM_DATE     AS LastDayDate
     ,SUM(DS.TotalSalePrice) AS TotalDailySales
    FROM  EOM  ED
    INNER JOIN Data.Sales  DS
    ON    ED.EOM_DATE = CONVERT(DATE,DS.SaleDate,0)
    GROUP BY ED.EOM_DATE
    ORDER BY ED.EOM_DATE ASC;

  • Apart from technical inefficiency, there is an issue with the logical shortcomings.

    Sale amounts on the last days of a month don't say much, unless they are compared with sales on any other day of a month.

    Here is my version of the query, much shorter, more informative and not using highly inefficient (and actually useless) functions DATEFROMPARTS and EOMONTH:
    SELECT DATEADD(mm, DATEDIFF(MM, 0, SLS.SaleDate), 0) MonthBeginning,
        SUM(CASE WHEN SLS.SaleDate >= DATEADD(mm, DATEDIFF(MM, 0, SLS.SaleDate)+1, 0)-1 THEN SLS.SalePrice ELSE 0 END) AS TotalSales_LastDay,
        SUM(CASE WHEN SLS.SaleDate < DATEADD(mm, DATEDIFF(MM, 0, SLS.SaleDate)+1, 0)-1 THEN SLS.SalePrice ELSE 0 END) AS TotalSales_AnyOtherDay
    FROM Data.SalesByCountry SLS
    GROUP BY DATEADD(mm, DATEDIFF(MM, 0, SLS.SaleDate), 0)
    ORDER BY MonthBegin

    _____________
    Code for TallyGenerator

  • Sale amounts on the last days of a month don't say much, unless they are compared with sales on any other day of a month.

     This may be the case but taking a requirement and producing something you think they might want is somewhat arrogant 🙂

  • Sergiy - Thursday, August 16, 2018 6:41 AM

    Apart from technical inefficiency, there is an issue with the logical shortcomings.

    Sale amounts on the last days of a month don't say much, unless they are compared with sales on any other day of a month.

    Here is my version of the query, much shorter, more informative and not using highly inefficient (and actually useless) functions DATEFROMPARTS and EOMONTH:
    SELECT DATEADD(mm, DATEDIFF(MM, 0, SLS.SaleDate), 0) MonthBeginning,
        SUM(CASE WHEN SLS.SaleDate >= DATEADD(mm, DATEDIFF(MM, 0, SLS.SaleDate)+1, 0)-1 THEN SLS.SalePrice ELSE 0 END) AS TotalSales_LastDay,
        SUM(CASE WHEN SLS.SaleDate < DATEADD(mm, DATEDIFF(MM, 0, SLS.SaleDate)+1, 0)-1 THEN SLS.SalePrice ELSE 0 END) AS TotalSales_AnyOtherDay
    FROM Data.SalesByCountry SLS
    GROUP BY DATEADD(mm, DATEDIFF(MM, 0, SLS.SaleDate), 0)
    ORDER BY MonthBegin

    You may want to thing again about the DATEFROMPARTS and the EOMONTH functions, those are some of the fastest options and certainly faster than cascading DATEADD and DATEDIFF
    😎 

    Here is a quick example

    USE TEEST;
    GO
    SET NOCOUNT ON;
    DECLARE @SAMPLE_SIZE  BIGINT  = 1000000;
    DECLARE @DATE_RANGE  INT   = 126144000;
    DECLARE @VALUE_RANGE  INT   = 1000000;
    DECLARE @FIRST_DATE  DATETIME  = '2015-01-01';

    --/*
    IF OBJECT_ID(N'dbo.TBL_TEST_DATE_AGGREGATE') IS NOT NULL DROP TABLE dbo.TBL_TEST_DATE_AGGREGATE;
    CREATE TABLE dbo.TBL_TEST_DATE_AGGREGATE
    (
      TDA_ID  INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBO_TBL_TEST_DATE_AGGREGATE_TDA_ID PRIMARY KEY CLUSTERED (TDA_ID ASC)
     ,SALE_DATE DATETIME    NOT NULL
     ,SALE_VALUE NUMERIC(12,2)   NOT NULL
    )

    ;WITH T(N) AS (SELECT X.N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N))
    , NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY @@VERSION) AS N FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7,T T8,T T9)
    INSERT INTO dbo.TBL_TEST_DATE_AGGREGATE WITH (TABLOCKX) (SALE_DATE,SALE_VALUE)
    SELECT
      DATEADD(SECOND,ABS(CHECKSUM(NEWID())) % @DATE_RANGE,@FIRST_DATE)       AS SALE_DATE
     ,CONVERT(NUMERIC(12,2),((ABS(CHECKSUM(NEWID())) % @VALUE_RANGE) + 0.0) / 100.0,0)  AS SALE_VALUE
    FROM  NUMS  NM
    ORDER BY SALE_DATE ASC;

    CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_TEST_DATE_AGGREGATE_SALE_DATE_INCLUDE_SALE_VALUE_INCL_SALE_VALUE ON dbo.TBL_TEST_DATE_AGGREGATE(SALE_DATE ASC) INCLUDE (SALE_VALUE);

    -- */

    DECLARE @timer TABLE (T_TXT VARCHAR(50) NOT NULL, T_TD DATETIME2(7) NOT NULL DEFAULT (SYSDATETIME()));
    DECLARE @DATETIME_BUCKET  DATETIME   = 0;
    DECLARE @NUMERIC_BUCKET  NUMERIC(12,2) = 0.0;

    INSERT INTO @timer (T_TXT) VALUES ('DRY RUN');
    SELECT
      @DATETIME_BUCKET = TDA.SALE_DATE
     ,@NUMERIC_BUCKET = TDA.SALE_VALUE
    FROM  dbo.TBL_TEST_DATE_AGGREGATE  TDA
    OPTION (MAXDOP 1);
    INSERT INTO @timer (T_TXT) VALUES ('DRY RUN');

    INSERT INTO @timer (T_TXT) VALUES ('DATEDIFF');
    SELECT
      @DATETIME_BUCKET = DATEADD(mm, DATEDIFF(MM, 0, SLS.SALE_DATE), 0)
    FROM dbo.TBL_TEST_DATE_AGGREGATE SLS
    OPTION (MAXDOP 1);
    INSERT INTO @timer (T_TXT) VALUES ('DATEDIFF');

    INSERT INTO @timer (T_TXT) VALUES ('EOMONTH');
    SELECT
      @DATETIME_BUCKET = EOMONTH(SLS.SALE_DATE)
    FROM dbo.TBL_TEST_DATE_AGGREGATE SLS
    OPTION (MAXDOP 1);
    INSERT INTO @timer (T_TXT) VALUES ('EOMONTH');

    SELECT
      T.T_TXT
     ,DATEDIFF(MICROSECOND,MIN(T.T_TD),MAX(T.T_TD)) AS DURATION
    FROM  @timer
    GROUP BY T.T_TXT
    ORDER BY DURATION ASC;

    Results on a latest gen i5 laptop


    T_TXT  DURATION
    --------- --------
    DRY RUN    179125
    EOMONTH    232185
    DATEDIFF   247174

  • Also, the author did say that this is just one solution.  And the article also stated that the requirement was for sales on the last day of each month.

  • Eirikur Eiriksson - Thursday, August 16, 2018 4:40 AM

    Thank you Adam for this effort but I must criticize the inefficiency of the query which can easily be vastly improved.
    😎 

    Here is a version that is more than 10 more efficient than the one posted in the article


    ;WITH T(N) AS (SELECT X.N FROM (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12)) X(N))
    ,EOM(EOM_DATE) AS
    (
      SELECT
       EOMONTH(DATEFROMPARTS(2016,TD.N,1)) AS EOM_DATE
      FROM  T TD
    )
    SELECT
      ED.EOM_DATE     AS LastDayDate
     ,SUM(DS.TotalSalePrice) AS TotalDailySales
    FROM  EOM  ED
    INNER JOIN Data.Sales  DS
    ON    ED.EOM_DATE = CONVERT(DATE,DS.SaleDate,0)
    GROUP BY ED.EOM_DATE
    ORDER BY ED.EOM_DATE ASC;

    With a couple of minor changes - you can eliminate the datefromparts altogether and allow for the past 12 months of data from any given date.

    Declare @endDate date = '2016-12-31';

    ;WITH T(N) AS (SELECT X.N FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11)) X(N))
    ,EOM(EOM_DATE) AS
    (
    SELECT
      EOMONTH(@endDate,-TD.N) AS EOM_DATE
    FROM T TD
    )
    SELECT
    ED.EOM_DATE  AS LastDayDate
    FROM EOM ED
    INNER JOIN Data.Sales  DS
    ON    ED.EOM_DATE = CONVERT(DATE,DS.SaleDate,0)
    GROUP BY ED.EOM_DATE
    ORDER BY ED.EOM_DATE ASC;

    Or better yet...

    Declare @endDate date = '2017-01-15';

    ;WITH T(N) AS (SELECT X.N FROM (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12)) X(N))
    ,EOM(EOM_DATE) AS
    (
    SELECT
      EOMONTH(@endDate,-TD.N) AS EOM_DATE
    FROM T TD
    )
    SELECT
    ED.EOM_DATE  AS LastDayDate
    FROM EOM ED
    INNER JOIN Data.Sales  DS
    ON    ED.EOM_DATE = CONVERT(DATE,DS.SaleDate,0)
    GROUP BY ED.EOM_DATE
    ORDER BY ED.EOM_DATE ASC;

    The second version gives you the prior 12 months from the given date - not including the month of the given date.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Lynn Pettis - Thursday, August 16, 2018 11:27 AM

    Also, the author did say that this is just one solution.  And the article also stated that the requirement was for sales on the last day of each month.

    yes, but why should it be the worst possible one which gets published (apart from a cursor approach)?

    _____________
    Code for TallyGenerator

  • Sergiy - Thursday, August 16, 2018 7:22 PM

    Lynn Pettis - Thursday, August 16, 2018 11:27 AM

    Also, the author did say that this is just one solution.  And the article also stated that the requirement was for sales on the last day of each month.

    yes, but why should it be the worst possible one which gets published (apart from a cursor approach)?

    Why isn't the grass blue and the sky green?

    Because that is the one the author published in the article.

  • Lynn Pettis - Thursday, August 16, 2018 11:25 PM

    Sergiy - Thursday, August 16, 2018 7:22 PM

    Lynn Pettis - Thursday, August 16, 2018 11:27 AM

    Also, the author did say that this is just one solution.  And the article also stated that the requirement was for sales on the last day of each month.

    yes, but why should it be the worst possible one which gets published (apart from a cursor approach)?

    Why isn't the grass blue and the sky green?

    Because that is the one the author published in the article.

    Heh... yeah, tell that to the users when the code grows or someone uses the same techniques on something much larger. 😉

    One of the things that this community is VERY good about is coming up with better ways in the discussions regardless of what the author states in the article.  Would you be so quick with your comment if the author had actually used a cursor or recursive CTE or While Loop?  Of course not.  You'd have explained a better way and that's exactly what's happening on this thread.

    A bigger and very real problem is that this article was derived from a book that the author wrote and is advertising on every article he's posted in this series.  Considering that the code does need some help and a book is being pushed in the process, I think it's especially important in this case.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Eirikur Eiriksson - Thursday, August 16, 2018 11:25 AM

    Sergiy - Thursday, August 16, 2018 6:41 AM

    Apart from technical inefficiency, there is an issue with the logical shortcomings.

    Sale amounts on the last days of a month don't say much, unless they are compared with sales on any other day of a month.

    Here is my version of the query, much shorter, more informative and not using highly inefficient (and actually useless) functions DATEFROMPARTS and EOMONTH:
    SELECT DATEADD(mm, DATEDIFF(MM, 0, SLS.SaleDate), 0) MonthBeginning,
        SUM(CASE WHEN SLS.SaleDate >= DATEADD(mm, DATEDIFF(MM, 0, SLS.SaleDate)+1, 0)-1 THEN SLS.SalePrice ELSE 0 END) AS TotalSales_LastDay,
        SUM(CASE WHEN SLS.SaleDate < DATEADD(mm, DATEDIFF(MM, 0, SLS.SaleDate)+1, 0)-1 THEN SLS.SalePrice ELSE 0 END) AS TotalSales_AnyOtherDay
    FROM Data.SalesByCountry SLS
    GROUP BY DATEADD(mm, DATEDIFF(MM, 0, SLS.SaleDate), 0)
    ORDER BY MonthBegin

    You may want to thing again about the DATEFROMPARTS and the EOMONTH functions, those are some of the fastest options and certainly faster than cascading DATEADD and DATEDIFF
    😎 

    Here is a quick example

    USE TEEST;
    GO
    SET NOCOUNT ON;
    DECLARE @SAMPLE_SIZE  BIGINT  = 1000000;
    DECLARE @DATE_RANGE  INT   = 126144000;
    DECLARE @VALUE_RANGE  INT   = 1000000;
    DECLARE @FIRST_DATE  DATETIME  = '2015-01-01';

    --/*
    IF OBJECT_ID(N'dbo.TBL_TEST_DATE_AGGREGATE') IS NOT NULL DROP TABLE dbo.TBL_TEST_DATE_AGGREGATE;
    CREATE TABLE dbo.TBL_TEST_DATE_AGGREGATE
    (
      TDA_ID  INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBO_TBL_TEST_DATE_AGGREGATE_TDA_ID PRIMARY KEY CLUSTERED (TDA_ID ASC)
     ,SALE_DATE DATETIME    NOT NULL
     ,SALE_VALUE NUMERIC(12,2)   NOT NULL
    )

    ;WITH T(N) AS (SELECT X.N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N))
    , NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY @@VERSION) AS N FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7,T T8,T T9)
    INSERT INTO dbo.TBL_TEST_DATE_AGGREGATE WITH (TABLOCKX) (SALE_DATE,SALE_VALUE)
    SELECT
      DATEADD(SECOND,ABS(CHECKSUM(NEWID())) % @DATE_RANGE,@FIRST_DATE)       AS SALE_DATE
     ,CONVERT(NUMERIC(12,2),((ABS(CHECKSUM(NEWID())) % @VALUE_RANGE) + 0.0) / 100.0,0)  AS SALE_VALUE
    FROM  NUMS  NM
    ORDER BY SALE_DATE ASC;

    CREATE NONCLUSTERED INDEX NCLIDX_DBO_TBL_TEST_DATE_AGGREGATE_SALE_DATE_INCLUDE_SALE_VALUE_INCL_SALE_VALUE ON dbo.TBL_TEST_DATE_AGGREGATE(SALE_DATE ASC) INCLUDE (SALE_VALUE);

    -- */

    DECLARE @timer TABLE (T_TXT VARCHAR(50) NOT NULL, T_TD DATETIME2(7) NOT NULL DEFAULT (SYSDATETIME()));
    DECLARE @DATETIME_BUCKET  DATETIME   = 0;
    DECLARE @NUMERIC_BUCKET  NUMERIC(12,2) = 0.0;

    INSERT INTO @timer (T_TXT) VALUES ('DRY RUN');
    SELECT
      @DATETIME_BUCKET = TDA.SALE_DATE
     ,@NUMERIC_BUCKET = TDA.SALE_VALUE
    FROM  dbo.TBL_TEST_DATE_AGGREGATE  TDA
    OPTION (MAXDOP 1);
    INSERT INTO @timer (T_TXT) VALUES ('DRY RUN');

    INSERT INTO @timer (T_TXT) VALUES ('DATEDIFF');
    SELECT
      @DATETIME_BUCKET = DATEADD(mm, DATEDIFF(MM, 0, SLS.SALE_DATE), 0)
    FROM dbo.TBL_TEST_DATE_AGGREGATE SLS
    OPTION (MAXDOP 1);
    INSERT INTO @timer (T_TXT) VALUES ('DATEDIFF');

    INSERT INTO @timer (T_TXT) VALUES ('EOMONTH');
    SELECT
      @DATETIME_BUCKET = EOMONTH(SLS.SALE_DATE)
    FROM dbo.TBL_TEST_DATE_AGGREGATE SLS
    OPTION (MAXDOP 1);
    INSERT INTO @timer (T_TXT) VALUES ('EOMONTH');

    SELECT
      T.T_TXT
     ,DATEDIFF(MICROSECOND,MIN(T.T_TD),MAX(T.T_TD)) AS DURATION
    FROM  @timer
    GROUP BY T.T_TXT
    ORDER BY DURATION ASC;

    Results on a latest gen i5 laptop


    T_TXT  DURATION
    --------- --------
    DRY RUN    179125
    EOMONTH    232185
    DATEDIFF   247174

    You see, Eirikur, to me ineffectiveness of EOMONTH is defined not by those 5 or so percent in the speed of computation, but by all the stuff which comes with it.
    The function is pretty much useless for all of the scenarios when event date+time recorded in the same column. Except, probably that one case which was used in the article, when you need data for that single last day of the month.
    In every other case you need rather beginning of the next ,month, and EOMONTH has no use case.
    And if you have date and time stored separately, the overall overhead of such approach will be so big that you'll forget about that tiny difference in date computations.

    _____________
    Code for TallyGenerator

  • I am getting the same output with this query

    Select convert(date,saledate),sum(saleprice) from Data.SalesbyCountry
    where convert(date,saledate) = eomonth(saledate) and year(saledate)='2016'
    group by convert(date,saledate)

    Am I missing something ???

  • Ombir wrote:

    I am getting the same output with this query

    Select convert(date,saledate),sum(saleprice) from Data.SalesbyCountry
    where convert(date,saledate) = eomonth(saledate) and year(saledate)='2016'
    group by convert(date,saledate)

    Am I missing something ???

    Yes, Sir.  You're missing the point of the entire article.

    The whole point was not to merely get the sum of what's there, which is easy as you have demonstrated, but how to get the sum of the last day of every month to demonstrate how to present the fact that not all months in the range of dates had sales.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply