Returning 0's in a SUM (SQL Spackle)

  • I feel the need to add a personal opinion here but please don't think I am saying there is a right or a wrong way. In the words of Dirty Harry "opinions are like Ar*eho les - everyone has one"

    As I stated earlier, I prefer the method of creating a calendar table and then linking the data in to populate it, which results in NULL's for the unmatched data. What I did not state was why. So here's why...

    It is readable and does not contain functions.

    I have been managing teams of sql and application developers for years and I have seen many brilliant and highly effective but totally un-maintainable chunks of sql. Breaking the problem down in to smaller, more manageable and sometime bleedin' obvious parts is generally a good thing as the next developer to work on the code may not have the same level of coding competence or ability to understand complexities quickly. Sure the clever code might run slightly faster and if performance is a problem then perhaps it is the way forward but ease of change should always be the top goal (IMHO)

    I also hate the use of functions where they can be avoided. Adding a function means that people often have to lookup what the function does and how it works (sometimes having to ask "Uncle Google"). Don't get me wrong here, I love functions but only where required. Functions can make a large chunk of SQL much easier to read and use in some cases. Personally I don't think they do here.

    I have a number table called Numbers on my system with a single column called Number and indexed values from 0 to 99,999. I use this table for all sorts of string and date manipulations. I also have a permanent and fully indexed calendar table too. These two tables are used all over the place and are highly effective.

    The other great feature of a permanent calendar table is that you can add special dates and slowly changing dimension tables to them too.

    So there are my reasons.

    Hope this doesn't offend. Please remember this is only an opinion.

    Dave

    David Bridge
    David Bridge Technology Limited
    www.davidbridgetechnology.com

  • David,

    I also have a calendar table in our production system and it performs very well and it makes our code consistent across the board. I agree there are many ways to accomplish this and what works best in one system might not be the best for another system.

  • DavidBridgeTechnology.com (6/13/2011)


    I have a number table called Numbers on my system with a single column called Number and indexed values from 0 to 99,999. I use this table for all sorts of string and date manipulations. I also have a permanent and fully indexed calendar table too. These two tables are used all over the place and are highly effective.

    You're a man after my own heart. See the following link and you'll see why I certainly don't take offense to what you posted. 🙂

    http://www.sqlservercentral.com/articles/T-SQL/62867/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • harry9katz (6/13/2011)


    Hi

    You have to add hour to both tables

    SELECT cast(floor(cast(SaleTime as float)) as datetime)Saledate,DATEPART(hour,SaleTime) saleshour

    ,sum([SaleQty]) sumsales

    into #salessum

    FROM #SalesTestData

    group by cast(floor(cast(SaleTime as float)) as datetime),DATEPART(hour,SaleTime)

    order by cast(floor(cast(SaleTime as float)) as datetime),DATEPART(hour,SaleTime)

    here U get the first table that the article started with

    U get salesum but not on every hour

    Then U add to the #Clendertest for every day all the hours

    an now it is simple

    SELECT dbo.CalendarTest.hourid, dbo.CalendarTest.StartDate, ISNULL(SUM(dbo.saleoneday.sumsales), 0) AS sumsales

    FROM dbo.saleoneday RIGHT OUTER JOIN

    dbo.CalendarTest ON dbo.saleoneday.Saledate = dbo.CalendarTest.StartDate AND dbo.saleoneday.saleshour = dbo.CalendarTest.hourid

    GROUP BY dbo.CalendarTest.hourid, dbo.CalendarTest.StartDate

    ORDER BY dbo.CalendarTest.StartDate, dbo.CalendarTest.hourid

    Is the dbo.CalendarTest table the same as the #CalendarTest table in the article? Also, what's the dbo.saleoneday table because I can find anything named even close to that in the article and you don't build it anywhere in your code example.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • berzat.museski (6/13/2011)


    No need for uncle Google in this case, spt_values is quite enough, unless you are querying some large data warehouse.

    However, I agree that GetNums is a cool trick to generate large order numbers.

    I was thinking of the following solution:

    I first add one computed column in the fact table (I assume that it is ok to do that) in order to keep the query SARGable. The new column will have the sales datetime rounded up to hour percision.

    ALTER TABLE #SalesTestData

    ADD SaleTimeHour AS CONVERT(DATETIME, CONVERT(VARCHAR,SaleTime,112)+' '+CAST(DATEPART(HOUR,SaleTime) AS VARCHAR(2))+':00:00.000',112) PERSISTED

    The report will then be formed by this single query:

    DECLARE @StartDate DATETIME, @EndDate DATETIME

    SET @StartDate='2011-01-01 00:00:00.000'

    SET @EndDate='2011-12-01 00:00:00.000'

    SELECT CAST(AllDates.ReportDayHour AS DATE) AS ReportDay, CAST(AllDates.ReportDayHour AS TIME) AS ReportHour,

    ISNULL(SUM(s.SaleQty),0) AS TotalSaleQty

    FROM (

    SELECT DATEADD(hh,h.number,DATEADD(dd,d.number,DATEADD(mm,m.number,@StartDate))) AS ReportDayHour

    FROM master..spt_values m

    CROSS JOIN master..spt_values d

    CROSS JOIN master..spt_values h

    WHEREm.type='p' AND m.number BETWEEN 0 AND DATEDIFF(mm,@StartDate,@EndDate)

    AND d.type='p' AND d.number BETWEEN 0 AND 30

    AND h.type='p' AND h.number BETWEEN 0 AND 23

    ) AS AllDates

    LEFT JOIN #SalesTestData s ON s.SaleTimeHour=AllDates.ReportDayHour

    GROUP BY AllDates.ReportDayHour

    ORDER BY AllDates.ReportDayHour

    The parameters are the start and end moments of the reports, as any manager would want 🙂

    Hi, Berzat,

    That's a very clever solution but, comparatively speaking, it's a bit slow compared to the methods offered in the article. After adding your computed column to the test table (as you did before), run the following code and see the difference in the time it takes...

    --===== Identify the run

    SET STATISTICS TIME OFF;

    PRINT REPLICATE('=',80);

    PRINT 'Method similar to article';

    PRINT REPLICATE('=',80);

    SET STATISTICS TIME ON;

    --===== Declare some obviously named variables. The ones with the @p

    -- prefix could be parameters in a stored procedure or function

    DECLARE @pStartMonth DATETIME,

    @pEndMonth DATETIME

    ;

    --===== Assign values to the input parameters

    SELECT @pStartMonth = '2011-01-15',

    @pEndMonth = 'Dec 2011'

    ;

    --===== "Normalize" the dates

    SELECT @pStartMonth = DATEADD(mm,DATEDIFF(mm,0,@pStartMonth),0),

    @pEndMonth = DATEADD(mm,DATEDIFF(mm,0,@pEndMonth)+1,0)

    ;

    --===== Create and populate the "hours" table

    DECLARE @Hours TABLE (HourStart DATETIME, HourNext DATETIME)

    ;

    INSERT INTO @Hours

    (HourStart, HourNext)

    SELECT TOP (DATEDIFF(hh, @pStartMonth, @pEndMonth))

    HourStart = DATEADD(hh,ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1,@pStartMonth),

    HourNext = DATEADD(hh,ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) ,@pStartMonth)

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    ;

    SELECT ReportDay = CONVERT(CHAR(10),MAX(bin.HourStart),120),

    ReportHour = DATEPART(hh,MAX(bin.HourStart)),

    TotalSales = SUM(ISNULL(sale.SaleQty,0))

    FROM @Hours bin

    LEFT OUTER JOIN #SalesTestData sale

    ON sale.SaleTime >= bin.HourStart

    AND sale.SaleTime < bin.HourNext

    GROUP BY bin.HourStart

    ORDER BY bin.HourStart

    ;

    GO

    -------------------------------------------------------------------------------

    --===== Identify the run

    SET STATISTICS TIME OFF;

    PRINT REPLICATE('=',80);

    PRINT 'Berzat''s method';

    PRINT REPLICATE('=',80);

    SET STATISTICS TIME ON;

    DECLARE @StartDate DATETIME, @EndDate DATETIME

    SET @StartDate='2011-01-01 00:00:00.000'

    SET @EndDate='2011-12-01 00:00:00.000'

    SELECT CAST(AllDates.ReportDayHour AS DATE) AS ReportDay, CAST(AllDates.ReportDayHour AS TIME) AS ReportHour,

    ISNULL(SUM(s.SaleQty),0) AS TotalSaleQty

    FROM (

    SELECT DATEADD(hh,h.number,DATEADD(dd,d.number,DATEADD(mm,m.number,@StartDate))) AS ReportDayHour

    FROM master..spt_values m

    CROSS JOIN master..spt_values d

    CROSS JOIN master..spt_values h

    WHERE m.type='p' AND m.number BETWEEN 0 AND DATEDIFF(mm,@StartDate,@EndDate)

    AND d.type='p' AND d.number BETWEEN 0 AND 30

    AND h.type='p' AND h.number BETWEEN 0 AND 23

    ) AS AllDates

    LEFT JOIN #SalesTestData s ON s.SaleTimeHour=AllDates.ReportDayHour

    GROUP BY AllDates.ReportDayHour

    ORDER BY AllDates.ReportDayHour

    ;

    SET STATISTICS TIME OFF;

    Here's the output on my ol' 9 year old, single CPU box. The method from the article is more than 12 times faster.

    ================================================================================

    Method similar to article

    ================================================================================

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 94 ms, elapsed time = 106 ms.

    (8760 row(s) affected)

    (8760 row(s) affected)

    SQL Server Execution Times:

    [font="Arial Black"] CPU time = 390 ms, elapsed time = 731 ms.[/font]

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 1 ms.

    ================================================================================

    Berzat's method

    ================================================================================

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    Warning: Null value is eliminated by an aggregate or other SET operation.

    (8760 row(s) affected)

    SQL Server Execution Times:

    [font="Arial Black"] CPU time = 9328 ms, elapsed time = 12354 ms.[/font]

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff, thank you for your effort measuring the two methods!

    It is true, my solution is a way slower as it is now. That is so because there is no index over the computed coumn in the example (how can it be SARGable without any indexes).

    After adding it, your query was only twice faster, which is ok I guess having the code optimized as it is 🙂

  • Jeff, I must tell you that your solution is briliant as well!

  • What's this COALESCE(SUM(SalesTestData.SaleQty),0) ?

    You do not know ISNULL ?

    ISNULL(SUM(SalesTestData.SaleQty),0)

  • Alexey Voronin (6/14/2011)


    What's this COALESCE(SUM(SalesTestData.SaleQty),0) ?

    You do not know ISNULL ?

    ISNULL(SUM(SalesTestData.SaleQty),0)

    They are interchangeable. Both are the same thing except COALESCE() allows for more than one evaluation.

    /* Anything is possible but is it worth it? */

  • Gatekeeper (6/14/2011)


    Alexey Voronin (6/14/2011)


    What's this COALESCE(SUM(SalesTestData.SaleQty),0) ?

    You do not know ISNULL ?

    ISNULL(SUM(SalesTestData.SaleQty),0)

    They are interchangeable. Both are the same thing except COALESCE() allows for more than one evaluation.

    Actually, they're not interchangeable at the semantic level. COALESCE is slower than ISNULL (although it takes a substantial number of rows to tell). COALESCE is in the ANSI standard where ISNULL is proprietary. COALESCE can change the datatype of the result based on which non-null element is selected (which can cause a major performance problem, BTW). ISNULL will keep the datatype of the return the same as the datatype of the first operand. COALESCE takes more keystrokes to type than ISNULL (BIG selling point for me. :-P)

    Heh... other than that, they're interchangeable when two operands are necessary. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • berzat.museski (6/14/2011)


    Jeff, I must tell you that your solution is briliant as well!

    Thanks for the compliment but, nah... I just used a variation of what is in the article. 🙂 That was the whole point, really. The code in the article is some of the fastest code you can find to do this and you don't need a custom column (frequently disallowed on real tables) nor an extra index to support it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (6/14/2011)


    COALESCE can change the datatype of the result based on which non-null element is selected (which can cause a major performance problem, BTW). ISNULL will keep the datatype of the return the same as the datatype of the first operand

    I've never run into that but it's good to learn something new! Thanks for the clarification.

    /* Anything is possible but is it worth it? */

Viewing 12 posts - 16 through 26 (of 26 total)

You must be logged in to reply to this topic. Login to reply