Averages and Medians without a cube

  • So I've got this nice little report that extracts data from my database and shows a particular set of data grouped by region with averages for certain things based on a particular day.

    The problem is, TPTB want one for ranges of dates, like the last day of the month for an entire year or three. Great I understand the business value of it and such and I know what they want. The problem is we don't have analysis service running and since we're about a month or 2 away from moving from SQL Server 2000 to 2005 I'm not looking to setup analysis services just for this one report.

    Here's some sample data obviously this is just a snippet, I have about 200k rows in the summary table about 20000 per year, and I just included the avg logic, the median logic is bulky complex and I figure if I can do it with the avg logic I can do it with the Med logic as well...

    CREATE TABLE #Summary (

    CaseNo INT IDENTITY PRIMARY KEY,

    Office INT,

    DateCaseOpened DATETIME,

    DateCaseClosed DATETIME

    )

    INSERT INTO #summary VALUES (1, '2007-01-01','2007-02-15')

    INSERT INTO #summary VALUES (3, '2006-07-20','2007-04-25')

    INSERT INTO #summary VALUES (3, '2006-10-19','2007-03-11')

    INSERT INTO #summary VALUES (2, '2007-02-12','2007-03-09')

    INSERT INTO #summary VALUES (1, '2007-08-01',NULL)

    INSERT INTO #summary VALUES (2, '2007-03-01','2007-06-18')

    SELECT * FROM #summary

    --DROP TABLE #summary

    DECLARE @PendingDate datetime

    SELECT @PendingDate = '2007-02-15'

    SELECT Office, AVG(DATEDIFF(d, DateCaseOpened, @PendingDate))

    FROM #summary

    WHERE DateCaseOpened <= @PendingDate

    AND (DateCaseClosed >= @PendingDate OR DateCaseClosed IS NULL)

    GROUP BY Office

    Now what they want is the same data but with the Pending Date of the last day of the month (easy enough to calculate) and show 3 years worth of data.

    The only way I could think to do with is with a cursor or while loop and loop through the months writing the Office, PendingDate, and Avg out to a temp table and then querying that. The problem is performance is really slow, as I suspected it would be.

    I'm just looking for some fresh eyes that might be able to suggest a set based approach to this. And, no I don't have a calendar table in my database, and no I'm not against putting one in there, I just haven't had a need yet perhaps this is the need.

    TIA,

    Luke.

    To help us help you read this[/url]For better help with performance problems please read this[/url]

  • I did a nasty fast median for someone else (see the following URL)... perhaps you could modify it to suit your needs.

    http://www.sqlservercentral.com/Forums/FindPost352218.aspx

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I hope this is a solution, or at least points the way to one! Unfortunately, I'm having real difficulty posting code into this forum for some reason, so please excuse the lack of formatting. The 'nanny' editing system takes out non-breaking spaces for a reason I can't fathom.

    I'm creating a table of 'end of months' here, though you'd do better with a permanent table

    __________________________________________________________

    [font="Courier New"]--truncate table #month

    CREATE TABLE #month (monthend DATETIME)

    DECLARE @Month DATETIME

    SELECT @Month=GETDATE()

    WHILE @Month>'1 Jan 2006'

    BEGIN

    INSERT INTO #month

    SELECT DATEADD(minute,-1,'01 '+CONVERT(CHAR(3),DATENAME(MONTH,@month))+' '+ DATENAME(YEAR,@month))

    SELECT @Month= DATEADD(MONTH,-1,@Month)

    END

    CREATE TABLE #Summary (

    CaseNo INT IDENTITY PRIMARY KEY,

    Office INT,

    DateCaseOpened DATETIME,

    DateCaseClosed DATETIME

    )

    INSERT INTO #summary VALUES (1, '2007-01-01','2007-02-15')

    INSERT INTO #summary VALUES (3, '2006-07-20','2007-04-25')

    INSERT INTO #summary VALUES (3, '2006-10-19','2007-03-11')

    INSERT INTO #summary VALUES (2, '2007-02-12','2007-03-09')

    INSERT INTO #summary VALUES (1, '2007-08-01',NULL)

    INSERT INTO #summary VALUES (2, '2007-03-01','2007-06-18')

    --average days case has been open

    SELECT

    [month]=COALESCE(RIGHT(CONVERT (CHAR(11),monthend,113),8),'All'),

    [Office 1]=SUM(CASE WHEN office=1 THEN [Av days open] ELSE 0 END),

    [Office 2]=SUM(CASE WHEN office=2 THEN [Av days open] ELSE 0 END),

    [Office 3]=SUM(CASE WHEN office=3 THEN [Av days open] ELSE 0 END),

    [all]=SUM(CASE WHEN office IS NULL THEN COALESCE([Av days open],0) ELSE 0 END)

    FROM

    (SELECT Office, monthend, [cases]=COUNT(caseno),

    [Av days open]=AVG(DATEDIFF(d, DateCaseOpened, monthend))

    FROM #month LEFT OUTER JOIN #summary

    ON monthend BETWEEN DateCaseOpened AND COALESCE(DateCaseClosed,GETDATE())

    GROUP BY Office,monthend WITH CUBE)f

    GROUP BY monthend

    ORDER BY COALESCE(monthend,GETDATE())

    --no ov open cases

    SELECT

    [month]=COALESCE(RIGHT(CONVERT (CHAR(11),monthend,113),8),'All'),

    [Office 1]=SUM(CASE WHEN office=1 THEN [cases] ELSE 0 END),

    [Office 2]=SUM(CASE WHEN office=2 THEN [cases] ELSE 0 END),

    [Office 3]=SUM(CASE WHEN office=3 THEN [cases] ELSE 0 END),

    [all]=SUM(CASE WHEN office IS NULL THEN COALESCE([cases],0) ELSE 0 END)

    FROM

    (SELECT Office, monthend, c,

    [Av days open]=AVG(DATEDIFF(d, DateCaseOpened, monthend))

    FROM #month LEFT OUTER JOIN #summary

    ON monthend BETWEEN DateCaseOpened AND COALESCE(DateCaseClosed,GETDATE())

    GROUP BY Office,monthend WITH CUBE)f

    GROUP BY monthend

    ORDER BY COALESCE(monthend,GETDATE())

    [/font]

    __________________________________________________

    Sorry about these results, but the 'code' IFCode seems to be broken. It is putting double returns into what one types in. C'mon Steve, Steve and Steve. We need a fix.

    month Office 1 Office 2 Office 3 all

    -------- ----------- ----------- ----------- -----------

    Dec 2005 0 0 0 0

    Jan 2006 0 0 0 0

    Feb 2006 0 0 0 0

    Mar 2006 0 0 0 0

    Apr 2006 0 0 0 0

    May 2006 0 0 0 0

    Jun 2006 0 0 0 0

    Jul 2006 0 0 11 11

    Aug 2006 0 0 42 42

    Sep 2006 0 0 72 72

    Oct 2006 0 0 57 57

    Nov 2006 0 0 87 87

    Dec 2006 0 0 118 118

    Jan 2007 30 0 149 109

    Feb 2007 0 16 177 123

    Mar 2007 0 30 254 142

    Apr 2007 0 60 0 60

    May 2007 0 91 0 91

    Jun 2007 0 0 0 0

    Jul 2007 0 0 0 0

    Aug 2007 30 0 0 30

    Sep 2007 60 0 0 60

    All 40 49 111 89

    Warning: Null value is eliminated by an aggregate or other SET operation.

    (23 row(s) affected)

    Best wishes,
    Phil Factor

  • Thanks for the quick replies guys, I'll have a look at it them a bit more indepth on Tuesday when I get back into the office.

    And Phil I totally agree with the formatting bit on the IFCode window. I appreciate it being there, but the double returns are quite annoying.

    I just wish I could still get the Simple Talk text formatter to still work from my office. It blows up our host based IPS system everytime I try to use it with all kinds of supposed SQL injection alerts and won't let me actually get to my formatted data.

    You'd think that since Steve and Steve both work for redgate it's be ncie to see that SQL formatter ported to the IFCode window or something...

    Anyhow thanks again guys I'll have a deeper look into the code on Tuesday.

    -Luke.

    To help us help you read this[/url]For better help with performance problems please read this[/url]

  • Yes, Redgate have a proper prettifier code in C# that is used in SQL Doc to produce HTML output which is much better than mine. I expect they could use that inside a special SQL tag.

    The Simple-Talk prettifier is nothing to do with Red-Gate. It was something I wrote especially for doing forum work where 'proper' XHTML wasn't allowed. It was sheer cussedness that made me write it in TSQL, but it makes it very adaptable for prettifying other languages as everything is table-driven. I've done an IFCODE version which is what I was using for the posting and it works, barring the pesky removal of non-breaking space characters.

    If the current prettifier is causing promlems, please let me have details. The one on the Simple-Talk site is used by the editors to do all the code formatting including the Workbenches, so it is pretty reliable.

    Best of luck with the cube. I reckon you'll be OK without having to use cursors or Analysis services.

    Best wishes,
    Phil Factor

  • Jeff, I took a look at what you did for that other median and I guess I just don't understand what's going on. My original Ugly slow SQL was basically the same as the SQLmag article listing 3 (as that's the one I found most useful so far). However due to my data set size it's still a bit on the slow side (that and it pegs my CPU utilization for a bit). I don't think I really understand what you did to speed it up so much. It looks like you took her data and put it in a temp table with indexes and that solved the majority of the problem. I've yet to get to that, as I'm not sure just how much of my data I need to pull out to make that happen. Based on your code from the other thread it looks like perhaps I'd want to query for the data I want, the 2 dates and caseno for each month and add that to the temp table, or perhaps the caseno and make DeltaT the value of my datediff per case and then solve for the median with appropriate indexes and such.

    I'll have to give that a go and see what I come out with.

    If you have any other pointers let me know and I'll see if I can incorporate them.

    -Luke.

    To help us help you read this[/url]For better help with performance problems please read this[/url]

  • Phil, I'm curious about the code you posted. I changed a bit of the script and incorporated the join as opposed to the loop to get the dates, but I was wondering about the idea of calculating the "more correct" average. It seems you average and then Sum, whereas I'm summing and averaging at the same point and we come up with different results. The biggest issue is that niether of our averages equal what was originally shown in my Cognos report that tallied all of this to begin with. My numbers were closer (only off by a day or 2 here and there)

    I know mine returns NULLS instead of zeros and doesn't return each month, but that's not a requirement as there will always be data for each month. I'm wondering if that might have something to do with it.

    Here's the code the way I have changed it.

    CREATE TABLE #Summary (

    CaseNo INT IDENTITY PRIMARY KEY,

    Office INT,

    DateCaseOpened DATETIME,

    DateCaseClosed DATETIME

    )

    INSERT INTO #summary VALUES (1, '2007-01-01','2007-02-15')

    INSERT INTO #summary VALUES (3, '2006-07-20','2007-04-25')

    INSERT INTO #summary VALUES (3, '2006-10-19','2007-03-11')

    INSERT INTO #summary VALUES (2, '2007-02-12','2007-03-09')

    INSERT INTO #summary VALUES (1, '2007-08-01',NULL)

    INSERT INTO #summary VALUES (2, '2007-03-01','2007-06-18')

    CREATE TABLE #TempDates (

    monthend DATETIME

    )

    DECLARE @DateVar DATETIME

    SELECT @DateVar = '2004-07-01'

    WHILE @DateVar < GETDATE()

    BEGIN

    INSERT INTO #TempDates

    SELECT DATEADD(D, -1,DATEADD(mm, DATEDIFF(mm,0,@DateVar), 0))

    SELECT @DateVar = DATEADD(M, 1, @DateVar)

    END

    ---========Mine =====--

    SELECT TD.monthend,

    [CO] = AVG(CASE WHEN office = 1 THEN DATEDIFF(d, cs.DateCaseOpened , TD.monthend)END),

    [RO1] =AVG(CASE WHEN office = 2 THEN DATEDIFF(d, cs.DateCaseOpened , TD.monthend)END),

    [RO2] =AVG(CASE WHEN office = 3 THEN DATEDIFF(d, cs.DateCaseOpened , TD.monthend)END),

    [RO3] =AVG(CASE WHEN office = 4 THEN DATEDIFF(d, cs.DateCaseOpened , TD.monthend)END)

    FROM #Summary CS,

    #TempDates TD

    WHERE cs.DateCaseOpened <= TD.monthend

    AND (cs.DateCaseClosed > TD.monthend OR cs.DateCaseClosed IS NULL)

    GROUP BY TD.monthend

    ORDER BY td.monthend

    ---==== Yours ===---

    SELECT

    [month]=COALESCE(RIGHT(CONVERT (CHAR(11),monthend,113),8),'All'),

    [Office 1]=SUM(CASE WHEN office=1 THEN [Av days open] ELSE 0 END),

    [Office 2]=SUM(CASE WHEN office=2 THEN [Av days open] ELSE 0 END),

    [Office 3]=SUM(CASE WHEN office=3 THEN [Av days open] ELSE 0 END),

    [Office 4]=SUM(CASE WHEN office=4 THEN [Av days open] ELSE 0 END),

    [all]=SUM(CASE WHEN office IS NULL THEN COALESCE([Av days open],0) ELSE 0 END)

    FROM

    (SELECT Office, monthend, [cases]=COUNT(caseno),

    [Av days open]=AVG(DATEDIFF(d, DateCaseOpened, monthend))

    FROM #TempDates LEFT OUTER JOIN #summary

    ON monthend BETWEEN DateCaseOpened AND COALESCE(DateCaseClosed,GETDATE())

    GROUP BY Office,monthend WITH CUBE)f

    GROUP BY monthend

    ORDER BY COALESCE(monthend,GETDATE())

    DROP TABLE #Summary

    DROP TABLE #TempDates

    for right now this is what I'm going to run with.

    thanks for the help guys.

    -Luke.

    To help us help you read this[/url]For better help with performance problems please read this[/url]

  • Hmm. Interesting about the averages. The averages in the example come straight from the CUBE (I hope) and the SUM() calculation is merely to do the pivot rotation. I suspect the AVG() you changed it to would give the same result since the output from the CUBE operation should give you only one entry per aggregation.

    To check the figures, the best idea would be to do the raw CUBE and match that with the Cognos report. It is not beyond the bounds of possibility that I could have made a mistake somewhere. However, I'm pretty sure that the algorithm is reasonable.

    If it has given you a few ideas, then I'm glad.

    Best wishes,
    Phil Factor

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply