Averages and Medians without a cube

Question

Post reply

Averages and Medians without a cube

Luke L

SSC-Dedicated

Points: 35547
More actions
October 5, 2007 at 10:53 am

#70135

So I've got this nice little report that extracts data from my database and shows a particular set of data grouped by region with averages for certain things based on a particular day.
The problem is, TPTB want one for ranges of dates, like the last day of the month for an entire year or three. Great I understand the business value of it and such and I know what they want. The problem is we don't have analysis service running and since we're about a month or 2 away from moving from SQL Server 2000 to 2005 I'm not looking to setup analysis services just for this one report.
Here's some sample data obviously this is just a snippet, I have about 200k rows in the summary table about 20000 per year, and I just included the avg logic, the median logic is bulky complex and I figure if I can do it with the avg logic I can do it with the Med logic as well...
CREATE TABLE #Summary (
CaseNo INT IDENTITY PRIMARY KEY,
Office INT,
DateCaseOpened DATETIME,
DateCaseClosed DATETIME
)
INSERT INTO #summary VALUES (1, '2007-01-01','2007-02-15')
INSERT INTO #summary VALUES (3, '2006-07-20','2007-04-25')
INSERT INTO #summary VALUES (3, '2006-10-19','2007-03-11')
INSERT INTO #summary VALUES (2, '2007-02-12','2007-03-09')
INSERT INTO #summary VALUES (1, '2007-08-01',NULL)
INSERT INTO #summary VALUES (2, '2007-03-01','2007-06-18')
SELECT * FROM #summary
--DROP TABLE #summary
DECLARE @PendingDate datetime
SELECT @PendingDate = '2007-02-15'
SELECT Office, AVG(DATEDIFF(d, DateCaseOpened, @PendingDate))
FROM #summary
WHERE DateCaseOpened <= @PendingDate
AND (DateCaseClosed >= @PendingDate OR DateCaseClosed IS NULL)
GROUP BY Office
Now what they want is the same data but with the Pending Date of the last day of the month (easy enough to calculate) and show 3 years worth of data.
The only way I could think to do with is with a cursor or while loop and loop through the months writing the Office, PendingDate, and Avg out to a temp table and then querying that. The problem is performance is really slow, as I suspected it would be.
I'm just looking for some fresh eyes that might be able to suggest a set based approach to this. And, no I don't have a calendar table in my database, and no I'm not against putting one in there, I just haven't had a need yet perhaps this is the need.
TIA,
Luke.
To help us help you read this[/url]For better help with performance problems please read this[/url]

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 1

I did a nasty fast median for someone else (see the following URL)... perhaps you could modify it to suit your needs.

http://www.sqlservercentral.com/Forums/FindPost352218.aspx

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Phil Factor SSC-Insane Points: 20244 More actions · Answer 2

I hope this is a solution, or at least points the way to one! Unfortunately, I'm having real difficulty posting code into this forum for some reason, so please excuse the lack of formatting. The 'nanny' editing system takes out non-breaking spaces for a reason I can't fathom.

I'm creating a table of 'end of months' here, though you'd do better with a permanent table

__________________________________________________________

[font="Courier New"]--truncate table #month

CREATE TABLE #month (monthend DATETIME)

DECLARE @Month DATETIME

SELECT @Month=GETDATE()

WHILE @Month>'1 Jan 2006'

BEGIN

INSERT INTO #month

SELECT DATEADD(minute,-1,'01 '+CONVERT(CHAR(3),DATENAME(MONTH,@month))+' '+ DATENAME(YEAR,@month))

SELECT @Month= DATEADD(MONTH,-1,@Month)

END

CREATE TABLE #Summary (

CaseNo INT IDENTITY PRIMARY KEY,

Office INT,

DateCaseOpened DATETIME,

DateCaseClosed DATETIME

)

INSERT INTO #summary VALUES (1, '2007-01-01','2007-02-15')

INSERT INTO #summary VALUES (3, '2006-07-20','2007-04-25')

INSERT INTO #summary VALUES (3, '2006-10-19','2007-03-11')

INSERT INTO #summary VALUES (2, '2007-02-12','2007-03-09')

INSERT INTO #summary VALUES (1, '2007-08-01',NULL)

INSERT INTO #summary VALUES (2, '2007-03-01','2007-06-18')

--average days case has been open

SELECT

[month]=COALESCE(RIGHT(CONVERT (CHAR(11),monthend,113),8),'All'),

[Office 1]=SUM(CASE WHEN office=1 THEN [Av days open] ELSE 0 END),

[Office 2]=SUM(CASE WHEN office=2 THEN [Av days open] ELSE 0 END),

[Office 3]=SUM(CASE WHEN office=3 THEN [Av days open] ELSE 0 END),

[all]=SUM(CASE WHEN office IS NULL THEN COALESCE([Av days open],0) ELSE 0 END)

FROM

(SELECT Office, monthend, [cases]=COUNT(caseno),

[Av days open]=AVG(DATEDIFF(d, DateCaseOpened, monthend))

FROM #month LEFT OUTER JOIN #summary

ON monthend BETWEEN DateCaseOpened AND COALESCE(DateCaseClosed,GETDATE())

GROUP BY Office,monthend WITH CUBE)f

GROUP BY monthend

ORDER BY COALESCE(monthend,GETDATE())

--no ov open cases

SELECT

[month]=COALESCE(RIGHT(CONVERT (CHAR(11),monthend,113),8),'All'),

[Office 1]=SUM(CASE WHEN office=1 THEN [cases] ELSE 0 END),

[Office 2]=SUM(CASE WHEN office=2 THEN [cases] ELSE 0 END),

[Office 3]=SUM(CASE WHEN office=3 THEN [cases] ELSE 0 END),

[all]=SUM(CASE WHEN office IS NULL THEN COALESCE([cases],0) ELSE 0 END)

FROM

(SELECT Office, monthend, c,

[Av days open]=AVG(DATEDIFF(d, DateCaseOpened, monthend))

FROM #month LEFT OUTER JOIN #summary

ON monthend BETWEEN DateCaseOpened AND COALESCE(DateCaseClosed,GETDATE())

GROUP BY Office,monthend WITH CUBE)f

GROUP BY monthend

ORDER BY COALESCE(monthend,GETDATE())

[/font]

__________________________________________________

Sorry about these results, but the 'code' IFCode seems to be broken. It is putting double returns into what one types in. C'mon Steve, Steve and Steve. We need a fix.

month Office 1 Office 2 Office 3 all

-------- ----------- ----------- ----------- -----------

Dec 2005 0 0 0 0

Jan 2006 0 0 0 0

Feb 2006 0 0 0 0

Mar 2006 0 0 0 0

Apr 2006 0 0 0 0

May 2006 0 0 0 0

Jun 2006 0 0 0 0

Jul 2006 0 0 11 11

Aug 2006 0 0 42 42

Sep 2006 0 0 72 72

Oct 2006 0 0 57 57

Nov 2006 0 0 87 87

Dec 2006 0 0 118 118

Jan 2007 30 0 149 109

Feb 2007 0 16 177 123

Mar 2007 0 30 254 142

Apr 2007 0 60 0 60

May 2007 0 91 0 91

Jun 2007 0 0 0 0

Jul 2007 0 0 0 0

Aug 2007 30 0 0 30

Sep 2007 60 0 0 60

All 40 49 111 89

Warning: Null value is eliminated by an aggregate or other SET operation.

(23 row(s) affected)

Best wishes,
Phil Factor

Luke L SSC-Dedicated Points: 35547 More actions · Answer 3

Thanks for the quick replies guys, I'll have a look at it them a bit more indepth on Tuesday when I get back into the office.

And Phil I totally agree with the formatting bit on the IFCode window. I appreciate it being there, but the double returns are quite annoying.

I just wish I could still get the Simple Talk text formatter to still work from my office. It blows up our host based IPS system everytime I try to use it with all kinds of supposed SQL injection alerts and won't let me actually get to my formatted data.

You'd think that since Steve and Steve both work for redgate it's be ncie to see that SQL formatter ported to the IFCode window or something...

Anyhow thanks again guys I'll have a deeper look into the code on Tuesday.

-Luke.

To help us help you read this[/url]For better help with performance problems please read this[/url]

Phil Factor SSC-Insane Points: 20244 More actions · Answer 4

Yes, Redgate have a proper prettifier code in C# that is used in SQL Doc to produce HTML output which is much better than mine. I expect they could use that inside a special SQL tag.

The Simple-Talk prettifier is nothing to do with Red-Gate. It was something I wrote especially for doing forum work where 'proper' XHTML wasn't allowed. It was sheer cussedness that made me write it in TSQL, but it makes it very adaptable for prettifying other languages as everything is table-driven. I've done an IFCODE version which is what I was using for the posting and it works, barring the pesky removal of non-breaking space characters.

If the current prettifier is causing promlems, please let me have details. The one on the Simple-Talk site is used by the editors to do all the code formatting including the Workbenches, so it is pretty reliable.

Best of luck with the cube. I reckon you'll be OK without having to use cursors or Analysis services.

Best wishes,
Phil Factor

Luke L SSC-Dedicated Points: 35547 More actions · Answer 5

Jeff, I took a look at what you did for that other median and I guess I just don't understand what's going on. My original Ugly slow SQL was basically the same as the SQLmag article listing 3 (as that's the one I found most useful so far). However due to my data set size it's still a bit on the slow side (that and it pegs my CPU utilization for a bit). I don't think I really understand what you did to speed it up so much. It looks like you took her data and put it in a temp table with indexes and that solved the majority of the problem. I've yet to get to that, as I'm not sure just how much of my data I need to pull out to make that happen. Based on your code from the other thread it looks like perhaps I'd want to query for the data I want, the 2 dates and caseno for each month and add that to the temp table, or perhaps the caseno and make DeltaT the value of my datediff per case and then solve for the median with appropriate indexes and such.

I'll have to give that a go and see what I come out with.

If you have any other pointers let me know and I'll see if I can incorporate them.

-Luke.

To help us help you read this[/url]For better help with performance problems please read this[/url]

Luke L SSC-Dedicated Points: 35547 More actions · Answer 6

Phil, I'm curious about the code you posted. I changed a bit of the script and incorporated the join as opposed to the loop to get the dates, but I was wondering about the idea of calculating the "more correct" average. It seems you average and then Sum, whereas I'm summing and averaging at the same point and we come up with different results. The biggest issue is that niether of our averages equal what was originally shown in my Cognos report that tallied all of this to begin with. My numbers were closer (only off by a day or 2 here and there)

I know mine returns NULLS instead of zeros and doesn't return each month, but that's not a requirement as there will always be data for each month. I'm wondering if that might have something to do with it.

Here's the code the way I have changed it.

CREATE TABLE #Summary (

CaseNo INT IDENTITY PRIMARY KEY,

Office INT,

DateCaseOpened DATETIME,

DateCaseClosed DATETIME

)

INSERT INTO #summary VALUES (1, '2007-01-01','2007-02-15')

INSERT INTO #summary VALUES (3, '2006-07-20','2007-04-25')

INSERT INTO #summary VALUES (3, '2006-10-19','2007-03-11')

INSERT INTO #summary VALUES (2, '2007-02-12','2007-03-09')

INSERT INTO #summary VALUES (1, '2007-08-01',NULL)

INSERT INTO #summary VALUES (2, '2007-03-01','2007-06-18')

CREATE TABLE #TempDates (

monthend DATETIME

)

DECLARE @DateVar DATETIME

SELECT @DateVar = '2004-07-01'

WHILE @DateVar < GETDATE()

BEGIN

INSERT INTO #TempDates

SELECT DATEADD(D, -1,DATEADD(mm, DATEDIFF(mm,0,@DateVar), 0))

SELECT @DateVar = DATEADD(M, 1, @DateVar)

END

---========Mine =====--

SELECT TD.monthend,

[CO] = AVG(CASE WHEN office = 1 THEN DATEDIFF(d, cs.DateCaseOpened , TD.monthend)END),

[RO1] =AVG(CASE WHEN office = 2 THEN DATEDIFF(d, cs.DateCaseOpened , TD.monthend)END),

[RO2] =AVG(CASE WHEN office = 3 THEN DATEDIFF(d, cs.DateCaseOpened , TD.monthend)END),

[RO3] =AVG(CASE WHEN office = 4 THEN DATEDIFF(d, cs.DateCaseOpened , TD.monthend)END)

FROM #Summary CS,

#TempDates TD

WHERE cs.DateCaseOpened <= TD.monthend

AND (cs.DateCaseClosed > TD.monthend OR cs.DateCaseClosed IS NULL)

GROUP BY TD.monthend

ORDER BY td.monthend

---==== Yours ===---

SELECT

[month]=COALESCE(RIGHT(CONVERT (CHAR(11),monthend,113),8),'All'),

[Office 1]=SUM(CASE WHEN office=1 THEN [Av days open] ELSE 0 END),

[Office 2]=SUM(CASE WHEN office=2 THEN [Av days open] ELSE 0 END),

[Office 3]=SUM(CASE WHEN office=3 THEN [Av days open] ELSE 0 END),

[Office 4]=SUM(CASE WHEN office=4 THEN [Av days open] ELSE 0 END),

[all]=SUM(CASE WHEN office IS NULL THEN COALESCE([Av days open],0) ELSE 0 END)

FROM

(SELECT Office, monthend, [cases]=COUNT(caseno),

[Av days open]=AVG(DATEDIFF(d, DateCaseOpened, monthend))

FROM #TempDates LEFT OUTER JOIN #summary

ON monthend BETWEEN DateCaseOpened AND COALESCE(DateCaseClosed,GETDATE())

GROUP BY Office,monthend WITH CUBE)f

GROUP BY monthend

ORDER BY COALESCE(monthend,GETDATE())

DROP TABLE #Summary

DROP TABLE #TempDates

for right now this is what I'm going to run with.

thanks for the help guys.

-Luke.

To help us help you read this[/url]For better help with performance problems please read this[/url]

Phil Factor SSC-Insane Points: 20244 More actions · Answer 7

Hmm. Interesting about the averages. The averages in the example come straight from the CUBE (I hope) and the SUM() calculation is merely to do the pivot rotation. I suspect the AVG() you changed it to would give the same result since the output from the CUBE operation should give you only one entry per aggregation.

To check the figures, the best idea would be to do the raw CUBE and match that with the Cognos report. It is not beyond the bounds of possibility that I could have made a mistake somewhere. However, I'm pretty sure that the algorithm is reasonable.

If it has given you a few ideas, then I'm glad.

Best wishes,
Phil Factor