Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Calculating Median, Max, Min, Lower Quartile and Upper Quartile for a Boxchart? Expand / Collapse
Author
Message
Posted Thursday, January 17, 2013 12:45 PM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: 2 days ago @ 7:49 AM
Points: 92, Visits: 341
Hi All - My boss asked me to create a boxchart using SSRS for some of our data. I read articles all morning but honestly, I'm at a loss on how accomplish this. I think I'm growing in my SQL knowledge, but this one has me stumped. I created a sample table that looks like what I'll be working with below. If anyone could lend a hand it would really help me!

The output should be something like below with values filled in for x:
Year   Median   Max   Min  LowerQ   UpperQ
2010 x x x x x
2011 x x x x x
2012 x x x x x


CREATE TABLE EStats    
(
PersonID VARCHAR(30) NOT NULL,
Grade VARCHAR(25) NOT NULL,
CourseDate Date NOT NULL
)
;

INSERT INTO EStats
(
PersonID, Grade, CourseDate
)

VALUES
('100', '91', '2010-03-01'),
('101', '96', '2010-03-01'),
('102', '88', '2010-03-01'),
('103', '92', '2010-03-01'),
('104', '81', '2010-03-01'),
('105', '85', '2010-03-01'),
('106', '91', '2010-03-01'),
('107', '89', '2010-03-01'),
('108', '99', '2010-03-01'),
('109', '88', '2010-03-01'),
('110', '81', '2011-03-02'),
('111', '77', '2011-03-02'),
('112', '88', '2011-03-02'),
('113', '76', '2011-03-02'),
('114', '69', '2011-03-02'),
('115', '70', '2011-03-02'),
('116', '75', '2011-03-02'),
('117', '88', '2011-03-02'),
('118', '76', '2011-03-02'),
('119', '95', '2012-03-01'),
('120', '96', '2012-03-01'),
('121', '90', '2012-03-01'),
('122', '80', '2012-03-01'),
('123', '85', '2012-03-01'),
('124', '94', '2012-03-01'),
('125', '89', '2012-03-01'),
('126', '97', '2012-03-01'),
('127', '94', '2012-03-01'),
('128', '72', '2012-03-01'),
('129', '88', '2012-03-01'),
('130', '91', '2012-03-01')

Post #1408572
Posted Thursday, January 17, 2013 1:22 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 9:50 AM
Points: 5,595, Visits: 24,985
Assume it is the median value that has you stumped, try reading this, it works for myself.

http://www.mssqltips.com/sqlservertip/2523/script-to-calculate-the-median-value-for-sql-server-data/?utm_source=dailynewsletter&utm_medium=email&utm_content=headline&utm_campaign=20111030


If everything seems to be going well, you have obviously overlooked something.

Ron

Please help us, help you -before posting a question please read

Before posting a performance problem please read
Post #1408597
Posted Thursday, January 17, 2013 1:35 PM


Say Hey Kid

Say Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey Kid

Group: General Forum Members
Last Login: Thursday, July 17, 2014 2:01 PM
Points: 687, Visits: 3,002
I had to do this as well, so you get my supply of bookmarks to help you on your way:
http://www.sqlmag.com/article/tsql3/calculating-the-median-gets-simpler-in-sql-server-2005
http://sqlblog.com/blogs/adam_machanic/archive/2006/12/18/medians-row-numbers-and-performance.aspx
http://sqlblog.com/blogs/peter_debetta/archive/2006/12/20/Medians_Actual_Query_Cost_and_Statistics.aspx

The first one by Itzik Ben-Gan is the one I used.

And lastly, a link to one of my prior posts here on SSC. It includes some add'l explanatory notes I wrote to remind myself how Ben-Gan's logic actually works:
http://www.sqlservercentral.com/Forums/Topic923660-338-1.aspx#bm1105837

HTH,
Rich
Post #1408604
Posted Thursday, January 17, 2013 2:27 PM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: 2 days ago @ 7:49 AM
Points: 92, Visits: 341
Awesome. Thanks a ton!
Post #1408618
Posted Thursday, January 17, 2013 4:40 PM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: 2 days ago @ 7:49 AM
Points: 92, Visits: 341
Actually, I'm having some trouble converting over the example to my sample table above. I just took the grades for 03-01-2012. The median should be the average between 90 and 91 (so 90.5), correct? This gets that result set:

SELECT grade 
FROM estats
WHERE CourseDate = '2012-03-01'
ORDER BY grade DESC

But when I run the below (which is an attempt to convert the example to my sample data above, I get 89.25. I'm sure I'm making a dumb mistake somewhere, but any help is appreciated.

WITH cte AS
(
SELECT EStats.PersonID, EStats.Grade,
ROW_NUMBER() OVER(PARTITION BY EStats.PersonID ORDER BY EStats.GRADE) AS RN,
COUNT(*) OVER(PARTITION BY EStats.PersonID) AS Cnt
FROM EStats
WHERE EStats.CourseDate = '2012-03-01'
)

SELECT AVG(CAST(cte.Grade AS Numeric)) AS Median
FROM cte
WHERE RN IN((cte.Cnt + 1) / 2, (Cnt + 2) / 2)

Post #1408669
Posted Thursday, January 17, 2013 6:24 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: 2 days ago @ 8:30 PM
Points: 3,627, Visits: 5,273
Maybe you need to:

PARTITION BY LEFT(CourseDate, 4)

?



My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1408687
Posted Tuesday, January 22, 2013 7:38 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: 2 days ago @ 7:49 AM
Points: 92, Visits: 341
Alright, its time to confess I'm still not getting this. Would anyone mind taking my original DDL and calculating median and lower/upper quartiles? I think if I can see it with my example data it would click for me. Any help is really appreciated!
Post #1410037
Posted Tuesday, January 22, 2013 8:33 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: 2 days ago @ 7:49 AM
Points: 92, Visits: 341
Below another attempt, but I seem to be running into the same problem. For instance, 2012-03-01 should produce 90.5, but the below gives me 89.25.

SELECT
CourseDate,
AVG(CAST(Grade AS Numeric))

FROM
(
SELECT
CourseDate,
Grade,
ROW_NUMBER() OVER (
PARTITION BY PersonID
ORDER BY Grade ASC, CourseDate ASC) AS RowAsc,
ROW_NUMBER() OVER (
PARTITION BY PersonID
ORDER BY Grade DESC, CourseDate DESC) AS RowDesc
FROM EStats
)x
WHERE
RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)
GROUP BY CourseDate
ORDER BY CourseDate

UPDATE: I think the problem is my PARTITION BY isn't working. In other words, the inner query produces a row_number with all 1's so there is nothing to match. Any ideas on a fix?
Post #1410068
Posted Tuesday, January 22, 2013 8:43 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: 2 days ago @ 7:49 AM
Points: 92, Visits: 341
Okay, using Dwain's suggestion I'm close to getting everything working. My only problem is the row_number in my inner query isn't working well when the numbers are the same. For instance, when you run my inner query...

		SELECT
CourseDate,
Grade,
ROW_NUMBER() OVER (
PARTITION BY LEFT(CourseDate, 4)
ORDER BY Grade ASC) AS RowAsc,
ROW_NUMBER() OVER (
PARTITION BY LEFT(CourseDate, 4)
ORDER BY Grade DESC) AS RowDesc
FROM EStats

...you'll notice it does this for 2010-03-01:

RowAsc
10
9
8
6
7
5
3
4
2
1

I think that is throwing off the snippet I added which Dwain suggested (THANKS, by the way!). Any suggestions on how to fix this?

For those interested, here's what I have. I'm sure its about the worst way I could do this, but its all I've got at this stage in my SQL knowledge! Once I get the row_number ordering issue worked out, I think it will work.

WITH Q3 AS
(
SELECT
CourseDate,
AVG(CAST(Grade AS Numeric)) AS Median

FROM
(
SELECT
CourseDate,
Grade,
ROW_NUMBER() OVER (
PARTITION BY LEFT(CourseDate, 4)
ORDER BY Grade ASC) AS RowAsc,
ROW_NUMBER() OVER (
PARTITION BY LEFT(CourseDate, 4)
ORDER BY Grade DESC) AS RowDesc
FROM EStats
)x
WHERE
RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)
GROUP BY CourseDate
--ORDER BY CourseDate
),

Q2 AS
(
SELECT
x.CourseDate,
AVG(CAST(Grade AS Numeric)) AS LowerQuartile

FROM
(
SELECT
Estats.CourseDate,
Estats.Grade,
ROW_NUMBER() OVER (
PARTITION BY LEFT(EStats.CourseDate, 4)
ORDER BY Grade ASC) AS RowAsc,
ROW_NUMBER() OVER (
PARTITION BY LEFT(Estats.CourseDate, 4)
ORDER BY Grade DESC) AS RowDesc
FROM EStats JOIN Q3 on EStats.CourseDate = Q3.CourseDate
WHERE EStats.Grade < Q3.Median
)x
WHERE
RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)
GROUP BY x.CourseDate
),

Q4 AS
(
SELECT
x.CourseDate,
AVG(CAST(Grade AS Numeric)) AS UpperQuartile

FROM
(
SELECT
Estats.CourseDate,
Estats.Grade,
ROW_NUMBER() OVER (
PARTITION BY LEFT(EStats.CourseDate, 4)
ORDER BY Grade ASC) AS RowAsc,
ROW_NUMBER() OVER (
PARTITION BY LEFT(Estats.CourseDate, 4)
ORDER BY Grade DESC) AS RowDesc
FROM EStats JOIN Q3 on EStats.CourseDate = Q3.CourseDate
WHERE EStats.Grade > Q3.Median
)x
WHERE
RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)
GROUP BY x.CourseDate
)

SELECT Q3.CourseDate, Q3.Median AS Median, Q2.LowerQuartile, Q4.UpperQuartile, MIN(EStats.Grade) AS Min, MAX(EStats.Grade) AS Max
FROM Q3
JOIN Q2 ON Q3.CourseDate = Q2.CourseDate
JOIN Q4 ON Q3.CourseDate = Q4.CourseDate
JOIN EStats ON Q3.CourseDate = EStats.CourseDate
GROUP BY Q3.CourseDate, Q3.Median, Q2.LowerQuartile, Q4.UpperQuartile
ORDER BY Q3.CourseDate

Post #1410075
Posted Tuesday, January 22, 2013 1:37 PM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: 2 days ago @ 7:49 AM
Points: 92, Visits: 341
I'm just going to bump this because I rewrote the above post and added what I've got to date...
Post #1410252
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse