SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Calculating Median, Max, Min, Lower Quartile and Upper Quartile for a Boxchart?


Calculating Median, Max, Min, Lower Quartile and Upper Quartile for a Boxchart?

Author
Message
DataAnalyst011
DataAnalyst011
Old Hand
Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)

Group: General Forum Members
Points: 352 Visits: 429
Hi All - My boss asked me to create a boxchart using SSRS for some of our data. I read articles all morning but honestly, I'm at a loss on how accomplish this. I think I'm growing in my SQL knowledge, but this one has me stumped. I created a sample table that looks like what I'll be working with below. If anyone could lend a hand it would really help me!

The output should be something like below with values filled in for x:
Year   Median   Max   Min  LowerQ   UpperQ
2010 x x x x x
2011 x x x x x
2012 x x x x x


CREATE TABLE EStats    
(
PersonID VARCHAR(30) NOT NULL,
Grade VARCHAR(25) NOT NULL,
CourseDate Date NOT NULL
)
;

INSERT INTO EStats
(
PersonID, Grade, CourseDate
)

VALUES
('100', '91', '2010-03-01'),
('101', '96', '2010-03-01'),
('102', '88', '2010-03-01'),
('103', '92', '2010-03-01'),
('104', '81', '2010-03-01'),
('105', '85', '2010-03-01'),
('106', '91', '2010-03-01'),
('107', '89', '2010-03-01'),
('108', '99', '2010-03-01'),
('109', '88', '2010-03-01'),
('110', '81', '2011-03-02'),
('111', '77', '2011-03-02'),
('112', '88', '2011-03-02'),
('113', '76', '2011-03-02'),
('114', '69', '2011-03-02'),
('115', '70', '2011-03-02'),
('116', '75', '2011-03-02'),
('117', '88', '2011-03-02'),
('118', '76', '2011-03-02'),
('119', '95', '2012-03-01'),
('120', '96', '2012-03-01'),
('121', '90', '2012-03-01'),
('122', '80', '2012-03-01'),
('123', '85', '2012-03-01'),
('124', '94', '2012-03-01'),
('125', '89', '2012-03-01'),
('126', '97', '2012-03-01'),
('127', '94', '2012-03-01'),
('128', '72', '2012-03-01'),
('129', '88', '2012-03-01'),
('130', '91', '2012-03-01')


bitbucket-25253
bitbucket-25253
SSCertifiable
SSCertifiable (7.9K reputation)SSCertifiable (7.9K reputation)SSCertifiable (7.9K reputation)SSCertifiable (7.9K reputation)SSCertifiable (7.9K reputation)SSCertifiable (7.9K reputation)SSCertifiable (7.9K reputation)SSCertifiable (7.9K reputation)

Group: General Forum Members
Points: 7857 Visits: 25280
Assume it is the median value that has you stumped, try reading this, it works for myself.

http://www.mssqltips.com/sqlservertip/2523/script-to-calculate-the-median-value-for-sql-server-data/?utm_source=dailynewsletter&utm_medium=email&utm_content=headline&utm_campaign=20111030

If everything seems to be going well, you have obviously overlooked something.

Ron

Please help us, help you -before posting a question please read

Before posting a performance problem please read
Rich Mechaber
Rich Mechaber
SSCommitted
SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)

Group: General Forum Members
Points: 1555 Visits: 3665
I had to do this as well, so you get my supply of bookmarks to help you on your way:
http://www.sqlmag.com/article/tsql3/calculating-the-median-gets-simpler-in-sql-server-2005
http://sqlblog.com/blogs/adam_machanic/archive/2006/12/18/medians-row-numbers-and-performance.aspx
http://sqlblog.com/blogs/peter_debetta/archive/2006/12/20/Medians_Actual_Query_Cost_and_Statistics.aspx

The first one by Itzik Ben-Gan is the one I used.

And lastly, a link to one of my prior posts here on SSC. It includes some add'l explanatory notes I wrote to remind myself how Ben-Gan's logic actually works:
http://www.sqlservercentral.com/Forums/Topic923660-338-1.aspx#bm1105837

HTH,
Rich
DataAnalyst011
DataAnalyst011
Old Hand
Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)

Group: General Forum Members
Points: 352 Visits: 429
Awesome. Thanks a ton!
DataAnalyst011
DataAnalyst011
Old Hand
Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)

Group: General Forum Members
Points: 352 Visits: 429
Actually, I'm having some trouble converting over the example to my sample table above. I just took the grades for 03-01-2012. The median should be the average between 90 and 91 (so 90.5), correct? This gets that result set:

SELECT grade 
FROM estats
WHERE CourseDate = '2012-03-01'
ORDER BY grade DESC



But when I run the below (which is an attempt to convert the example to my sample data above, I get 89.25. I'm sure I'm making a dumb mistake somewhere, but any help is appreciated.

WITH cte AS
(
SELECT EStats.PersonID, EStats.Grade,
ROW_NUMBER() OVER(PARTITION BY EStats.PersonID ORDER BY EStats.GRADE) AS RN,
COUNT(*) OVER(PARTITION BY EStats.PersonID) AS Cnt
FROM EStats
WHERE EStats.CourseDate = '2012-03-01'
)

SELECT AVG(CAST(cte.Grade AS Numeric)) AS Median
FROM cte
WHERE RN IN((cte.Cnt + 1) / 2, (Cnt + 2) / 2)


dwain.c
dwain.c
SSCertifiable
SSCertifiable (7.4K reputation)SSCertifiable (7.4K reputation)SSCertifiable (7.4K reputation)SSCertifiable (7.4K reputation)SSCertifiable (7.4K reputation)SSCertifiable (7.4K reputation)SSCertifiable (7.4K reputation)SSCertifiable (7.4K reputation)

Group: General Forum Members
Points: 7357 Visits: 6431
Maybe you need to:


PARTITION BY LEFT(CourseDate, 4)



?


My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
My temporal SQL musings: Calendar Tables, an Easter SQL, Time Slots and Self-maintaining, Contiguous Effective Dates in Temporal Tables
DataAnalyst011
DataAnalyst011
Old Hand
Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)

Group: General Forum Members
Points: 352 Visits: 429
Alright, its time to confess I'm still not getting this. Would anyone mind taking my original DDL and calculating median and lower/upper quartiles? I think if I can see it with my example data it would click for me. Any help is really appreciated!
DataAnalyst011
DataAnalyst011
Old Hand
Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)

Group: General Forum Members
Points: 352 Visits: 429
Below another attempt, but I seem to be running into the same problem. For instance, 2012-03-01 should produce 90.5, but the below gives me 89.25.

SELECT
CourseDate,
AVG(CAST(Grade AS Numeric))

FROM
(
SELECT
CourseDate,
Grade,
ROW_NUMBER() OVER (
PARTITION BY PersonID
ORDER BY Grade ASC, CourseDate ASC) AS RowAsc,
ROW_NUMBER() OVER (
PARTITION BY PersonID
ORDER BY Grade DESC, CourseDate DESC) AS RowDesc
FROM EStats
)x
WHERE
RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)
GROUP BY CourseDate
ORDER BY CourseDate



UPDATE: I think the problem is my PARTITION BY isn't working. In other words, the inner query produces a row_number with all 1's so there is nothing to match. Any ideas on a fix?
DataAnalyst011
DataAnalyst011
Old Hand
Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)

Group: General Forum Members
Points: 352 Visits: 429
Okay, using Dwain's suggestion I'm close to getting everything working. My only problem is the row_number in my inner query isn't working well when the numbers are the same. For instance, when you run my inner query...

      SELECT
CourseDate,
Grade,
ROW_NUMBER() OVER (
PARTITION BY LEFT(CourseDate, 4)
ORDER BY Grade ASC) AS RowAsc,
ROW_NUMBER() OVER (
PARTITION BY LEFT(CourseDate, 4)
ORDER BY Grade DESC) AS RowDesc
FROM EStats



...you'll notice it does this for 2010-03-01:

RowAsc
10
9
8
6
7
5
3
4
2
1

I think that is throwing off the snippet I added which Dwain suggested (THANKS, by the way!). Any suggestions on how to fix this?

For those interested, here's what I have. I'm sure its about the worst way I could do this, but its all I've got at this stage in my SQL knowledge! Once I get the row_number ordering issue worked out, I think it will work.

WITH Q3 AS
(
SELECT
CourseDate,
AVG(CAST(Grade AS Numeric)) AS Median

FROM
(
SELECT
CourseDate,
Grade,
ROW_NUMBER() OVER (
PARTITION BY LEFT(CourseDate, 4)
ORDER BY Grade ASC) AS RowAsc,
ROW_NUMBER() OVER (
PARTITION BY LEFT(CourseDate, 4)
ORDER BY Grade DESC) AS RowDesc
FROM EStats
)x
WHERE
RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)
GROUP BY CourseDate
--ORDER BY CourseDate
),

Q2 AS
(
SELECT
x.CourseDate,
AVG(CAST(Grade AS Numeric)) AS LowerQuartile

FROM
(
SELECT
Estats.CourseDate,
Estats.Grade,
ROW_NUMBER() OVER (
PARTITION BY LEFT(EStats.CourseDate, 4)
ORDER BY Grade ASC) AS RowAsc,
ROW_NUMBER() OVER (
PARTITION BY LEFT(Estats.CourseDate, 4)
ORDER BY Grade DESC) AS RowDesc
FROM EStats JOIN Q3 on EStats.CourseDate = Q3.CourseDate
WHERE EStats.Grade < Q3.Median
)x
WHERE
RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)
GROUP BY x.CourseDate
),

Q4 AS
(
SELECT
x.CourseDate,
AVG(CAST(Grade AS Numeric)) AS UpperQuartile

FROM
(
SELECT
Estats.CourseDate,
Estats.Grade,
ROW_NUMBER() OVER (
PARTITION BY LEFT(EStats.CourseDate, 4)
ORDER BY Grade ASC) AS RowAsc,
ROW_NUMBER() OVER (
PARTITION BY LEFT(Estats.CourseDate, 4)
ORDER BY Grade DESC) AS RowDesc
FROM EStats JOIN Q3 on EStats.CourseDate = Q3.CourseDate
WHERE EStats.Grade > Q3.Median
)x
WHERE
RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)
GROUP BY x.CourseDate
)

SELECT Q3.CourseDate, Q3.Median AS Median, Q2.LowerQuartile, Q4.UpperQuartile, MIN(EStats.Grade) AS Min, MAX(EStats.Grade) AS Max
FROM Q3
JOIN Q2 ON Q3.CourseDate = Q2.CourseDate
JOIN Q4 ON Q3.CourseDate = Q4.CourseDate
JOIN EStats ON Q3.CourseDate = EStats.CourseDate
GROUP BY Q3.CourseDate, Q3.Median, Q2.LowerQuartile, Q4.UpperQuartile
ORDER BY Q3.CourseDate


DataAnalyst011
DataAnalyst011
Old Hand
Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)Old Hand (352 reputation)

Group: General Forum Members
Points: 352 Visits: 429
I'm just going to bump this because I rewrote the above post and added what I've got to date...
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search