RE: Calculating Median, Max, Min, Lower Quartile and Upper Quartile for a Boxchart?

SSC Guru

Points: 86908

January 23, 2013 at 6:25 pm

roryp 96873 (1/23/2013)
Well, definitely not the prettiest thing I've ever written...

Are you kidding? Ugly is my middle name... 😛

DataAnalyst011 (1/23/2013)
Year Median Max Min LowerQ UpperQ 2010 90 99 81 88 92 2011 76 88 69 72.5 84.5 2012 90.5 97 72 86.5 94.5

I don't think the value in bold is correct. If I understand your computation correctly you're trying to find the median of the lower half of the grades for 2011, which would be in this set from your input stream:

114 69.0 2011-03-02

115 70.0 2011-03-02

116 75.0 2011-03-02

118 76.0 2011-03-02

113 76.0 2011-03-02

And that value is 116's score of 75.

So here is my solution, my ugly baby. Gotta love it cause it's my baby!

;WITH [Stats] AS (

SELECT PersonID, Grade, CourseDate, [Year]

,Median=CASE WHEN Count1%2 = 0 AND rn1 IN (Count1/2, (Count1/2)+1) THEN Grade

WHEN Count1%2 = 1 AND rn1 = (Count1/2)+1 THEN Grade END

-- Used to establish median over quartiles

,rn2=ROW_NUMBER() OVER (PARTITION BY [Year], Quartile ORDER BY Grade)

,Count2=COUNT(*) OVER (PARTITION BY [Year], Quartile)

,Quartile

FROM (

SELECT PersonID, Grade=CAST(Grade AS DECIMAL(10,1)), CourseDate, [Year]

-- Used to establish median over years

,rn1=ROW_NUMBER() OVER (PARTITION BY [Year] ORDER BY Grade)

,Count1=COUNT(*) OVER (PARTITION BY [Year])

-- Break the grades into two groups to establish "median" in a quartile

,Quartile=NTILE(2) OVER (PARTITION BY [Year] ORDER BY Grade)

FROM EStats

CROSS APPLY (SELECT [Year]=LEFT(CourseDate, 4)) a) a

)

SELECT [Year]

,Median=AVG(Median)

,[Max]=MAX(Grade)

,[Min]=MIN(Grade)

,LowerQ=AVG(CASE

WHEN Quartile = 1 AND Count2%2 = 0 AND rn2 IN (Count2/2, (Count2/2)+1) THEN Grade

WHEN Quartile = 1 AND Count2%2 = 1 AND rn2 = (Count2/2)+1 THEN Grade END)

,UpperQ=AVG(CASE

WHEN Quartile = 2 AND Count2%2 = 0 AND rn2 IN (Count2/2, (Count2/2)+1) THEN Grade

WHEN Quartile = 2 AND Count2%2 = 1 AND rn2 = (Count2/2)+1 THEN Grade END)

FROM [Stats]

GROUP BY [Year]

Let me know if this helps.

Edit: Tidied up my solution a bit.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St