Calculate the median

Question

Calculate the median

Saurabh.D

Ten Centuries

Points: 1104
More actions
March 8, 2019 at 6:07 pm

#371106

What is the fastest way to calculate the median
ThanksSaurabh.D

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

pietlinden SSC Guru Points: 63421 More actions · Answer 1

Take the nice barometer to the building manager and ask him?
https://sqlsunday.com/2016/02/11/median-and-percentile-in-tsql/

Clocker SSC Enthusiast Points: 117 More actions · Answer 2

is this in relation to a median over a very large dataset and what is the most performant way to return the aggregate?

Jonathan AC Roberts SSCoach Points: 18257 More actions · Answer 3

I'd use SQL Server's built in PERCENTILE_CONT function. Just call it with the numeric_literal = 0.5 and it will return the median.
If you have a large number values you and you want a very slightly faster time you might want to use PERCENTILE_DISC, this will pick a value from the set but if you have an even number of items it won't take the average of the middle two.
drop table #TempTest go create table #TempTest ( id int identity(1,1) not null primary key clustered, myLetter char(1) not null, myNumber int not null ) go insert into #TempTest(myLetter,myNumber) select top(103) PartitionBy.x,ABS(CHECKSUM(NewId())) % 1000000 from master.sys.all_columns a,master.sys.all_columns b,master.sys.all_columns c cross apply(values('A'),('B'),('C'),('D'),('E'),('F')) PartitionBy(x)

go select a.myLetter,a.myNumber, PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY myNumber) OVER (PARTITION BY myLetter) MedianDISC, PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY myNumber) OVER (PARTITION BY myLetter) MedianCONT from #TempTest a

jcelko212 32090 SSCrazy Eights Points: 9294 More actions · Answer 4

Saurabh.D - Friday, March 8, 2019 6:07 PM
What is the fastest way to calculate the median

At one time in the SQL world, the â€œprogramming problem du jourâ€ was finding the median. This was because there were two newsstand database magazines, DBMS and Database Programming & Design, in the 1990s with editorial columns by myself and Chris respectively. The magazines had the same publisher, so to boost sales of both, Chris and I would do â€œdueling SQLâ€ columns. You had to buy both to get the whole story. We each did several versions of the median with the SQL available at the time starting in 1992 and these articles ran for months. Date and I posted several versions and Rory Murchison and Philip Vaughan of San Jose, CA, and The team of Anatoly Abramovich, Yelena Alexandrova, and Eugene Birger presented a series of articles in SQL Forum magazine on computing the median (SQL Forum 1993, 1994). The late Ken Henderson also added a streamlined version of this code to the discussion.

The Weighted Median

The weighted median is an even better measure of central tendency than the plain median. It is also more â€œset-orientedâ€ than the plain median. It factors in the number of times the two values in the middle subset of a table with an even number of rows appear. The table with (1, 2, 2, 3, 3, 3) has a median of 3, the middle value. It returns the subset (2, 2, 3, 3, 3) in the middle, which gives us (13/5) = 2.6 for the weighted median. The weighted median is a more accurate description of the skew of the data.

23.07.02. Modern Median

We now have ROW_NUMBER() and can that function to do the sorting without the complicated self-joins, unions and views of the older methods. Here is an elegant solution from Peso:

SELECT AVG(X.part_wgt) AS part_wgt_median
FROM (SELECT part_wgt,
(2 * ROW_NUMBER() OVER (ORDER BY part_wgt)
- COUNT(*) OVER ()) AS middle_subset
FROM Parts) AS X
WHERE X.middle_subset BETWEEN 0 AND 2;

This can be modified to give us the weighted median.

SELECT (1.0 * SUM(X.y)/SUM(X.t)) AS part_wgtmedian
FROM (SELECT SUM(part_wgt) OVER (PARTITION BY part_wgt) AS y,
   (2 * ROW_NUMBER() OVER (ORDER BY part_wgt)
   - COUNT(*) OVER ()) AS middle_subset,
   COUNT(*) OVER (PARTITION BY part_wgt) AS t
FROM Parts) AS X
WHERE X.middle_subset BETWEEN 0 AND 2;
The 1.0 factor is an old trick to switch from integer math to decimal, but you could use a CAST().
Today, the PERCENTILE functions are probably better but try it.

Please post DDL and follow ANSI/ISO standards when asking for help.

Jeff Moden SSC Guru Points: 1003853 More actions · Answer 5

I've not tested it personally but the OFFSET/FETCH option that came out in 2012 seems to be both the most effecient and fastest method. Please see the following article written by one of the Masters of SQL.
https://www.itprotoday.com/sql-server/sql-server-2012-solutions-median-calculation

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

michael.cole 47030 SSC Eights! Points: 804 More actions · Answer 6

Jeff Moden - Sunday, March 10, 2019 1:39 PM
I've not tested it personally but the OFFSET/FETCH option that came out in 2012 seems to be both the most effecient and fastest method. Please see the following article written by one of the Masters of SQL.
https://www.itprotoday.com/sql-server/sql-server-2012-solutions-median-calculation

I have tested it and used in production, and (at least for my situation) it was the most effecient and fastest method.