The "Median" Function

  • Comments posted to this topic are about the item The "Median" Function

  • Nice question to end the week on, thanks Kathi

    Space, the final frontier? not any more...
    All limits henceforth are self-imposed.
    “libera tute vulgaris ex”

  • From the docs:

    For example, PERCENTILE_DISC (0.5) will compute the 50th percentile (that is, the median) of an expression.

    So I think you should have allowed this, given that the question simply asked "Which function can be used to calculate a median?"

  • The median is actually trickier than you might think. The first question to ask is whether or not the value returned as the median has to exist in the data. If the set of data has an odd number of elements, then it's easy to find the middle of it. The problems come when the set has an even number of elements. Let's create a set of data to look at {1,2,2,3,3,3}.

    Do you notice this set has one 1, two 2's and three 3's?

    The usual way of computing this median would be to take the subset {2,3}, sum it, and divide by 2, which would give us 2.5. However, it's just as legitimate to go either low or high and return {2} or {3} in some cases. However, a more representative measure of central tendency is the weighted median. We take the duplicates around the middle {2,2,3,3,3} subset, and average that set (2+2+3+3+3)/5 = 13/5 = 2.6 instead. This result shows there's a slight skew toward the high-end of the values.

    If anyone is interested in a bit of SQL history, many years ago there were two competing database magazines. One was DBMS and the other one was Database Programming and Design. Chris Date had a column in one and I had a column in the other, but both magazines wound up being published by the same company! We did a "feud" in our columns so that people would buy both magazines follow up; Date would write something and I disagreed with it or Date would propose a problem and I would give another solution, and so forth. One of these topics went back and forth for about three issues with how to compute the median. Some other people chimed in and made their contributions, too. The code we came up with is obsolete today, but it was good fun developing it.

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • Thanks for that - however, the question was pretty vague, and I believe that a perfectly valid answer to the question that was posed is PERCENTILE_DISC.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply