• dkschill (7/30/2013)


    Is this an effective way to find median?

    CREATE TABLE #T (i TINYINT);

    GO

    INSERT INTO #T

    SELECT ABS(CHECKSUM(NEWID())) % 250

    GO 1000

    ;WITH t AS (SELECT i, ROW_NUMBER() OVER (ORDER BY i) AS n, count(*) OVER (PARTITION BY (SELECT NULL)) AS c FROM #T)

    SELECT

    AVG(i) AS Median

    FROM t

    WHERE

    ( c % 2 = 1 AND n = ((c-1)/2)+1 )

    OR (c % 2 = 0 AND (n = c / 2 OR n = c / 2 + 1))

    First and foremost, for your sample data, don't use GO 1000. You will find a tally table works much better for this kind of thing.

    WITH t(i) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) FROM sys.all_objects)

    INSERT INTO #T

    SELECT ABS(CHECKSUM(NEWID()))%200

    FROM t

    WHERE i<=1000;

    My non-2012 solution is:

    ;WITH t(i,n,c) AS

    ( SELECT i,

    ROW_NUMBER() OVER (ORDER BY i),

    COUNT(*) OVER (PARTITION BY (SELECT NULL))

    FROM #T)

    SELECT

    AVG(i) AS Median

    FROM t

    WHERE (c%2=1 AND n=((c-1)/2)+1) OR (c%2=0 AND (n=c/2 OR n=c/2+1));

    I tested this against Louis' solution (Brilliant use of NTILE BTW) and, though it appears to produce a better query plan (more parallelism, lower est. query cost) it runs about the same speed as what Louis put together (~7 seconds/1 million rows on my crappy test box [4-cpu, 4gb ram test]).

    Edit: code formatting messed up.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001