"median" stored procedure

  • i need to compute the median values for several columns of a bunch of tables, so i'm trying to write a stored procedure to compute medians as t-sql doesn't already have one. i've already dug up the code to do this from this site:

    -----------------------------------

    DECLARE @cnt INT

    DECLARE @fpos INT

    DECLARE @output FLOAT

    SELECT @cnt = COUNT(*) FROM bocomp where debt is not null

    IF ( @cnt % 2 ) = 0

      BEGIN

        SET @fpos = @cnt / 2

        SELECT @output = avg(cast(debt as float)) FROM (

          SELECT (

            select count(*) + 1 from bocomp b where b.debt < a.debt and b.debt is not null

          ) AS ROW, a.debt FROM bocomp a where a.debt is not null

        ) AS Built

        WHERE ROW IN (@fpos, @fpos + 1)

      END

    ELSE

      BEGIN

        SET @fpos = ( ( ( @cnt - 1 ) / 2 ) + 1 )

        SELECT @output = debt FROM (

          SELECT (

            select count(*) + 1 from bocomp b where b.debt < a.debt and b.debt is not null

          ) AS ROW, a.debt FROM bocomp a where a.debt is not null

        ) AS Built

        WHERE ROW = @fpos

      END

    SELECT @output AS MEDIAN

    -----------------------------------

    what i want to do is change each "bocomp" to "@cTable" (some kind of varchar variable) and each "debt" to "@cField", so i can put this in a stored procedure with those two parameters. when i try this directly, it doesn't work. also, when i try building each line as a string and executing it that way, it doesn't work because of the other variable names being worked with.

    i'm a bit rusty with my t-sql (haven't worked with this in about a year), so any help is appreciated!

  • I've tried another method of grabbing the middle rows. Using TOP to get the top half (and a bit) of the rows, and TOP again to get the bottom 1 or 2 rows of that sample.

    It's a script at the moment (for testing purposes), but you can play with the @col and @tbl variables to see if it comes up with your expected values.

     
    
    declare @rows int, @topn int, @samplen int, @sql nvarchar(1000), @median float

    declare @col sysname set @col = 'id'
    declare @tbl sysname set @tbl = 'sysobjects'

    select @sql = 'select @rows = count(' + @col + ') from ' + @tbl + ' where ' + @col + ' is not null'
    exec sp_executesql @sql, N'@rows int output', @rows OUTPUT

    select @topn = case when @rows % 2 = 1 then 1 else 2 end
    select @samplen = (@rows / 2) + 1

    select @topn = case when @rows % 2 = 1 then 1 else 2 end

    select @rows AS [Total Candidate Rows], @topn AS [Number of Median Rows], @samplen AS [Rows to Sample]

    select @sql =
    'select @median = avg(cast(' + @col + ' as float)) from
    (select top ' + cast(@topn as varchar) + ' ' + @col + ' from
    (select top ' + cast(@samplen as varchar) + ' ' + @col + ' from ' + @tbl + ' where ' + @col + ' is not null order by ' + @col + ') [half]
    order by ' + @col + ' desc) [middle]'
    exec sp_executesql @sql, N'@median float output', @median OUTPUT
    select @median as MEDIAN

    Hope it helps.

    Cheers,

    - Mark


    Cheers,
    - Mark

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply