WANTED: UDF for Least Squares estimate

  • I have an application with known X and Y points, I'd like to run a function that would calculate m & b in y = mx +b then determine and return a desired Y (for a given X).

    Is there perhaps a source for statistical functions in SQL that I could use for this and other stats calculations?

    I basically worked this up by hand but it's not elegant... I'd also like r^2... and other things like 3rd order polynomials.

    Perhaps a Freeware or commercial UDF set?? :w00t:

    Thanks!

    -Brian

  • You're almost certainly going to be better off looking for .NET versions, and plugging them in as CLR functions.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I wrote a UDF to do Least Squares Linear Regression back in 2001 but the performance was so bad compared to simple SQL that I threw it away. The run time difference for SQL was 1/20th of a UDF.

    Try the below SQL replacing these strings:

    "dbo.T1" with the real table name.]

    "X1" with the X column name

    "Y1" with the Y column name

    SELECTSlopeCoefficient

    ,(Y1_Mean - ( SlopeCoefficient * X1_Mean ) ) AS SlopeIntercept

    FROM(

    SELECT X1_Mean

    ,Y1_Mean

    ,( ( PopCnt * SUM(X1_Y1_Product) ) - ( SUM(X1) * SUM(X1) ) )

    / ( ( PopCnt * SUM(X1_DeviationSquared) ) - POWER(SUM(X1),2 ) )

    as SlopeCoefficient

    FROM(

    SELECTT1.X1

    ,T1.Y1

    ,T1_Base.PopCnt

    ,T1_Base.X1_Mean

    ,T1_Base.Y1_Mean

    ,T1.X1 - X1_Mean AS X1_Deviation

    ,T1.Y1 - Y1_Mean AS Y1_Deviation

    ,(T1.X1 - X1_Mean ) * ( T1.Y1 - Y1_Mean ) AS X1_Y1_Product

    ,POWER( (T1.X1 - X1_Mean ) , 2 ) AS X1_DeviationSquared

    ,POWER( ( T1.Y1 - Y1_Mean ) , 2 ) AS Y1_DeviationSquared

    FROM(

    select AVG( CAST( T1.X1 as float ) ) as X1_Mean

    ,AVG( CAST( T1.Y1 as float ) ) as Y1_Mean

    ,COUNT( * ) as PopCnt

    fromdbo.T1

    ) as T1_Base

    CROSS JOIN dbo.T1

    ) as T1_D

    GROUP BY X1_Mean

    ,Y1_Mean

    )as t1_S

    SQL = Scarcely Qualifies as a Language

  • What would you like the UDF interface to look like?

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Thanks. The CLR approach sounds like the best option.

  • Yes, I think that CLR is probably your best solution, but not because it is necessarily faster for any direct solution (I frankly doubt that it is), but because it is practically impossible to efficiently generalize any solution for this in T-SQL.

    For instance, probably the most desirable form for a general least-squares solution would be as an aggregate function, but we cannot write custom aggregate functions in T-SQL, so the only way to present such a solution is through CLR.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply