• Nice work, Mark.

    A couple of points, though. First, the word "variance" is frequently misused when people mean to say "variation". This can cause ambiguity in math/stats where "variance" is defined as the mean of the squared residuals (a residual being the difference between the regression line at point Xi and the raw data value Yi). The square root of variance is actually the more familiar "standard deviation".

    Second, I can explain why the notation used in linear regression is at odds with that of trigonometry and Cartesian geometry. The reason for writing "y = a + bx" is generality. Suppose you wanted to fit a 2nd-order poynomial to a data series that that had curvature, not just slope; you would then be solving "y = a + bx + cx^2". You could keep adding powers of x if you had good reason to believe that the underlying phenomenon possessed many degrees of freedom.

    And, as if that wasn't enough, in the general problem (where any kind of approximating function is possible), you would actually write the coefficients as a0, a1, a2,... and the "basis functions" as f0(x), f1(x), f2(x),... . For linear regression, f0(x) = 1 and f1(x) = x.

    Of course, there are weird and wonderful techniques for fitting data to nonlinear functions, but it might not be practical to do it with SQL.

    Cheers!

    - Al