Forecasting with SQL

  • Comments posted to this topic are about the item Forecasting with SQL

  • SSAS is completely capable of handling forecasting tasks utilizing a variety of different econometric algorithms.

    I would STRONGLY urge readers to consult with a professional statistician or economist before undertaking this type of project.

    There are a number of different data afflictions that can have serious adverse impacts on the accuracy of an econometric forecasting solution.

    Autocorrelated error terms, unit roots, non-linear time series and others that can cause issues when building ARIMA or regression models and result in spurrious relationships. A trained practitioner will know what to look for and, more importantly, how to correct for these conditions. The spurrious relationships that can result from improper econometric data analysis can result in using independent variables that really have no impact on the dependent variable or throwing out independent variables that actually do have a statistically significant relationship but just need a little prep work that a trained econometrician would understand.

    When these forecasting tools are used to base critical business decisions that impact people's lives and livelihoods, PLEASE be sure and leave this type of work to the experts.

  • Martin Vrieze (2/9/2010)


    When these forecasting tools are used to base critical business decisions that impact people's lives and livelihoods, PLEASE be sure and leave this type of work to the experts.

    Point of Order: Those who created the credit default swap catastrophe using these types of methods were experts. Having an expert consultant is no substitute to understanding at least the underlying principles of the models you're using. Don't just leave the work to the experts; work with them. If it's important enough to pay someone to do, it's important enough for you to understand it.

  • Point of Order: Those who created the credit default swap catastrophe using these types of methods were experts. Having an expert consultant is no substitute to understanding at least the underlying principles of the models you're using. Don't just leave the work to the experts; work with them. If it's important enough to pay someone to do, it's important enough for you to understand it.

    Spoken with true wisdom. Well said.

  • Hi Martin,

    I appreciate your comments and words of pause. It is important to realize that picking up this forecast model will not get you a job as a forecaster ;P

    This article was not intended to be a replacement for a forecast system or to ignore the tools available in analysis services. It was intended to be an entry point into the world of forecasting, in the case where analysis services is not an option. You are certainly aware that linear regression is only one of many models to apply and that each model differs in it's usefulness for each application.

    Hopefully, folks will find this exercise useful in understanding the some of the basic language of forecasting and, fundamentally, what this model is and how it works. Learning about a new skill can be daunting, but I hope that we will not be discouraged from starting merely by the possible depth of the subject.

    Mark

  • Hey! That's just good stuff.

    Thanks for a well written article.

    Tom Garth
    Vertical Solutions[/url]

    "There are three kinds of men. The one that learns by reading. The few who learn by observation. The rest of them have to pee on the electric fence for themselves." -- Will Rogers
  • sknox (2/9/2010)


    Martin Vrieze (2/9/2010)


    When these forecasting tools are used to base critical business decisions that impact people's lives and livelihoods, PLEASE be sure and leave this type of work to the experts.

    Point of Order: Those who created the credit default swap catastrophe using these types of methods were experts. Having an expert consultant is no substitute to understanding at least the underlying principles of the models you're using. Don't just leave the work to the experts; work with them. If it's important enough to pay someone to do, it's important enough for you to understand it.

    IIRC most of the debacle was the result of ONE FORMULA that EVERYONE USED. Just took it for granted that it worked. But it was flawed. Businesses WANTED to use it though (serious bias) because it gave essentially all buckets of risk an AAA rating.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Great topic. I had to dig through all of my fearful stats days to do trending a few years back with SSRS 2005 as we were not going to be buying add on tools... So there I was building my own trend lines.

    It isn't just trending where lack of understanding can bite us. To many people write reports with no understanding of the data and far more consume the reports with even less understanding. It is so frightening the number of business decisions that are made with invalid, incorrect, and incomplete data.

  • Nice article. Well written.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Nice work, Mark.

    A couple of points, though. First, the word "variance" is frequently misused when people mean to say "variation". This can cause ambiguity in math/stats where "variance" is defined as the mean of the squared residuals (a residual being the difference between the regression line at point Xi and the raw data value Yi). The square root of variance is actually the more familiar "standard deviation".

    Second, I can explain why the notation used in linear regression is at odds with that of trigonometry and Cartesian geometry. The reason for writing "y = a + bx" is generality. Suppose you wanted to fit a 2nd-order poynomial to a data series that that had curvature, not just slope; you would then be solving "y = a + bx + cx^2". You could keep adding powers of x if you had good reason to believe that the underlying phenomenon possessed many degrees of freedom.

    And, as if that wasn't enough, in the general problem (where any kind of approximating function is possible), you would actually write the coefficients as a0, a1, a2,... and the "basis functions" as f0(x), f1(x), f2(x),... . For linear regression, f0(x) = 1 and f1(x) = x.

    Of course, there are weird and wonderful techniques for fitting data to nonlinear functions, but it might not be practical to do it with SQL.

    Cheers!

    - Al

  • Thanks for the clarifications, Al. I appreciate the finer points and I am sure this will be helpful to any readers as well.

    Mark

  • Do you have any useful info on calculating multiple linear regression with T-SQL?

  • piet_dj (2/10/2010)


    Do you have any useful info on calculating multiple linear regression with T-SQL?

    Here http://developerworkshop.net/software.html

    It also handles logarithmic, exponential and power regression.


    N 56°04'39.16"
    E 12°55'05.25"

  • Very good article, thanks. When I picked up the code snippet and tried to use it, I saw one problem. In the last section of code, labeled "Create Forecast", the term that's multiplied by the seasonality factor is missing a parenthesis. That is,

    MAX(A) + (MAX(B) * MAX(Forecastkey) + 1) should be

    (MAX(A) + (MAX(B) * MAX(Forecastkey) + 1))

    Correct me if I'm wrong, but this is the change that it took to make my forecast look correct.

  • ltaylor 73774 (2/11/2010)


    Very good article, thanks. When I picked up the code snippet and tried to use it, I saw one problem. In the last section of code, labeled "Create Forecast", the term that's multiplied by the seasonality factor is missing a parenthesis. That is,

    MAX(A) + (MAX(B) * MAX(Forecastkey) + 1) should be

    (MAX(A) + (MAX(B) * MAX(Forecastkey) + 1))

    Correct me if I'm wrong, but this is the change that it took to make my forecast look correct.

    Yes, you are correct. When the trend is multiplied against seasonality, the parenthese are needed.

    Thank you for finding that error! I will get the download updated.

    Mark

Viewing 15 posts - 1 through 15 (of 34 total)

You must be logged in to reply to this topic. Login to reply