Calculating Moving Averages with T-SQL

  • I was wondering why you haven't used the CROSS APPLY operator for the first query (see below), any special reason for that?

    SELECT a.quote_date

    , a.close_price

    , dat.[20_day_sma]

    FROM #google_stock a

    CROSS APPLY (

    SELECT AVG(b.close_price) [20_day_sma]

    FROM #google_stock b

    WHERE DateDiff(d, a.quote_date, b.quote_date) BETWEEN -20 AND 0) dat

    ORDER BY a.quote_date

  • Chris Morris-439714 (3/4/2010)


    Gabriel P (3/3/2010)


    Comments posted to this topic are about the item <A HREF="/articles/Moving+Average/69389/">Calculating Moving Averages with T-SQL</A>

    Excellent article, Gabriel. I reckon it's time to start messing with those pesky CTE's again.

    << and therefore isn't a good indicator of what the immediate future might hold. >>

    Have a word with Gail about DBCC TIMEWARP, without too much effort you should get the last ten days and the next ten days.

    Bwahahaha

    Great one Chris.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Nice article and well presented.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • A very nice read. One more step: as we need running totals every time, why not precalculate and store them - that will speed it up even further. And we can use constraints to guarantee that running totals are correct. I described it here a while ago.

  • Oddly enough, I've been thinking about this problem off and on for the last couple of days and then I saw your article today. I have to agree with what Dean Cochrane said in his post on this thread... it's one of the best articles I've seen on this or any other SQL site in a very long while.

    You started out by clearly identifying the need and exactly how the problem worked. If there were any doubt, the excellent use of thoughtful and accurate graphics quickly removed all doubt.

    You provided everything in code including test data. Lots of folks forget to do that. And code is what people are really after.

    I started reading the article and by the time I started thinking "I know how I might do it", there you were with ...

    At this point, I'm guessing most readers of this article already have the gears churning on how to solve this problem.

    ... so I'll add 10 points for your "mind reading" capabilities, as well.

    Your formulas and explnations of the formulas where quite clear and, yeah, the more I read, the more I was sure I knew how I'd do it. I kept thinking to myself "I wonder if he's going to use a recursive CTE, triangular join, looped update, or what?" Then I saw what you used and you made my day. Thanks for the leg up and the honorable mention. 🙂 After all the heat I took both on the original article and the rewrite, it was a real pleasure to see someone use the method especially since it was for something totally different to anything demonstrated in the article.

    As a bit of a sidebar, I do have to admit that I've not done enough testing on "heap" Temp tables to be comfortable using the method on a "heap". I'll also admit that I haven't been able to break one yet but I'd probably add the normal clustered index to the temp table just to be on the safe side... just like you did in the first example but I believe you just forgot to do in the second example.

    Sorry... got carried away. I'll just finish by saying that even if the article didn't mention anything about the Quirky Update method, this would still be an outstanding article in my eyes. Very well done, Gabriel.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • sushil9313333532 (3/4/2010)


    hi ..........i m aldon .how 2 improve my sql tech.

    By reading very well written articles like this one and by running and studying the code in it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • TimvdH (3/4/2010)


    Great work, thank you! Would love to read more about technical analysis and sql. Or even better: portfolio theory and sql. BTW: does someone know if using MDX / Analysis Server makes sense here (query performance wise)? I guess it could make sense if all you need is the most recent moving average, but I know too little about AS to really have an idea. I was thinking about trying to use AS to calculate securities price correlations. Maybe someone knows if it makes sense to try this at all. After all you need to import the daily securities data into the data mart every day (I hope it is possible to add only the most recent day after an initial import). If someone has any hands on experience about (ten thousands of) time series in AS, I would be most interested...

    Regards

    Tim

    Tim, my knowledge of Analysis services is limited at best, T-SQL is my bread & butter.

    kamban (3/4/2010)


    Yes, but how we can make connection to some live data source so we can do live trading analyse?

    F

    That is beyond the scope of T-SQL and the article. You would have to develop an application that uses the live feed's API to insert the data into SQL Server.

    AdrianMNastase (3/4/2010)


    I was wondering why you haven't used the CROSS APPLY operator for the first query (see below), any special reason for that?

    SELECT a.quote_date

    , a.close_price

    , dat.[20_day_sma]

    FROM #google_stock a

    CROSS APPLY (

    SELECT AVG(b.close_price) [20_day_sma]

    FROM #google_stock b

    WHERE DateDiff(d, a.quote_date, b.quote_date) BETWEEN -20 AND 0) dat

    ORDER BY a.quote_date

    I did, and it had the same amount of logical reads so I figured behind the scenes the engine was executing they query in a similar fashion (if not the same).

    Alexander Kuznetsov (3/4/2010)


    A very nice read. One more step: as we need running totals every time, why not precalculate and store them - that will speed it up even further. And we can use constraints to guarantee that running totals are correct. I described it here a while ago.

    Alexander - thank you for the reply. Allow me some time to read & analyze your blog post before I reply.

    Jeff Moden (3/4/2010)


    As a bit of a sidebar, I do have to admit that I've not done enough testing on "heap" Temp tables to be comfortable using the method on a "heap". I'll also admit that I haven't been able to break one yet but I'd probably add the normal clustered index to the temp table just to be on the safe side... just like you did in the first example but I believe you just forgot to do in the second example.

    Thank you for the reply Jeff. You are absolutely correct - I did not meant to leave that temp table as a heap. Thank you for pointing that out. In the code box underneath Figure 3, there should be an addendum made at the bottom:

    create clustered index ix_n on #mod_goog_data(n)

  • It's nice article and we have easily understand the concept.

    Good Job....

  • Thanks for the excellent article!

    You did a terrific job explaining a complex topic in simple terms.

    The graphical explanations were also very helpful.

    I will be passing this one on as a "must read."

  • great article!!! however you didn't clarify when is the right time to buy goog :hehe:

  • May I suggest some improvements to the outlined solution.

    First I would prefer to store the stock quotes as integers (multiply by 100) - less storage space needed which also transforms to more speed.

    Secondly I would precalculate running sums in an indexed view.

    Thirdly I would use an 'Integer table' to avoid doing a self join.

    Might look like this:

    create view test. google_sums

    with schemabinding

    as

    select

    B. IntNumber as [DayNumber],

    sum(A. close_price) as [20_day_sum],

    COUNT_BIG(*) as [#Days]

    from

    test. google_stock_2 a

    join Aux. Integers B on b. IntNumber >= a. DayCount

    and b. IntNumber < a. DayCount + 20

    group by

    B. IntNumber

    CREATE UNIQUE CLUSTERED INDEX [PK_Google_Sums] ON [test].[google_sums]

    (

    [DayNumber] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [DrugMapping_Indx]

    And then we can just join this sums table to the original data table like this:

    select

    quote_date,

    close_price,

    case

    when [#days] < 20 then null

    else cast([20_day_sum] as float)/2000

    end as [20_day_sma]

    from

    test. google_stock_2 G2

    join test. google_sums GS on GS. DayNumber = G2. DayCount

  • sven.purbe (3/8/2010)


    select

    B. IntNumber as [DayNumber],

    sum(A. close_price) as [20_day_sum],

    COUNT_BIG(*) as [#Days]

    from

    test. google_stock_2 a

    join Aux. Integers B on b. IntNumber >= a. DayCount

    and b. IntNumber < a. DayCount + 20

    group by

    B. IntNumber

    That has all the problems the code in the article was designed to avoid the big one being that you end up "touching" most rows 20 times.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Well Jeff, if you look at the problem in a real world situation with thousands of stocks and thousands of datapoints and thousands of accesses to those I would suggest that my solution is much more efficient. The sum for a day is automatically calculated once (assuming that stock quotes doesn't change retroactively) and then used directly by any client needing it. In the other case the average will be calculated over and over again for all clients.

  • sven.purbe (3/8/2010)


    Well Jeff, if you look at the problem in a real world situation with thousands of stocks and thousands of datapoints and thousands of accesses to those I would suggest that my solution is much more efficient. The sum for a day is automatically calculated once (assuming that stock quotes doesn't change retroactively) and then used directly by any client needing it. In the other case the average will be calculated over and over again for all clients.

    I missed it... my apologies. I only looked at the SELECT and not the fact that it was an indexed view. I agree that the indexed view provides a "calculated once" solution. I'll have to try it out and see what daily single row updates do to it. It will also be interesting to see what the first run on a couple of million rows does.

    Thanks for the feedback, Sven.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • sven.purbe (3/8/2010)


    May I suggest some improvements to the outlined solution.

    First I would prefer to store the stock quotes as integers (multiply by 100) - less storage space needed which also transforms to more speed.

    You are correct. If you are capable of doing such a modification in your environment, then yes feel free to do so.

    However the goal of the article is to show the efficiency gained from performing the technique. If you still need more performance, there are a number of ways (including yours) we can achieve better results (e.g. change the quote_date in the #google_stock to a date field and half the space on that column). I recognize the performance gain, and I think it is good to recognize, but for sake of "keeping it simple" I didn't consider adding anymore layers of complexity - the amount of performance savings gain by using the technique in this article far surpasses any data type compression techniques we have at our disposal.

    sven.purbe (3/8/2010)


    Well Jeff, if you look at the problem in a real world situation with thousands of stocks and thousands of datapoints and thousands of accesses to those I would suggest that my solution is much more efficient. The sum for a day is automatically calculated once (assuming that stock quotes doesn't change retroactively) and then used directly by any client needing it. In the other case the average will be calculated over and over again for all clients.

    You raise a valid point (and Alexander Kuznetsov brought it up in an earlier post) - the solution was designed with temp tables for the purpose of providing examples to the readers. However if you want a more permanent solution, there's an easy way to fix this:

    1) If you convert the final temp tables to static tables, this issue of re-running the calculations on user request goes away and you can use the article's solution to just run once and save the data into a new table or another field in the existing table.

    2) For both of our solutions, data would have to be added daily. For the solution in the article, all we would need is to change the #mod_goog_data (now called mod_goog_data1) table to store the 20 Day Sum instead of the 20 Day SMA. First the changes must be made to the two code boxes following Figure 3 in the article:

    with t1 as (select row_number() over (order by quote_date) n

    ,quote_date

    ,close_price

    from #google_stock)

    select a.n

    ,a.quote_date

    ,a.close_price

    ,CAST(null as decimal(8,2)) [20_day_sum] --Change from [sma] to [20_day_sum]

    ,CAST(b.close_price as decimal(8,2)) [20_day_old_close]

    into mod_goog_data1 --Remove # to store in a regular table instead of temp table

    from t1 a

    left join t1 b

    on a.n - 20 = b.n

    create clustered index ix_ix on mod_goog_data1(n)

    declare @intervals int, @initial_sum decimal(8,2)

    declare @anchor int, @moving_sum decimal(8,2)

    set @intervals = 20

    select @initial_sum = sum(close_price)

    from mod_goog_data1

    where n <= @intervals

    update t1

    set @moving_sum = case when n < @intervals then null

    when n = @intervals then @initial_sum

    when n > @intervals then

    @moving_sum + [close_price] - [20_day_old_close]

    end,

    [20_day_sum] = @moving_sum, --Change to [20_day_sum] and remove the divisor @intervals

    @anchor = n

    from mod_goog_data1 t1 with (TABLOCKX)

    OPTION (MAXDOP 1)

    select quote_date

    ,close_price

    ,Cast([20_day_sum]/(@intervals+.00) as decimal(8,2)) [20_day_avg] from mod_goog_data1

    Now all we have to do the insert and it would look something like this:

    declare @quote_date date, @close_price decimal(8,2)

    select @quote_date = '20100125'

    select @close_price = 540.00

    --the field [20_day_sum] replaces [sma]

    insert into mod_goog_data1 (n, quote_date, close_price, [20_day_sum], [20_day_old_close])

    select top 1 a.n + 1 n

    ,@quote_date quote_date

    ,@close_price close_price

    ,a.[20_day_sum] + @close_price - b.close_price [20_day_sum]

    ,b.close_price [20_day_old_close]

    from mod_goog_data1 a left join mod_goog_data1 b on a.n - 19 = b.n

    order by a.n desc

    This takes 6 logical reads and always reads 1 ms on my machine (excluding parse/compile time). Since it is doing 2 separate seeks on the same table, the number of logical (page) reads will be always be approximately 2 * (height of clustered index + 1). When I tried the clustered index view solution you suggested it takes 95 logical reads and averages 3 ms, varying between 2 and 5 ms(excluding p/c time). And that is only on 4 months of stock data.

    In reality, for 85% of the people out there, which ever one you choose won't matter as they both take a relatively short time on small to medium size data sets. However if you're dealing with very large tables of historical data, I don't think you can get much better performance than the solution I posted above.

Viewing 15 posts - 16 through 30 (of 80 total)

You must be logged in to reply to this topic. Login to reply