Updating a Column Using a Recursive Definition

  • I am seeking to use SQL to find an exponential weighted average, which is defined recursively.

    If Vn = value on day n

    The exponential weighted average can be defined as:

    ExpAvgn := 0.03*Vn+0.97*ExpAvgn-1

    The problem is, I don't know how to deal with this kind of a recursive definition using SQL and can't seem to find any good resources.

    Has anyone encountered (and conquered) this problem before?

    Any help here is greatly appreciated.

    Thanks!

  • DECLARE@Sample TABLE

    (

    ID INT PRIMARY KEY CLUSTERED,

    Number INT,

    expAvg FLOAT

    )

    INSERT@Sample

    (

    ID,

    Number

    )

    SELECT TOP 10Number,

    ABS(CHECKSUM(NEWID())) % 50000

    FROMmaster..spt_values

    WHEREType = 'P'

    ORDER BYNumber

    DECLARE@expAvg FLOAT

    SET@expAvg = 0

    UPDATE@Sample

    SET@expAvg = ExpAvg = 0.03E * Number + 0.97E * @expAvg

    SELECT*

    FROM@Sample


    N 56°04'39.16"
    E 12°55'05.25"

  • Nicely done, Peter.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Have a look at Tally/Number Table methods also

  • Thank you Jeff!

    The use of master..spt_values or a TallyNumber table is ONLY for generating sample data.

    The working piece of algorithm is the UPDATE part.


    N 56°04'39.16"
    E 12°55'05.25"

  • Heh... that bit of info should be as a comment in the code. 😉

    Anyway, works good.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • johncyriac (12/10/2008)


    Have a look at Tally/Number Table methods also

    Sure... got any links? 😉 And, how do they apply for this particular problem?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (12/11/2008)


    johncyriac (12/10/2008)


    Have a look at Tally/Number Table methods also

    Sure... got any links? 😉 And, how do they apply for this particular problem?

    Tip: add next time 'from the great articles by Jeff Moden' 😉 And I have to say, you would be correct. Though I still don't fully understand when and how I can use it. Jeff, can you write an article 'My first Tally table' or 'Tally tables for Dummies'?:D

    Ronald HensbergenHelp us, help yourself... Post data so we can read and use it: http://www.sqlservercentral.com/articles/Best+Practices/61537/-------------------------------------------------------------------------2+2=5 for significant large values of 2

  • Thanks so much for the help. I've been able to implement your suggestions, but had to add quite a few work-around statements to skip over missing/NULL values (otherwise due to the recursion, everything forward would have been NULL.)

    I thought I had the query working, and then found a problem that I can't seem to avoid. When it moves on to each subsequent SymbolID, it's still initializing it at the initial value for the first value. (I tried to avoid this with a @first variable to keep track, but I'm banging my head up against the wall with this. Can anyone see where my mistake is? I colored in RED in the results below where you can see the mistake happening, and highlighted in RED the code where those values are being set.

    The relevant columns in the table I'm running it on are like:

    CREATE TABLE [dbo].[Test_HistoricalData](

    [RowID] [bigint] PRIMARY KEY NOT NULL,

    [MarketDate] [datetime] NULL,

    [Volume] [decimal](18, 0) NULL,

    [SymbolID] [int] NOT NULL,

    [ExpAvgVol] [decimal](18, 4) NULL

    ) ON [PRIMARY]

    Each row represents one stock symbol on a given market date. The test table that I'm running it on right now has ~700 different symbols and their respective data over the last 2 years. I need to implement this on my production table with ~70K symbols over the last 5 years. Right now, (I think due to all my checks), I don't think this is working fast enough to feel comfortable doing so.

    If anyone has any suggestions on how I might be able to speed this up a bit, I would really appreciate it.

    Thanks!

    PS-The PRINT statements in there were just for my troubleshooting to see if I was getting any hang-ups.

    Results look like:

    SymbolIDMarketDateVolumePrNonNullMktDatePrNonNullVolumeExpAvgVolfirstTimeinitialExpAvgVol

    92007-01-03 16:00:00.0002575800NULLNULLNULL12575800

    92007-01-04 16:00:00.00020737002007-01-03 16:00:00.0002575800257580000

    92007-01-05 16:00:00.00026766002007-01-04 16:00:00.0002073700256073700

    92007-01-08 16:00:00.00015572002007-01-05 16:00:00.00026766002564212.8900

    ....

    92007-01-30 16:00:00.00014362002007-01-29 16:00:00.00020238002488137.0771495100

    92007-01-31 16:00:00.00022631002007-01-30 16:00:00.00014362002456578.9648350200

    102007-01-01 16:00:00.000NULLNULLNULLNULL1NULL

    102007-01-02 16:00:00.00028575NULLNULLNULL12575800

    102007-01-03 16:00:00.0002167332007-01-02 16:00:00.000285752499383.2500

    102007-01-04 16:00:00.0002387212007-01-03 16:00:00.0002167332430903.742500

    102007-01-05 16:00:00.0003805838932007-01-04 16:00:00.0002387212365138.26022500

    102007-01-08 16:00:00.0001268632007-01-05 16:00:00.00038058389313711700.902418200

    102007-01-09 16:00:00.000683562007-01-08 16:00:00.00012686313304155.765345700

    =============CODE BELOW=========================================

    -- ---------------------------------------------------------------------------------

    -- PROCEDURE TO CALCULATE AND STORE EXPONENTIAL WEIGHTED AVERAGE OF VOLUME

    -- Last Updated December 11, 2008

    -- ---------------------------------------------------------------------------------

    -- INITIALIZING RUN CRITERIA: Date and SymbolID ranges-------------------

    DECLARE @startDate DATETIME

    DECLARE @endDate DATETIME

    DECLARE @startSymbolID INT

    DECLARE @endSymbolID INT

    SET @startDate = '2007-01-01'

    SET @endDate = '2007-02-01'

    SET @startSymbolID = 1

    SET @endSymbolID = 11

    -- DECLARING A TEMPORARY TABLE FOR STORAGE-------------------------------------------

    DECLARE @TemporaryTable TABLE

    (SymbolID INT --NVARCHAR(50)

    , MarketDate DATETIME

    , Volume DECIMAL (18,0)

    , PrNonNullMkDate DATETIME

    , PrNonNullVolume DECIMAL (18,0)

    , ExpAvgVol FLOAT

    , firstTime INT

    , initialExpAvgVol FLOAT

    )

    -- FILLING THE TEMPORARY TABLE WITH KNOWNS-------------------------------------------

    INSERT @TemporaryTable

    (SymbolID

    , MarketDate

    , Volume

    , PrNonNullMkDate

    , PrNonNullVolume

    , ExpAvgVol

    )

    SELECT

    T1.SymbolID

    , T1.MarketDate

    , T1.Volume

    -- Most recent prior date with Volume IS NOT NULL

    , T3.PrNonNullMktDate

    -- Value of the Volume on that date

    , (SELECT T4.Volume

    FROM Test_HistoricalData T4

    WHERE T4.MarketDate=T3.PrNonNullMktDate AND T4.SymbolID=T1.SymbolID) AS PrNonNullVolume

    , T1.ExpAvgVol

    FROM

    Test_HistoricalData T1

    LEFT JOIN (SELECT T2.SymbolID

    , T2.MarketDate

    , (SELECT MAX(D.MarketDate)

    FROM Test_HistoricalData D

    WHERE D.MarketDate < T2.MarketDate

    AND D.SymbolID=T2.SymbolID

    AND D.Volume IS NOT NULL

    ) AS PrNonNullMktDate

    FROM Test_HistoricalData T2

    WHERE T2.SymbolID BETWEEN @startSymbolID AND @endSymbolID

    AND T2.MarketDate BETWEEN @startDate AND @endDate

    ) AS T3 on T3.MarketDate=T1.MarketDate AND T1.SymbolID=T3.SymbolID

    WHERE

    T1.SymbolID BETWEEN @startSymbolID AND @endSymbolID

    AND T1.MarketDate BETWEEN @startDate AND @endDate

    ORDER BY

    T1.SymbolID, T1.MarketDate

    -- DECLARING A TEMPORARY VARIABLE TO HOLD THE ExpAvgVol and INITIALIZING IT-------------------------------------------

    DECLARE @symbolID INT

    DECLARE @expAvgVol FLOAT

    DECLARE @initialExpAvgVol FLOAT

    DECLARE @first INT

    SET @symbolID = (SELECT MIN(SymbolID)

    FROM @TemporaryTable )

    SET @first=1

    SET @initialExpAvgVol =

    (CASE WHEN

    --Checking to see if there is an earlier NOT NULL value for ExpAvgVol on this SymbolID

    (SELECT ExpAvgVol

    FROM Test_HistoricalData

    WHERE SymbolID=@symbolID

    AND MarketDate = (SELECT MAX(MarketDate)

    FROM Test_HistoricalData

    WHERE SymbolID=@symbolID

    AND MarketDate < @startDate

    AND ExpAvgVol IS NOT NULL) ) IS NOT NULL

    THEN

    --Use this earlier value as an initial value

    (SELECT ExpAvgVol

    FROM Test_HistoricalData

    WHERE SymbolID=@symbolID

    AND MarketDate = (SELECT MAX(MarketDate)

    FROM Test_HistoricalData

    WHERE SymbolID=@symbolID

    AND MarketDate < @startDate

    AND ExpAvgVol IS NOT NULL) )

    ELSE

    --Grab the earliest volume in this time period

    (SELECT Volume

    FROM @TemporaryTable

    WHERE SymbolID=@symbolID

    AND MarketDate = (SELECT MIN(MarketDate)

    FROM @TemporaryTable

    WHERE SymbolID=@symbolID) )

    END )

    --IF ExpAvgVol has an earlier in the stored table, intialize to that, else set to earliest volume

    SET @expAvgVol = @initialExpAvgVol

    PRINT 'Variables intialized. Beginning computation of ExpAvgVol.'

    ---- CALCULATING THE ExpAvgVol and Storing in TempTable-------------------------------------------

    UPDATE @TemporaryTable

    SET @expAvgVol=ExpAvgVol=

    CASE WHEN @first=1

    THEN 0.03E * PrNonNullVolume + 0.97E * @initialExpAvgVol

    ELSE 0.03E * PrNonNullVolume + 0.97E * @expAvgVol

    END

    , @first=firstTime=

    CASE WHEN @symbolID!=SymbolID OR PrNonNullVolume IS NULL

    THEN 1

    ELSE 0

    END

    , @symbolID=SymbolID

    , @initialExpAvgVol = initialExpAvgVol=

    CASE WHEN @first=1

    THEN

    (CASE WHEN

    -- WHEN There is an earlier NOT NULL value for ExpAvgVol on this SymbolID

    (SELECT T1.ExpAvgVol

    FROM Test_HistoricalData T1

    WHERE T1.SymbolID=@symbolID

    AND T1.MarketDate = (SELECT MAX(T.MarketDate)

    FROM Test_HistoricalData T

    WHERE T.SymbolID=@symbolID

    AND T.MarketDate < @startDate

    AND T.ExpAvgVol IS NOT NULL) ) IS NOT NULL

    THEN

    -- Then use this earlier value as an initial value

    (SELECT T1.ExpAvgVol

    FROM Test_HistoricalData T1

    WHERE T1.SymbolID=@symbolID

    AND T1.MarketDate = (SELECT MAX(T.MarketDate)

    FROM Test_HistoricalData T

    WHERE T.SymbolID=@symbolID

    AND T.MarketDate < @startDate

    AND T.ExpAvgVol IS NOT NULL) )

    ELSE

    --If not, then grab the earliest volume in this time period

    (SELECT Volume

    FROM @TemporaryTable

    WHERE SymbolID=@symbolID

    AND MarketDate = (SELECT MIN(T.MarketDate)

    FROM @TemporaryTable T

    WHERE T.SymbolID=@symbolID

    AND T.Volume IS NOT NULL) )

    END )

    ELSE 0

    END

    PRINT '@symbolID'

    PRINT @symbolID

    PRINT '@initialExpAvgVol'

    PRINT @initialExpAvgVol

    PRINT 'Completed Calculating ExpAvgVol and storing in TempTable. Last Row:'

    -- DISPLAYING THE RESULTS-------------------------------------------

    SELECT *

    FROM @TemporaryTable

    PRINT 'Completed displaying temporary table, moving on to updating actual table.'

    -- UPDATING IN THE ACTUAL TABLE-------------------------------------------

    UPDATE a

    SET a.ExpAvgVol=T.ExpAvgVol

    FROM Test_HistoricalData a

    LEFT JOIN @TemporaryTable T ON T.SymbolID=a.SymbolID AND T.MarketDate=a.MarketDate

    WHERE

    a.SymbolID BETWEEN @startSymbolID AND @endSymbolID

    AND a.MarketDate BETWEEN @startDate AND @endDate

    -- DISPLAYING THE ACTUAL TABLE FOR COMPARISON-------------------------------------------

    PRINT 'Completed Update to actual table. Displaying Results.'

    SELECT a.SymbolID

    , a.MarketDate

    , a.Volume

    , a.ExpAvgVol

    FROM Test_HistoricalData a

    WHERE

    a.SymbolID BETWEEN @startSymbolID AND @endSymbolID

    AND a.MarketDate BETWEEN @startDate AND @endDate

    ORDER BY

    a.SymbolID ASC, a.MarketDate ASC

  • It's nice to see some documented code for a change. Thanks. 🙂

    I think you've made things just a little difficult on yourself and I think I can help, but don't know because I don't have your data to play with. If you would, take a look at the link in my signature below and see if you can build some data that way so I can give 'er a try. It's important that the data be formatted as an Insert statement so I don't have to do the formatting.

    A couple of hundred lines would be really good... don't post it... just attach it as a file. Thanks.

    I could be wrong, but I think the principle code for the "Running Total" solution would work very well here. If you'd like to read up on it, here's an article on a very high speed method...

    [font="Arial Black"]

    Solving the "Running Total" & "Ordinal Rank" Problems in SS 2k/2k5

    [/font][/url]

    ... heh... yeah... I happen to know the author pretty well.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • r.hensbergen (12/11/2008)


    Jeff Moden (12/11/2008)


    johncyriac (12/10/2008)


    Have a look at Tally/Number Table methods also

    Sure... got any links? 😉 And, how do they apply for this particular problem?

    Tip: add next time 'from the great articles by Jeff Moden' 😉 And I have to say, you would be correct. Though I still don't fully understand when and how I can use it. Jeff, can you write an article 'My first Tally table' or 'Tally tables for Dummies'?:D

    Heh... Thanks Ron. I wasn't trying to be a smart guy about this... I just hate it when someone makes some off the wall suggestion and offers no code and no links to support the suggestion. To top it off, I don't believe a Tally table will have any bearing on this problem whatsoever.

    Shifting gears... thanks for the great compliment. In answer to your question, I have a question... have your read the following article? I'm asking only because I don't know if you have or haven't. It falls into the same vein as your suggested article names.

    [font="Arial Black"]The "Numbers" or "Tally" Table: What it is and how it replaces a loop.[/font][/url]

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    I'm attaching the file with the formatted test data. It should generate about 150 rows of data for you. And because it makes things more interesting (:)), I made sure to include some situations which can throw the recursion off (skipped days or days where various values are NULL).

    Please let me know if you have any problems receiving the data.

    Thanks again for all your help!

  • Looks like Peso nailed the concept for using the "quirky update" to get at the data, however, there's another issue that hasn't been mentioned, and that is accuracy. If there is even one missing value in a sequence of values, then an exponential weighted average isn't going to be accurate, and the number of missing values for any given symbol will have an impact on the overall accuracy of the number derived. The question thus becomes, how important is the accuracy of the number? Some alternatives:

    1.) Exclude any symbols where there's even 1 missing value - kind of extreme, but might be necessary.

    2.) Exclude any symbos with more than X number of missing values, and you decide what X is.

    3.) Interpolate for missing values - in other words, decide that the missing value probably wasn't far from the average of the two trading days on either side of the missing value. If you have symbols with multiple consecutive missing values, this method might well have to exclude them. Even so, this method is going to produce inaccurate results, but probably not too far off, and depending on whether the interpolated value is sufficiently close to the actual value.

    Also, you can combine 2 and 3 in lots of ways. Again, the primary concern is just how accurate these numbers need to be. It would only take one oddball actual value that is missing and is early on in the sequence for that symbol, to have an interpolation method be way off.

    Steve

    (aka smunson)

    :):):)

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Another thought just occurred to me. Perhaps an additional value needs to be stored: the number of values that contributes to the given exponential weighted average, and you develop the code to start the sequence fresh every time there's a missing value. I'm not very good at the "quirky update" technique, so I don't think I could effectively code it for this scenario, but you'd also need to count how many values are contributing to that average at any given point, and probably store it in the table as well, so that for any given symbol, the most recent weighted average value will only rely on a continuous data sequence, and you'll know exactly how far back the unbroken chain goes.

    Steve

    (aka smunson)

    :):):)

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Steve,

    You've definitely got a good point that I had thought about as well. (Though for the moment, I'm most concerned with getting everything implemented correctly.)

    But to address accuracy/conservatism in models, this is actually going to be compared with and/or combined with several other measures of moving average volume (ie a moving average over the last week, over the last month, as well as further exponential weighted averages with different constants (ie. 0.05 and 0.95 instead of 0.03 and 0.97.)

    Also, the final idea for the exponential weighted average is that I start these moving averages far enough in the past, that by the time I get to more current data, any past anomalies will have smoothed out.

    For example:

    For the constants (0.97 , 0.03) it takes ~37 trading days to have any anomaly reduced to 1% of its value in the weighted average, but for (0.95, 0.05) it would take ~30 to do the same thing.

    Different combinations of these averages and a healthy dose of conservatism in my models is so far the best way that I've found to address these concerns. If you've got any other ideas/suggestions, I'd love to hear them.

    Thanks!

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply