Identifying a record n-rows from the current, on a rolling basis, in a non-contiguous data set

  • Hi folks

    I need to derive a measure using a value from the current record, and another value from a (related) record n-rows away. The data set is non-contiguous and contains nearly 30,000 records.

    My attempts to resolve this have gravitated towards cursors - which I understand should be a last resort. I know this is possible using sets, but my T-SQL skills are falling short.

    The formula, with source data, and expected results follow.

    Formula: Current close minus close n (trading) days ago. For this example n = 5

    Example (with reference to sample data below)

    For COMM_SYMB = CL

    CLS on day 717 = 62,79. CLS (five trading days earlier, day 710) = 61.22

    Therefore momentum on day 717 is: 62.79 - 61.22 = 1.57

    Thanks in advance,

    Wayne

    -- Source data

    DROP TABLE dbo.MOM_TEST

    CREATE TABLE dbo.MOM_TEST

    (

    TIME_KEY INT NOT NULL,

    COMM_SYMB NVARCHAR(3) NOT NULL,

    CLS FLOAT NOT NULL,

    MOM FLOAT

    )

    INSERT INTO dbo.MOM_TEST(TIME_KEY, COMM_SYMB, CLS, MOM)

    SELECT 703, 'BO', 29.34, NULL UNION ALL

    SELECT 703, 'CL', 62.44, NULL UNION ALL

    SELECT 703, 'HO', 1.8089, NULL UNION ALL

    SELECT 704, 'BO', 29.34, NULL UNION ALL

    SELECT 704, 'CL', 62.43, NULL UNION ALL

    SELECT 704, 'HO', 1.7983, NULL UNION ALL

    SELECT 705, 'BO', 28.78, NULL UNION ALL

    SELECT 705, 'CL', 62.19, NULL UNION ALL

    SELECT 705, 'HO', 1.794, NULL UNION ALL

    SELECT 706, 'BO', 28.73, NULL UNION ALL

    SELECT 706, 'CL', 62.49, NULL UNION ALL

    SELECT 706, 'HO', 1.7788, NULL UNION ALL

    SELECT 707, 'BO', 28.63, NULL UNION ALL

    SELECT 707, 'CL', 62.03, NULL UNION ALL

    SELECT 707, 'HO', 1.7573, NULL UNION ALL

    SELECT 710, 'BO', 29.03, NULL UNION ALL

    SELECT 710, 'CL', 61.22, NULL UNION ALL

    SELECT 710, 'HO', 1.7243, NULL UNION ALL

    SELECT 711, 'BO', 29.14, NULL UNION ALL

    SELECT 711, 'CL', 61.02, NULL UNION ALL

    SELECT 711, 'HO', 1.7224, NULL UNION ALL

    SELECT 712, 'BO', 29.02, NULL UNION ALL

    SELECT 712, 'CL', 62.17, NULL UNION ALL

    SELECT 712, 'HO', 1.732, NULL UNION ALL

    SELECT 713, 'BO', 28.79, NULL UNION ALL

    SELECT 713, 'CL', 63.33, NULL UNION ALL

    SELECT 713, 'HO', 1.7765, NULL UNION ALL

    SELECT 714, 'BO', 28.59, NULL UNION ALL

    SELECT 714, 'CL', 64.09, NULL UNION ALL

    SELECT 714, 'HO', 1.7817, NULL UNION ALL

    SELECT 717, 'BO', 28.15, NULL UNION ALL

    SELECT 717, 'CL', 62.79, NULL UNION ALL

    SELECT 717, 'HO', 1.7552, NULL UNION ALL

    SELECT 718, 'BO', 28.5, NULL UNION ALL

    SELECT 718, 'CL', 63.46, NULL UNION ALL

    SELECT 718, 'HO', 1.7592, NULL UNION ALL

    SELECT 719, 'BO', 28.53, NULL UNION ALL

    SELECT 719, 'CL', 63.72, NULL UNION ALL

    SELECT 719, 'HO', 1.7693, NULL UNION ALL

    SELECT 720, 'BO', 28.58, NULL UNION ALL

    SELECT 720, 'CL', 62.66, NULL UNION ALL

    SELECT 720, 'HO', 1.74, NULL UNION ALL

    SELECT 721, 'BO', 28.84, NULL UNION ALL

    SELECT 721, 'CL', 62.41, NULL UNION ALL

    SELECT 721, 'HO', 1.7213, NULL

    -- end source data

    -- Expected Results

    DROP TABLE dbo.MOM_ER

    CREATE TABLE dbo.MOM_ER

    (

    TIME_KEY INT NOT NULL,

    COMM_SYMB NVARCHAR(3) NOT NULL,

    CLS FLOAT NOT NULL,

    MOM FLOAT

    )

    INSERT INTO dbo.MOM_ER(TIME_KEY, COMM_SYMB, CLS, MOM)

    SELECT 703, 'BO', 29.34, 0 UNION ALL

    SELECT 703, 'CL', 62.44, 0 UNION ALL

    SELECT 703, 'HO', 1.8089, 0 UNION ALL

    SELECT 704, 'BO', 29.34, 0 UNION ALL

    SELECT 704, 'CL', 62.43, 0 UNION ALL

    SELECT 704, 'HO', 1.7983, 0 UNION ALL

    SELECT 705, 'BO', 28.78, 0 UNION ALL

    SELECT 705, 'CL', 62.19, 0 UNION ALL

    SELECT 705, 'HO', 1.794, 0 UNION ALL

    SELECT 706, 'BO', 28.73, 0 UNION ALL

    SELECT 706, 'CL', 62.49, 0 UNION ALL

    SELECT 706, 'HO', 1.7788, 0 UNION ALL

    SELECT 707, 'BO', 28.63, 0 UNION ALL

    SELECT 707, 'CL', 62.03, 0 UNION ALL

    SELECT 707, 'HO', 1.7573, 0 UNION ALL

    SELECT 710, 'BO', 29.03, -0.309999999999999 UNION ALL

    SELECT 710, 'CL', 61.22, -1.22 UNION ALL

    SELECT 710, 'HO', 1.7243, -0.0846 UNION ALL

    SELECT 711, 'BO', 29.14, -0.199999999999999 UNION ALL

    SELECT 711, 'CL', 61.02, -1.41 UNION ALL

    SELECT 711, 'HO', 1.7224, -0.0759000000000001 UNION ALL

    SELECT 712, 'BO', 29.02, 0.239999999999998 UNION ALL

    SELECT 712, 'CL', 62.17, -0.019999999999996 UNION ALL

    SELECT 712, 'HO', 1.732, -0.0620000000000001 UNION ALL

    SELECT 713, 'BO', 28.79, 0.0599999999999987 UNION ALL

    SELECT 713, 'CL', 63.33, 0.839999999999996 UNION ALL

    SELECT 713, 'HO', 1.7765, -0.00229999999999997 UNION ALL

    SELECT 714, 'BO', 28.59, -0.0399999999999991 UNION ALL

    SELECT 714, 'CL', 64.09, 2.06 UNION ALL

    SELECT 714, 'HO', 1.7817, 0.0244 UNION ALL

    SELECT 717, 'BO', 28.15, -0.880000000000003 UNION ALL

    SELECT 717, 'CL', 62.79, 1.57 UNION ALL

    SELECT 717, 'HO', 1.7552, 0.0309000000000001 UNION ALL

    SELECT 718, 'BO', 28.5, -0.640000000000001 UNION ALL

    SELECT 718, 'CL', 63.46, 2.44 UNION ALL

    SELECT 718, 'HO', 1.7592, 0.0368000000000002 UNION ALL

    SELECT 719, 'BO', 28.53, -0.489999999999998 UNION ALL

    SELECT 719, 'CL', 63.72, 1.55 UNION ALL

    SELECT 719, 'HO', 1.7693, 0.0373000000000001 UNION ALL

    SELECT 720, 'BO', 28.58, -0.210000000000001 UNION ALL

    SELECT 720, 'CL', 62.66, -0.670000000000002 UNION ALL

    SELECT 720, 'HO', 1.74, -0.0365 UNION ALL

    SELECT 721, 'BO', 28.84, 0.25 UNION ALL

    SELECT 721, 'CL', 62.41, -1.68000000000001 UNION ALL

    SELECT 721, 'HO', 1.7213, -0.0604

    -- End expected Results

  • did you test using row_number or ranking functions ?

    That in combination with a CTE or temp table may help out on this kind of stuff.

    Check books online or SSC for examples:w00t:

    You should at least read Jeff or Lynns article on running totals,...

    (Start with Jeffs, because that one is also refered to by Lynn)

    Jeffs: http://www.sqlservercentral.com/articles/Advanced+Querying/61716/

    Lynns: http://www.sqlservercentral.com/articles/T-SQL/65522/

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution ๐Ÿ˜€

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • And, sorry... the running total article I wrote is in the process of being rewritten.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Actually, in SQL Server 2k5, we don't need to solve the running total problem for this because it's just a ranking problem... this should do it...

    ;WITH

    cteGroup AS

    (

    SELECT DENSE_RANK() OVER (ORDER BY Time_Key) AS Time_KeyGroup,

    Time_Key, Comm_Symb, CLS

    FROM dbo.Mom_Test

    )

    SELECT t1.Time_Key, t1.Comm_Symb, t1.CLS,

    t1.CLS - t2.CLS AS Mom

    FROM cteGroup t1

    LEFT OUTER JOIN cteGroup t2

    ON t1.Time_KeyGroup-5 = t2.Time_KeyGroup

    AND t1.Comm_Symb = t2.Comm_Symb

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Many thanks Jeff, much appreciated.

    I was hammering away with the row number function, but never got it to work successfully.

    Wayne

  • Thanks for the feedback, Wayne. I appreciate you taking the time as well posting your data in such a readily consumable format.

    As a sidebar, even though it makes for code that's very short and easy on the eyes, a self referencing CTE may actually cause a bit of a performance problem compared to what I consider to be optimal performance. The reason for the performance problem is two fold...

    Calling the CTE twice actually causes the CTE to execute twice much like it's near cousin, the derived table, would (you'd actually need to list the code twice there). The other problem may occur if the result set from the CTE is large... since part of the join is being done on a calculated column, there's virtually no chance of an index being used in the join between the two result sets of the CTE. It would likely be better to put the result set, with an additional column of precalculated column of Time_KeyGroup minus 5 (called Time_KeyGroup5) into a Temp table, index both Time_KeyGroup and the Time_KeyGroup5 columns, and then do the join. The use of SELECT/INTO will make the creation of the Temp table very, very fast even for some of the larger result sets I can see this being used for.

    The only method even faster than that, would be to simply add the Time_KeyGroup columns, along with the appropriate indexes, to the orginal table and keep them up to date.

    For all those getting ready to hammer me about using SELECT/INTO and the myth about the table locks it causes, it's not been a problem since version 6.5 sp1. Please don't take my word for it, though... instead, verify what I've just said through the following link, please...

    http://support.microsoft.com/kb/153441/EN-US/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (3/7/2009)


    Actually, in SQL Server 2k5, we don't need to solve the running total problem for this because it's just a ranking problem... this should do it...

    ;WITH

    cteGroup AS

    (

    SELECT DENSE_RANK() OVER (ORDER BY Time_Key) AS Time_KeyGroup,

    Time_Key, Comm_Symb, CLS

    FROM dbo.Mom_Test

    )

    SELECT t1.Time_Key, t1.Comm_Symb, t1.CLS,

    t1.CLS - t2.CLS AS Mom

    FROM cteGroup t1

    LEFT OUTER JOIN cteGroup t2

    ON t1.Time_KeyGroup-5 = t2.Time_KeyGroup

    AND t1.Comm_Symb = t2.Comm_Symb

    Surely it doesn't need to be this complex? This generates exactly the same results as are found in the Expected Results table:

    SELECT now.*, ISNULL(now.CLS-old.CLS, 0) AS calcMOM

    FROM #MOM_ER now

    LEFT JOIN #MOM_ER old

    ON old.COMM_SYMB = now.COMM_SYMB AND old.TIME_KEY + 7 = now.TIME_KEY

    Unless I'm missing something?

    โ€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.โ€ - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Chris Morris (3/9/2009)


    Jeff Moden (3/7/2009)


    Actually, in SQL Server 2k5, we don't need to solve the running total problem for this because it's just a ranking problem... this should do it...

    ;WITH

    cteGroup AS

    (

    SELECT DENSE_RANK() OVER (ORDER BY Time_Key) AS Time_KeyGroup,

    Time_Key, Comm_Symb, CLS

    FROM dbo.Mom_Test

    )

    SELECT t1.Time_Key, t1.Comm_Symb, t1.CLS,

    t1.CLS - t2.CLS AS Mom

    FROM cteGroup t1

    LEFT OUTER JOIN cteGroup t2

    ON t1.Time_KeyGroup-5 = t2.Time_KeyGroup

    AND t1.Comm_Symb = t2.Comm_Symb

    Surely it doesn't need to be this complex? This generates exactly the same results as are found in the Expected Results table:

    SELECT now.*, ISNULL(now.CLS-old.CLS, 0) AS calcMOM

    FROM #MOM_ER now

    LEFT JOIN #MOM_ER old

    ON old.COMM_SYMB = now.COMM_SYMB AND old.TIME_KEY + 7 = now.TIME_KEY

    Unless I'm missing something?

    Haven't tested yours but I don't believe they generate the same thing... your's looks like it picks up the Time_Keys that are a week apart... mine picks up the Time_keys from whatever data was available 5 entries ago.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff is right. I have to cater for non-trading days other than weekends (which are not included in my sample data) so having a 7 day difference from the time key will not always work. I also vary these windows, some being up to 40 periods which will include many non-trading days.

    Thanks,

    Wayne

  • Jeff,

    Thanks for posting the link about SELECT INTO. I've been given grief by other DBA's for using this technique for years - claiming that it locked sysobjects in tempdb. I use it because, as you stated, it's absolutely the fastest way to create a temp table.

    I bench marked a report procedure running it many times consecutively so cache wasn't a consideration and the CREATE TABLE/INSERT INTO ran from 1.5 to 1.9 seconds each time. The SELECT INTO version ran in 300 ms each time.

    Todd Fifield

  • You bet, Todd... Thanks for the feedback. Heh... It's amazing how long the myth of Select/Into has managed to survive.

    I'll have to look for it, but I've got a bit of test code hanging around somewhere that proves that there are no blocking locks and that multiple parallel routines can all be doing SELECT/INTO all at the same time.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (3/10/2009)


    You bet, Todd... Thanks for the feedback. Heh... It's amazing how long the myth of Select/Into has managed to survive.

    I'll have to look for it, but I've got a bit of test code hanging around somewhere that proves that there are no blocking locks and that multiple parallel routines can all be doing SELECT/INTO all at the same time.

    It's easy enough to prove. Open two connections in Management Studio. Write a script that selects something into a temp table that will take a humanly-measurable amount of time to finish. Call it 10 seconds to be on the safe side. Copy the script into both connections, and run them at the same time.

    Doesn't have the mathematical proof factor to it, but it sure does make it very, very real to people.

    If they still doubted after that, I'd also open the Management Studio Current Activity window and watch that with them while it was running. Shows blocks in there.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Wayne (3/9/2009)


    Jeff is right. I have to cater for non-trading days other than weekends (which are not included in my sample data) so having a 7 day difference from the time key will not always work. I also vary these windows, some being up to 40 periods which will include many non-trading days.

    Thanks,

    Wayne

    Thanks guys, thought I was losing me marbles. Nah don't answer that ๐Ÿ˜›

    โ€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.โ€ - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • GSquared (3/10/2009)


    Jeff Moden (3/10/2009)


    You bet, Todd... Thanks for the feedback. Heh... It's amazing how long the myth of Select/Into has managed to survive.

    I'll have to look for it, but I've got a bit of test code hanging around somewhere that proves that there are no blocking locks and that multiple parallel routines can all be doing SELECT/INTO all at the same time.

    It's easy enough to prove. Open two connections in Management Studio. Write a script that selects something into a temp table that will take a humanly-measurable amount of time to finish. Call it 10 seconds to be on the safe side. Copy the script into both connections, and run them at the same time.

    Doesn't have the mathematical proof factor to it, but it sure does make it very, very real to people.

    If they still doubted after that, I'd also open the Management Studio Current Activity window and watch that with them while it was running. Shows blocks in there.

    That's kinda what my code example does... except one takes 60 seconds to run and the other one takes 10 seconds to run and starts 10 seconds after the first one with log marks for start and end time. Don't depend on when stuff get's returned to the screen, either. Screen returns have about the lowest priority in the world in 2k and not much faster in 2k5. Ya gotta mark start and end times for both scripts.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • If you set nocount on, and all both scripts do is the select into, there's nothing to show on the screen except that it completed.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply