Converting Daily OHLCV data into weekly and monthly OHLCV

  • With the understanding that you'll eventually verify what I asked, the following code does the trick. I went for logical understanding and ease of maintenance rather than code brevity. Performance doesn't drop much even with the added request. BTW... you do have a Tally table by now, yes?

    --===== Show how to do the weekly stuff. The monthly stuff is almost identical

    -- except for a couple of column names and one little formula.

    --===== This simulates the input parameters to a sproc

    DECLARE @StartDate DATETIME,

    @EndDate DATETIME

    ;

    SELECT @StartDate = '1 Feb 2010', --30 jan to 5 feb

    @EndDate = '28 Feb 2010'

    ;

    WITH

    cteDateSpan AS

    ( --=== Calculate 1 date range per week (per row) to include both the start and end date inputs

    SELECT WeekOpenDate = DATEADD(dd,(DATEDIFF(dd,-2,@StartDate)/7+(t.N-1))*7 ,-2),

    NextWeekOpenDate = DATEADD(dd,(DATEDIFF(dd,-2,@StartDate)/7+(t.N-1))*7+7,-2)

    FROM dbo.Tally t

    WHERE t.N BETWEEN 1 AND (DATEDIFF(dd,-2,@EndDate)/7+1)-(DATEDIFF(dd,-2,@StartDate)/7) --Number of weeks

    )

    ,

    ctePreAgg AS

    ( --=== This calculates the things that can be calculated at this time.

    -- The Open/Close dates will be necessary to find the open and close values for the week.

    -- This also takes dates outside the requested range because they may not land on a week

    -- start or week end date. In other words, the requested start and end dates WILL be

    -- included in the span of weeks below.

    SELECT [Symbol],

    WeekOpenDate = MIN([Date]),

    WeekCloseDate = MAX([Date]),

    [High] = MAX([High]),

    [Low] = MIN([Low]),

    [Volume] = SUM([Volume])

    FROM [dbo].[DIM_Data_OHLC]

    WHERE [Date] >= DATEADD(dd,(DATEDIFF(dd,-2,@StartDate)/7)*7 ,-2) --FirstWeekOpenDate

    AND [Date] < DATEADD(dd,(DATEDIFF(dd,-2,@EndDate) /7)*7+7,-2) --LastWeekOpenDate + 1 Week (7 days)

    GROUP BY [Symbol], DATEDIFF(dd,-2,[Date])/7 --Week number

    )

    ,

    cteFinalAgg AS

    ( --=== Get the weekly open/close info. The correlated subqueries work very much like CROSS APPLY in this case

    -- and are very fast.

    SELECT cte.Symbol,

    cte.WeekOpenDate,

    cte.WeekCloseDate,

    [Open] = (SELECT [Open] FROM [dbo].[DIM_Data_OHLC] t3 WHERE t3.Symbol = cte.Symbol AND t3.Date = cte.WeekOpenDate),

    cte.High,

    cte.Low,

    [Close] = (SELECT [Close] FROM [dbo].[DIM_Data_OHLC] t3 WHERE t3.Symbol = cte.Symbol AND t3.Date = cte.WeekCloseDate),

    cte.Volume

    FROM ctePreAgg cte

    ) --=== Do the presentation stuff. This is where we put the full week dates in where the data may not have them.

    SELECT final.Symbol,

    span.WeekOpenDate,

    WeekCloseDate = DATEADD(dd,-1,span.NextWeekOpenDate),

    [Open] = STR(final.[Open],9,2),

    High = STR(final.High,9,2),

    Low = STR(final.Low,9,2),

    [Close] = STR(final.[Close],9,2),

    Change = STR(final.[Close] - final.[Open],9,2),

    Swing = STR(final.High - final.Low,9,2),

    final.Volume

    FROM cteFinalAgg final

    RIGHT JOIN cteDateSpan span

    ON final.WeekOpenDate >= span.WeekOpenDate AND final.WeekOpenDate < span.NextWeekOpenDate

    ORDER BY Symbol,span.WeekOpenDate

    ;

    GO

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • @Digs:

    I'd be interested to see the performance results of Jeffs latest version compared to my approach.

    I'd like to see the effect of multiple sort operations (like I do) vs. multiple scan counts/logical reads (like in Jeffs version) on your real data. Just curious, though...

    @jeff:

    Your latest version include results for 2010-02-27 to 2010-03-05. But those are out of the date range to be queried.

    I guess Digs need to decide where to place the "week break" and how to handle data of a week where the start or end date is not a Friday. Currently it seems like there are no data for the weekend which makes it a little easier assuming monday as the beginning of a new week: If a start or end date will be on a weekend, the code wouldn't include data for the following week.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • LutzM (10/4/2010)


    @jeff:

    Your latest version include results for 2010-02-27 to 2010-03-05. But those are out of the date range to be queried.

    I guess Digs need to decide where to place the "week break" and how to handle data of a week where the start or end date is not a Friday. Currently it seems like there are no data for the weekend which makes it a little easier assuming monday as the beginning of a new week: If a start or end date will be on a weekend, the code wouldn't include data for the following week.

    The desired end date was 2010-02-28. According to the "rules" I built into the code, that's in the week for the 2010-02-27 to 2010-03-05 which is why I listed it so it's not really "out of range" for my understanding of the problem.

    However, you're absolutely correct. Shoud dates that occur on the weekend even be considered? Guess we'll find out as soon as Digs replies.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks for the effort lads.

    1) NO data on weekends, as NYSE, AMEX and NASD are closed on these days. Lets hope they remain so !

    2) Weekly data must end with a FRIDAYS date, of the week in question, (if data exists within the 5 day period of a week). If NO data for the week, then the friday for that week is not required.

    3) I have 1000 symbols, and 2.5 million records in one OHLC table.

    I will play with your results thanks !:-)

  • Jeff and or Lutz

    If I have data in a table, with a Date as the Primary key.

    And data is currently IN ORDER ASCENDING.

    If I do a

    SELECT * FROM <TABLE>

    ORDER BY [DATE] ASC

    Question: As data is already in order, does this mean the cpu demands on sorting the data is less or not required, therefore the cpu usage of would be the same for these two queries: 😉

    SELECT * FROM <TABLE>

    ORDER BY [DATE] ASC

    and

    SELECT * FROM <TABLE>

    ATTENTION: Jeff, you can be on my payroll anyday, just need a turnover to pay you !

  • Jeff I get an Error when I run your code

    FROM dbo.Tally t

    INVALID OBJECT !

  • Lutz,

    With your method, how does one get monthly data ???

  • Digs (10/4/2010)


    Jeff I get an Error when I run your code

    FROM dbo.Tally t

    INVALID OBJECT !

    Assuming Jeff is busy at the moment, check my signature for the dbo.Tally. You'll want to build one of these. They're small, but incredibly powerful.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Roger that !:-)

  • Digs (10/4/2010)


    Lutz,

    With your method, how does one get monthly data ???

    Version A:

    Step 1: Verify that the current solution will return the requested result.

    Step 2: Understand how it works. When in doubt, ask for clarification.

    Step 3: Clarify the requirements in terms of input data and expected results (e.g. What are the expected results for a monthly report with start_date = 2010/02/15 and end_date = 2010/03/15?)

    Step 4: Try to modify the code to meet the requirement (most probably the change from DATADD(wk,DATEDIFF(wk)) stuff to DATADD(mm,DATEDIFF(mm)) will be involved 😉 )

    Step 5: When in doubt, post what you have so far and where you get stuck.

    Version B:

    Find someone at SSC to post a ready to use solution.

    Version B probably will be faster. But version A will actually help you to solve such issues by yourself in the future so you won't have to rely on a forum reply. It'll actually help you to help yourself. And it's more fun anyway.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Digs (10/4/2010)


    Jeff I get an Error when I run your code

    FROM dbo.Tally t

    INVALID OBJECT !

    I told you... hopefully you have a Tally Table by now. Since you don't, you really need to read the following article and play with the code to see what is possible...

    http://www.sqlservercentral.com/articles/T-SQL/62867/

    ... if you don't learn what the Tally Table is, you're missing out on a huge amount of what SQL Server can actually do.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Digs (10/4/2010)


    Thanks for the effort lads.

    1) NO data on weekends, as NYSE, AMEX and NASD are closed on these days. Lets hope they remain so !

    2) Weekly data must end with a FRIDAYS date, of the week in question, (if data exists within the 5 day period of a week). If NO data for the week, then the friday for that week is not required.

    3) I have 1000 symbols, and 2.5 million records in one OHLC table.

    I will play with your results thanks !:-)

    No... based on that, it's time to stop playing. Post the CREATE TABLE script for the table including all indexes, keys, and triggers. Also post 10 lines of real data. It's time to get serious.

    And read that article on the Tally Table and how it works... you're really going to need it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 12 posts - 16 through 27 (of 27 total)

You must be logged in to reply to this topic. Login to reply