October 3, 2010 at 2:01 pm
Please down load text file attached Example.xls, 'MSFT.txt', and set up table.
CREATE TABLE [dbo].[DIM_Data_OHLC](
[RecID] [int] IDENTITY(1,1) NOT NULL,
[Date] [datetime] NOT NULL,
[Open] [float] NULL,
[High] [float] NULL,
[Low] [float] NULL,
[Close] [float] NULL,
[Volume] [float] NULL,
CONSTRAINT [PK_DIM_Data_OHLC] PRIMARY KEY NONCLUSTERED
(
[Date] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
go
BULK INSERT [dbo].[DIM_Data_OHLC]
FROM 'c:\MSFT.txt'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\r'
)
Once you have the data in, run this SELECT as your base data.
SELECT CONVERT(CHAR(10), [Date], 101) AS 'Date'
,[Open]
,[High]
,[Low]
,[Close]
,[Volume]
,DATEPART(weekday,[date]) AS 'DayPerWeek'
,(YEAR([Date])*100)+ MONTH([Date]) AS 'MonthNumber'
FROM [dbo].[DIM_Data_OHLC]
Lets start with WEELY LOGIC OHLCV data extraction from the daily data above:
HOW tos:
1) The date for weekly data is always the FRIDAY date. This is tricky when you have a week and there is no fridays data in daily format. So then if you have Mon, Tue, Wed data, you need to calculate Fridays date from the last day you have, so use Wed to determine Fridays date.
This is due to holidays in the OHLCV data.
2) You can tell with the DayPerWeek field when a week starts and ends
When DayPerWeek(n) is less than DayPerWeek(n-1) then you have a new week. You cant use 6 as the end for the reasons given in (1).
3) One needs to pull OHLCV for weekly data (as per example in spreadsheet) .
4) Open is always the first open of the DayPerWeek count (ie yellow area in spreadsheet), and close is the last close of DayPerWeek count, High is the highest value of the High field and low is the lowest value in the low field. The Date allocation is NOT taken from the last record, see (1) above. Volume is the sum of all data in the DayPerWeek count.
Thats the weekly, the monthly is the same except for:
1) Use the MonthNumber field instead of DayPerweek field
2) Month Date is the last data date provided by month data set. Just like the close value.
3) You can tell with the MonthNumber field when a month starts and ends When MonthNumber(n) is greater than MonthNumber(n-1) then you have a new month.
I need to be able to SELECT data on Daily, Weekly, Monthly format within a date range, so stored procedure inputs would be
@DateStart DATETIME
@DateEnd DATETIME
I do this in vb.net , I have no idea if it can be done TSQL side ??
Please help, thanks !:-D:-):-D
Ref: http://www.sqlmag.com/article/tsql3/t-sql-s-datetime-data-type.aspx
October 3, 2010 at 4:12 pm
Please provide ready to use sample data as described in the first link in my signature.
It's unlikely you'll find somebody converting the data for you.
I would use something like SELECT DATEADD(dd,4,DATEADD(ww,DATEDIFF(ww,0,Date-1),0))
to assign the Friday of each week to a given date. The Date - 1 is used to refer to the following Friday date starting with Monday instead of Sunday. A similar concept can be used for monthly data.
Side note: I would not rely on data of a "previous row" since usually the order of rows shouldn't matter (unless a running total or the like is needed). Instead I try to determine the values I need based on the columns available in a single row and apply it to all rows at once.
In order to get the values for Open and Close I'd probably use the ROW_NUMBER() function ordered by Date for Open and ordered by Date desc for Close, so I can query the data using Open_row=1 or Close_row=1.
To get High, Low and Volume I'd use aggregate functions.
October 3, 2010 at 4:20 pm
..."Please provide ready to use sample data as described in the first link in my signature."..
But I have, I dont know what you mean, please be specific ?:w00t:
October 3, 2010 at 4:30 pm
Digs (10/3/2010)
..."Please provide ready to use sample data as described in the first link in my signature."..But I have, I dont know what you mean, please be specific ?:w00t:
Did you really read the article I pointed you at?
"The Correct Way to Post Data" section describe the format as
SELECT '4','Oct 17 2007 12:00AM','5.1709','8','1' UNION ALL
If the data are presented that way we can simply copy and paste it and work on the solution instead of reformatting the data or download it and use some import scripts/tasks.
October 3, 2010 at 4:36 pm
My confusion comes, from the fact that I have posted many threads with BULK INSERT of data setups, with no troubles at all.
But i take your point, easier is best !
October 3, 2010 at 4:49 pm
Digs (10/3/2010)
My confusion comes, from the fact that I have posted many threads with BULK INSERT of data setups, with no troubles at all.But i take your point, easier is best !
I usually don't like to download a file and store it on my disc just for a test purpose. Just a personal preference...
I would rate your version of posting sample data as 2nd best. 😉
As a side note: you might want to check your BULK INSERT statement (table names don't match).
October 3, 2010 at 7:06 pm
Digs (10/3/2010)
Please down load text file attached Example.xls, 'MSFT.txt', and set up table.
CREATE TABLE [dbo].[DIM_Data_OHLC](
[RecID] [int] IDENTITY(1,1) NOT NULL,
[Date] [datetime] NOT NULL,
[Open] [float] NULL,
[High] [float] NULL,
[Low] [float] NULL,
[Close] [float] NULL,
[Volume] [float] NULL,
CONSTRAINT [PK_DIM_Data_OHLC] PRIMARY KEY NONCLUSTERED
(
[Symbol] ASC,
[Date] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
go
BULK INSERT [dbo].[DIM_Data_OHLC]
FROM 'c:\MSFT.txt'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\r'
)
Once you have the data in, run this SELECT as your base data.
SELECT CONVERT(CHAR(10), [Date], 101) AS 'Date'
,[Open]
,[High]
,[Low]
,[Close]
,[Volume]
,DATEPART(weekday,[date]) AS 'DayPerWeek'
,(YEAR([Date])*100)+ MONTH([Date]) AS 'MonthNumber'
FROM [dbo].[DIM_Data_OHLC]
Lets start with WEELY LOGIC OHLCV data extraction from the daily data above:
HOW tos:
1) The date for weekly data is always the FRIDAY date. This is tricky when you have a week and there is no fridays data in daily format. So then if you have Mon, Tue, Wed data, you need to calculate Fridays date from the last day you have, so use Wed to determine Fridays date.
This is due to holidays in the OHLCV data.
2) You can tell with the DayPerWeek field when a week starts and ends
When DayPerWeek(n) is less than DayPerWeek(n-1) then you have a new week. You cant use 6 as the end for the reasons given in (1).
3) One needs to pull OHLCV for weekly data (as per example in spreadsheet) .
4) Open is always the first open of the DayPerWeek count (ie yellow area in spreadsheet), and close is the last close of DayPerWeek count, High is the highest value of the High field and low is the lowest value in the low field. The Date allocation is NOT taken from the last record, see (1) above. Volume is the sum of all data in the DayPerWeek count.
Thats the weekly, the monthly is the same except for:
1) Use the MonthNumber field instead of DayPerweek field
2) Month Date is the last data date provided by month data set. Just like the close value.
3) You can tell with the MonthNumber field when a month starts and ends When MonthNumber(n) is greater than MonthNumber(n-1) then you have a new month.
I need to be able to SELECT data on Daily, Weekly, Monthly format within a date range, so stored procedure inputs would be
@DateStart DATETIME
@DateEnd DATETIME
I do this in vb.net , I have no idea if it can be done TSQL side ??
Please help, thanks !:-D:-):-D
Ref: http://www.sqlmag.com/article/tsql3/t-sql-s-datetime-data-type.aspx
I agree with Lutz... especially since the code won't actually work. For example, there is no "Symbol" column in the table to create the PK on it and the Bulk Insert has nothing in it to bypass the ID column. Although I've got to take my hat off in respect for a hell of a try, it's not so easy to accomodate.
That not with standing, the data file is in good shape and I believe I can easily modify it to a more readily consumable format using MS Word. I'll be back.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 3, 2010 at 7:21 pm
Hey Digs,
What's the max number of data rows you expect to handle at one time?
--Jeff Moden
Change is inevitable... Change for the better is not.
October 3, 2010 at 7:36 pm
HI Jeff,
Max records : 40000
Average : 6000
October 3, 2010 at 8:40 pm
Digs (10/3/2010)
HI Jeff,Max records : 40000
Average : 6000
Cool... This should do nicely with the indexes you currently have then because they're going to cause table scans the way they are. See the final paragraph of this post for a recommendation for indexes.
First, here's how to build the test data in the format according to the article you were asked to read. You really tried on your post but we'd prefer it this way AND we'd really like you to test your own work before posting to make sure it works...
--===== Create the test table. This is NOT a part of the solution.
CREATE TABLE [dbo].[DIM_Data_OHLC]
(
[RecID] [int] IDENTITY(1,1) NOT NULL,
[Date] [datetime] NOT NULL,
Symbol VARCHAR(5) NOT NULL DEFAULT 'AAAAA',
[Open] [float] NULL,
[High] [float] NULL,
[Low] [float] NULL,
[Close] [float] NULL,
[Volume] [float] NULL,
CONSTRAINT [PK_DIM_Data_OHLC] PRIMARY KEY NONCLUSTERED
([Symbol] ASC, [Date] ASC)
);
GO
--===== Populate the test table. This is not a part of the solution.
INSERT INTO [dbo].[DIM_Data_OHLC]
([Date], [Open], [High], [Low], [Close], [Volume])
SELECT '01/04/2010','30.62','31.1','30.59','30.95','38409100' UNION ALL
SELECT '01/05/2010','30.85','31.1','30.64','30.96','49749600' UNION ALL
SELECT '01/06/2010','30.88','31.08','30.52','30.77','58182400' UNION ALL
SELECT '01/07/2010','30.63','30.7','30.19','30.45','50559700' UNION ALL
SELECT '01/08/2010','30.28','30.88','30.24','30.66','51197400' UNION ALL
SELECT '01/11/2010','30.71','30.76','30.12','30.27','68754700' UNION ALL
SELECT '01/12/2010','30.15','30.4','29.91','30.07','65912100' UNION ALL
SELECT '01/13/2010','30.26','30.52','30.01','30.35','51863500' UNION ALL
SELECT '01/14/2010','30.31','31.1','30.26','30.96','63228100' UNION ALL
SELECT '01/15/2010','31.08','31.24','30.71','30.86','79913200' UNION ALL
SELECT '01/19/2010','30.75','31.24','30.68','31.1','46575700' UNION ALL
SELECT '01/20/2010','30.81','30.94','30.31','30.59','54849500' UNION ALL
SELECT '01/21/2010','30.61','30.72','30','30.01','73086700' UNION ALL
SELECT '01/22/2010','30','30.2','28.84','28.96','102004600' UNION ALL
SELECT '01/25/2010','29.24','29.66','29.1','29.32','63373000' UNION ALL
SELECT '01/26/2010','29.2','29.85','29.09','29.5','66639900' UNION ALL
SELECT '01/27/2010','29.35','29.82','29.02','29.67','63949500' UNION ALL
SELECT '01/28/2010','29.84','29.87','28.89','29.16','117513700' UNION ALL
SELECT '01/29/2010','29.9','29.92','27.66','28.18','193888500' UNION ALL
SELECT '02/01/2010','28.39','28.48','27.92','28.41','85931100' UNION ALL
SELECT '02/02/2010','28.37','28.5','28.14','28.46','54413700' UNION ALL
SELECT '02/03/2010','28.26','28.79','28.12','28.63','61397900' UNION ALL
SELECT '02/04/2010','28.38','28.5','27.81','27.84','77850000' UNION ALL
SELECT '02/05/2010','28','28.28','27.57','28.02','80960100' UNION ALL
SELECT '02/08/2010','28.01','28.08','27.57','27.72','52820600' UNION ALL
SELECT '02/09/2010','27.97','28.34','27.75','28.01','59195800' UNION ALL
SELECT '02/10/2010','28.03','28.24','27.84','27.99','48591300' UNION ALL
SELECT '02/11/2010','27.93','28.4','27.7','28.12','65993700' UNION ALL
SELECT '02/12/2010','27.81','28.06','27.58','27.93','81117200' UNION ALL
SELECT '02/16/2010','28.13','28.37','28.02','28.35','51935600' UNION ALL
SELECT '02/17/2010','28.53','28.65','28.36','28.59','45882900' UNION ALL
SELECT '02/18/2010','28.59','29.03','28.51','28.97','42856500' UNION ALL
SELECT '02/19/2010','28.79','28.92','28.69','28.77','44451800' UNION ALL
SELECT '02/22/2010','28.84','28.94','28.65','28.73','36707100' UNION ALL
SELECT '02/23/2010','28.68','28.83','28.09','28.33','52266200' UNION ALL
SELECT '02/24/2010','28.52','28.79','28.38','28.63','43165900' UNION ALL
SELECT '02/25/2010','28.27','28.65','28.02','28.6','48735300' UNION ALL
SELECT '02/26/2010','28.65','28.85','28.51','28.67','40370600' UNION ALL
SELECT '03/01/2010','28.77','29.05','28.53','29.02','43805400' UNION ALL
SELECT '03/02/2010','29.08','29.3','28.24','28.46','93123900' UNION ALL
SELECT '03/03/2010','28.51','28.61','28.35','28.46','48442100' UNION ALL
SELECT '03/04/2010','28.46','28.65','28.27','28.63','42890600' UNION ALL
SELECT '03/05/2010','28.66','28.68','28.42','28.59','56001800' UNION ALL
SELECT '03/08/2010','28.52','28.93','28.5','28.63','39414500' UNION ALL
SELECT '03/09/2010','28.56','29.11','28.55','28.8','50271600' UNION ALL
SELECT '03/10/2010','28.86','29.11','28.8','28.97','44891400' UNION ALL
SELECT '03/11/2010','28.89','29.19','28.85','29.18','35349700' UNION ALL
SELECT '03/12/2010','29.32','29.38','29.04','29.27','31700200' UNION ALL
SELECT '03/15/2010','29.18','29.37','29.01','29.29','37512000' UNION ALL
SELECT '03/16/2010','29.42','29.49','29.2','29.37','36723500' UNION ALL
SELECT '03/17/2010','29.5','29.87','29.4','29.63','50385700' UNION ALL
SELECT '03/18/2010','29.63','29.72','29.5','29.61','43845200' UNION ALL
SELECT '03/19/2010','29.76','29.9','29.35','29.59','81332100' UNION ALL
SELECT '03/22/2010','29.5','29.7','29.39','29.6','37718200' UNION ALL
SELECT '03/23/2010','29.59','29.9','29.41','29.88','42026600' UNION ALL
SELECT '03/24/2010','29.72','29.85','29.6','29.65','33987700' UNION ALL
SELECT '03/25/2010','29.83','30.57','29.8','30.01','73168700' UNION ALL
SELECT '03/26/2010','30.09','30.2','29.59','29.66','55595500' UNION ALL
SELECT '03/29/2010','29.71','29.82','29.55','29.59','33336000' UNION ALL
SELECT '03/30/2010','29.63','29.86','29.5','29.77','34954800' UNION ALL
SELECT '03/31/2010','29.64','29.72','29.17','29.29','63760000'
;
Now... self education time on your part. You need to check out the formulas in the following code and make sure you understand what they're doing. Seriously, take the time... they'll make stuff like this real easy in the future...
--===== Demo a couple of useful formulas
DECLARE @StartDate DATETIME,
@EndDate DATETIME
SELECT @StartDate = '1 Feb 2010',
@EndDate = '28 Feb 2010'
SELECT [RecID], [Date], [Symbol], [Open], [High], [Low], [Close], [Volume],
WeekNumber = DATEDIFF(dd,-2,Date)/7, -- (-2) is a Saturday (1899-12-30)
MonthNumber = DATEDIFF(mm,0,Date) -- (0) is a Monday (1900-01-01)
INTO #DailyInfo
WHERE [Date] >= @StartDate AND [Date] < @EndDate + 1
FROM [dbo].[DIM_Data_OHLC]
;
GO
Here's what the weekly table code would look like. Now, if you did your homework above, you'll realize that, except for a couple of column name changes and the change of just one forumula, the same code can be used for the monthly report. That's provided that you also understand that I haven't done any validation on the start and end date inputs. 😉
--===== Show how to do the weekly stuff. The monthly stuff is almost identical
-- except for a couple of column names and one little formula.
DECLARE @StartDate DATETIME,
@EndDate DATETIME
;
SELECT @StartDate = '1 Feb 2010',
@EndDate = '28 Feb 2010'
;
WITH
ctePreAgg AS
(
SELECT [Symbol],
WeekOpenDate = MIN([Date]),
WeekCloseDate = MAX([Date]),
[High] = MAX([High]),
[Low] = MIN([Low]),
[Volume] = SUM([Volume])
FROM [dbo].[DIM_Data_OHLC]
WHERE Symbol > '' AND [Date] >= @StartDate AND [Date] < @EndDate + 1
GROUP BY [Symbol], DATEDIFF(dd,-2,[Date])/7 --Week number
)
,
cteFinalAgg AS
(
SELECT cte.Symbol,
cte.WeekOpenDate,
cte.WeekCloseDate,
[Open] = (SELECT [Open] FROM [dbo].[DIM_Data_OHLC] t3 WHERE t3.Symbol = cte.Symbol AND t3.Date = cte.WeekOpenDate),
cte.High,
cte.Low,
[Close] = (SELECT [Close] FROM [dbo].[DIM_Data_OHLC] t3 WHERE t3.Symbol = cte.Symbol AND t3.Date = cte.WeekCloseDate),
cte.Volume
FROM ctePreAgg cte
)
SELECT Symbol,
WeekOpenDate,
WeekCloseDate,
[Open] = STR([Open],9,2),
High = STR(High,9,2),
Low = STR(Low,9,2),
[Close] = STR([Close],9,2),
Change = STR([Close]-[Open],9,2),
Swing = STR(High-Low,9,2),
Volume
FROM cteFinalAgg
ORDER BY Symbol,WeekCloseDate
As a recommendation, you can get the best of both worlds if you reverse the order of the column names in the PK and change it to a clustered index. Because it will be date oriented, you probably won't get any page splits and because it's clustered, the code above and all like it will scream with clustered indexes AND you'll have the added benefit of a much less fragmented table the hardly ever needs to be rebuilt except for where the non-clustered indexes come into play. And, there really shouldn't be any of those for this table unless you're doing something totally off the wall.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 3, 2010 at 8:42 pm
Heh... P.S.... don't worry about the correlated subqueries. The only time such a thing will become a problem is when an aggregated inequality is involved which will make for a triangular join the will eat your computer disk platters and memory chips for lunch.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 3, 2010 at 9:27 pm
Thanks Jeff,
1) Yes week and month is the same logic, just that the [Date] used is different
a) Weekly mut be a friday
b) month is last date of dataset.
2) I will have a play...dont leave town, (ha)..
Thanks for your box of tricks..
October 3, 2010 at 9:40 pm
Jeff, a little issue with the Date used for weekly data
The daily data can be nasty.
For example I can get data for a week like these three cases:
1) Tue, Wed, Thu
2) Mon, Tue, Wed
3) Wed, Thur, Fri
Case 3 is ok as there is a data record for the Friday of the week in question. However for case 1 and 2 there is NO friday data record for the weeks data.
What I need to do for case 1 and 2 is take Thu for case1 and Wed for case 2, determine their DayOfWeekNumber and then determine the FRIDAY DATE for these weeks.
I believe the way you have done it , just gives me the MAX OR MIN date for the weeks data, in which case this DATE may not be a friday , when it must.
So how do I do that nasty trick !
NOTE: They way you have done it at the moment is fine for monthly.
October 4, 2010 at 3:32 am
Thanx Jeff for cleaning up the sample data!
And here's my approach:
I'm not sure how it'll perform on a larger data set due to the double sort operation. But it might be compensated by just one table scan (table seek on larger data volume?).
;
WITH cte AS
(
SELECT
DATE,
Symbol,
[OPEN],
high,
low,
[CLOSE],
volume,
DATEADD(dd,4,DATEADD(ww,DATEDIFF(ww,0,DATE-1),0)) AS wk,
ROW_NUMBER() OVER(PARTITION BY Symbol,DATEADD(dd,4,DATEADD(ww,DATEDIFF(ww,0,DATE-1),0)) ORDER BY DATE ) AS row_open,
ROW_NUMBER() OVER(PARTITION BY Symbol,DATEADD(dd,4,DATEADD(ww,DATEDIFF(ww,0,DATE-1),0)) ORDER BY DATE DESC) AS row_close
FROM [dbo].[DIM_Data_OHLC]
WHERE Symbol > '' AND [DATE] >= @StartDate AND [DATE] < @EndDate + 1
)
SELECT
symbol,
wk AS WeekCloseDate,
MAX(CASE WHEN row_open = 1 THEN [OPEN] ELSE NULL END) AS [OPEN],
MAX(high) AS High,
MIN(low) AS Low,
MAX(CASE WHEN row_close = 1 THEN [CLOSE] ELSE NULL END) AS [CLOSE],
SUM(volume) AS Volume
FROM cte
GROUP BY symbol,wk
October 4, 2010 at 4:40 am
Digs (10/3/2010)
Jeff, a little issue with the Date used for weekly dataThe daily data can be nasty.
For example I can get data for a week like these three cases:
1) Tue, Wed, Thu
2) Mon, Tue, Wed
3) Wed, Thur, Fri
Case 3 is ok as there is a data record for the Friday of the week in question. However for case 1 and 2 there is NO friday data record for the weeks data.
What I need to do for case 1 and 2 is take Thu for case1 and Wed for case 2, determine their DayOfWeekNumber and then determine the FRIDAY DATE for these weeks.
I believe the way you have done it , just gives me the MAX OR MIN date for the weeks data, in which case this DATE may not be a friday , when it must.
So how do I do that nasty trick !
NOTE: They way you have done it at the moment is fine for monthly.
Ummm... you mean that what you really want is the calendar dates for the week start and end and not the actual dates of the data for the week start and end? Can do. Just verify that's what you really want.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 27 total)
You must be logged in to reply to this topic. Login to reply