May 17, 2017 at 7:23 am
Hi,
I'm using a time series table with a SMALLDATETIME field.
The table stores sensors' data every 15 minutes, so the date values are: 01-01-2017 00:00, 01-01-2017 00:15, ....
The data can be searched in some scenarios like this (and other variations with different group by and sums):
SELECT sensor, MONTH(date), SUM(value) FROM sensorData WHERE YEAR(date) = 2017 GROUP sensor, MONTH(date)
This will do an index scan and not seek... I thought the date type was already stored in SQL by its parts (year, month, day, ...).
What's the best way to do this type of query? Create computed columns with YEAR(date), MONTH(date), DAY(date), ... and create indexes over those columns?
Thanks,
Pedro
May 17, 2017 at 7:36 am
PiMané - Wednesday, May 17, 2017 7:23 AMHi,I'm using a time series table with a SMALLDATETIME field.
The table stores sensors' data every 15 minutes, so the date values are: 01-01-2017 00:00, 01-01-2017 00:15, ....
The data can be searched in some scenarios like this (and other variations with different group by and sums):
SELECT sensor, MONTH(date), SUM(value) FROM sensorData WHERE YEAR(date) = 2017 GROUP sensor, MONTH(date)
This will do an index scan and not seek... I thought the date type was already stored in SQL by its parts (year, month, day, ...).
What's the best way to do this type of query? Create computed columns with YEAR(date), MONTH(date), DAY(date), ... and create indexes over those columns?Thanks,
Pedro
Try this, Pedro:
DECLARE @StartRange DATE = '20170101', @EndRange DATE = '20180101'
WHERE [Date] >= @StartRange AND [Date] < @EndRange
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
May 17, 2017 at 7:50 am
Without even changing the parameter type.
CREATE TABLE #SampleDates(
SomeDate smalldatetime
)
CREATE CLUSTERED INDEX CISampleDates ON #SampleDates(SomeDate)
INSERT INTO #SampleDates
SELECT DATEADD( hh, (ABS( CHECKSUM( NEWID())) % 100000 ) , '2015')
FROM sys.all_columns
GO
DECLARE @Year int = 2017;
SELECT *
FROM #SampleDates
WHERE SomeDate >= CAST(@Year AS char(4))
AND SomeDate < CAST(@Year + 1 AS char(4));
GO
DROP TABLE #SampleDates
May 17, 2017 at 7:52 am
Thanks,
That was my first thought... build a function that would return the date range given year, month, day, hour, minutes....
But since every sensor has 96 rows a day, the system has over 2000 sensors and 5 years of data, the GROUP BY MONTH(date) won't be faster if there's a computed column month with an index ?
(I'm still working on this huge database scenario, haven't made tests yet but I'll be making then latter to see all possibilities...)
Thanks,
Pedro
May 17, 2017 at 9:29 am
Just an fyi... the number of rows you'll have is just over 350 million. Large, yes. Gigantor, not really. This is typically the kind of thing that is best served by throwing hardware at it, as long as your query is efficient and you have a good database design. It doesn't sound like you'll have any way to avoid that number of rows, so I don't think you'll have any other alternatives on the row count.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
May 17, 2017 at 9:37 am
sgmunson - Wednesday, May 17, 2017 9:29 AMJust an fyi... the number of rows you'll have is just over 350 million. Large, yes. Gigantor, not really. This is typically the kind of thing that is best served by throwing hardware at it, as long as your query is efficient and you have a good database design. It doesn't sound like you'll have any way to avoid that number of rows, so I don't think you'll have any other alternatives on the row count.
That's right...
For a year I have 28 million rows.... And this is for a building only... The manage energy efficiency for several buildings and they are thinking of making this solution multi tenant... More rows even...
I just recommended them the best storage money can buy since this is going to be very write intensive (all buildings will send the data every 15 minutes from the 2000 sensors they have) and that data will have to be "processed" into the final table (that will also have lots of writes and reads).
I made some tests with computed columns but the outcome is the same as having MONTH(date) so no point on having those columns.
Thanks for you help... My next fight will be the ETL to convert the RAW data into the final data....
Pedro
Viewing 6 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply