June 3, 2009 at 11:52 am
Hi,
I have this log table having around 100 millions records. Those logs are for the last year web activity.
If I query for few days, a week or even a month, the dataset is relatively small. A million at the most. This query is pretty fast. The query looks like this:
select count(distinct fk_tbl_visits_id), count(*), sum(nbPages),
DATEADD(dd, DATEDIFF(dd, 0, dateadd(hh, 0, date)), 0)
from tbl_logs date between '2009-05-18' and '2009-05-25'
group by DATEADD(dd, DATEDIFF(dd, 0, dateadd(hh, 0, date)), 0)
order by count(*) desc
But if a run this query for 6 month of even the year, the dataset is several millions and the query is really slow.
The query execution plan is the same with the same indexes used and so on. What slows down the query is the aggregate functions like count(*), sum or even group by.
So, is there any trick to speed up group by or aggregate functions of several millions of rows?
Thanks for any tips,
Stephane
June 3, 2009 at 11:58 am
That's the kind of thing that data warehouses are used for.
Simplest version of that for you would be to have a table that you populate daily, that has the counts for the prior day in it. Catch it up once, run it daily. Then, if you need to know the number of entries in a six-month period, you're summing up about 180 rows, instead of counting millions. Will be faster.
However, that kind of thing isn't as flexible as a real data cube. It doesn't do you much good if you have to add something else to your Where clause. You can pick the criteria you use most often, and use those as columns and as part of your group by in the pseudo-warehouse, to gain some flexibility at a slight loss of performance. That will get your halfway to a well-designed cube.
Beyond that, it's take a look at Analysis Services, run yourself through the tutorials on it, and then start building a cube based on your tables. It's easier than you might think, and the benefits can be quite significant, in terms of both speed and flexibility.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
June 3, 2009 at 12:10 pm
As a matter of fact, I heard of the cube thing and it sounded pretty complicated to me. I'll have a deeper look then.
Thanks a lot.
Stephane
June 4, 2009 at 7:23 am
It's definitely one of those things that looks complicated till you've done a few, and then it suddenly "clicks" and becomes very simple to understand. (Not necessarily simple to implement, but the concept becomes simple.)
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Viewing 4 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply