reg....count distinct query

  • Hi,

    I've tried whatever I know, if anyone can help me on this it would be really very helpful.

    The application is related to web server hits reporting. The platform is asp/vb/sql2000/win 2k. The main database consists of the records from the Web Server Log files and it grows at 0.1 million records per day. There are currently 1679606 records in the main table t_reportlogspecials, which has all the data to be reported.

    I've given the query & table structure below. The problem is the query is taking 30 seconds to execute. It takes < 3 seconds when I have only count(*) but it takes 30 seconds when I have the count(distinct cookieid) & count(distinct aspsessionid). When I saw the execution plan it showed that it used the index seek on both cases.

    There was also another pblm that I noted when I ran the query in Query analyzer as a direct SQL stmt it took 23 secs and when I ran it as a Stored Proc it took 30 secs.

    The break up for the execution plan for

    1)Query

    Total time = 23 sec, index scan cost 19%, index IO cost 5.8, table scan (2 nos) each cost 22%, IO cost 8.34

    2)Stored proc

    Total time = 30 sec, index scan cost 37%, index IO cost 3.87, table scan (2 nos) each cost 21%, IO cost 0.662

    I need to know 2 things

    1) How to tune or make this aggregation query return the count (distinct..) values faster. I need to fix this query to return data quickly. This is the main priority.

    2) Why should the query take lesser time to execute than the stored proc even if it is so, why should the diff be so much.

    Awaiting reply

    Mohan

    ========

    The query is

    SELECT day(requestdt) AS timeperiod, COUNT(*) AS requestcount,

    COUNT(DISTINCT CookieID) AS ucount,

    COUNT(DISTINCT ASPSessionID) AS visitcount

    FROM t_ReportLogSpecials

    WHERE (RequestDT > '10/31/2001') AND (RequestDT < '12/01/2001')

    And (FSIAdvertiserID = 246 or ropadvertiserid=246)

    And cookieid<>’’ and aspsessionid<>’’

    GROUP BY day(requestdt)

    ORDER BY count(*) desc

    The Table structure is

    CREATE TABLE [dbo].[t_ReportLogSpecials] (

    [ID] [bigint] IDENTITY (1, 1) NOT NULL ,

    [RequestDT] [datetime] NULL ,

    [NewspaperID] [int] NULL ,

    [FSIAdvertiserID] [int] NULL ,

    [ROPAdvertiserID] [int] NULL ,

    [FSIID] [int] NULL ,

    [PageID] [int] NULL ,

    [AdID] [int] NULL ,

    [LocationID] [int] NULL ,

    [SubregionID] [int] NULL ,

    [ManufacturerID] [int] NULL ,

    [CategoryID] [int] NULL ,

    [ClientIP] [varchar] (16) NULL ,

    [ServerName] [varchar] (16) NULL ,

    [HostName] [varchar] (32) NULL ,

    [CookieID] [varchar] (24) NULL ,

    [UserAgent] [varchar] (50) NULL ,

    [ASPSessionID] [varchar] (33) NULL ,

    [List] [bit] NULL ,

    [Qstring] [varchar] (256) NULL

    ) ON [PRIMARY]

    GO

    ALTER TABLE [dbo].[t_ReportLogSpecials] WITH NOCHECK ADD

    CONSTRAINT [PK_t_ReportLogSpecials] PRIMARY KEY CLUSTERED

    (

    [ID]

    ) ON [PRIMARY]

    GO

    CREATE INDEX [IX_Calendar] ON [dbo].[t_ReportLogSpecials]([RequestDT], [FSIAdvertiserID], [ROPAdvertiserID], [CookieID], [ASPSessionID]) ON [PRIMARY]

    GO

    ===============

  • Usually when a proc returns worse results than a dynamic query it means that the compiled plan is bad - either because the statistics are out of date or the data is such that the compiled plan was optimum for the initial run, but bad for the later ones.

    Start by running sp_updatestats to make sure your statistics are good. If you want to see for sure, you can profile your proc to see if it is recompiling, but I'd suggest just using sp_recompile procname. Then do your comparison again, should be the same or slighly better for the proc. If you find the original problem cropping up with different queries, alter the proc to use the with recompile option which forces it to generate a plan each time. Potentially slower, but if you're encountering the problem is the best way to fix it.

    Index doesnt look bad to start with. The trick (if there is one!) is to kill the table scans. I imagine you've tried the tuning wizard? One think I always look at to start with it to see if the clustered index is being used well. In this situation looks ok, possibly you could try it on request date, but you dont need the records sorted that way so possibly no gain.

    Lets look at the results of stats (doesnt hurt) and the recompile, then go from there.

    One other thing you might want to consider is how often you need to run this - I'd suggest running something like this once a day just after midnight for the previous day and logging to a table - then you can do reporting from the summary table. If this would work, then you don't need to spend a lot of time tuning it.

    Andy

  • Hi Andy,

    Thanx for u r tips. I did the update statistics & now both the query & stored proc take exactly same time.

    I'm looking on to Indexed Views. I've never worked on it. I'll try to create an indexed view & check how it works out. If that dos'nt help i have to use the summary table which you have suggested. If you can give me some min. tips on handling large data & design of summary table to handle this issue, it would be of immense help.

    thanx again

    mohan

  • Large data isnt necessarily bad or slow. Keep the table narrow, use the smallest data types that will work and always normalize. Beyond that, decent indexes are all you need. Access time is complicated - its a combination of how selective your indexes are, how much data is already in cache, how fast the drives are if not cached, etc.

    For the summary table, I'd suggest a table that mirrors the output of your summary query you have now. The only difference being you'll want to use the whole data (minus any time component) instead of just the day. I'd recommend making date unique. Then in your summary query just have it check for the existence of a matching date record and delete it if it exists, that will allow you to re-run the query if needed. Once that is done, then you can build reports from the summary table. Think of it as a cheap OLAP solution!

    Andy

  • I agree with Andy. Large data isn't bad. If the query performs badly, then your design might need some rethinking or use Andy's summary options.

    Steve Jones

    steve@dkranch.net

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply