Speed Up SELECT DISTINCT Queries

  • Hi,

    I have a table that logs each visits on a web site. I would like to know the number of unique visitors. So I use a select distinct, but for a table of 25 M rows, the query take 3 to 5 seconds. Is there a way to speed that up?

    The table :

    CREATE TABLE [dbo].[tbl_visits](

    [id] [int] IDENTITY(1,1) NOT NULL,

    [startTime] [datetime] NULL CONSTRAINT [DF_tbl_visits_startTime] DEFAULT (getdate()),

    [fk_tbl_ip_address_id] [int] NULL,

    CONSTRAINT [PK_tbl_visits] PRIMARY KEY CLUSTERED

    (

    [id] ASC

    )WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]

    ) ON [PRIMARY]

    CREATE NONCLUSTERED INDEX [idx_ip] ON [dbo].[tbl_visits]

    (

    [startTime] ASC,

    [fk_tbl_ip_address_id] ASC

    ) ON [PRIMARY]

    The query :

    select count(distinct fk_tbl_ip_address_id) from tbl_visits

    and startTime between '2007-01-11 00:00:00' and '2008-01-11 23:59:00'

    The index is used and an seek is made. When I remove the distinct, it runs really fast. I try to think about a table to store each unique visitor, but it's not possible since the number of unique visitors changes depending of the date range (a visitor can be counted once in a day, but 5 times in a week)

    Any idea how to speed that up?

    Thanks for any help

    Stephane

  • Some other ideas:

    it sounds like you have to be able to report down to visits per day so you could

    create a reporting table that gets updated each night for the ip activity by day

    you could also insert / update to it on the fly instead of nightly based on needs

  • you could also insert the non distinct hits to a temp table and aggregate against that as a test.

  • Joel Ewald (1/24/2008)


    you could also insert the non distinct hits to a temp table and aggregate against that as a test.

    Hmm... how would you do that?

  • CREATE TABLE #Test ( fk_tbl_ip_address_id varchar(15) )

    INSERT INTO #Test

    select fk_tbl_ip_address_id

    from tbl_visits

    and startTime between '2007-01-11 00:00:00' and '2008-01-11 23:59:00'

    select count(distinct fk_tbl_ip_address_id ) FROM #Test

    you might even want to create an index against the temp table

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply