Performance problem with SQL queries having a specific column mentioned in WHERE or GROUP BY clause.

  • Performance problem with SQL queries having a specific column mentioned in WHERE or GROUP BY clause.

    **Table structure:**

    CREATE TABLE [dbo].[RPT_LOG_SEARCH_WEB_T](

    [SEARCH_DATE] [smalldatetime] NULL,

    127.0.0.1 [nvarchar](20) NULL,

    [BAR_ID] [uniqueidentifier] NULL,

    [QUERY] [nvarchar](2000) NULL,

    [QUERY_STRING] [nvarchar](3000) NULL,

    [SRC] [nvarchar](6) NULL,

    [HTTP_STATUS] [int] NULL,

    [CS_USER_AGENT] [nvarchar](4000) NULL,

    [COUNTRY] [nvarchar](6) NULL,

    [PID] [nvarchar](6) NULL,

    [TEST_ID] [nvarchar](6) NULL,

    [TEST_GROUP_ID] [nvarchar](500) NULL,

    [S_COMPUTERNAME] [nvarchar](50) NULL,

    [BROWSER_TYPE] [nvarchar](15) NULL,

    [BROWSER_TYPE_ID] [smallint] NULL,

    [BROWSER_VERSION] [varchar](10) NULL

    ) ON [RPT_LOG_SEARCH_WEB_PS]([SEARCH_DATE])

    GO

    SET ANSI_PADDING OFF

    GO

    /****** Object: Index [RPT_LOG_SEARCH_WEB_CI] Script Date: 11/05/2009 18:19:33 ******/

    CREATE CLUSTERED INDEX [RPT_LOG_SEARCH_WEB_CI] ON [dbo].[RPT_LOG_SEARCH_WEB_T]

    (

    [SEARCH_DATE] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [RPT_LOG_SEARCH_WEB_PS]([SEARCH_DATE])

    **Row Count:**

    4,137,403,549

    **The following query runs fine:**

    select query, count(*) from dbo.RPT_LOG_SEARCH_WEB_T

    where SEARCH_DATE > '20091105'

    group by query

    **The following query RUNS FOREVER:**

    select ip, count(*) from dbo.RPT_LOG_SEARCH_WEB_T

    where SEARCH_DATE > '20091105'

    group by ip

    Any help on the subject will be appretiated...

  • Is that 4 billion records?!

    Without having a clue about the data I would think that [Query] column would have a smaller number of distinct values (how different can be [Query] values from one another?). Hence there is more "counting" work to do with 127.0.0.1 values. You may need an index on 127.0.0.1 field to help you with the second query.

  • I'll agree with Al-279884 and add that a compare of the execution plans between the 2 queries may shed some light on the subject.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • I tried to get an execution plan for the problematic query.

    However, once the "IP" field is mentioned in the query I can not get an execution plan for the query as well.

  • Yes - 4 billion records. One of our smaller tables :-D.

    Actually, the number of distinct values for "IP" is much lower than the one for "QUERY"

  • daniell-1020276 (11/5/2009)


    I tried to get an execution plan for the problematic query.

    However, once the "IP" field is mentioned in the query I can not get an execution plan for the query as well.

    SSMS outputs the execution plan after the query completes, but if you use Profiler to capture the query, it will display the execution plan first, then process the query.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • daniell-1020276 (11/5/2009)


    I tried to get an execution plan for the problematic query.

    However, once the "IP" field is mentioned in the query I can not get an execution plan for the query as well.

    Try to get Estimated Execution Plan rather than Actual Execution Plan (if it isn't what you tried already).

    Also 4 billion records with some wide nvarchar fields must be some serious volume of data. Since these seem to be logging/reporting tables, have you considered partitioning the data?

  • Al-279884 (11/5/2009)


    daniell-1020276 (11/5/2009)


    I tried to get an execution plan for the problematic query.

    However, once the "IP" field is mentioned in the query I can not get an execution plan for the query as well.

    Try to get Estimated Execution Plan rather than Actual Execution Plan (if it isn't what you tried already).

    Also 4 billion records with some wide nvarchar fields must be some serious volume of data. Since these seem to be logging/reporting tables, have you considered partitioning the data?

    I'll try to get an Estimated Execution Plan.

    The table is partitioned by SEARCH_DATE.

  • Also You can try dropping the index and check whether the table scan is faster.If its faster with 4 billion records then it will surely be faster with 4 billion + records considering that You might insert more records into the table in the future.

  • The only index on the table is the clustered index. I would not drop the clustered index on a table w/ 4 billion records just to see what happens. Check the query plan before you do anything with your indexing.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply