consider WHERE, GROUP, or SELECT first when coming up with a nonclustered index?

  • Ok, so I want to create a (potentially covering) index on a table.  Columns from the table (fact_customer_package below) are referenced in all 3 of: the select list, the where clause, the group by clause.

    My question is...  What order should the columns referenced be used to define the index?  should I make the where clause columns the first in the index in the order that they're referenced (plan_star_date_id, entered_by_datetime), then the cust_id from the select?  Does the group by column have any relevance in the index?  That is, can an index help the group by clause?

    Also, will the convert() around fcp.entered_by_datetime foul up the query plan?  Should I create an index with convert(char(8),fcp.entered_by_datetime,1)?  Can I even do that in SQL 2000?

    select fcp.cust_id,

      min(dd1.date) as plan_start_date,

     min(fp.fiscal_period_id) as fiscal_period_id

    from fact_customer_package fcp

    inner join DIM_DATE dd1

     on fcp.plan_start_date_id = dd1.date_id

    inner join dim_date dd3

     on convert(char(8),fcp.entered_by_datetime,1) = convert(char(8),dd3.date,1)

    inner join DIM_FISCAL_PERIOD fp on dd3.year_month = fp.fiscal_period

    group by fcp.cust_id

    THANKS!

  • A Covering Index should be constructed with the Most unique (meaning highest selectivity) column first, followed by next most unique, down to the least unique.

    Yes Using the Convert around your fcp.Entered_by_datetime will "Mess" up your execution plan. you'll have to play with it, but if your tables are large I'd suspect you'll always have performance issues with a structure as this.

  • Thanks Ray, but I'm really looking for a "rule of thumb" when it comes to creating an index when you already have an exact query in mind.

    I understand that higher cardinality columns should come first in an index that might be used by any number of queries, but when there's a specific query in mind, wouldn't the order of the where clause take precedence?  Should the column(s) mentioned in the SELECT list come before the where clause columns or after?

    Thanks

  • Greg, the rule of thumb should be that columns that appear ONLY in the SELECT list are trivial, and should not be the first column(s) in your index.

    If those same columns, or different ones, appear in a JOIN or WHERE clause, or a GROUP BY or ORDER BY, then they have more influence on the query.

    It's difficult to generalize much more than that - it all depends on your data distribution, and as Ray has already stated the most restrictive should be the first in the index definition. However, I personally would tend to give columns in a WHERE clause more significance in the index than those that appear in a JOIN clause, since the ability to eliminate as much redundant data as possible as soon as possible (via a WHERE clause) is something the optimizer will always look for; that way, the subsequent join clauses have less data to manipulate.

  • nice.  Thanks.  that's what I'm looking for Philip. 

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply