Single or Compound Clustered Index for Large Fact Table (SQL 2008)

  • [we're actually moving to SQL Server 2008 but this 2005 forum seemed best for this question]

    The question of Clustered indexes comes up a lot, we know we need one, we will be partitioning the tables by date. But the question is how wide to make it for maximum query benefit.

    We're building a dimensional data mart database and web reporting application with very large fact tables for sales, they are all like this one:

    TABLE [dbo].[DAILY_SALES](

    [date_key] [int] NOT NULL, [e.g. 20091108]

    [product_id] [int] NOT NULL,

    [organization_id] [int] NOT NULL,

    [geography_id] [int] NOT NULL,

    [sales_qty] [bigint] NOT NULL,

    [etl_id] [smallint] NOT NULL)

    Most of the MSDN Fast Track documents seem to recommend clustering *only* on the date column or other columns "that are used in query restrictions for the majority of the query workload".

    http://msdn.microsoft.com/en-us/library/dd459178.aspx

    All of our application queries must include both date and product id, with about 75% that would also filter additionally or group by organization and geography, or on other columns of the Product table.

    So we are asking ourselves,

    Which is better:

    Clustered Index on date_key only;

    Clustered Index on date_key AND product_id;

    Clustered Index on all IDs date, product, org. and geo.

    then in addition to one of those clustered indices

    a Covering Indices on any IDs not included in the clustered one?

    Thank you everyone for allowing a revisit to this frequent but important question.

    Any suggestions or leads would be welcome.

    - jh

    Skål - jh

  • I would cluster the date column only and create a non-clustered index on your key columns linked to dimensional tables.

    When you query the data, filter the query based on the date and data attributes in the dimension tables, this will narrow the fact data by the non-clustered index from your dimension and the clustered index on your date.

    Keep in mind that SQL Server 2008 has improvemed the query optimizer to work with querying star-schemas.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Top recommendations for a clustered index - narrow, unique, non-changing (not an issue in a data warehouse), ever-increasing

    Based on what you've said, I'd start with cluster on date_key (the partition key) and use nonclusters for queries, first a NC on ProductID and/or ProductID and Geography and/or ProductID and Organisation with the sales_qty as an include column on all the NC indexes. Then test, make sure that's working well, add, expand or drop the nonclustered indexes as necessary.

    As always, test carefully, make sure it works for you.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Great thoughts thanks everyone! You're helping clarify the next choices.

    much appreciated!

    Skål - jh

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply