Order of columns in Clustered index, performance at insert vs. performance at query

  • I have some problems with determing how i should make the clustered Index on my table. First of all some background information (unfortunately this is pseudocode because i am not allowed to give you the real code, but i think you'll get the idea) :

    CREATE TABLE Items (

    ID int,

    IDParent int,

    Name varchar(100) )

    CREATE TABLE Statistics (

    ID int,

    IDItem int,

    Date smallDateTime,

    clicks int,

    impressions int )

    Every night we put around 100.000 new records in the Statistics table, all with the date of yesterday. So it seems logical to put de Clustered index on Date, IDItem.

    BUT: When we query the data (which is done a lot at daytime, and these are quite havy queries) we execute queries somewhere like this:

    SELECT i.Name,

    , SUM(stat.Clicks)

    , SUM(stat.impressions)

    FROM Statistics stat

    JOIN Items i ON i.ID = stat.IDItem

    WHERE stat.Date BETWEEN '2008-07-01' AND '2008-10-01'

    AND i.IDParent = 123

    GROUP BY i.Name

    The "i.IDParent = 123" narrows the number of items to 10 (of the 10 million), the "stat.Date between..." narrows the statistics to 90 records of 4 years of data.

    So as you can see: The Statistics.IDItem is deffinately the most narrowing column and should be the first in the index. However: all data is inserted with a certain column, and for that reason you want Date to be the first column....

    What can i do best now?? query performance is MUCH more important than the performance of the inserts, so i'm thinking of putting IDItem first. But when i do that my indexes will get fragmented every time i put new data in it.

  • "query performance is MUCH more important than the performance of the inserts, so i'm thinking of putting IDItem first. But when i do that my indexes will get fragmented every time i put new data in it." If you choose this you might consider rebuilding your index on regular bases..

    ------------------------------------------------------------
    Application Architect and Developer @ Traffic4u
    Visit my profile on LinkedIN

  • thnx!! It is indeed an option to drop the index, add the data and build the indexes again.

    However: there are about 100.000.000 rows of statistics, so rebuilding the index will cost more than an hour. And during this time the tables are not accesible, if the insert of data is done at 5AM this won't be a problem, but sometimes the system is done at 8AM, and at this time the customers are busy with using the data....

    What if i set the fillfactor to somewhere around 70?? In that way we can add some data before the index gets fragmented. And if we rebuild de index every weekend it gets back to the 70% filling so it can handle some inserts for the next week. Is this a good idea ?

  • Wandrag (10/23/2008)


    Is the ID field in Stats table unique?

    Rather create the clustered index on the ID field, and a non-clustered index on (IDItem,Date).

    You can re-organize the non-clustered index online daily (if needs be).

    Yes, the ID is unique, and currently this IS the clusterd index

    However: alle queries are done on iditem, date so the clusterd index on ID is not very usefull. Until now we did indeed have an non-clustered index on IDItem, Date. But the queryplan shows that 75% is used for Key lookups (and 75% of 10 seconds is quite a lot). A nonclusterd, covered index would make no sence because in real world there is not only impressions and clicks, but somewhere around 10 columns which are all queried. So the only way to prevent these key lookups is a better clustered index.

  • If you really want to query fast put an clustered index on the columns you want to query on. If you want your clustered index to be nice and healthy and performing the way you want it is nescesary to keep it healthy by keeping it unfragmented by eg. rebuilding it. Playing around with the fillfactor could be a good solution for you, i´d say make some calculations to find the correct fragmentation level.

    Keep in mind that for rebuilding your index online Enterprise Edition is needed. And have you thought about table partitioning? With table partitioning rebuilding your clustered index at night might be a solution. (rebuilding the index won't take up to an hour anymore)

    @Wandrag i disagree with you by putting the clustered index on the column ID. Sander is not quering ID, so a clustered index (on IDItem and Date) is way faster than a normal index.

    ------------------------------------------------------------
    Application Architect and Developer @ Traffic4u
    Visit my profile on LinkedIN

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply