Very large tables w/o PKs, clustered indexes

  • Hi.

    My client has a very large table without a PK and without a clustered index (CI).  The only thing they've got in there are six nonclustered indexes (NCI) on searched fields.  Needless to say, performance is substandard.

    The table has the same structure as the data file it is populated with.  They drop the NCIs, import the data, and then rebuild the NCIs.

    To (potentially dramatically) increase performance, my current thought is to merely add an IDENTITY(1,1) PK as and use this column as the clustered index.

    Any thoughts on this?  Just add that column as PK/CI?  There are three existing fields that taken as a composite index could be used as a unique CI, but it would be a lot wider than an INT PK/CI. 

    When implementing, would it be better to create a new table and move the data over (and then build the indexes) or just add the column to the existing table and create the CI and then rebuild all the NCIs?

    Thoughts?  (and thanks)

  • What uniquely identifies the rows??

  • If you are adding an IDENTITY field to an existing table this suggests that the table doesn't already have a primary key and doesn't link to anything. It may be linked to, but not from.

    I wouldn't waste a clustered index on a false identity in this case.

    Their import procedure suggests static data so why not put your clustered index on a field which is commonly used in the WHERE clause of queries?

    For example, if there is some sort of entry date field such as OrderDate then this may be a suitable candidate simply because most queries will have some form of date range qualifier and they will be in natural sequence order.

    Clustered indexes are brilliant at range queries.

  • Unfortunately, there's nothing that uniquely identifies the rows within the current data which is why I was thinking of adding a PK/CI.  It's phone-call data: so originally I was thinking that I could use the number-called and a datestamp as a unique composite CI, but the date portion is just a datestamp, not a true timestamp.

    Would it be worth doing a nonunique composite clustered index on datestamp, number-called (in that order)?  Or just add the PK/CI?  Use just the datestamp?

    I think their NCIs are there for searches but again, there's nothing unique about the data within these NCI columns.

  • If you want to increase performance, creating an index that will never be used makes no sense.

    I would replace one of the non clustered indexes by a clustered index. But again, if there are a lot of updates on the table, sometimes a non clustered index can be better. If the table is (more or less) read-only, you could use a fillfactor of 90 up to 100 %, ...

    I would first start a profiler trace to identify the type of queries that use this table. You can only try to improve performance by adding indexes if you know how the data is used ....

     

    Bert

  • That makes sense.

    I think I'll convert the NCIs on the call-date and subscriber-number into a composite CI on call-date, subscriber-number.  And since the data is read-only (and the indexes are dropped/recreated between data loads), a fill factor of 100% makes sense.

    This'll store the data in more or less chronological order by month.

    Thanks!

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply