Can it help to index a field that is part of a multi-field primary key?

  • I have a table with a primary key consisting of a document name string (normally fairly short) and an integer (for uniqueness).

    When I populate the document I want to pull out all the records relating to that document (often 50 or so) and work with them. I am thinking that the process would work faster if I also defined a non-unique index on the document name.

    Would this actually help or is it sufficient that a “covering” primary key already exists?

    Here is the current definition of the table (call it Links):

    CREATE TABLE dbo.Links(

    TemplateName nvarchar(30) NOT NULL,-- name of document template

    FieldSeq smallint NOT NULL,-- seq. no. of field in document,

    -- fairly arbitrary

    FieldString nvarchar(255) NULL,-- text of SQL expression to generate

    -- the value for the field

    DocInsertionPoint nvarchar(30) NULL,-- name of bookmark/cell in document

    -- where value will be placed

    CONSTRAINT PK_Links PRIMARY KEY NONCLUSTERED

    (

    TemplateName ASC,

    FieldSeq ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON PRIMARY

    ) ON PRIMARY

    GO

    Many thanks in advance for enlightenment.

    MarkD

  • You have defined the "TemplateName" column as the first in the composite key (clustered index), so the clustered index will be used if you use the "TemplateName" in the WHERE clause. An additional non-clustered index on "TemplateName" will not be used because it is a duplicate of the current clustered index.

    ** Don't mistake the ‘stupidity of the crowd’ for the ‘wisdom of the group’! **
  • HanShi (10/25/2016)


    You have defined the "TemplateName" column as the first in the composite key (clustered index), so the clustered index will be used if you use the "TemplateName" in the WHERE clause. An additional non-clustered index on "TemplateName" will not be used because it is a duplicate of the current clustered index.

    Ah, but the primary key is non-clustered!

    Should I make it clustered, as I think you imply, especially as the table will be almost exclusively read-only?

    Sorry if I am missing something elementary - I'm pretty new to this.

    MarkD

  • Mark Dalley (10/25/2016)


    HanShi (10/25/2016)


    You have defined the "TemplateName" column as the first in the composite key (clustered index), so the clustered index will be used if you use the "TemplateName" in the WHERE clause. An additional non-clustered index on "TemplateName" will not be used because it is a duplicate of the current clustered index.

    Ah, but the primary key is non-clustered!

    Oh sorry, I overlooked. But it doesn't really matter unless the new non-clustered index will be a covering index for the query and the current primary key isn't.

    To be sure you can look at the query plan before and after the creation of the index you want to add.

    1. execute a (many used) query on the table and generate a query plan

    2. create the additional non-clustered index

    3. re-execute the query from step 1 and generate a query plan

    4. compare the query plan from step 1 and 3 to see if the index usage has changed

    ** Don't mistake the ‘stupidity of the crowd’ for the ‘wisdom of the group’! **
  • Mark, I don't think you should make it clustered.

    Your table, as it is now, is a heap because you don't have a clustered index. As such, you rows are in no particular order. When you create a clustered index, that's the order of the rows in the leaf level, so it's the order of rows in the table. If your clustered index is by TemplateName, any new rows will have to be inserted to be in proper order, which means it could have to go in between 2 existing rows. If the page where those two rows are stored is full, then the page will have to be split to make room for the new row. An age-old analogy is the phone book - if the page is already full and you want to add a new name in the middle of it, you wouldn't shift the rest of the phone book down one row; you'd split the content of the page so half of it stays in place and half of it goes to a new page. The page split takes time and is fully logged, so you don't want to encourage them.

    Also, each NCI (nonclustered index) you create inherits the bytes of the clustering key for each row - it's the pointer to the physical row. With an nvarchar(30), you have 62 bytes - 2 * 30 Unicode characters plus 2 for the length of the string. Add the 2 bytes of the smallint and your clustering key is now 64 bytes wide. This is, in my opinion, way too wide for a clustering key. I prefer integers for my clustering keys - 4 bytes. I've heard the argument that disk space is cheap, but your SAN admin might disagree with you. Skipping the cost of disk space, however, look at the number of reads required to read 1M rows.

    Hope this helps.

  • Thanks Ed, that makes it a good deal clearer.

    The thought behind my original question was, in effect: Where a primary key contains multiple fields, can I use it (or its index) to quickly access those records where the first component field is the same? The answer, fairly clearly from what you and HanShi have said, is Yes, so there is no point in having an extra index for just that field.

    Re the clustered vs. non-clustered aspect though, because (a) I will want to quickly pull out all the records relating to a given document at one time; and (b) this will happen quickest when the records are in the correct physical order; and (c) given your answers, I don't expect to need any more indexes; and finally (d) the table, once populated, will rarely be updated... I am inclined to go for a clustered index. Does this make sense?

    Feel free to shoot holes in any mistakes in my reasoning.

    MarkD

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply