An Indexing Discussion

Question

An Indexing Discussion

Bobby Glover

SSCarpal Tunnel

Points: 4143
More actions
September 21, 2012 at 11:55 am

#268023

Don't have a problem as such apart from a poorly indexed db. Thought I would open up a discussion and get views on the following.
I have a couple of large tables 7m rows each. (NOT DESIGNED BY ME)Each table has a PK Non-Clustered and a variety of other Non-clustered indexes.
Both of these tables do not show up in my missing indexes report from the DMV. The general rule is to have a CLUSTERED PK on all tables especially large ones.
This is my point and lets see what the view is out there.
We should try and avoid SQL going to disk, as it seems all current queries are covered by the indexes on the table. SQL should have no need to go to disk it will pick the data up from the NON-CLUSTERED indexes. However if queries are changed or new queries are added to read from these tables which are not covered by the current indexes they will be forced to retrieve data from the heap and all sorts of scanning will be done.
If the PK is turned to a clustered index this would then be written to all the non-clustered indexes making them larger in size and longer to write to. Also if I now change the PK to clustered will it affect the query optimiser as I have changed the unique index and these are used by the optimiser (even if it doesn't actually read from the index) to find the quickiest way to the data.
Question convert to CLUSTERED PK or Not.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 1

The clustered index does not have to be the primary key. Yes, Microsoft recommends that all tables have a custered index. It is even suggested that this index should be an ever increasing unique value, but it doesn't have to be. Choosing a clustered index should be an exercise in validation of the way the data is used. For instance, it may make more sense to have a clustered index on a date column, even if it isn't unique due to the number of range searches done using that column.

You also need to look at architectual considerations. If you are using logshipping or database mirroring, you may want to be sure that the clustered index is relatively stable with regards to fragmentation as rebuilding the clustered index will have an impact on both of these technologies.

Orlando Colamatteo SSC Guru Points: 182276 More actions · Answer 2

The tables sound like good candidates for havaing their non-clustered PK converted to a clustered PK.

I'd like to expound on a couple points in your post.

If the PK is turned to a clustered index this would then be written to all the non-clustered indexes making them larger in size and longer to write to.

This is not always true. The first part is, i.e. when a table has a clustered index the clustering key is written to the leaf pages of all non-clustered indexes so that if more data than is in the non-clustered index leaf page is needed to satisfy the query the engine can get back to the data row (i.e. the clustered index entry) to retrieve it. However the latter part is not always true. With heaps the leaf pages of the non-clustered index still need a way to get back to the entire data row (in this case a page in the heap) if more data is needed to satisfy the query, but since there is no clustering key SQL Server stores what is called the RID (Row Identifier) from the row in the heap in the leaf page of the non-clustered index. RIDs are actually 8 bytes so if your PK is less than 8 bytes then you would actually have a space savings and not a space increase in your non-clustered indexes.

Also if I now change the PK to clustered will it affect the query optimiser as I have changed the unique index and these are used by the optimiser (even if it doesn't actually read from the index) to find the quickiest way to the data.

Any query plans that reference the table will be negated by the schema change. This means queries against these tables will need new plans to be compiled the next time they are executed. On a busy system lots of recompiles can hurt until all commonly used plans are again cached. SQL Server will definitely use some different query plan operators (e.g. RID Lookups become Bookmark Lookups, Table Scans become Clustered Index Scans, etc.) after the table has had its PK converted to a clustered PK although I would not expect there to be a drastic change in performance one way or the other for queries that were already covered by the set of indexes prior to the change.

One other benefit of switching to a clustered index will be that you no longer need to worry about forwarded records. In a heap, when an existing row is updated and that row no longer fits on the page where it lives a forwarding record is created to hold that row and a pointer to that new record is placed in the old page. If the same record is updated again later and the same process occurs again you now have to visit 3 pages to get to that row. Long chains of forwarded record pointers can affect performance and there is no great way to alleviate them. In SQL 2008 you would use ALTER TABLE REBUILD but if you find yourself wanting to do that to a heap then you should be considering adding a clustered index.

Run this during off-hours or a maintenance window as it can impact performance and see the forwarded_record_count column:

SELECT *

FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('dbo.YourHeap'), NULL, NULL, 'DETAILED');

If you decide to add the clustered index I would recommend doing it in this order:

- drop all non-clustered indexes and the PK

- add the PK as clustered

- re-add the rest of the non-clustered indexes

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 3

opc.three (9/23/2012)
In a heap, when an existing row is updated and that row no longer fits on the page where it lives a forwarding record is created to hold that row and a pointer to that new record is placed in the old page. If the same record is updated again later and the same process occurs again you now have to visit 3 pages to get to that row. Long chains of forwarded record pointers can affect performance and there is no great way to alleviate them.

Just need to correct one point...

There will never be chains of forwarding pointers. If a forwarded row moves again, the original forwarding pointer is updated to point to the new location, there's not a second or third ... level of redirection. Forwarding pointers are bad enough already without multiple levels of forwarding.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Orlando Colamatteo SSC Guru Points: 182276 More actions · Answer 4

GilaMonster (9/23/2012)
opc.three (9/23/2012)
In a heap, when an existing row is updated and that row no longer fits on the page where it lives a forwarding record is created to hold that row and a pointer to that new record is placed in the old page. If the same record is updated again later and the same process occurs again you now have to visit 3 pages to get to that row. Long chains of forwarded record pointers can affect performance and there is no great way to alleviate them.
Just need to correct one point...
There will never be chains of forwarding pointers. If a forwarded row moves again, the original forwarding pointer is updated to point to the new location, there's not a second or third ... level of redirection. Forwarding pointers are bad enough already without multiple levels of forwarding.

Thanks for the pointing that out. It's good to know it doesn't lengthen the chain. All the docs I have read thus far would simply say something to the effect of "new record needed, pointer is added." I just found this post that plainly says that in the beginning but unquivocally confirms what you said later:

Forwarding and forwarded records, and the back-pointer size by Paul Randal

In a heap it is possible to get forwarding and forwarded records. They occur when a record in a heap expands such that it no longer fits on the page it currently resides on. In this case, the record is moved to a new page, and a small forwarding record is left in the original location. The forwarding record points to the new location of the record, which is known as a forwarded record. This is done as a performance optimization so that all the nonclustered indexes on the heap do not have to be altered with the new location of the heap record.

The above makes it sound like chains can result in the case of a second split. Then later:

Ok, really back to the point of the post. What happens if the original record grows again and has to move again? Does it leave ANOTHER forwarding record when it moves to the second new location - creating a chain of forwarding records?
The answer is no. The *original* forwarding record is updated with the new location of the forwarded record. This can only be done if the forwarded record points *back* to the forwarding record - which it does.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Paul White SSC Guru Points: 150467 More actions · Answer 5

Bobby Glover (9/21/2012)
I have a couple of large tables 7m rows each. Each table has a PK Non-Clustered and a variety of other Non-clustered indexes [...] Question convert to CLUSTERED PK or Not.

It's really impossible to say based on the information given. It depends on the width of the table and existing indexes, whether the existing PK is a good candidate for clustering (narrow, stable, monotonically increasing), the typical access patterns (singleton lookups versus range scans), level and type of update/delete activity and more.

Do you have a problem right now that you are looking to solve, or do you just wonder if making the clustered table would improve something (and what might that something be)?

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi