Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


An Indexing Discussion


An Indexing Discussion

Author
Message
Bobby Glover
Bobby Glover
SSC-Enthusiastic
SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)SSC-Enthusiastic (185 reputation)

Group: General Forum Members
Points: 185 Visits: 751
Don't have a problem as such apart from a poorly indexed db. Thought I would open up a discussion and get views on the following.

I have a couple of large tables 7m rows each. (NOT DESIGNED BY ME)Each table has a PK Non-Clustered and a variety of other Non-clustered indexes.

Both of these tables do not show up in my missing indexes report from the DMV. The general rule is to have a CLUSTERED PK on all tables especially large ones.

This is my point and lets see what the view is out there.

We should try and avoid SQL going to disk, as it seems all current queries are covered by the indexes on the table. SQL should have no need to go to disk it will pick the data up from the NON-CLUSTERED indexes. However if queries are changed or new queries are added to read from these tables which are not covered by the current indexes they will be forced to retrieve data from the heap and all sorts of scanning will be done.

If the PK is turned to a clustered index this would then be written to all the non-clustered indexes making them larger in size and longer to write to. Also if I now change the PK to clustered will it affect the query optimiser as I have changed the unique index and these are used by the optimiser (even if it doesn't actually read from the index) to find the quickiest way to the data.

Question convert to CLUSTERED PK or Not.
Lynn Pettis
Lynn Pettis
SSC-Insane
SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)

Group: General Forum Members
Points: 24280 Visits: 37987
The clustered index does not have to be the primary key. Yes, Microsoft recommends that all tables have a custered index. It is even suggested that this index should be an ever increasing unique value, but it doesn't have to be. Choosing a clustered index should be an exercise in validation of the way the data is used. For instance, it may make more sense to have a clustered index on a date column, even if it isn't unique due to the number of range searches done using that column.

You also need to look at architectual considerations. If you are using logshipping or database mirroring, you may want to be sure that the clustered index is relatively stable with regards to fragmentation as rebuilding the clustered index will have an impact on both of these technologies.

Cool
Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Orlando Colamatteo
Orlando Colamatteo
SSCrazy Eights
SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)

Group: General Forum Members
Points: 8287 Visits: 14368
The tables sound like good candidates for havaing their non-clustered PK converted to a clustered PK.

I'd like to expound on a couple points in your post.

If the PK is turned to a clustered index this would then be written to all the non-clustered indexes making them larger in size and longer to write to.

This is not always true. The first part is, i.e. when a table has a clustered index the clustering key is written to the leaf pages of all non-clustered indexes so that if more data than is in the non-clustered index leaf page is needed to satisfy the query the engine can get back to the data row (i.e. the clustered index entry) to retrieve it. However the latter part is not always true. With heaps the leaf pages of the non-clustered index still need a way to get back to the entire data row (in this case a page in the heap) if more data is needed to satisfy the query, but since there is no clustering key SQL Server stores what is called the RID (Row Identifier) from the row in the heap in the leaf page of the non-clustered index. RIDs are actually 8 bytes so if your PK is less than 8 bytes then you would actually have a space savings and not a space increase in your non-clustered indexes.

Also if I now change the PK to clustered will it affect the query optimiser as I have changed the unique index and these are used by the optimiser (even if it doesn't actually read from the index) to find the quickiest way to the data.

Any query plans that reference the table will be negated by the schema change. This means queries against these tables will need new plans to be compiled the next time they are executed. On a busy system lots of recompiles can hurt until all commonly used plans are again cached. SQL Server will definitely use some different query plan operators (e.g. RID Lookups become Bookmark Lookups, Table Scans become Clustered Index Scans, etc.) after the table has had its PK converted to a clustered PK although I would not expect there to be a drastic change in performance one way or the other for queries that were already covered by the set of indexes prior to the change.

One other benefit of switching to a clustered index will be that you no longer need to worry about forwarded records. In a heap, when an existing row is updated and that row no longer fits on the page where it lives a forwarding record is created to hold that row and a pointer to that new record is placed in the old page. If the same record is updated again later and the same process occurs again you now have to visit 3 pages to get to that row. Long chains of forwarded record pointers can affect performance and there is no great way to alleviate them. In SQL 2008 you would use ALTER TABLE REBUILD but if you find yourself wanting to do that to a heap then you should be considering adding a clustered index.

Run this during off-hours or a maintenance window as it can impact performance and see the forwarded_record_count column:

SELECT  *
FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('dbo.YourHeap'), NULL, NULL, 'DETAILED');



If you decide to add the clustered index I would recommend doing it in this order:

- drop all non-clustered indexes and the PK
- add the PK as clustered
- re-add the rest of the non-clustered indexes

__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
GilaMonster
GilaMonster
SSC-Forever
SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)

Group: General Forum Members
Points: 47415 Visits: 44405
opc.three (9/23/2012)
In a heap, when an existing row is updated and that row no longer fits on the page where it lives a forwarding record is created to hold that row and a pointer to that new record is placed in the old page. If the same record is updated again later and the same process occurs again you now have to visit 3 pages to get to that row. Long chains of forwarded record pointers can affect performance and there is no great way to alleviate them.


Just need to correct one point...

There will never be chains of forwarding pointers. If a forwarded row moves again, the original forwarding pointer is updated to point to the new location, there's not a second or third ... level of redirection. Forwarding pointers are bad enough already without multiple levels of forwarding.


Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass


Orlando Colamatteo
Orlando Colamatteo
SSCrazy Eights
SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)

Group: General Forum Members
Points: 8287 Visits: 14368
GilaMonster (9/23/2012)
opc.three (9/23/2012)
In a heap, when an existing row is updated and that row no longer fits on the page where it lives a forwarding record is created to hold that row and a pointer to that new record is placed in the old page. If the same record is updated again later and the same process occurs again you now have to visit 3 pages to get to that row. Long chains of forwarded record pointers can affect performance and there is no great way to alleviate them.


Just need to correct one point...

There will never be chains of forwarding pointers. If a forwarded row moves again, the original forwarding pointer is updated to point to the new location, there's not a second or third ... level of redirection. Forwarding pointers are bad enough already without multiple levels of forwarding.

Thanks for the pointing that out. It's good to know it doesn't lengthen the chain. All the docs I have read thus far would simply say something to the effect of "new record needed, pointer is added." I just found this post that plainly says that in the beginning but unquivocally confirms what you said later:

Forwarding and forwarded records, and the back-pointer size by Paul Randal

In a heap it is possible to get forwarding and forwarded records. They occur when a record in a heap expands such that it no longer fits on the page it currently resides on. In this case, the record is moved to a new page, and a small forwarding record is left in the original location. The forwarding record points to the new location of the record, which is known as a forwarded record. This is done as a performance optimization so that all the nonclustered indexes on the heap do not have to be altered with the new location of the heap record.


The above makes it sound like chains can result in the case of a second split. Then later:

Ok, really back to the point of the post. What happens if the original record grows again and has to move again? Does it leave ANOTHER forwarding record when it moves to the second new location - creating a chain of forwarding records?

The answer is no. The *original* forwarding record is updated with the new location of the forwarded record. This can only be done if the forwarded record points *back* to the forwarding record - which it does.


__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Paul White
Paul White
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10352 Visits: 11350
Bobby Glover (9/21/2012)
I have a couple of large tables 7m rows each. Each table has a PK Non-Clustered and a variety of other Non-clustered indexes [...] Question convert to CLUSTERED PK or Not.

It's really impossible to say based on the information given. It depends on the width of the table and existing indexes, whether the existing PK is a good candidate for clustering (narrow, stable, monotonically increasing), the typical access patterns (singleton lookups versus range scans), level and type of update/delete activity and more.

Do you have a problem right now that you are looking to solve, or do you just wonder if making the clustered table would improve something (and what might that something be)?



Paul White
SQLPerformance.com
SQLblog.com
@SQL_Kiwi
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search