Cluster That Index!

Question

Cluster That Index!

Chris Hedgate

One Orange Chip

Points: 25041
More actions
March 30, 2003 at 12:00 am

#159784

Comments posted to this topic are about the content posted at http://www.sqlservercentral.com/columnists/chedgate/clusterthatindex.asp
--
Chris Hedgate http://www.hedgate.net/
Contributor to the Best of SQL Server Central volumes
Articles: http://www.sqlservercentral.com/columnists/chedgate/

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

virasol SSC-Addicted Points: 488 More actions · Answer 1

Thanks for the great article.

Btw: the link to script 1 is broken, there is a space in '.t xt' and the URL for script2.txt does not appear to exist... probably because script2 seems to be included at the bottom of script1.txt

Edited by - virasol on 03/30/2003 3:49:51 PM

Edited by - virasol on 03/30/2003 3:51:00 PM

Chris Hedgate One Orange Chip Points: 25041 More actions · Answer 2

quote:
Thanks for the great article.
Btw: the link to script 1 is broken, there is a space in '.t xt' and the URL for script2.txt does not appear to exist... probably because script2 seems to be included at the bottom of script1.txt

Thanks. I've notified Andy about the broken links, hopefully they'll be fixed later today.

--

Chris Hedgate @ Apptus Technologies (http://www.apptus.se)

http://www.sql.nu

--
Chris Hedgate http://www.hedgate.net/
Contributor to the Best of SQL Server Central volumes
Articles: http://www.sqlservercentral.com/columnists/chedgate/

sushila SSC-Dedicated Points: 35293 More actions · Answer 3

Christoffer,

I really enjoyed reading this article - it was very well written, everything explained in simple but comprehensive terms - by the end of it all I could almost hear the grunts and groans of a non-clustered index!

However, I do have one question about the suggestion of dropping an ID column as clustered so we can actually use a column that will come in more handy and make sense to use as clustered - generally speaking - indexes are known to improve performance - but at the same time, if we have too many indexes on a table, it would also slow it down as each index means more overheads on inserts, updates and deletes! How does this fit in with the clustered/non-clustered pros and cons ?!

**ASCII stupid question, get a stupid ANSI !!!**

jgoodwin SSChasing Mays Points: 648 More actions · Answer 4

Great article. One disadvantage to a clustered index that I hear a lot of people claim is disk space. How much disk space does a clustered index actually take up compared to the table it is indexing?

malman SSC Rookie Points: 40 More actions · Answer 5

A very useful article. I would just like to comment on the area of Page splits that was discussed. To get around the page split problem to a degree use the FILL FACTOR and PAD INDEX to set the initial level of data in the pages or when rebuilding. Levels are from 0-100%. Depending on whether the table is heavy on reads or writes as to the% of the page that is initially filled.

I was unaware that heaps could not be fragmented and the explanation on how SQL Server uses forward pointers was very interesting. Thanks for the article

Mark

Chris Hedgate One Orange Chip Points: 25041 More actions · Answer 6

quote:
I really enjoyed reading this article - it was very well written, everything explained in simple but comprehensive terms - by the end of it all I could almost hear the grunts and groans of a non-clustered index!

Thanks. Note however that I am not saying that non-clustered indexes are bad in any way, just that I think you should always have a clustered index. Plus, if you need more, some non-clustered indexes.

quote:
However, I do have one question about the suggestion of dropping an ID column as clustered so we can actually use a column that will come in more handy and make sense to use as clustered - generally speaking - indexes are known to improve performance - but at the same time, if we have too many indexes on a table, it would also slow it down as each index means more overheads on inserts, updates and deletes! How does this fit in with the clustered/non-clustered pros and cons ?!

Well, if you have a table with say one incrementing int column 'id' and one varchar column 'name', then you are probably going to need an index on both these columns, right? The id column is primary key so it will be indexed anyway, and the name is probably a column that you will use as the search argument in where clauses, so you'll need an index for that as well.

In general I would recommend that if you're just going to have one index on a table, make it a clustered one. As I said, although a clustered index really shows it's strength in range queries, it still like a non-clustered index for 'normal' queries. So it won't hurt performance to have a clustered index as opposed to a non-clustered index as the only index on a table. The problems of not having a clustered index that I described should also show that you should choose a clustered index.

--

Chris Hedgate @ Apptus Technologies (http://www.apptus.se)

http://www.sql.nu

--
Chris Hedgate http://www.hedgate.net/
Contributor to the Best of SQL Server Central volumes
Articles: http://www.sqlservercentral.com/columnists/chedgate/

Chris Hedgate One Orange Chip Points: 25041 More actions · Answer 7

quote:
Great article. One disadvantage to a clustered index that I hear a lot of people claim is disk space. How much disk space does a clustered index actually take up compared to the table it is indexing?

Thanks for the compliments and for commenting. First of all, I'd say that if I had to choose between extra disk space or the problems described in the article, I'd choose disk space any time. Now, to answer your question, I'd say that extra disk space for a clustered index is normally not going to be that much of an issue.

Remember, the leaves are the actual data, even if you drop the clustered index these pages will still be (more or less) the same. So it it only the levels of the index tree above the leaves that are interesting. How much space they use of course depends on the data in your table, how much data you have and how wide your index key is. Since you need one index row for every data page (i.e. the leaves of the clustered index), this means that to determine the number of index pages at the level above the leaf level you can use this formula:

number of pages devided by (8096 bytes per index page divided by the index key row size)

This gives us the number of index pages at the level above the leaf level. You can then take this number of index pages and use that as the first part of the formula (number of pages) to determine the number of index pages needed at the next level. When you end up with just a single page you've found the root level.

In Inside SQL Server 2000 Kalen Delaney has an example of a table with 10.000 data pages (remember each page contains 8KB of data), with a clustered index on a fixed-length character column of 5 bytes. With overhead this makes each index key row 12 bytes.

10000 / (8096 / 12) = 15

This means that at the level above the leaves 15 index pages are necessary. The index rows for these 15 pages can then be contained on a single page at the level above, so that is the root page. So, the extra size for this table with a clustered index on the char(5) column is 16/10000, i.e. less than 1%. This number (1% extra size for a clustered index) is normally a good estimate, normally even a bit high as shown in the example. As always you must remember to keep index key size as small as possible of course.

PS. Note that fillfactor and padindex configurations are not taken into account here. DS.

--

Chris Hedgate @ Apptus Technologies (http://www.apptus.se)

http://www.sql.nu

--
Chris Hedgate http://www.hedgate.net/
Contributor to the Best of SQL Server Central volumes
Articles: http://www.sqlservercentral.com/columnists/chedgate/

Chris Hedgate One Orange Chip Points: 25041 More actions · Answer 8

quote:
A very useful article. I would just like to comment on the area of Page splits that was discussed. To get around the page split problem to a degree use the FILL FACTOR and PAD INDEX to set the initial level of data in the pages or when rebuilding. Levels are from 0-100%. Depending on whether the table is heavy on reads or writes as to the% of the page that is initially filled.

Thanks Mark, good comments on fillfactor and padindex. I explicitly didn't go into them in the article as it would have taken too much focus from the article. They could of course be discussed in a follow-up article.

--

Chris Hedgate @ Apptus Technologies (http://www.apptus.se)

http://www.sql.nu

--
Chris Hedgate http://www.hedgate.net/
Contributor to the Best of SQL Server Central volumes
Articles: http://www.sqlservercentral.com/columnists/chedgate/

Razvan Socol SSCarpal Tunnel Points: 4968 More actions · Answer 9

quote:
Great article. One disadvantage to a clustered index that I hear a lot of people claim is disk space. How much disk space does a clustered index actually take up compared to the table it is indexing?

Jgoodwin,

from Chris Hedgate's response we should notice that the extra space needed for the clustered index is highly dependent on the key size. For the table with 10000 data pages, if the key size was 5, only 16 pages (0.2%) were needed; but if the key size were 1000, it would take about 1430 pages for the index (that's 15% more space): 8096/1007=8, 10000/8 + 10000/8/8 + 10000/8/8/8 + ... + 1 = 1430. And this is only for the clustered index itself, but the other non-clustered indexes will need extra space too.

Conclusion: you SHOULD use a clustered index, but the key for the index must be small (less than 100 bytes, ideally 4 or 8 bytes).

Razvan

Mike Dougherty Ten Centuries Points: 1112 More actions · Answer 10

I usually change SQL2000 default clustered index on the primary key to the foreign key of the parent table. We are usually selecting children from a parent-child relationship. By having the child clustered on the parentID, the matching child records for a given parent are located sequentially on the index page. This makes a huge difference if the children are inserted far apart in time (where they would otherwise have ended up on many different index pages) We have not been concerned about page splits since inserts are done during nightly batch processing while lookups are done in real-time throughout the day. A small extra overhead during an unattended process to secure a faster user experience is an obvious tradeoff.

I agree with the author: always have a clustered index. The real decision is which index should be clustered.