Introduction to Indexes: Part 2 – The clustered index

Question

Post reply

Introduction to Indexes: Part 2 – The clustered index

Viewing 15 posts - 31 through 45 (of 123 total)

You must be logged in to reply to this topic. Login to reply

bob 54859 SSC Enthusiast Points: 108 More actions · Answer 1

Ok,

So... here is a question we have debated in our shop for quite a while.

The question is...

Given a table where they is no good candidate for a clustered index, for example we use client generated GUIDs for primary keys and most of the data does not meet the for criteria you outline above... Which approach should we take:

1. Keep the table as a heap.

2. Create an identity field on the table which is used for no other reason that to create a clustered index upon it.

I believe Kim Tripp recommended number 2 in a podcast iirc. however, we have a dev here who is adamant that this is a mistake.

I'm am scared that the answer will be "it depends".

HELP!

Thanks,

BOb

hlam-1032421 SSC Rookie Points: 42 More actions · Answer 2

Some considerations:

1.Size of table and growth rate.

2.Primary needed for speed of replication?

3.OLTP vs. Data Warehouse like

4.3rd normal form vs. Demoralization?

5.Contention with other servers and process?

Sometimes you may want to ask:

Why and when do I need to swing this thing?

Instead of

How do I swing this thing?

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 3

You really do need to know your data access patterns in order to be able to answer the question you pose. One big question is how many forwarded records are there. If lots, then a surrogate CI could be helpful.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Nicole Bowman SSCommitted Points: 1555 More actions · Answer 4

Thanks for another very clear article Gail. I am looking forward to part 3!

Cheers,

Nicole Bowman

Nothing is forever.

bob 54859 SSC Enthusiast Points: 108 More actions · Answer 5

hlam-1032421 (11/11/2009)
Some considerations:
1.Size of table and growth rate.
2.Primary needed for speed of replication?
3.OLTP vs. Data Warehouse like
4.3rd normal form vs. Demoralization?
5.Contention with other servers and process?
Sometimes you may want to ask:
Why and when do I need to swing this thing?
Instead of
How do I swing this thing?

So, in my case we are talking about a large table which is constantly growing. For example the GL transaction table in a financial app. Insert perf is very important, but so is select perf. This is an OLTP app with no replication. It is mostly normalized (till it hurts) and denormalized (till it works).

I agree, asking WHY and When to be able to apply to certain situations.

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 6

You made a very important piece of information public there: GL transactions. In my experience with financial data GL rows are almost never updated/deleted. That means record forwarding is not an issue. Also most of the access is point lookups so NC indexes against a heap are fine. The one possibility could be if you do extensive reporting by date period in which case you may consider transaction date as the clustered index - but I bet this would not be best unless you do a lot of reporting by date range. And if you do you may want a datamart for reporting anyway (if you don't have one already).

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

swain_sabita Grasshopper Points: 11 More actions · Answer 7

<a href="http://www.atoncer.com/trading-cards/ccg/yugioh.htm"> yugioh cards </a>

<a href="http://www.atoncer.com/trading-cards/ccg/ccg.htm"> ccg collecting </a>

<a href="http://www.atoncer.com/comics/platinum-age/main.htm"> platinum age comics </a>

<a href="http://www.atoncer.com/comics/magazines/main.htm"> comic magazines </a>

<a href="http://www.atoncer.com/comics/comic-art/main.htm"> comic art collecting </a>

Anipaul SSC-Insane Points: 24681 More actions · Answer 8

Very well explained. Very few articles I found that explained the basics so well. This articles really make this site unique...

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 9

hlam-1032421 (11/11/2009)
With comments like this I run for the exit:
… good clustering. One school says to put the clustered index on the column or set of columns that would be most useful for queries, either frequently run queries or ones doing queries of large ranges of data. The other school says to use the clustered index primarily to organise the table and leave data access to the nonclustered indexes.I hold to the second school…

Cheers. Nice meeting you. 😉

The difference between hierarchy database and relational database should tell you why no one should be in that school #2.

Are you referring to the relational requirement that the user doesn't need to concern themselves with storage details? That may well be fine in theory, but unfortunately reality isn't that clean. There is a physical layout to a table and there are good and bad ways to do that.

Since multiple nonclustered indexes can be created, I'd rather use those (mostly) for queries and select the cluster so that my nonclustered indexes aren't wider than they need to be, that rows aren't moving around, that pages aren't splitting all over the place.

If you hold to the other view (that the cluster should be selected for data access), why don't you write an article discussing (and contrasting) the two views? It would be an interesting read.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 10

TheSQLGuru (11/11/2009)
You made a very important piece of information public there: GL transactions. In my experience with financial data GL rows are almost never updated/deleted. That means record forwarding is not an issue.

If that is the case, the heap may well be fine. What I'd suggest (if you have the time and space) is to mock up this table (and related tables) on a dev box, do some tests of insert and select performance with it as a heap (representative of the workload in your production environment), then add the identity column, make it the cluster and repeat said tests.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

INNOCENT GUMBO Old Hand Points: 378 More actions · Answer 11

TheSQLGuru

I am not quite sure about many Clustered Indexes here. Since the concept tells us that they are part of the table and determine the Ordering of data,then with 4 or 5 of them how physically data will be aligned?

2. We all know that a clustered index and table strictly reside in the same file and filegroup, some data "clouding" may be found in one FileGroup.

Gail help me if this should be possible. TheSQLGuru, maybe you can shed some light there for me.

Otherwise Gail the article is very fantastic, Easy to follow especially on the issue os SARGable.

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 12

INNOCENT GUMBO (11/12/2009)
I am not quite sure about many Clustered Indexes here. Since the concept tells us that they are part of the table and determine the Ordering of data,then with 4 or 5 of them how physically data will be aligned?

Precisely why there can be only one

2. We all know that a clustered index and table strictly reside in the same file and filegroup, some data "clouding" may be found in one FileGroup.

Well, the clustered index is the table. Whichever filegroup the clustered index is created on is where the table is.

Clouding?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

INNOCENT GUMBO Old Hand Points: 378 More actions · Answer 13

Thank you Gail for that because I read somewhere where TheSQLGuru was "asking" for more Clustered Indexes on a single table

NHM Old Hand Points: 365 More actions · Answer 14

Great article, thanx.

Assuming the data to load is sorted, can anyone tell me which is quicker when loading a table with a clustered index; 1) With the index defined and in place or 2) Create the clustered index after the data is loaded?

Thanx,

John Colby SSC Enthusiast Points: 155 More actions · Answer 15

I routinely work with tables where the entire table is a PK, name and address and five other fields. these are just mailing lists, but HUGE - to me anyway - 50 million names is typical.

The tables are address validated monthly, other than that they are stable, no insertions or deletions ever, only the monthly update. The monthly address validation updates about 1 to 2% of the records monthly.

I had been (up until reading this article) creating a clustered index that was also a cover index, i.e. the index had every single field. Obviously a WIDE index, however I do not need any other index in this case. Because there is never an insertion or deletion, that discussion goes away.

So is there any reason (in THIS case) NOT to do things this way?

Also, given the lack of future inserts, is there any reason NOT to create the index with the Zip5 / Zip4 as the leading fields in the index. I assume that would make the data physically sort on zip instead of physically sort on PK. Again I join on PK but use where and order by on zip.

Or should I still create the clustered index on PK (Unique) and then add a cover index of every field (they are all used in various processing).

Gail, I have to agree - "run for the doors" comments are unhelpful and in poor taste. "Expound in detail or write your own article about why".

jwcolby54