Blog Post

Webcast Follow-Up: Introduction to Clustered Indexes and Heaps

,

WebcamLast month, I presented a session for Pragmatic Works Training on the T’s titled Introduction to Clustered Indexes and Heaps.  I’d meant to get the Q&A and session materials posted earlier, but somehow getting this post out got a bit “corrupted”, but at least it is finally coming out.  In the session, I discussed the differences between clustered indexes and heaps and the value of one over the other.

The abstract for the sessions was:

It’s the age old question, do I put a clustered index on the table or leave it as a heap? While a simple question, this can have serious impacts on the performance of the table in queries. In this session, we’ll review the basics of both clustered indexes and heaps; identifying key differences and cases where you might choose one over another.

Webcast Recording & Materials

If you missed the webcast, you’ll can watch the recording here.  For those that want to download the slide deck, you can click the following link –

Introduction to Clustered Indexes and Heaps.

Also, if you just want to flip through the slide deck again, here you go.

Session Q&A

Can a table with a foreign key have a clustered index?

Yes, a table that will act as a foreign key in another table should have a clustered index on it’s primary key.  The foreign key columns in tables, should almost always be indexed, usually with a non-clustered index.  Sometimes, depending on the database design, the foreign key column may be the same column that is clustered.

If only non-clustered index in a table, it is a heap also?

That is correct, it’s the choice between clustered indexes and heaps.  Having a non-clustered index on the heap does not affect the heap itself.

Clustered index always unique? but non-clustered index may not?

At the most base level, clustered indexes are always unique.  If you don’t specifically enforce this, SQL Server adds a uniquifier value to the row.  Non-clustered indexes do not need to be unique.

As a DBA, how can I find good column to create a clustered index on a table given that I don’t know my workload on the table.

I would start with the primary key on the table.  That’s often one of the best places to start.  From there it’s a matter of how you are using the table, or plan to.  A great resource would be the session I did last year, Choosing Your Clustered Index.

As we know Cluster index is a default setup of PK – primary keys. When is recommended to leave the PK index cluster and when we should remove the cluster option?

When the table is a child to another table, like a detail table related to a header record table.  Then the PK on the table may not be the best clustered index choice.  As a similar answer, check out Choosing Your Clustered Index for some more information on these scenarios.

Can you please discuss the difference between clustered and non-clustered indexes?

Non-clustered indexes represent secondary access patterns for accessing your data, where clustered indexes are the primary path for data access.  The webcast How Do Non-Clustered Indexes Improve Performance? can provide some additional information that would be useful.

Could you please provide a good book we could use as reference/review?

The best book out there on indexes would be my book for SQL Server 2012 titled, Expert Performance Indexing for SQL Server 2012.

I build dimensional data marts here and the fact builds need to do lookups into the dimension tables to find the surrogate key during the staging process to put in the facts. Should a clustered index be put on the business key of the dimension table to look it up faster or should it be on the surrogate key? I would think it needs to be the business key thus going to the record quicker and return the surrogate. Now, I know it should be on the surrogate key in the mart that is used for reporting itself so that the join between the fact and the dimensions would be faster for reporting. For example, the source data has part number ABCD-1234 and a quantity sold. There is a Parts dimension table with a record that has the business key of ABCD -1234 and it has been assigned a surrogate key of 9876. The lookup would search the dimension table for ABCD-1234 and should return 9876 as the surrogate. I would think the clustered index would be on ABCD-1234 so the fact build can find it faster and return 9780.

You should have the clustered indexes on the surrogate keys.  The business keys are primarily used during the ETL process, which probably runs daily.  Where the queries on your data mart run numerous times a day and will be leveraging the surrogate key, and that clustered index, time and again.

If we archive data from main table and delete data then what will be impact on cluster index

Main impact will be that there will be gaps in the b-tree where the deleted rows were previously.  Any unused pages will be deallocated from the clustered index.

Is primary Key constraint on a table variable similar to clustered index on a regular table?

They are different types of objects but can be the same.  The clustered index defines how the b-tree for the index that stored the data will layout the data.  While the primary key enforce the uniqueness on the table and identifies key columns for the optimizer.

Is there any real time scenario, where heap is better than clustered index?

If you plan to load a table and then query the data once or twice.  It often takes longer to import and maintain the clustered index that it would be to just use a clustered index.

Is there a general rule or typical scenarios when would you consider using a non-clustered index, or is there one?

Non-clustered indexes are intended as alternative paths to data on a row.  The clustered index key won’t always be the best way to find every row every time.  For more information, check out - How Do Non-Clustered Indexes Improve Performance?.

What is the advantage/disadvantage table w/o clustered index but has non-cluster index?

If the time to build a clustered index on a table and then query the table exceeds the time to add the non-clustered index and query the table from a heap, then a heap and non-clustered index may be the right way to go.  This typically only will pay off if the table and data will only be queries a few times, like in staging tables.  Over time, a clustered index will eventually provide a better performance profile.

What is the harm in having a wide Clustered Index , say , consisting of 4-5 columns?

Wide clustered indexes provide a problem with space in your database.  With non-clustered indexes, in order to identify the correct clustered index row for look-up operations, all of the columns from the clustered index are included in non-clustered indexes.  Including all of the columns, has a large impact on the space required for an index.

Are heaps always bad?

No, for staging tables and temporary tables they can often make a lot of sense.

What I would like to see in SQLServer is dynamic Indexing. The engine creates, modifies and drops indexes by itself depending on the workload running on it.  Is that a practical thing?

It likely isn’t something we’ll be seeing.  There’s a lot of variable that can come into play with this type of a feature.  Especially on large databases where a dynamic indexing feature could decide to create some significantly large indexes.  While at the same time dropping indexes that are critical, but maybe only used once a week or month.

Is clustered index suitable on a GUID? If I cannot add a identity or I cant choose a natural or surrogate key?

In nearly every case, the answer here is no.  GUIDs provide wide clustered indexes and their very nature helps ensure that your clustered indexes will become fragmented quickly.  There’s nearly always a better choice to any GUID option that is being considered for clustered indexes.

Thanks for Attending

Thanks to those that attended the webcast.  If you have any questions on the webcast or suggestions for improvements, please leave a comment on this post.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating