SQL Server Clustered Tables vs Heap Tables

By:   |   Comments (13)   |   Related: 1 | 2 | 3 | > Indexing


Problem

One very important design aspect when creating a new table is the decision to create or not create a clustered index.  A table that does not have a clustered index is referred to as a HEAP and a table that has a clustered index is referred to as a clustered table. 

A clustered table provides a few benefits over a heap such as controlling how the data is sorted and stored, the ability to use the index to find rows quickly and the ability to reorganize the data by rebuilding the clustered index.  Depending on INSERT, UPDATE and DELETE activity against your tables, your physical data can become very fragmented.  This fragmentation can lead to wasted space in your database, because of partly full pages as well as the need to read more pages in order to satisfy the query.  So what can be done?

Solution

The primary issue that we want to address is the fragmentation that occurs with normal database activity.  Depending on whether your table has a clustered index or not will determine if you can easily address the fragmentation problem down to the physical data level.  Because a heap or a clustered index determines the physical storage of your table data, there can only be one of these per table.  So a table can either have one heap or one clustered index.

Let's take a look at the differences between a heap and clustered indexed table.

SQL Server HEAP

  • Data is not stored in any particular order
  • Specific data can not be retrieved quickly, unless there are also non-clustered indexes
  • Data pages are not linked, so sequential access needs to refer back to the index allocation map (IAM) pages
  • Since there is no clustered index, additional time is not needed to maintain the index
  • Since there is no clustered index, there is not the need for additional space to store the clustered index tree
  • These tables have a index_id value of 0 in the sys.indexes catalog view
IAM pages retrieve data in a single partition heap

source: SQL Server books online

SQL Server Clustered Indexed Table

  • Data is stored based on the clustered index key
  • Data can be retrieved quickly based on the clustered index key, if the query uses the indexed columns
  • Data pages are linked for faster sequential access
  • Additional time is needed to maintain the clustered index based on INSERT, UPDATE and DELETE activity
  • Additional space is needed to store the clustered index tree
  • These tables have a index_id value of 1 in the sys.indexes catalog view
Levels of a clustered index

source: SQL Server books online

So based on the above, you can see there are a few fundamental differences on when a table has a clustered index or does not.

SQL Server Fragmentation

A problem that occurs on all tables is the issue of becoming fragmented.  Depending on the activity performed such as DELETEs, INSERTs and UPDATEs, your heap tables and clustered tables can become fragmented.  A lot of this depends on the activity as well as the key values that are used for your clustered index. 

  • If your heap table only has INSERTs occurring, your table will not become fragmented, since only new data is written to new data pages.
  • If your clustered index key is sequential, such as an identity value, and you only have INSERTs, again this will not become fragmented since the new data is always written at the end of the clustered index.
  • But if your table is either a heap or a clustered table and there are a lot of INSERTs, UPDATEs and DELETEs the data pages can become fragmented.  This results in wasted space as well as additional data pages to read to satisfy queries. 
    • When a table is created as a heap, SQL Server does not force where the new data pages are written.  Whenever new data is written, this data is always written at the end of the table or on an available page that is assigned to this table.  When data is deleted the space becomes free in the data pages and these pages may or may not be reused when new data is written. Also, when you delete data from a heap SQL Server does not always deallocate the space for the free pages.
    • With a clustered index, depending on the index key, new records may be written to existing pages where free space exists or there may be need to split a page into multiple pages in order to insert the new data.  When deletes occur the same issue occurs as with a heap, but this free space may be used again if data needs to be inserted into one of the existing pages that has free space.
    • So based on this, your heap table could become more fragmented then your clustered table.

Identifying SQL Server Fragmentation

To identify whether your clustered table or heap table is fragmented you need to either run DBCC SHOWCONTIG (2000 or 2005) or use the DMV sys.dm_db_index_physical_stats.  These commands will give you insight into the fragmentation problems that may exist in your table.  For further information take at these Index Fragmentation and Maintenance Tips.

Resolving SQL Server Fragmentation

Below we will look at some ways to resolve fragmentation.

SQL Server Clustered Indexed Tables

Resolving the fragmentation for a clustered table can be done easily by rebuilding or reorganizing your clustered index.  Take a look a this tip, Script to Manage SQL Server Rebuilds and Reorganize for Index Fragmentation.

SQL Server Heap Tables

For heap tables this is not as easy.  The following are different options you can take to resolve the fragmentation:

  1. Create a clustered index and then drop the clustered index if you don't want to keep the clustered index.
  2. Create a new table and insert data from the heap table into the new table based on some sort order
  3. Export the data, truncate the table and import the data back into the table

Additional Info

When creating a new table via SQL Server Management Studio when you specify a primary key for the table, SSMS automatically makes this a clustered index, but this can be overridden and it doesn't always make sense that the primary key is the clustered index.  Take a look at this tip Finding a better candidate for your SQL Server clustered indexes for more information.  When creating a new table via scripts you need to identify that the table be created with a clustered index.  So based on this, most of your tables are going to have a clustered index, because of the primary key, but if you do not specify a primary key or build a clustered index the data will be stored as a heap.

Next Steps
  • Keeping table and index fragmentation under control is a key process to maintain optimum performance out of your database.  Now that you can see how a heap vs. a clustered table differs and what needs to be done to address the fragmentation, take a look at your table structures to see if you need to address these issues.
  • Even if you are doing a complete index rebuild on all of your tables once a week or whenever, your heap tables will never be de-fragmented, so you will need to come up with another strategy to handle fragmentation issues with heap tables.
  • Based on the above, it seems that all tables should have a clustered index. For the most part this is the case, but there may be some reason that you do not want to have a clustered index.  One reason could be a table that only has INSERTs, such as a log file.  But if in doubt, it would be better to have a clustered index then to not have one.


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Greg Robidoux Greg Robidoux is the President and founder of Edgewood Solutions, a technology services company delivering services and solutions for Microsoft SQL Server. He is also one of the co-founders of MSSQLTips.com. Greg has been working with SQL Server since 1999, has authored numerous database-related articles, and delivered several presentations related to SQL Server. Before SQL Server, he worked on many data platforms such as DB2, Oracle, Sybase, and Informix.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Friday, December 29, 2017 - 11:39:00 PM - Ragav Back To Top (74590)

 Thanks for your article. It is very helpful and easy to understand. You have explained the reasoning behind the concepts very well.

 


Tuesday, January 13, 2015 - 10:52:35 AM - Greg Robidoux Back To Top (35921)

Hi Emmanuel,

The growth of the log is caused by large transactions in conjunction with other processes that may not clear the log such as replication, a database in FULL recovery model and no transaction log backups are being performed, database mirroring is setup but the databases are not synchronizing., etc.

If you have large tables and therefore large indexes when rebuilding the index it will take up a lot of space in the transaction log, so this would need to be monitored. 

As far as your steps below go I would rebuild and reorg indexes where needed.  There are several scripts on MSSQLTips.com that you can find as well as plenty others on the web to only do index maintenance when needed.

As far as shrinking goes I would only do this if the log gets way out of control.  The best thing to do is to run your normal transactions and maintenance and see how large the log file gets.  This should then be your baseline to how large your log needs to be to perform all of the operations for your database.  I would advise against constantly having the log grow and then shrinking it again.  This should only be done if an anomaly occurs and therefore this is needed to keep things in check.

Hope this answers your question.


Tuesday, January 13, 2015 - 8:45:36 AM - Emmanuel Opara Back To Top (35916)

Hi Greg,

Before I proceed, God bless you for the nice article.

I resumed a new job recently and an issue was reported to me about the database transaction log file(.ldf) expanding very fast. This also in a way affects the system memory, and delays transaction operations at the particular hour when a whole lot of transactions are performed.

I discovered they do shrink the database whenever this occurs so I began thinking of a permanent solution.

What came to my mind at first was:

1. index rebuilding/ reorganizing.

2. "Is database shrinking a good idea?"

So while searching for articles on that online, I came across yours. After going through your comprehensive article, I decided a periodic shrink and index reorganizing while I also advice for an increase in the system RAM and Disc space.

Please is this a good idea (Expecially for an Enterprise app)?

 

 


Wednesday, July 3, 2013 - 11:18:11 AM - Greg Robidoux Back To Top (25701)

@Cameron - the Clustered index contains all of the data for a table including the non-indexed columns. So by specifiying columns for the Clustered index, when the index is created the data will be stored in that order.  As new data is added SQL Server needs to add the rows to existing pages or get new data pages which introduces fragmentation and the data is not physically stored together.

 


Wednesday, July 3, 2013 - 4:02:37 AM - Cameron Townshend Back To Top (25686)

Hi under the "Clustered Table" headin you have the following

  • Data is stored in order based on the clustered index key

 

Could you expand on this more. What exactly do you mean by "stored in order"? Do you mean on disk or what exactly. I thought that data was not necessarily stored in order.


Thursday, June 13, 2013 - 2:59:15 AM - Sateesh Back To Top (25418)

Very usefull.......I found what I was looking

Many Thanks


Thursday, May 30, 2013 - 11:12:39 AM - syed arshad Back To Top (25201)

I like this document and very good explanation. thanks for sharing the knowledge.


Wednesday, May 29, 2013 - 1:31:07 PM - Greg Robidoux Back To Top (25187)

@Leo - yes that is correct.  If there is a clustered index you will see a Clustered Index Scan and if there is not you will see a Table Scan.

 

http://www.mssqltips.com/sqlservertutorial/277/index-scans-and-table-scans/


Wednesday, May 29, 2013 - 12:29:46 PM - Leo Back To Top (25186)

Hi Greg,

I think I found what I was looking for:

"As soon as you create a clustered index, that becomes the table. All the data is stored at the leaf level of the index. So when you have a clustered index, and you get a scan, you'll only ever see a clustered index scan and never a table scan. This is because the cluster is the table."
 
I hope that is correct.
 
Thank you very much.
 
 

 

 


Wednesday, May 29, 2013 - 12:06:41 PM - Leo Back To Top (25185)

Hi Greg,

Thank you for the response.

>>a clustered index is the same as the table

So, Is there only one object? What is not clear to me is the logical architecture of the clustered table. Is it correct to say that all records are stored inside de clustered index? In this case, FULL TABLE SCAN and INDEX SCAN are the same in a clustered table?

Thank you

 


Wednesday, May 29, 2013 - 11:30:46 AM - Greg Robidoux Back To Top (25184)

@Leo - a clustered index is the same as the table.  The clustered index just tells SQL Server how to sort the data for storage.

If there is a not a clustered index the table is called a heap and there is no specfic ordering of the data.


Wednesday, May 29, 2013 - 10:54:13 AM - Leo Back To Top (25182)

Hi Greg,

The clustered index and the table are separeted objects or they are the same object inside the Data File?

Thanks


Thursday, March 8, 2012 - 4:47:02 PM - Timbilt Back To Top (16308)

Very userful post.  Cleared up a lot for me.  I was confused as to whether or not a heap is created for each non clustered index.  I was wondering why it would be necessary to use possibly double the space simply to make queries faster. "Because a heap or a clustered index determines the physical storage of your table data, there can only be one of these per table. So a table can either have one heap or one clustered index."  You cleared it up for me - Thanks.















get free sql tips
agree to terms