Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Which is Better: Heaps or Tables with Clustered Indexes?

I have always been of the personal opinion that all SQL Server tables have a clustered index. As part of the research I am doing on a new book on High Performance Index Maintenance, and on heaps specifically, I ran across this SQL Server Best Practices Article from Microsoft.

This article describes a series of tests that Microsoft did using SQL Server 2005 SP1 in 2007. While the article is a little dated, it does a great job of “proving” that every table should have a clustered index. It goes into great detail explaining why using a clustered index instead of a heap is virtually always the best choice.

If you are not already convinced that heaps should be avoided, you will be after reading this paper.

 

 

Share this post :

Comments

Posted by Anonymous on 13 February 2009

Pingback from  SQL Server Central | usproxylist.com

Posted by DAvid on 18 February 2009

For the data access profile in question sure the results are fine.

Try it again where the update is to part of the record key (clustered or not) Specifically where there are concurrent updates to this key and selects by this key and watch the non-clustered - heap become the clear choice.

Posted by mpalecek on 20 February 2009

I thought this was supposed to be comparing clustered indexes vs. heap tables. I can see the point of the paper indicating that tables should consider a clustered index before a non-clustered index, but what about plain simple heap table. With no index at all. it woud seem to me that this would be a the most useful configuration to test. That there might be benefits to a heap table that do not exist if there is a non-clustered index present.

Posted by Dexter on 20 February 2009

Depending on the usage of a heap I still think it has a place. For eaxmple:

1. Querying only the columns that make up the non clustered index on a heap - selects should be faster.

2. Inserting rows into a heap where it does not have any non clustered indexes - inserts should be much faster.

Posted by bisupport on 20 February 2009

I find the time it takes to create clustered indexes often (significantly) outweighs the benefits.  Remember also that SQL Server Standard edition throttles back clustered index creation to utilising a single processor core.

Posted by Brad M. McGehee on 20 February 2009

I know the Microsoft paper was not a complete test of the pros and cons of heaps. On the other hand, who knows of any "good quality" research that clearly indicates where heaps are superior to clustered indexes. I am sure that there must be places where heaps might have an advantage, but since I was born in Missouri, I want someone to "show me."

Posted by R L Reid on 20 February 2009

Let's forget performance for a moment.  Why should we even be concerned with the question if we are running dataservers (as opposed to designing the engines)?

Clustered indices are one of the few decent constraint mechanisms provided for "SQL databases", but that's orthogonal to any of this discussion.

The point of relational and even of SQL databases is to hide the details of physical access from the developer - and as much as possible from the DBA as well, I'd say.

And in the horserace of technology - the old notions of relative speed of CPU vs disk vs network all being upended - what gives better performance changes.  What wins today, loses tomorrow.  

We shouldn't need to be in the business of deciding on the storage format of an index these days.  That's the engines business.  SQL Server has had great abilities since 2005 to make evolutionary decisions on how best to index, and will practically tell you what to do.  Let it use its smarts and just do it.  If it makes a poor choice, it will discover that on its own too.  

A well normalized db I've maintained for 25 years went through a 10 year period where a clustered index on the unique key gave poor performance because of the clumped distribution of the keys.   Then the vendor changed the histograms and suddenly a clustered index was the way to go again.  

Then there's the issue of what happens when a non-trivial data-centeric program optimized to run in 64K runs on a machine with 64 CPUs - but that's another story.

One annoying thing about clustered indices - is that most developers INSIST that forces the results to be returned in the same order as the index, without an explicit "order by".   And ask me to "fix the server".  

Non-clustered indices used to be a nice performance mechanism when you could manage "index coverage" of high volume queries.  But these days, take advantage of the cache and the multi-channel IO and the parallel threads and avoid "tricks" that work until the next version or the new hardware or the port to a different vendor.

Posted by R L Reid on 20 February 2009

Sorry - one more note (and believe me, I have no wish to denigrate clustered indices or to push heaps).  

You can't test everything.  I notice the hardware they used for this test was pretty puny - 2 quad cores and 192 MB memory; and a single bus JBOD with no self balancing or assured writes to onboard cache (if I read it correctly).

All of which is fine.  These days, we can't hope that then generality of the results hold as you scale the machines.  64 GB memory, 64 (or even 8) cpus, multibus io, smart iSCSI arrays (e.g. Equalogics) change the equation so drastically that you can't even assume the generalities will hold.  

But we're database folks, so we already knew "It depends", right?!

It would be interesting to see that same test on a broad range of hardware and vendors - if anyone's pockets we deep enough for it.

roger

Leave a Comment

Please register or log in to leave a comment.