Stairway to SQL Server Indexes: Step 3, Clustered Indexes

Question

Stairway to SQL Server Indexes: Step 3, Clustered Indexes

David Durant

Default port

Points: 1422
More actions
February 14, 2011 at 10:37 am

#246686

Comments posted to this topic are about the item Stairway to SQL Server Indexes: Step 3, Clustered Indexes

Viewing 12 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

amenjonathan SSCrazy Points: 2482 More actions · Answer 1

I have heard from more than one person that all columns in the clustered index also show up in each non-clustered index on that same table, which is why you should keep your clustered index columns to the bare minimum you can get away with.

Is this true?

-------------------------------------------------------------------------------------------------
My SQL Server Blog

sknox SSChampion Points: 12388 More actions · Answer 2

amenjonathan (6/22/2011)
I have heard from more than one person that all columns in the clustered index also show up in each non-clustered index on that same table, which is why you should keep your clustered index columns to the bare minimum you can get away with.
Is this true?

Yes and no*.

The clustered index (plus a uniqueifier if necessary) is the row pointer for non-clustered indexes (See http://msdn.microsoft.com/en-us/library/ms177484.aspx) So a smaller clustered index will mean smaller non-clustered indexes, all else being equal.

BUTif you have few non-clustered indexes, and you have a column that is almost always included in criteria for queries, you can gain performance if that's part of the clustered index at a small cost of space in the non-clustered indexes (indeed, you'd likely be moving that column from its own non-clustered index to the clustered index, so you could actually save space.)

Note: When I speak about changing the clustered index, I'm speaking of conceptually changing it preferably before any data is in the table, and definitely before moving it to production. Changing a clustered index in production can be difficult, to say the least. This is one of those areas where planning is key.

* This is ALWAYS the answer to any Yes or No question. Always.

amenjonathan SSCrazy Points: 2482 More actions · Answer 3

Very cool. Thanks!

-------------------------------------------------------------------------------------------------
My SQL Server Blog

Steven Willis SSCrazy Eights Points: 9893 More actions · Answer 4

Here's two sample queries that will run against the default DotNetNuke schema. Note the only difference is in the second query in which I've added to the where clause a requirement to make the value of the primary key greater than zero. Adding this line to the second query seems to force an index seek on the table. Is this really doing what it seems to be doing? If so, it's a very valuable technique for forcing seeks instead of scans.

SELECT

up.PropertyValue

,ppd.PortalID

FROM

dbo.UserProfile AS up

INNER JOIN dbo.ProfilePropertyDefinition AS ppd

ON up.PropertyDefinitionID = ppd.PropertyDefinitionID

WHERE

up.UserID = 2345 -- Put a real UserID here

SELECT

up.PropertyValue

,ppd.PortalID

FROM

dbo.UserProfile AS up

INNER JOIN dbo.ProfilePropertyDefinition AS ppd

ON up.PropertyDefinitionID = ppd.PropertyDefinitionID

WHERE

ppd.PropertyDefinitionID > 0

AND up.UserID = 2345 -- Put a real UserID here

[For some reason the image is not displaying, but you can click on the link and download it to view in SSMS.]

Steven Willis SSCrazy Eights Points: 9893 More actions · Answer 5

Now a second question...I personally avoid using UNIQUEIDENTIFIER cols as part of a primary key. But sometimes I have no choice when I've inherited a schema from someone else. I've not found a way to avoid index scans on these cols. Any advice?

sknox SSChampion Points: 12388 More actions · Answer 6

Steven Willis (6/22/2011)
Here's two sample queries that will run against the default DotNetNuke schema. Note the only difference is in the second query in which I've added to the where clause a requirement to make the value of the primary key greater than zero. Adding this line to the second query seems to force an index seek on the table. Is this really doing what it seems to be doing? If so, it's a very valuable technique for forcing seeks instead of scans.

Yes and no.*

Note in the query plan that each query has exactly the same cost, and each element of the two queries has the same cost as its counterpart (even the scan/seek). This plus the actual number of rows and row sizes indicates that the data you're querying against is simple enough that there's not much difference in performance between a seek and a scan.

Here we get into the "black box" nature of the query optimizer. It does appear that adding a requirement on the primary key forced an index seek, but that is not guaranteed to happen whenever you do so. Sometimes a scan is more efficient than a seek, and in my experience the optimizer is usually, but not always, better than I am at figuring that out.

* (see my previous post on the usefulness of this answer)

tmulherin Valued Member Points: 61 More actions · Answer 7

I know it's trivial but the table names used in some of your SQL statements do not match the the table names reflected in the results. For instance in step3:

SELECT *

FROM SalesOrderDetail

WHERE SalesOrderID = 43671

AND SalesOrderDetailID = 120

Heap (1 row(s) affected)

Table 'SalesOrderDetail_noindex'. Scan count 1, logical reads 1495.

This is true so far in steps 2 and 3.

sknox SSChampion Points: 12388 More actions · Answer 8

Steven Willis (6/22/2011)
Now a second question...I personally avoid using UNIQUEIDENTIFIER cols as part of a primary key. But sometimes I have no choice when I've inherited a schema from someone else. I've not found a way to avoid index scans on these cols. Any advice?

Yes. Determine if the schema requires the primary key to be the clustered index. The two are not the same thing.

Check out http://ask.sqlservercentral.com/questions/12/should-my-primary-key-be-clustered for starters.

Vedran Kesegic SSCrazy Eights Points: 9304 More actions · Answer 9

Comparison of clustered index table with unindexed heap in this article is not fair, IMHO. Clustered index means we have both data and index. HEAP means we only have data and no index. Author should add an index to the heap to make fair comparison.

Author should also show examples of operations where heap beats clustered index table in performance.

_____________________________________________________
Microsoft Certified Master: SQL Server 2008
XDetails Addin - for SQL Developers
blog.sqlxdetails.com - Transaction log myths

Tom Wickerath Valued Member Points: 53 More actions · Answer 10

<Quote>
After creating the clustered index shown above, the resulting table / clustered index would look like this:
SalesOrderID SalesOrderDetailID ProductID OrderQty UnitPrice
26 rows of data shown
As you look at the sample data shown above, you may notice that each SalesOrderDetailID value is unique. Do not be confused; SalesOrderDetailID is not the primary key of the table. The combination of SalesOrderID / SalesOrderDetailID is the primary key of the table; as well as the index key for the clustered index.
</Quote>

I believe the sentence I have shown in bold font is incorrect. This table does not have a primary key defined at this point in the tutorial, as it was created via a SELECT INTO statement. The original source table, Sales.SalesOrderDetail, does indeed include a primary key, using the combination of SalesOrderID / SalesOrderDetailID, but not the new table that we just created the clustered index on.

This reply was modified 6 years, 6 months ago by Tom Wickerath.

Tom Wickerath Valued Member Points: 53 More actions · Answer 11

I'm curious about one more thing....

In Level 2 and most of Level 3, the number of logical reads I observed matches very close with the values given by the author. However, in the last example, "Retrieving all rows for a single Product", I am getting a lot more logical reads. I'm wondering if anyone can help me understand why this is so.

select *
 from SalesOrderDetail_NoIndex
where ProductID = 755;

select *
 from SalesOrderDetail_Index
where ProductID = 755;

Table 'SalesOrderDetail_NoIndex'. Scan count 1, logical reads 1494
Table 'SalesOrderDetail_Index'. Scan count 1, logical reads 1860

Select @@Version:
Microsoft SQL Server 2017 (RTM-CU18) (KB4527377) - 14.0.3257.3 (X64)   Nov 16 2019 01:14:50   Copyright (C) 2017 Microsoft Corporation  Developer Edition (64-bit) on Windows 10 Enterprise 10.0 <X64> (Build 17763: )

The author states:

"IO slightly greater for the clustered index version; 1513 reads versus 1495 reads."

--> except I see a much more pronounced difference of 1860 reads versus 1494 reads, with less rows returned than advertised (164 versus 228)!

"Without a nonclustered index on the ProductID column to help find the rows for a single Product, both versions had to be scanned. Because of the overhead of having a clustered index, the clustered index version is the slightly larger table; therefore scanning it required a few more reads than scanning the heap."