Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««910111213»»

Introduction to Indexes: Part 2 – The clustered index Expand / Collapse
Author
Message
Posted Friday, July 6, 2012 10:13 AM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 6:05 AM
Points: 40,258, Visits: 36,681
ScottPletcher (7/6/2012)
An integer column defined as identity (a very good and popular choice for clustered index) is 4 bytes.



It's certainly popular, but most often not good. The no-thought identity as clustered key likely causes more performance problems than any other single thing.


We'll have to agree to disagree on that matter. I hold to the school of thought that the clustered index should first organise the data (narrow, unchanging, ever-increasing, unique). If it can also support queries that's a bonus.

There's only one clustered index, multiple nonclustered indexes, clustered index to organise the table, nonclusters to support the queries.



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #1326194
Posted Friday, July 6, 2012 10:49 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 11:44 AM
Points: 2,217, Visits: 3,336
GilaMonster (7/6/2012)
There's only one clustered index, multiple nonclustered indexes, clustered index to organise the table, nonclusters to support the queries.



Every column added to a query invalidates the existing covering index and will revert back to a table scan.

How do you keep up with every single-column change to thousands of queries for thousands of databases and tens of thousands, or even millions, of tables, to keep all your non-clus indexes as truly covering?

Just not possible, practical or worthwhile.

The most important performance aspect of a table is determining the correct clustered index.


SQL DBA,SQL Server MVP('07, '08, '09)

Carl Sagan said: "There is no such thing as a dumb question." Sagan obviously never watched a congressional hearing!
Post #1326210
Posted Friday, July 6, 2012 10:57 AM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 6:05 AM
Points: 40,258, Visits: 36,681
ScottPletcher (7/6/2012)
GilaMonster (7/6/2012)Every column added to a query invalidates the existing covering index and will revert back to a table scan.


Not necessarily a table scan. Maybe a seek + lookup, maybe index intersection, maybe an index scan. SQL has lots of options for indexes.

How do you keep up with every single-column change to thousands of queries for thousands of databases and tens of thousands, or even millions, of tables, to keep all your non-clus indexes as truly covering?


I don't. I make covering indexes for critical queries, ensure that not-so-critical queries perform within requirements and make sure that new developments are correctly performance tested and that the developers have made any necessary index creations or modifications before the change are deployed to production.

How do you keep up with changes to all the queries across thousands of databases and tens of thousands, or even millions tables to keep all your nonclustered indexes up to date and useful for the queries that don't use the clustered index, seeing as I have very seldom seen tables with one and only one access path? (and the cluster can only support one access path)

The most important performance aspect of a table is determining the correct clustered index.


Absolutely, and what I want is one that is narrow - doesn't make nonclustered indexes larger than they need to be -, unique - doesn't require a uniquifier column - , unchanging - that isn't going to require that every nonclustered index on the table be updated when the clustering key value is changed - and ever-increasing - that isn't going to cause huge fragmentation due to inserts.



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #1326217
Posted Friday, July 6, 2012 12:16 PM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 11:44 AM
Points: 2,217, Visits: 3,336
The most important performance aspect of a table is determining the correct clustered index.

Absolutely, and what I want is one that is narrow - doesn't make nonclustered indexes larger than they need to be -, unique - doesn't require a uniquifier column - , unchanging - that isn't going to require that every nonclustered index on the table be updated when the clustering key value is changed - and ever-increasing - that isn't going to cause huge fragmentation due to inserts.



The same rote quote, and, yes, those are good general rules, but not enough by themselves.

Please, those of you learning about clustered indexes, look beyond the no-thought approach.

Instead, consider the actual requirements of your specific table before determining the clustered index key(s) for that table.

You'll save yourself huge performance problems and the extra work required to re-index the table properly later.


SQL DBA,SQL Server MVP('07, '08, '09)

Carl Sagan said: "There is no such thing as a dumb question." Sagan obviously never watched a congressional hearing!
Post #1326270
Posted Sunday, July 8, 2012 2:59 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Tuesday, August 19, 2014 6:53 PM
Points: 1,932, Visits: 1,589
As a matter of fact, you can’t avoid the scans (table / index) in real life or practical scenarios but you can minimize those and that’s what Gail is trying to explain here.

Every columns that you add to query as predicate may or may not be covering at the moment. It needs not to be covering especially in ad-hoc query environments it’s very common. If a query that executes rarely, I don’t see any reason to worry much on it unless an explicit weight / priority is assign to that query.

Also, you would not like to include all the columns in non-clustered index just to make it covering. Would you?


Dev

Devendra Shirbad | BIG Data Architect / DBA | Ex-Microsoft CSS (SQL 3T)
*** Open Network for Database Professionals ***

LinkedIn: http://www.linkedin.com/in/devendrashirbad
Post #1326540
Posted Monday, July 9, 2012 9:20 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 11:44 AM
Points: 2,217, Visits: 3,336
Dev (7/8/2012)
As a matter of fact, you can’t avoid the scans (table / index) in real life or practical scenarios but you can minimize those and that’s what Gail is trying to explain here.

Every columns that you add to query as predicate may or may not be covering at the moment. It needs not to be covering especially in ad-hoc query environments it’s very common. If a query that executes rarely, I don’t see any reason to worry much on it unless an explicit weight / priority is assign to that query.

Also, you would not like to include all the columns in non-clustered index just to make it covering. Would you?



Not at all. That's why I clearly stated, the (vast) majority of the queries should be using the clustered index to limit rows.

What Gail stated is:

clustered index to organise the table, nonclusters to support the queries.

I make covering indexes for critical queries


If nonclus are to support the queries, they need to be:
1) covering indexes OR
2) select only a very small % of the total rows

Since you can't directly control #2, aren't you then forced to make covering indexes?

If the query restricts to a minimum range of rows based on the clustered index, I don't have to make any adjustments then even if they select/compare every column in the table.


SQL DBA,SQL Server MVP('07, '08, '09)

Carl Sagan said: "There is no such thing as a dumb question." Sagan obviously never watched a congressional hearing!
Post #1326920
Posted Monday, July 9, 2012 9:23 AM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: 2 days ago @ 1:45 AM
Points: 1,605, Visits: 5,658
What does that have to do with the clustered index key, though? You certainly can't cover every possible query with the clustered index (unless you have an incredibly simple database), so you have to have nonclustered ones--and if you have them, then having the clustering key as small as possible (since it needs to be repeated in every NC index) is surely a good idea, which is pretty much what Gail said?
Post #1326923
Posted Monday, July 9, 2012 9:57 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Tuesday, August 19, 2014 6:53 PM
Points: 1,932, Visits: 1,589
Let me put this way; I have a table XYZ and an ID (Primary, Clustered Index). I typically don’t query on ID column but it’s used in many queries for joins. I typically search based on Description (Non-Clustered index).

Now in this case, assume I don’t have Non-Clustered index in place and I query based on Description column, it will scan the cluster. Will it be a good plan? If I have Non-Clustered index (on description) in place, based on selectivity optimizer may pick the index. Isn’t it a good plan?

In continuation to the same, if the query selects few non-key columns as well, I may include couple of columns (mostly queried) in it. If this index can satisfy 60-80% queries on that table I will consider it a good index.

The major fact, we can have only one Clustered Index on a table (ignoring workaround with INCLUDE). ‘select only a very small % of the total rows’ is applicable to Clustered Index as well for seek operation else it will scan the cluster.


Dev

Devendra Shirbad | BIG Data Architect / DBA | Ex-Microsoft CSS (SQL 3T)
*** Open Network for Database Professionals ***

LinkedIn: http://www.linkedin.com/in/devendrashirbad
Post #1326949
Posted Monday, July 9, 2012 10:04 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 11:44 AM
Points: 2,217, Visits: 3,336
Dev (7/9/2012)‘select only a very small % of the total rows’ is applicable to Clustered Index as well for seek operation else it will scan the cluster.


False, of course. If, for example, you have a table with two years' worth of data, 2010 and 2011, clustered on date, and you specify >= '20110101', SQL will read only the 2011 data.

That is another main reason you should leverage the clus index as much as possible.


SQL DBA,SQL Server MVP('07, '08, '09)

Carl Sagan said: "There is no such thing as a dumb question." Sagan obviously never watched a congressional hearing!
Post #1326953
Posted Monday, July 9, 2012 10:17 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Tuesday, August 19, 2014 6:53 PM
Points: 1,932, Visits: 1,589
ScottPletcher (7/9/2012)
Dev (7/9/2012)‘select only a very small % of the total rows’ is applicable to Clustered Index as well for seek operation else it will scan the cluster.


False, of course. If, for example, you have a table with two years' worth of data, 2010 and 2011, clustered on date, and you specify >= '20110101', SQL will read only the 2011 data.

That is another main reason you should leverage the clus index as much as possible.


It’s just an optimization added to SCAN operation but it won’t be good anyways if the data is skewed and ‘>= 20110101’ returns 90% of rows.


Dev

Devendra Shirbad | BIG Data Architect / DBA | Ex-Microsoft CSS (SQL 3T)
*** Open Network for Database Professionals ***

LinkedIn: http://www.linkedin.com/in/devendrashirbad
Post #1326962
« Prev Topic | Next Topic »

Add to briefcase «««910111213»»

Permissions Expand / Collapse