what's the best column for clustered indexes

Question

what's the best column for clustered indexes

mpalaparthi

Say Hey Kid

Points: 688
More actions
September 22, 2011 at 5:59 pm

#248581

Hello All,
I have a table and the DDL is show below:
CREATE
TABLE [dbo].[XYZ]
([cacheId] [numeric](10, 0) NULL, [TrackingNumber] [varchar](255) NULL, [FormID] [varchar](255) NULL, [CCIPortID] [varchar](255) NULL,
[OriginStationID] [varchar](255) NULL, [DestinationStationID] [varchar]
(255) NULL, [DestinationZipCode] [varchar](255) NULL, [CarrierOID] [varchar](255) NULL, [SentToSPSS] [bit]NULL, [ProcessedBySPSS] [bit]
NULL, [SpecialSorted] [varchar](255) NULL, [SepUID] [varchar](255) NULL, [UpdatedTrackingIndex] [bit]NULL, [OkToSendToSorter] [bit]
NULL, [RouteID] [varchar](255) NULL, [SentToSorterCount] [numeric]
(10, 0) NULL, [time_created$] [datetime]NULL, [time_last_modified$] [datetime]NULL, [parent$_id$] [numeric](19, 0) NULL, [id$] [numeric]
(19, 0) NOT NULL, [extId$] [varchar](255) NULL, [state$] [char]
(1) NULL) ON [PRIMARY]
GO
Here is the index info:
Index Type Column Name
id$ clustered index
sendtoSPSS filtered index
OkToSendSortor filtered index
trackingindex NC
Fill factor: 80% and pad index is enabled on all indexes
here is the DMLs that are frequently executed on the XYZ table
Select Statement:
select T.cacheId, T.id$, T.extId$, T.state$, T.time_created$, T.time_last_modified$, T.parent$_id$, T.TrackingNumber, T.FormID, T.CCIPortID, T.OriginStationID, T.DestinationStationID, T.DestinationZipCode, T.CarrierOID, T.SentToSPSS, T.ProcessedBySPSS, T.SpecialSorted, T.SepUID, T.UpdatedTrackingIndex, T.OkToSendToSorter, T.RouteID, T.SentToSorterCount from XYZ T
Update Statement:
update XYZ set cacheId=?, id$=?, extId$=?, state$=?, time_created$=?, time_last_modified$=?, parent$_id$=?, TrackingNumber=?, FormID=?, CCIPortID=?, OriginStationID=?, DestinationStationID=?, DestinationZipCode=?, CarrierOID=?, SentToSPSS=?, ProcessedBySPSS=?, SpecialSorted=?, SepUID=?, datedTrackingIndex=?, OkToSendToSorter=?, RouteID=?, SentToSorterCount=?
where id$=?
Delete Statement: delete from XYZ where id$=?
Insert Statement:
insert into XYZ(cacheId, id$, extId$, state$, time_created$, time_last_modified$, parent$_id$, TrackingNumber, FormID, CCIPortID, OriginStationID, DestinationStationID, DestinationZipCode, CarrierOID, SentToSPSS, ProcessedBySPSS, SpecialSorted, SepUID, UpdatedTrackingIndex, OkToSendToSorter, RouteID,
SentToSorterCount) values(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)
My Question is: My Index are frequently fragmented and I can understand that there should be non-clustered index on id$ column and if so which column is best for Clustered index.
here is one more select statement that executes on same XYZ table:
select t4.TrackingNumber, t6.EffectiveInterceptZipCode, t6.EffectiveDiverterCode from XYZ t4 with (nolock), table-1 t5 with (nolock), table-2 t6 with (nolock),(Select distinct t1.TrackingNumber from XYZ t1 with (nolock), table-1 t2 with (nolock),table-2 t3 with (nolock) where exists (select * from (select count(1) as shipcount from XYZ tt1where tt1.TrackingNumber=t1.TrackingNumber) tt2 where tt2.shipcount>1)and t2.valPid$=0 and t1.SentToSPSS = 0 and t1.OkToSendToSorter = 1and t3.StatusExpirationTimeStamp_tm > GETDATE() and t1.id$=t2.pid$and t2.pid$=t3.parent$_id$ and t3.id$=t2.id$ ) t7
where t5.valPid$=0 and t6.StatusExpirationTimeStamp_tm > GETDATE()
and t4.TrackingNumber=t7.TrackingNumber and t4.id$=t5.pid$
and t5.pid$=t6.parent$_id$ and t6.id$=t5.id$
order by TrackingNumber, DuplicateRecordRanking
The reason I create the Filtered indexes on sendtoSPSS and OkToSendToSorter colums is the data in those columns is either 0 or 1.
Please advice
Thanks

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 1

Have a read through this: http://www.sqlservercentral.com/articles/Indexing/68563/

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

spaghettidba SSC Guru Points: 105732 More actions · Answer 2

You're not giving enough information to get sensible advice.

First of all, give yourself a chance to get an answer and format the code you post. You can make it more readable using the IFCode tags [ code]your code goes here[/code]. I intentionally put a space between "[" and "code]" to make it display, but you should not include it.

However, your code, once formatted and surrounded in IFCode looks like this:

CREATE TABLE [dbo].[XYZ]

(

[cacheId] [numeric](10, 0) NULL,

[TrackingNumber] [varchar](255) NULL,

[FormID] [varchar](255) NULL,

[CCIPortID] [varchar](255) NULL,

[OriginStationID] [varchar](255) NULL,

[DestinationStationID] [varchar](255) NULL,

[DestinationZipCode] [varchar](255) NULL,

[CarrierOID] [varchar](255) NULL,

[SentToSPSS] [bit] NULL,

[ProcessedBySPSS] [bit] NULL,

[SpecialSorted] [varchar](255) NULL,

[SepUID] [varchar](255) NULL,

[UpdatedTrackingIndex] [bit] NULL,

[OkToSendToSorter] [bit] NULL,

[RouteID] [varchar](255) NULL,

[SentToSorterCount] [numeric](10, 0) NULL,

[time_created$] [datetime] NULL,

[time_last_modified$] [datetime] NULL,

[parent$_id$] [numeric](19, 0) NULL,

[id$] [numeric](19, 0) NOT NULL,

[extId$] [varchar](255) NULL,

[state$] [char](1) NULL

)

ON [PRIMARY]

GO

Statements:

--Select Statement:

SELECT T.cacheId,

T.id$,

T.extId$,

T.state$,

T.time_created$,

T.time_last_modified$,

T.parent$_id$,

T.TrackingNumber,

T.FormID,

T.CCIPortID,

T.OriginStationID,

T.DestinationStationID,

T.DestinationZipCode,

T.CarrierOID,

T.SentToSPSS,

T.ProcessedBySPSS,

T.SpecialSorted,

T.SepUID,

T.UpdatedTrackingIndex,

T.OkToSendToSorter,

T.RouteID,

T.SentToSorterCount

FROM XYZ T

--Update Statement:

UPDATE XYZ

SET cacheId = ?,

id$ = ?,

extId$ = ?,

state$ = ?,

time_created$ = ?,

time_last_modified$ = ?,

parent$_id$ = ?,

TrackingNumber = ?,

FormID = ?,

CCIPortID = ?,

OriginStationID = ?,

DestinationStationID = ?,

DestinationZipCode = ?,

CarrierOID = ?,

SentToSPSS = ?,

ProcessedBySPSS = ?,

SpecialSorted = ?,

SepUID = ?,

datedTrackingIndex = ?,

OkToSendToSorter = ?,

RouteID = ?,

SentToSorterCount = ?

WHERE id$ = ?

--Delete Statement:

DELETE

FROM XYZ

WHERE id$=?

--Insert Statement:

INSERT INTO XYZ

(

cacheId,

id$,

extId$,

state$,

time_created$,

time_last_modified$,

parent$_id$,

TrackingNumber,

FormID,

CCIPortID,

OriginStationID,

DestinationStationID,

DestinationZipCode,

CarrierOID,

SentToSPSS,

ProcessedBySPSS,

SpecialSorted,

SepUID,

UpdatedTrackingIndex,

OkToSendToSorter,

RouteID,

SentToSorterCount

)

VALUES (

?,

?

)

--here is one more select statement that executes on same XYZ table:

SELECT t4.TrackingNumber,

t6.EffectiveInterceptZipCode,

t6.EffectiveDiverterCode

FROM XYZ t4 WITH ( NOLOCK ),

[table-1] t5 WITH ( NOLOCK ),

[table-2] t6 WITH ( NOLOCK ),

( SELECT DISTINCT

t1.TrackingNumber

FROM XYZ t1 WITH ( NOLOCK ),

[table-1] t2 WITH ( NOLOCK ),

[table-2] t3 WITH ( NOLOCK )

WHERE EXISTS ( SELECT *

FROM ( SELECT COUNT(1) AS shipcount

FROM XYZ tt1

WHERE tt1.TrackingNumber = t1.TrackingNumber

) tt2

WHERE tt2.shipcount > 1 )

AND t2.valPid$ = 0

AND t1.SentToSPSS = 0

AND t1.OkToSendToSorter = 1

AND t3.StatusExpirationTimeStamp_tm > GETDATE()

AND t1.id$ = t2.pid$

AND t2.pid$ = t3.parent$_id$

AND t3.id$ = t2.id$

) t7

WHERE t5.valPid$ = 0

AND t6.StatusExpirationTimeStamp_tm > GETDATE()

AND t4.TrackingNumber = t7.TrackingNumber

AND t4.id$ = t5.pid$

AND t5.pid$ = t6.parent$_id$

AND t6.id$ = t5.id$

ORDER BY TrackingNumber,

DuplicateRecordRanking

Some general advice:

1) Don't use NOLOCK: you may end up with inconsistent data

2) Don't use $ and other symbols in your column names

3) Don't use "Table-1" as a table name: use an entity name. You already know it's a table, you don't need to state it again in its name.

4) A good clustering key is small, unique and ever-increasing. The column id$ looks small and (maybe) unique. Is it ever-increasing? If so, use it.

Generally speaking, the primary key is a good clustering key. Does this table have a primary key? Use it.

Hope this helps

Gianluca

-- Gianluca Sartori

okbangas SSChampion Points: 11773 More actions · Answer 3

Generally speaking, the primary key is a good clustering key. Does this table have a primary key? Use it.

Generally speaking, I'ld say this is not a good advice. I find all sorts of primary keys, including composit primary keys and uniqueidentifier (guid). Neither are good candidates for a clustering key. You cannot assume that an identity column (or other ever-increasing int) is the primary key.

Ole Kristian Velstadbråten Bangås - Virinco - Facebook - Twitter

Concatenating Row Values in Transact-SQL[/url]

spaghettidba SSC Guru Points: 105732 More actions · Answer 4

okbangas (9/23/2011)
Generally speaking, the primary key is a good clustering key. Does this table have a primary key? Use it.
Generally speaking, I'ld say this is not a good advice. I find all sorts of primary keys, including composit primary keys and uniqueidentifier (guid). Neither are good candidates for a clustering key. You cannot assume that an identity column (or other ever-increasing int) is the primary key.

A composite key can be a good clustering key.

Also a uniqueidentifier can be a good clustering key.

Neither are perfect, but both can be good.

-- Gianluca Sartori

mpalaparthi Say Hey Kid Points: 688 More actions · Answer 5

Hello Gianluca Sartori,

Thanks for the reply

The name table-1 is just a dummy name, it's not the real name of the table

We are using Nolocks because in my application select, Insert, delete and Updates happens contineously and some times simultaniously too and the business is okay with, not getting the latest data when we use NOLOCKS

The symbol $ is not recommneded by me and this application is going into production soon and I guess I can't remove $ symbol at this point of time

That id$ is not the identity column that we normaly use. That is a system generated integer and is always unique and here is the sample data in that id$ column

(3888088106,691956266,9855616042,10318775338,11919868970,11987993642,13390480426,27798766634,31674041386)

That id$ is the only column which has the unique data and the other columns has some duplicate data.

Thats where I am confused, ID$ column has clustered index and it is the only column which has the unique data but that id$ column is used in WHERE clause as (where id$ = "some data") but any column which has the where condition as (where id$ = "some data") it is good if it has nonclustered index on it.

But it has clustered index. thats where I am getting confusion.

Please advice

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 6

Gianluca Sartori (9/23/2011)
Also a uniqueidentifier can be a good clustering key.

Can be, maybe, if the person designing the table understands the complexities. Usually however it's about the worst clustered index possible (wideish and completely random)

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Netic Old Hand Points: 342 More actions · Answer 7

okbangas (9/23/2011)
Generally speaking, the primary key is a good clustering key. Does this table have a primary key? Use it.
Generally speaking, I'ld say this is not a good advice. I find all sorts of primary keys, including composit primary keys and uniqueidentifier (guid). Neither are good candidates for a clustering key. You cannot assume that an identity column (or other ever-increasing int) is the primary key.

Totally agree!

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 8

The best clustered index is really dependent on the data and the access paths to it. For instance, if you have a datatime column in your database that is frequently used in range queries, you may find this to be a better candidate for a clustered index than a unique or primary key.

Since we don't know your application telling you which column (or columns) would make a better clustered index is nothing more than guess work. You need to look at your data and the queries that access the table, and the frequency in which those queries are run to make a good decision on the clustered index.

mpalaparthi Say Hey Kid Points: 688 More actions · Answer 9

Hi Gila,

Thanks for the reply.

Here is the sample Insert statement of the XYZ table:

INSERT [dbo].[XYZ] ([cacheId], [TrackingNumber], [FormID], [CCIPortID], [OriginStationID], [DestinationStationID], [DestinationZipCode], [CarrierOID], [SentToSPSS], [ProcessedBySPSS], [SpecialSorted], [SepUID], [UpdatedTrackingIndex], [OkToSendToSorter], [RouteID], [SentToSorterCount], [time_created$], [time_last_modified$], [parent$_id$], [id$], [extId$], [state$])

VALUES (CAST(13 AS Numeric(10, 0)), N'868605045285', N'868605045285', N'CCII-ALL', N'CVRKK', N'MAZA', N'00604', N'2944556900', 0, 0, NULL, N'2455815000', 1, 1, NULL, CAST(0 AS Numeric(10, 0)), CAST(0x00009F660168CC26 AS DateTime), CAST(0x00009F660168CC26 AS DateTime), CAST(3888071673 AS Numeric(19, 0)), CAST(3888088106 AS Numeric(19, 0)), NULL, N'C')

INSERT [dbo].[XYZ] ([cacheId], [TrackingNumber], [FormID], [CCIPortID], [OriginStationID], [DestinationStationID], [DestinationZipCode], [CarrierOID], [SentToSPSS], [ProcessedBySPSS], [SpecialSorted], [SepUID], [UpdatedTrackingIndex], [OkToSendToSorter], [RouteID], [SentToSorterCount], [time_created$], [time_last_modified$], [parent$_id$], [id$], [extId$], [state$])

VALUES (CAST(93 AS Numeric(10, 0)), N'873270570640', N'873270570640', N'CCII-ALL', N'HRLA', N'QXZA', N'4061', N'2944555749', 0, 0, NULL, N'2455815000', 1, 1, NULL, CAST(0 AS Numeric(10, 0)), CAST(0x00009F6600B5D89E AS DateTime), CAST(0x00009F6600B5D89E AS DateTime), CAST(9691939833 AS Numeric(19, 0)), CAST(9691956266 AS Numeric(19, 0)), NULL, N'C')

INSERT [dbo].[XYZ] ([cacheId], [TrackingNumber], [FormID], [CCIPortID], [OriginStationID], [DestinationStationID], [DestinationZipCode], [CarrierOID], [SentToSPSS], [ProcessedBySPSS], [SpecialSorted], [SepUID], [UpdatedTrackingIndex], [OkToSendToSorter], [RouteID], [SentToSorterCount], [time_created$], [time_last_modified$], [parent$_id$], [id$], [extId$], [state$])

VALUES (CAST(91 AS Numeric(10, 0)), N'871711250160', N'871711250160', N'CCII-ALL', N'HRLA', N'QXZA', N'4061', N'2944555745', 0, 0, NULL, N'2455815000', 1, 1, NULL, CAST(0 AS Numeric(10, 0)), CAST(0x00009F6600B5D854 AS DateTime), CAST(0x00009F6600B5D854 AS DateTime), CAST(9855599609 AS Numeric(19, 0)), CAST(9855616042 AS Numeric(19, 0)), NULL, N'C')

So, considering the DML statements which column is better for clustered index. As I mentioned except the id$ column, rest of the columns has duplicates but this ID$ column is used in where condition.

Kindly advice. I am seeing frequest fragmentation of indexes on the XYZ tables and because of this deadlocks are occuring in the application.

Please advice.

Thanks a TON

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 10

As I said above, we really can't tell you what would be the best column (or columns) for you to use for a clustered index. You have to make that decision based on your analysis of the data, the different access paths (queries run against the data), the frequency in which each of these queries are run.

The best clustered index may or may not be a unique or primary key. It may be one of the datetime columns because of frequent range searchs on that column.

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 11

mpalaparthi (9/23/2011)
Kindly advice. I am seeing frequest fragmentation of indexes on the XYZ tables and because of this deadlocks are occuring in the application.

Fragmentation is not going to be the cause of the deadlocks. Poorly written queries or poor indexing is the likely cause.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

okbangas SSChampion Points: 11773 More actions · Answer 12

Lynn Pettis (9/23/2011)
The best clustered index is really dependent on the data and the access paths to it. For instance, if you have a datatime column in your database that is frequently used in range queries, you may find this to be a better candidate for a clustered index than a unique or primary key.

Funny you mention it. Datetime can be ever inceeasing, bit rarely unique, causing sql server to add a four byte identifier to each row. Om such a scenario, it may actually be benedicial to have a unique composite clustering key, the datetime and an identity column. The latter may be used as a primay key as well 🙂

Ole Kristian Velstadbråten Bangås - Virinco - Facebook - Twitter

Concatenating Row Values in Transact-SQL[/url]

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 13

okbangas (9/23/2011)
Funny you mention it. Datetime can be ever inceeasing, bit rarely unique, causing sql server to add a four byte identifier to each row. Om such a scenario, it may actually be benedicial to have a unique composite clustering key, the datetime and an identity column. The latter may be used as a primay key as well 🙂

So you'd add a 4-byte integer to every row to avoid having SQL add a 4-byte integer to only the duplicate rows?

The uniquifier's only added to rows with a duplicate value of the clustering key, not to every row.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

okbangas SSChampion Points: 11773 More actions · Answer 14

Trying to Keep it short when writing on the phone. The uniqueifier cause the table to require a variable length block as well. So, in sone cases it is beneficial, yes.

Ole Kristian Velstadbråten Bangås - Virinco - Facebook - Twitter

Concatenating Row Values in Transact-SQL[/url]