what's the best column for clustered indexes

  • Hello All,

    I have a table and the DDL is show below:

    CREATE

    TABLE [dbo].[XYZ]

    ([cacheId] [numeric](10, 0) NULL, [TrackingNumber] [varchar](255) NULL, [FormID] [varchar](255) NULL, [CCIPortID] [varchar](255) NULL,

    [OriginStationID] [varchar](255) NULL, [DestinationStationID] [varchar]

    (255) NULL, [DestinationZipCode] [varchar](255) NULL, [CarrierOID] [varchar](255) NULL, [SentToSPSS] [bit]NULL, [ProcessedBySPSS] [bit]

    NULL, [SpecialSorted] [varchar](255) NULL, [SepUID] [varchar](255) NULL, [UpdatedTrackingIndex] [bit]NULL, [OkToSendToSorter] [bit]

    NULL, [RouteID] [varchar](255) NULL, [SentToSorterCount] [numeric]

    (10, 0) NULL, [time_created$] [datetime]NULL, [time_last_modified$] [datetime]NULL, [parent$_id$] [numeric](19, 0) NULL, [id$] [numeric]

    (19, 0) NOT NULL, [extId$] [varchar](255) NULL, [state$] [char]

    (1) NULL) ON [PRIMARY]

    GO

    Here is the index info:

    Index Type Column Name

    id$ clustered index

    sendtoSPSS filtered index

    OkToSendSortor filtered index

    trackingindex NC

    Fill factor: 80% and pad index is enabled on all indexes

    here is the DMLs that are frequently executed on the XYZ table

    Select Statement:

    select T.cacheId, T.id$, T.extId$, T.state$, T.time_created$, T.time_last_modified$, T.parent$_id$, T.TrackingNumber, T.FormID, T.CCIPortID, T.OriginStationID, T.DestinationStationID, T.DestinationZipCode, T.CarrierOID, T.SentToSPSS, T.ProcessedBySPSS, T.SpecialSorted, T.SepUID, T.UpdatedTrackingIndex, T.OkToSendToSorter, T.RouteID, T.SentToSorterCount from XYZ T

    Update Statement:

    update XYZ set cacheId=?, id$=?, extId$=?, state$=?, time_created$=?, time_last_modified$=?, parent$_id$=?, TrackingNumber=?, FormID=?, CCIPortID=?, OriginStationID=?, DestinationStationID=?, DestinationZipCode=?, CarrierOID=?, SentToSPSS=?, ProcessedBySPSS=?, SpecialSorted=?, SepUID=?, datedTrackingIndex=?, OkToSendToSorter=?, RouteID=?, SentToSorterCount=?

    where id$=?

    Delete Statement: delete from XYZ where id$=?

    Insert Statement:

    insert into XYZ(cacheId, id$, extId$, state$, time_created$, time_last_modified$, parent$_id$, TrackingNumber, FormID, CCIPortID, OriginStationID, DestinationStationID, DestinationZipCode, CarrierOID, SentToSPSS, ProcessedBySPSS, SpecialSorted, SepUID, UpdatedTrackingIndex, OkToSendToSorter, RouteID,

    SentToSorterCount) values(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)

    My Question is: My Index are frequently fragmented and I can understand that there should be non-clustered index on id$ column and if so which column is best for Clustered index.

    here is one more select statement that executes on same XYZ table:

    select t4.TrackingNumber, t6.EffectiveInterceptZipCode, t6.EffectiveDiverterCode from XYZ t4 with (nolock), table-1 t5 with (nolock), table-2 t6 with (nolock),(Select distinct t1.TrackingNumber from XYZ t1 with (nolock), table-1 t2 with (nolock),table-2 t3 with (nolock) where exists (select * from (select count(1) as shipcount from XYZ tt1where tt1.TrackingNumber=t1.TrackingNumber) tt2 where tt2.shipcount>1)and t2.valPid$=0 and t1.SentToSPSS = 0 and t1.OkToSendToSorter = 1and t3.StatusExpirationTimeStamp_tm > GETDATE() and t1.id$=t2.pid$and t2.pid$=t3.parent$_id$ and t3.id$=t2.id$ ) t7

    where t5.valPid$=0 and t6.StatusExpirationTimeStamp_tm > GETDATE()

    and t4.TrackingNumber=t7.TrackingNumber and t4.id$=t5.pid$

    and t5.pid$=t6.parent$_id$ and t6.id$=t5.id$

    order by TrackingNumber, DuplicateRecordRanking

    The reason I create the Filtered indexes on sendtoSPSS and OkToSendToSorter colums is the data in those columns is either 0 or 1.

    Please advice

    Thanks

  • Have a read through this: http://www.sqlservercentral.com/articles/Indexing/68563/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • You're not giving enough information to get sensible advice.

    First of all, give yourself a chance to get an answer and format the code you post. You can make it more readable using the IFCode tags [ code]your code goes here[/code]. I intentionally put a space between "[" and "code]" to make it display, but you should not include it.

    However, your code, once formatted and surrounded in IFCode looks like this:

    CREATE TABLE [dbo].[XYZ]

    (

    [cacheId] [numeric](10, 0) NULL,

    [TrackingNumber] [varchar](255) NULL,

    [FormID] [varchar](255) NULL,

    [CCIPortID] [varchar](255) NULL,

    [OriginStationID] [varchar](255) NULL,

    [DestinationStationID] [varchar](255) NULL,

    [DestinationZipCode] [varchar](255) NULL,

    [CarrierOID] [varchar](255) NULL,

    [SentToSPSS] [bit] NULL,

    [ProcessedBySPSS] [bit] NULL,

    [SpecialSorted] [varchar](255) NULL,

    [SepUID] [varchar](255) NULL,

    [UpdatedTrackingIndex] [bit] NULL,

    [OkToSendToSorter] [bit] NULL,

    [RouteID] [varchar](255) NULL,

    [SentToSorterCount] [numeric](10, 0) NULL,

    [time_created$] [datetime] NULL,

    [time_last_modified$] [datetime] NULL,

    [parent$_id$] [numeric](19, 0) NULL,

    [id$] [numeric](19, 0) NOT NULL,

    [extId$] [varchar](255) NULL,

    [state$] [char](1) NULL

    )

    ON [PRIMARY]

    GO

    Statements:

    --Select Statement:

    SELECT T.cacheId,

    T.id$,

    T.extId$,

    T.state$,

    T.time_created$,

    T.time_last_modified$,

    T.parent$_id$,

    T.TrackingNumber,

    T.FormID,

    T.CCIPortID,

    T.OriginStationID,

    T.DestinationStationID,

    T.DestinationZipCode,

    T.CarrierOID,

    T.SentToSPSS,

    T.ProcessedBySPSS,

    T.SpecialSorted,

    T.SepUID,

    T.UpdatedTrackingIndex,

    T.OkToSendToSorter,

    T.RouteID,

    T.SentToSorterCount

    FROM XYZ T

    --Update Statement:

    UPDATE XYZ

    SET cacheId = ?,

    id$ = ?,

    extId$ = ?,

    state$ = ?,

    time_created$ = ?,

    time_last_modified$ = ?,

    parent$_id$ = ?,

    TrackingNumber = ?,

    FormID = ?,

    CCIPortID = ?,

    OriginStationID = ?,

    DestinationStationID = ?,

    DestinationZipCode = ?,

    CarrierOID = ?,

    SentToSPSS = ?,

    ProcessedBySPSS = ?,

    SpecialSorted = ?,

    SepUID = ?,

    datedTrackingIndex = ?,

    OkToSendToSorter = ?,

    RouteID = ?,

    SentToSorterCount = ?

    WHERE id$ = ?

    --Delete Statement:

    DELETE

    FROM XYZ

    WHERE id$=?

    --Insert Statement:

    INSERT INTO XYZ

    (

    cacheId,

    id$,

    extId$,

    state$,

    time_created$,

    time_last_modified$,

    parent$_id$,

    TrackingNumber,

    FormID,

    CCIPortID,

    OriginStationID,

    DestinationStationID,

    DestinationZipCode,

    CarrierOID,

    SentToSPSS,

    ProcessedBySPSS,

    SpecialSorted,

    SepUID,

    UpdatedTrackingIndex,

    OkToSendToSorter,

    RouteID,

    SentToSorterCount

    )

    VALUES (

    ?,

    ?,

    ?,

    ?,

    ?,

    ?,

    ?,

    ?,

    ?,

    ?,

    ?,

    ?,

    ?,

    ?,

    ?,

    ?,

    ?,

    ?,

    ?,

    ?,

    ?,

    ?

    )

    --here is one more select statement that executes on same XYZ table:

    SELECT t4.TrackingNumber,

    t6.EffectiveInterceptZipCode,

    t6.EffectiveDiverterCode

    FROM XYZ t4 WITH ( NOLOCK ),

    [table-1] t5 WITH ( NOLOCK ),

    [table-2] t6 WITH ( NOLOCK ),

    ( SELECT DISTINCT

    t1.TrackingNumber

    FROM XYZ t1 WITH ( NOLOCK ),

    [table-1] t2 WITH ( NOLOCK ),

    [table-2] t3 WITH ( NOLOCK )

    WHERE EXISTS ( SELECT *

    FROM ( SELECT COUNT(1) AS shipcount

    FROM XYZ tt1

    WHERE tt1.TrackingNumber = t1.TrackingNumber

    ) tt2

    WHERE tt2.shipcount > 1 )

    AND t2.valPid$ = 0

    AND t1.SentToSPSS = 0

    AND t1.OkToSendToSorter = 1

    AND t3.StatusExpirationTimeStamp_tm > GETDATE()

    AND t1.id$ = t2.pid$

    AND t2.pid$ = t3.parent$_id$

    AND t3.id$ = t2.id$

    ) t7

    WHERE t5.valPid$ = 0

    AND t6.StatusExpirationTimeStamp_tm > GETDATE()

    AND t4.TrackingNumber = t7.TrackingNumber

    AND t4.id$ = t5.pid$

    AND t5.pid$ = t6.parent$_id$

    AND t6.id$ = t5.id$

    ORDER BY TrackingNumber,

    DuplicateRecordRanking

    Some general advice:

    1) Don't use NOLOCK: you may end up with inconsistent data

    2) Don't use $ and other symbols in your column names

    3) Don't use "Table-1" as a table name: use an entity name. You already know it's a table, you don't need to state it again in its name.

    4) A good clustering key is small, unique and ever-increasing. The column id$ looks small and (maybe) unique. Is it ever-increasing? If so, use it.

    Generally speaking, the primary key is a good clustering key. Does this table have a primary key? Use it.

    Hope this helps

    Gianluca

    -- Gianluca Sartori

  • Generally speaking, the primary key is a good clustering key. Does this table have a primary key? Use it.

    Generally speaking, I'ld say this is not a good advice. I find all sorts of primary keys, including composit primary keys and uniqueidentifier (guid). Neither are good candidates for a clustering key. You cannot assume that an identity column (or other ever-increasing int) is the primary key.



    Ole Kristian VelstadbrÄten BangÄs - Virinco - Facebook - Twitter

    Concatenating Row Values in Transact-SQL[/url]

  • okbangas (9/23/2011)


    Generally speaking, the primary key is a good clustering key. Does this table have a primary key? Use it.

    Generally speaking, I'ld say this is not a good advice. I find all sorts of primary keys, including composit primary keys and uniqueidentifier (guid). Neither are good candidates for a clustering key. You cannot assume that an identity column (or other ever-increasing int) is the primary key.

    A composite key can be a good clustering key.

    Also a uniqueidentifier can be a good clustering key.

    Neither are perfect, but both can be good.

    -- Gianluca Sartori

  • Hello Gianluca Sartori,

    Thanks for the reply

    The name table-1 is just a dummy name, it's not the real name of the table

    We are using Nolocks because in my application select, Insert, delete and Updates happens contineously and some times simultaniously too and the business is okay with, not getting the latest data when we use NOLOCKS

    The symbol $ is not recommneded by me and this application is going into production soon and I guess I can't remove $ symbol at this point of time

    That id$ is not the identity column that we normaly use. That is a system generated integer and is always unique and here is the sample data in that id$ column

    (3888088106,691956266,9855616042,10318775338,11919868970,11987993642,13390480426,27798766634,31674041386)

    That id$ is the only column which has the unique data and the other columns has some duplicate data.

    Thats where I am confused, ID$ column has clustered index and it is the only column which has the unique data but that id$ column is used in WHERE clause as (where id$ = "some data") but any column which has the where condition as (where id$ = "some data") it is good if it has nonclustered index on it.

    But it has clustered index. thats where I am getting confusion.

    Please advice

  • Gianluca Sartori (9/23/2011)


    Also a uniqueidentifier can be a good clustering key.

    Can be, maybe, if the person designing the table understands the complexities. Usually however it's about the worst clustered index possible (wideish and completely random)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • okbangas (9/23/2011)


    Generally speaking, the primary key is a good clustering key. Does this table have a primary key? Use it.

    Generally speaking, I'ld say this is not a good advice. I find all sorts of primary keys, including composit primary keys and uniqueidentifier (guid). Neither are good candidates for a clustering key. You cannot assume that an identity column (or other ever-increasing int) is the primary key.

    Totally agree!

  • The best clustered index is really dependent on the data and the access paths to it. For instance, if you have a datatime column in your database that is frequently used in range queries, you may find this to be a better candidate for a clustered index than a unique or primary key.

    Since we don't know your application telling you which column (or columns) would make a better clustered index is nothing more than guess work. You need to look at your data and the queries that access the table, and the frequency in which those queries are run to make a good decision on the clustered index.

  • Hi Gila,

    Thanks for the reply.

    Here is the sample Insert statement of the XYZ table:

    INSERT [dbo].[XYZ] ([cacheId], [TrackingNumber], [FormID], [CCIPortID], [OriginStationID], [DestinationStationID], [DestinationZipCode], [CarrierOID], [SentToSPSS], [ProcessedBySPSS], [SpecialSorted], [SepUID], [UpdatedTrackingIndex], [OkToSendToSorter], [RouteID], [SentToSorterCount], [time_created$], [time_last_modified$], [parent$_id$], [id$], [extId$], [state$])

    VALUES (CAST(13 AS Numeric(10, 0)), N'868605045285', N'868605045285', N'CCII-ALL', N'CVRKK', N'MAZA', N'00604', N'2944556900', 0, 0, NULL, N'2455815000', 1, 1, NULL, CAST(0 AS Numeric(10, 0)), CAST(0x00009F660168CC26 AS DateTime), CAST(0x00009F660168CC26 AS DateTime), CAST(3888071673 AS Numeric(19, 0)), CAST(3888088106 AS Numeric(19, 0)), NULL, N'C')

    INSERT [dbo].[XYZ] ([cacheId], [TrackingNumber], [FormID], [CCIPortID], [OriginStationID], [DestinationStationID], [DestinationZipCode], [CarrierOID], [SentToSPSS], [ProcessedBySPSS], [SpecialSorted], [SepUID], [UpdatedTrackingIndex], [OkToSendToSorter], [RouteID], [SentToSorterCount], [time_created$], [time_last_modified$], [parent$_id$], [id$], [extId$], [state$])

    VALUES (CAST(93 AS Numeric(10, 0)), N'873270570640', N'873270570640', N'CCII-ALL', N'HRLA', N'QXZA', N'4061', N'2944555749', 0, 0, NULL, N'2455815000', 1, 1, NULL, CAST(0 AS Numeric(10, 0)), CAST(0x00009F6600B5D89E AS DateTime), CAST(0x00009F6600B5D89E AS DateTime), CAST(9691939833 AS Numeric(19, 0)), CAST(9691956266 AS Numeric(19, 0)), NULL, N'C')

    INSERT [dbo].[XYZ] ([cacheId], [TrackingNumber], [FormID], [CCIPortID], [OriginStationID], [DestinationStationID], [DestinationZipCode], [CarrierOID], [SentToSPSS], [ProcessedBySPSS], [SpecialSorted], [SepUID], [UpdatedTrackingIndex], [OkToSendToSorter], [RouteID], [SentToSorterCount], [time_created$], [time_last_modified$], [parent$_id$], [id$], [extId$], [state$])

    VALUES (CAST(91 AS Numeric(10, 0)), N'871711250160', N'871711250160', N'CCII-ALL', N'HRLA', N'QXZA', N'4061', N'2944555745', 0, 0, NULL, N'2455815000', 1, 1, NULL, CAST(0 AS Numeric(10, 0)), CAST(0x00009F6600B5D854 AS DateTime), CAST(0x00009F6600B5D854 AS DateTime), CAST(9855599609 AS Numeric(19, 0)), CAST(9855616042 AS Numeric(19, 0)), NULL, N'C')

    So, considering the DML statements which column is better for clustered index. As I mentioned except the id$ column, rest of the columns has duplicates but this ID$ column is used in where condition.

    Kindly advice. I am seeing frequest fragmentation of indexes on the XYZ tables and because of this deadlocks are occuring in the application.

    Please advice.

    Thanks a TON

  • As I said above, we really can't tell you what would be the best column (or columns) for you to use for a clustered index. You have to make that decision based on your analysis of the data, the different access paths (queries run against the data), the frequency in which each of these queries are run.

    The best clustered index may or may not be a unique or primary key. It may be one of the datetime columns because of frequent range searchs on that column.

  • mpalaparthi (9/23/2011)


    Kindly advice. I am seeing frequest fragmentation of indexes on the XYZ tables and because of this deadlocks are occuring in the application.

    Fragmentation is not going to be the cause of the deadlocks. Poorly written queries or poor indexing is the likely cause.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Lynn Pettis (9/23/2011)


    The best clustered index is really dependent on the data and the access paths to it. For instance, if you have a datatime column in your database that is frequently used in range queries, you may find this to be a better candidate for a clustered index than a unique or primary key.

    Funny you mention it. Datetime can be ever inceeasing, bit rarely unique, causing sql server to add a four byte identifier to each row. Om such a scenario, it may actually be benedicial to have a unique composite clustering key, the datetime and an identity column. The latter may be used as a primay key as well 🙂



    Ole Kristian VelstadbrÄten BangÄs - Virinco - Facebook - Twitter

    Concatenating Row Values in Transact-SQL[/url]

  • okbangas (9/23/2011)


    Funny you mention it. Datetime can be ever inceeasing, bit rarely unique, causing sql server to add a four byte identifier to each row. Om such a scenario, it may actually be benedicial to have a unique composite clustering key, the datetime and an identity column. The latter may be used as a primay key as well 🙂

    So you'd add a 4-byte integer to every row to avoid having SQL add a 4-byte integer to only the duplicate rows?

    The uniquifier's only added to rows with a duplicate value of the clustering key, not to every row.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Trying to Keep it short when writing on the phone. The uniqueifier cause the table to require a variable length block as well. So, in sone cases it is beneficial, yes.



    Ole Kristian VelstadbrÄten BangÄs - Virinco - Facebook - Twitter

    Concatenating Row Values in Transact-SQL[/url]

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply