Changing the clustered index

  • Hello,

    After reading some comments here I decided to look at tables to see if any had a clustered index that was a unique identifier. Yep. So if I have a table with a unique identifier as the primary key/clustered index and an identity column that is indexed, I would like to make the identity a clustered index (maybe even the primary key) and make the unique identifier a unique non-clustered index (not the primary key).

    Does this sound reasonable?

    If I do this will I need to drop and recreate the other indexes? Or maybe just rebuild the other indexes?

    Currently:

    CREATE TABLE Payments (

    IDX INT IDENTITY(1,1) NOT NULL,

    GUID UNIQUEIDENTIFIER NOT NULL DEFAULT(NEWID()),

    ..... -- many other columns

    );

    GO

    ALTER TABLE [dbo].[PAYMENTS] ADD CONSTRAINT [PK_PAYMENTS_GID] PRIMARY KEY CLUSTERED ([GUID] ASC);

    GO

    CREATE NONCLUSTERED INDEX [IX_Payments_ID] ON [dbo].[PAYMENTS] ([IDX] ASC);

    GO

    Would like:

    ALTER TABLE [dbo].[PAYMENTS] ADD CONSTRAINT [PK_PAYMENTS_IDX] PRIMARY KEY CLUSTERED (IDX ASC);

    GO

    CREATE UNIQUE NONCLUSTERED INDEX [IX_Payments_GUID] ON [dbo].[PAYMENTS] (GUID ASC);

    GO

    Thank you

  • Arghhhhhhhhhhhhh! :crazy: ...

    Did I read well? A GUID as a Clustered Index? Yes! Eliminate that if you can!

    Take a look on this article[/url] from Kimberly.

  • Before making such a critical choice, review SQL's missing index and index usage stats. Odds are, you have a better choice than an identity column for the clustering index key. That is, more than 50% of the time, overall performance is better clustering on something other than just a meaningless identity value.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • I'm not a fan of GUIDs, but they are not inherently evil. Before you replace it, I'd find out why you have it. If there's no good reason, then by all means, rip it out. But be sure there's not a good purpose there.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • The only unique things in the table are IDX and GUID. Using a combination of columns it is still not possible to get a unique combination. Since this table has been around for about ten years, it is not normalized, but is used everywhere. 🙁

  • Grant Fritchey (2/27/2015)


    I'm not a fan of GUIDs, but they are not inherently evil. Before you replace it, I'd find out why you have it. If there's no good reason, then by all means, rip it out. But be sure there's not a good purpose there.

    I will keep the GUID column but just change from primary/clustered to unique non-clustered index.

    The other option is to change the clustered index and keep the GUID the primary key.

  • djj (2/27/2015)


    The only unique things in the table are IDX and GUID. Using a combination of columns it is still not possible to get a unique combination. Since this table has been around for about ten years, it is not normalized, but is used everywhere. 🙁

    Your clustered index key(s) doesn't(don't) have to be unique by itself(themselves); SQL will take care of that if it needs to.

    And that's a secondary concern. The most critical performance factor is that you get the best clustered index for that table.

    This code will show you the unused index stats and the index usage stats for the table(s).

    --USE [<your_db_name_here>] --naturally make sure you are in the right db

    SET DEADLOCK_PRIORITY LOW --probably irrelevant, but just in case

    DECLARE @list_missing_indexes bit

    DECLARE @table_name_pattern sysname

    --NOTE: showing missing indexes can take some time; set to 0 if you don't want to wait.

    SET @list_missing_indexes = 1

    SET @table_name_pattern = '%' --<<-- !your table name/pattern goes here!

    --SET @table_name_pattern = '%'

    PRINT 'Started @ ' + CONVERT(varchar(30), GETDATE(), 120)

    IF @list_missing_indexes = 1

    BEGIN

    SELECT

    GETDATE() AS capture_date,

    DB_NAME(mid.database_id) AS Db_Name,

    dps.row_count,

    OBJECT_NAME(mid.object_id /*, mid.database_id*/) AS Table_Name,

    mid.equality_columns, mid.inequality_columns, mid.included_columns,

    user_seeks, user_scans, ca1.max_days_active, unique_compiles,

    last_user_seek, last_user_scan, avg_total_user_cost, avg_user_impact,

    system_seeks, system_scans, last_system_seek, last_system_scan, avg_total_system_cost, avg_system_impact,

    mid.statement, mid.object_id, mid.index_handle

    FROM sys.dm_db_missing_index_details mid WITH (NOLOCK)

    CROSS APPLY (

    SELECT DATEDIFF(DAY, create_date, GETDATE()) AS max_days_active FROM sys.databases WHERE name = 'tempdb'

    ) AS ca1

    LEFT OUTER JOIN sys.dm_db_missing_index_groups mig WITH (NOLOCK) ON

    mig.index_handle = mid.index_handle

    LEFT OUTER JOIN sys.dm_db_missing_index_group_stats migs WITH (NOLOCK) ON

    migs.group_handle = mig.index_group_handle

    LEFT OUTER JOIN sys.dm_db_partition_stats dps WITH (NOLOCK) ON

    dps.object_id = mid.object_id AND

    dps.index_id IN (0, 1)

    --order by

    --DB_NAME, Table_Name, equality_columns

    WHERE

    1 = 1

    AND mid.database_id = DB_ID() --only current db

    AND OBJECT_NAME(mid.object_id) LIKE @table_name_pattern

    --AND mid.object_id IN (OBJECT_ID('<table_name_1>'), OBJECT_ID('<table_name_2>'))

    ORDER BY

    --avg_total_user_cost * (user_seeks + user_scans) DESC,

    Db_Name, Table_Name, equality_columns, inequality_columns

    END --IF

    PRINT 'Midpoint @ ' + CONVERT(varchar(30), GETDATE(), 120)

    -- list index usage stats (seeks, scans, etc.)

    SELECT

    ius2.row_num, DB_NAME() AS db_name,

    CASE WHEN i.name LIKE ca2.table_name + '%'

    THEN '~' + SUBSTRING(i.name, LEN(ca2.table_name) + 1, 200)

    ELSE i.name END AS index_name,

    CASE WHEN i.is_unique = 0 THEN 'N' ELSE 'Y' END + '.' +

    CASE WHEN i.is_primary_key = 0 AND i.is_unique_constraint = 0 THEN 'N' ELSE 'Y' END AS [uniq?],

    ca2.table_name,

    i.index_id, --ius.user_seeks + ius.user_scans AS total_reads,

    dps.row_count,

    SUBSTRING(key_cols, 3, 8000) AS key_cols, SUBSTRING(nonkey_cols, 3, 8000) AS nonkey_cols,

    ius.user_seeks, ius.user_scans, ius.user_lookups, ius.user_updates,

    ius.last_user_seek, ius.last_user_scan, ius.last_user_lookup, ius.last_user_update,

    fk.Reference_Count AS fk_ref_count,

    DATEDIFF(DAY, CASE WHEN o.create_date > ca1.sql_startup_date THEN o.create_date

    ELSE ca1.sql_startup_date END, GETDATE()) AS max_days_active,

    FILEGROUP_NAME(i.data_space_id) AS filegroup_name,

    ius.system_seeks, ius.system_scans, ius.system_lookups, ius.system_updates,

    ius.last_system_seek, ius.last_system_scan, ius.last_system_lookup, ius.last_system_update

    FROM sys.indexes i WITH (NOLOCK)

    INNER JOIN sys.objects o WITH (NOLOCK) ON

    o.object_id = i.object_id

    CROSS JOIN (

    SELECT create_date AS sql_startup_date FROM sys.databases WHERE name = 'tempdb'

    ) AS ca1

    CROSS APPLY (

    SELECT OBJECT_NAME(i.object_id/*, DB_ID()*/) AS table_name

    ) AS ca2

    OUTER APPLY (

    SELECT

    ', ' + COL_NAME(object_id, ic.column_id)

    FROM sys.index_columns ic

    WHERE

    ic.key_ordinal > 0 AND

    ic.object_id = i.object_id AND

    ic.index_id = i.index_id

    ORDER BY

    ic.key_ordinal

    FOR XML PATH('')

    ) AS key_cols (key_cols)

    OUTER APPLY (

    SELECT

    ', ' + COL_NAME(object_id, ic.column_id)

    FROM sys.index_columns ic

    WHERE

    ic.key_ordinal = 0 AND

    ic.object_id = i.object_id AND

    ic.index_id = i.index_id

    ORDER BY

    COL_NAME(object_id, ic.column_id)

    FOR XML PATH('')

    ) AS nonkey_cols (nonkey_cols)

    LEFT OUTER JOIN sys.dm_db_partition_stats dps WITH (NOLOCK) ON

    dps.object_id = i.object_id AND

    dps.index_id = i.index_id

    LEFT OUTER JOIN sys.dm_db_index_usage_stats ius WITH (NOLOCK) ON

    ius.database_id = DB_ID() AND

    ius.object_id = i.object_id AND

    ius.index_id = i.index_id

    LEFT OUTER JOIN (

    SELECT

    database_id, object_id, MAX(user_scans) AS user_scans,

    ROW_NUMBER() OVER (ORDER BY MAX(user_scans) DESC) AS row_num --user_scans|user_seeks+user_scans

    FROM sys.dm_db_index_usage_stats WITH (NOLOCK)

    WHERE

    database_id = DB_ID()

    --AND index_id > 0

    GROUP BY

    database_id, object_id

    ) AS ius2 ON

    ius2.database_id = DB_ID() AND

    ius2.object_id = i.object_id

    LEFT OUTER JOIN (

    SELECT

    referenced_object_id, COUNT(*) AS Reference_Count

    FROM sys.foreign_keys

    WHERE

    is_disabled = 0

    GROUP BY

    referenced_object_id

    ) AS fk ON

    fk.referenced_object_id = i.object_id

    WHERE

    i.object_id > 100 AND

    i.is_hypothetical = 0 AND

    i.type IN (0, 1, 2) AND

    o.type NOT IN ( 'IF', 'IT', 'TF', 'TT' ) AND

    (

    o.name LIKE @table_name_pattern AND

    o.name NOT LIKE 'dtprop%' AND

    o.name NOT LIKE 'filestream[_]' AND

    o.name NOT LIKE 'MSpeer%' AND

    o.name NOT LIKE 'MSpub%' AND

    --o.name NOT LIKE 'queue[_]%' AND

    o.name NOT LIKE 'sys%'

    )

    --AND OBJECT_NAME(i.object_id /*, DB_ID()*/) IN ('tbl1', 'tbl2', 'tbl3')

    ORDER BY

    --row_count DESC,

    --ius.user_scans DESC,

    --ius2.row_num, --user_scans&|user_seeks

    db_name, table_name,

    -- list clustered index first, if any, then other index(es)

    CASE WHEN i.index_id IN (0, 1) THEN 1 ELSE 2 END,

    key_cols

    PRINT 'Ended @ ' + CONVERT(varchar(30), GETDATE(), 120)

    SET DEADLOCK_PRIORITY NORMAL

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • djj (2/27/2015)


    Grant Fritchey (2/27/2015)


    I'm not a fan of GUIDs, but they are not inherently evil. Before you replace it, I'd find out why you have it. If there's no good reason, then by all means, rip it out. But be sure there's not a good purpose there.

    I will keep the GUID column but just change from primary/clustered to unique non-clustered index.

    The other option is to change the clustered index and keep the GUID the primary key.

    My initial comment was about that GUID being a Cluster Index, not dropping the GUID, if your app really uses it. A GUID as a Cluster Index is bad! The fragmentation will kill you.

    If that GUID should be a unique NCI or PK, it's up to you. They internally act pretty much the same, except that primary key column cannot be nullable, unique index column can, I think. But for some third party apps and if you are trying to adhere to the relational model, a PK should be deployed in SQL as an actual PK.

  • Thanks everyone.

    Scott, I will try your code later as I am getting ready to leave. 😀

  • ScottPletcher (2/27/2015)


    Your clustered index key(s) doesn't(don't) have to be unique by itself(themselves); SQL will take care of that if it needs to.

    ... at possibly quite some cost if the clustered index isn't unique thanks to the "uniquefier" column that will appear for non-unique rows.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (2/27/2015)


    ScottPletcher (2/27/2015)


    Your clustered index key(s) doesn't(don't) have to be unique by itself(themselves); SQL will take care of that if it needs to.

    ... at possibly quite some cost if the clustered index isn't unique thanks to the "uniquefier" column that will appear for non-unique rows.

    Or at virtually no cost, depending on what % of rows have dups, how many dups there are, and how many rows fit on a page. But, again, all those are secondary concerns compared to the primary concern of getting the best clustering index key(s).

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher (2/27/2015)


    Jeff Moden (2/27/2015)


    ScottPletcher (2/27/2015)


    Your clustered index key(s) doesn't(don't) have to be unique by itself(themselves); SQL will take care of that if it needs to.

    ... at possibly quite some cost if the clustered index isn't unique thanks to the "uniquefier" column that will appear for non-unique rows.

    Or at virtually no cost, depending on what % of rows have dups, how many dups there are, and how many rows fit on a page. But, again, all those are secondary concerns compared to the primary concern of getting the best clustering index key(s).

    I disagree, Scott. Those should all be primary concerns for selecting what the best clustered index is. As you say, there may be no cost to using a non-unique clustered index... or there may be a huge cost that could actually be repaired fairly easily (using methods you hate, oddly enough) but, if you don't know, then any change is a shot in the dark especially on large tables.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (2/27/2015)


    ScottPletcher (2/27/2015)


    Jeff Moden (2/27/2015)


    ScottPletcher (2/27/2015)


    Your clustered index key(s) doesn't(don't) have to be unique by itself(themselves); SQL will take care of that if it needs to.

    ... at possibly quite some cost if the clustered index isn't unique thanks to the "uniquefier" column that will appear for non-unique rows.

    Or at virtually no cost, depending on what % of rows have dups, how many dups there are, and how many rows fit on a page. But, again, all those are secondary concerns compared to the primary concern of getting the best clustering index key(s).

    I disagree, Scott. Those should all be primary concerns for selecting what the best clustered index is. As you say, there may be no cost to using a non-unique clustered index... or there may be a huge cost that could actually be repaired fairly easily (using methods you hate, oddly enough) but, if you don't know, then any change is a shot in the dark especially on large tables.

    Don't be so obsessed with the INSERT process. Each row is Selected 100x or more for its 1 INSERT. Thus, I believe getting the best clustered index keys is first; you can deal with duplicates, if necessary, later, except perhaps on very large tables. Often avoiding gazillions of covering indexes will gain you much more than you lose by duplicate key values.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher (2/27/2015)


    Jeff Moden (2/27/2015)


    ScottPletcher (2/27/2015)


    Jeff Moden (2/27/2015)


    ScottPletcher (2/27/2015)


    Your clustered index key(s) doesn't(don't) have to be unique by itself(themselves); SQL will take care of that if it needs to.

    ... at possibly quite some cost if the clustered index isn't unique thanks to the "uniquefier" column that will appear for non-unique rows.

    Or at virtually no cost, depending on what % of rows have dups, how many dups there are, and how many rows fit on a page. But, again, all those are secondary concerns compared to the primary concern of getting the best clustering index key(s).

    I disagree, Scott. Those should all be primary concerns for selecting what the best clustered index is. As you say, there may be no cost to using a non-unique clustered index... or there may be a huge cost that could actually be repaired fairly easily (using methods you hate, oddly enough) but, if you don't know, then any change is a shot in the dark especially on large tables.

    Don't be so obsessed with the INSERT process. Each row is Selected 100x or more for its 1 INSERT. Thus, I believe getting the best clustered index keys is first; you can deal with duplicates, if necessary, later, except perhaps on very large tables. Often avoiding gazillions of covering indexes will gain you much more than you lose by duplicate key values.

    You say that but I've seen systems that have been instantly and absolutely paralyzed just by someone adding the wrong index. My personal example is that I single handedly paralyzed a part of Expedia.com for a short period just by adding a the wrong index that was more concerned with SELECT performance than INSERT performance.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (2/27/2015)


    ScottPletcher (2/27/2015)


    Jeff Moden (2/27/2015)


    ScottPletcher (2/27/2015)


    Jeff Moden (2/27/2015)


    ScottPletcher (2/27/2015)


    Your clustered index key(s) doesn't(don't) have to be unique by itself(themselves); SQL will take care of that if it needs to.

    ... at possibly quite some cost if the clustered index isn't unique thanks to the "uniquefier" column that will appear for non-unique rows.

    Or at virtually no cost, depending on what % of rows have dups, how many dups there are, and how many rows fit on a page. But, again, all those are secondary concerns compared to the primary concern of getting the best clustering index key(s).

    I disagree, Scott. Those should all be primary concerns for selecting what the best clustered index is. As you say, there may be no cost to using a non-unique clustered index... or there may be a huge cost that could actually be repaired fairly easily (using methods you hate, oddly enough) but, if you don't know, then any change is a shot in the dark especially on large tables.

    Don't be so obsessed with the INSERT process. Each row is Selected 100x or more for its 1 INSERT. Thus, I believe getting the best clustered index keys is first; you can deal with duplicates, if necessary, later, except perhaps on very large tables. Often avoiding gazillions of covering indexes will gain you much more than you lose by duplicate key values.

    You say that but I've seen systems that have been instantly and absolutely paralyzed just by someone adding the wrong index. My personal example is that I single handedly paralyzed a part of Expedia.com for a short period just by adding a the wrong index that was more concerned with SELECT performance than INSERT performance.

    Can't believe an index on a single key value that is explicitly specified in every user request could be "bad" or "wrong". Seems bizarre to me to consider clustering on anything else given that info going on, barring extremely unusual insert patterns, except based on a pre-determined obsession with using only "narrow, ever-increasing, ..." clustering key values.

    I've also not had an index based on careful analysis of missing index and index usage stats, and a few other stats I review, cause me such an issue.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Viewing 15 posts - 1 through 15 (of 26 total)

You must be logged in to reply to this topic. Login to reply