Why Primary Keys can be important on large tables

  • Joe Fuller (6/30/2014)


    I'd like to ask what are everyone's thought on the use of non-clustered unique indexes.

    I've found these useful for the following reason.

    Our DBA's re-index on a weekly schedule and the default option is online=on and the sort option is not set for tempdb.

    This means that the free space required in the database needs to be roughly equivalent to the size of the biggest table with a clustered index.

    We have a databse of ~500GB and the biggest table (primary key clustered index) is 250GB hence we needed ~750GB just to cater for online re-index.

    We changed the index to non-clustered unique and saved ourselves ~220GB.

    What are the performance problems we may face with this approach?

    We can't tell you, as only you know how your Index, table and data are being accessed. Plus we don't even know your table's schema.

    The main advantage of having a PK as a CI, which is the default, is helping on range scans, when your result set needs to be ordered, or when the PK or column itself is ever increasing, like an identity. If your application follows most of those, having a PK that it is also a CI, will boost your performance.

  • Joe Fuller (6/30/2014)


    We changed the index to non-clustered unique and saved ourselves ~220GB.

    What are the performance problems we may face with this approach?

    Hi Joe,

    you compare apples with bananas 🙂

    In the moment you made the clustered index a non clustered index you only have the KEY-attributes in the index but not ALL attributes. A clustered index is the table itself in a fixed ordered structure (Key attribute(s)).

    Just give you an example which may demonstrate the differences

    USE demo;

    GO

    -- Create simple table

    IF OBJECT_ID('dbo.foo', 'U') IS NOT NULL

    DROP TABLE dbo.foo;

    GO

    CREATE TABLE dbo.foo

    (

    idintNOT NULL,

    c1char(200)NOT NULLDEFAULT ('filler'),

    c2char(200)NOT NULLDEFAULT ('another filler')

    );

    GO

    SET NOCOUNT ON

    GO

    -- Insert 1.000 records

    DECLARE@i int = 1;

    WHILE @i <= 1000

    BEGIN

    INSERT INTO dbo.foo (id) VALUES (@i);

    SET @i += 1;

    END

    GO

    -- Now create a clustered index on Id

    CREATE UNIQUE CLUSTERED INDEX ix_foo_id ON dbo.foo (ID);

    GO

    -- check the used datapages 411 bytes per index row(!!)

    SELECT SUM(page_count / 128.0) AS size_mb, SUM(record_count) AS num_records

    FROM sys.dm_db_index_physical_stats

    (

    db_id(),

    OBJECT_ID('dbo.foo', 'U'),

    NULL,

    NULL,

    'DETAILED'

    );

    GO

    -- Now drop the clustered index and make it a normal index

    DROP INDEX dbo.foo.ix_foo_id;

    GO

    CREATE UNIQUE NONCLUSTERED INDEX ix_foo_Id ON dbo.foo (Id);

    GO

    -- check again - now it is only 11 bytes per index row

    SELECT SUM(page_count / 128.0) AS size_mb, SUM(record_count) AS num_records

    FROM sys.dm_db_index_physical_stats

    (

    db_id(),

    OBJECT_ID('dbo.foo', 'U'),

    NULL,

    NULL,

    'DETAILED'

    )

    WHEREindex_id > 0;

    GO

    The above code will first create a table called dbo.foo and fill it with 1,000 records. Afterwards it will create a clustered index (is the table). Now the row size for the index is 4 + 200 + 200 + 7 bytes for the record structre = 411 bytes.

    If you run the first sys.dm_db_index_physical_stats you will get app. 0.5 MB as size for 1,053 records:

    1,000 records in the leaf

    53 records for the b-tree

    After the index has been recreated as a non clustered index the size is only ~5% of the first one because the size is 11 bytes. Due to the fact that 400 bytes (c1 and c2) has been thrown out the index is smaller plus on top the RID of the heap in the leaf level!

    So what is the problem here:

    1. An index seek in a SELECT * will cause automatically a key lookup because c1 and c2 are missing IN the index

    2. data will not be stored in a sorted order (because the table itself is a heap

    Indexes are quite complex structures and you should have a look into BOL or MSDN for further information 😉

    Microsoft Certified Master: SQL Server 2008
    MVP - Data Platform (2013 - ...)
    my blog: http://www.sqlmaster.de (german only!)

  • Hi Ben Kubicek,

    It's good Article. I have small Q. if we have table with IDENTITY column and NONCLUSTERED index. the table haven't any CLUSTERED index. and table have only INSERT & UPDATE there is no DELETE for that table. do we have any fragment issue? of-cause we have space issue. but i'm worry about fragmentation?

    Thanks

    Tharindu Dhaneenja.
    MCTS,MCITP(SQL Server),OCA(Oracle)
    http://www.databaseusergroup.com

  • dhaneenja-755935 (7/1/2014)


    Hi Ben Kubicek,

    It's good Article. I have small Q. if we have table with IDENTITY column and NONCLUSTERED index. the table haven't any CLUSTERED index. and table have only INSERT & UPDATE there is no DELETE for that table. do we have any fragment issue? of-cause we have space issue. but i'm worry about fragmentation?

    Thanks

    Hi Dhaneenja,

    no you will not have any fragmentation but may run into "forwarded records"

    If you don't have a clustered index than you have a HEAP!

    You may avoid forwarded records by usage of fixed column length!

    http://blogs.msdn.com/b/mssqlisv/archive/2006/12/01/knowing-about-forwarded-records-can-help-diagnose-hard-to-find-performance-issues.aspx.

    Microsoft Certified Master: SQL Server 2008
    MVP - Data Platform (2013 - ...)
    my blog: http://www.sqlmaster.de (german only!)

  • sql-lover (6/30/2014)


    We can't tell you, as only you know how your Index, table and data are being accessed. Plus we don't even know your table's schema.

    The main advantage of having a PK as a CI, which is the default, is helping on range scans, when your result set needs to be ordered, or when the PK or column itself is ever increasing, like an identity. If your application follows most of those, having a PK that it is also a CI, will boost your performance.

    SQL-Lover,

    if you love SQL Server than don't mix PK and CI. This is pure nonsence. You are trapping in the same pitfall as the autor of the article. A PK is a constraint and not an index! The CI is an index and not a constraint.

    Not the PK will boost the performance but an index itself!

    The only constraint I know which may boost the application is a CHECK-constraint and a FK-constraint but there could be more id don't know (currently 🙂 ).

    Microsoft Certified Master: SQL Server 2008
    MVP - Data Platform (2013 - ...)
    my blog: http://www.sqlmaster.de (german only!)

  • I think some may benefit from Thomas Kejser's fantastic blog explaining when to use a Heap vs Clustered index.

    http://kejser.org/clustered-indexes-vs-heaps

    To try to distill it down to a few points ( probably unsuccessfully)

    -high rate of data change ( ins/upd/del) benefits from clustered index. This is because when the seek for the row traverses the btree in the clustered index, the leaf page is the data. In a heap, you have to either traverse the forward pointer list, or if you have a non clustered index, the leaf is only a pointer to the page in the heap so further reads are required.

    -If you have a table that just does inserts , it can be beneficial to leave it as a heap and use a GUID (gasp, shock, horror!) to avoid a unique constraint , and place a non clustered index pointing to the random column. the heap wont become fragmented and the non clustered index leaf points to the data page in the heap. If you use a clustered index in this scenario:

    1. It will become horribly fragmented when using a random key.

    2. If you use an ascending key you lose scalability.

    3. The non clustered index scan points to the clustered index, so you have two index b trees to traverse.

    I highly advise reading his article as it should explain a lot of questions raised by this article.

  • Uwe Ricken (7/1/2014)


    sql-lover (6/30/2014)


    We can't tell you, as only you know how your Index, table and data are being accessed. Plus we don't even know your table's schema.

    The main advantage of having a PK as a CI, which is the default, is helping on range scans, when your result set needs to be ordered, or when the PK or column itself is ever increasing, like an identity. If your application follows most of those, having a PK that it is also a CI, will boost your performance.

    SQL-Lover,

    if you love SQL Server than don't mix PK and CI. This is pure nonsence. You are trapping in the same pitfall as the autor of the article. A PK is a constraint and not an index! The CI is an index and not a constraint.

    Not the PK will boost the performance but an index itself!

    The only constraint I know which may boost the application is a CHECK-constraint and a FK-constraint but there could be more id don't know (currently 🙂 ).

    utterly rubbish!

    There is no difference between Unique Index and Unique Constraint, besides the obvious syntax of course.

    And yes! Depending of your table's schema and application design, having a PK that is non clustered Index will boost performance, like insertions.

    I was explaining the guidelines, of picking a PK that is not a CI. Which can be the case in some databases. I did not say that a PK may boost the performance but the type of PK itself. Read again.

    So, one more time...

    The PK should be a CI if meet these requirements:

    -Help with range scans

    -Order by requirements in the queries

    -Its values are stable and values are not random

    If none of those are satisfied, then having a different CI on the table, which is not the existing PK column, may be better, depending of the table's schema and application design. Still a CI is needed, but having a NCI/PK may be better for performance.

  • @SQL-Lover

    Surely, neither a key nor a key constraint is an index, though in fact a primary key constraint or unique constraint will usually, in an RDBMS, be implemented by using an index. SQL Server will create an index on a key to enforce efficiently the uniqueness of the key (i.e. to support fast validation of the key constraint). These indexes also support use of the key in joins, and selection of single rows or ranges. SQL Server assigns this index the same name as the key constraint. Even SQL Server's metadata is careful to distinguish the PK or UNIQUE contraint from the index that is used to implement it. I know it seems rather picky to point this out but it helps to bear this in mind when understanding some of the details, and understand some differences between RDBMSs.

    Yes, in some circumstances, choosing a non-clustered index to enforce a primary key constraint will improve performance. I suspect that this is usually more due to the fact that you only get one clustered index per table and it might be more effectively used elsewhere than enforcing a primary key constraint!

    I'd always thought that the query optimiser will take key constraints and foreign key constraints into account in creating the candidate query plans from which it selects an effective plan, but I could be wrong!

    Best wishes,
    Phil Factor

  • sql-lover (7/1/2014)


    There is no difference between Unique Index and Unique Constraint, besides the obvious syntax of course.

    I would disagree. They are mostly the same,yes, but you can use the INCLUDE clause and/or the WHERE clause on a unique index. I have not had success doing that with a unique constraint.

  • bkubicek (6/29/2014)


    Comments posted to this topic are about the item <A HREF="/articles/Primary+key/110374/">Why Primary Keys can be important on large tables</A>

    In a nutshell: wrong title and wrong content. How did this one get past editorial review? Is this the standard we expect to see in a featured article? Perhaps it is.

  • sql-lover (7/1/2014)


    utterly rubbish!

    There is no difference between Unique Index and Unique Constraint, besides the obvious syntax of course.[/QUOTE]

    I don't know your knowledge but such an ignorance of ANSI defintions is "cool" 😀

    There is a HUGE difference but I think you seems to refuse to believe the reality

    I think it is worthless to explan in detail the differences BUT... read this from Joe Celko (especially the last sentence)

    https://www.simple-talk.com/sql/t-sql-programming/unique-constraints-in-sql/

    "The good news / bad news is that the UNIQUE constraints will have to create an index"

    Phil has described it in his post absolutely correct!

    sql-lover (7/1/2014)


    And yes! Depending of your table's schema and application design, having a PK that is non clustered Index will boost performance, like insertions.[/QUOTE]

    That's a complete wrong statement: Not the PK but the underlying index (clustered or nonclustered) may boost performance. But it will NEVER boost the performance when using DML-statements!

    If you have a heap the PK won't help you in any situation because you have the table itself (HEAP) and the non clustered index. SQL Server has to maintain 2 "indexes" instead of one!

    sql-lover (7/1/2014)


    I was explaining the guidelines, of picking a PK that is not a CI. Which can be the case in some databases. I did not say that a PK may boost the performance but the type of PK itself.[/QUOTE]

    You did explain it WRONG and... - the last sentences is conflicting with your previous statement! You said it may boost and now you say it will. So - what is your final statement?

    sql-lover (7/1/2014)


    Read again.

    So, one more time...[/QUOTE]

    No please - reading it several times will not make it true. It is wrong?

    sql-lover (7/1/2014)


    -Help with range scans[/QUOTE]

    No - it is not a benefit from the PK but from the index (whether clustered or not!)

    sql-lover (7/1/2014)


    -Order by requirements in the queries[/QUOTE]

    No - it is not a benefit from the PK but from the index (whether clustered or not!)

    sql-lover (7/1/2014)


    -Its values are stable and values are not random[/QUOTE]

    what benefit will have a PK here? Can you please explain in detail, what than will happen if SQL Server is using a PK?

    sql-lover (7/1/2014)


    If none of those are satisfied, then having a different CI on the table, which is not the existing PK column, may be better, depending of the table's schema and application design. Still a CI is needed, but having a NCI/PK may be better for performance.

    This is ludicrous - and again (if you want hear or not!) - you are mixing SQL paradigms and it seems to me that you don't understand them in detail.

    Keep in mind that a PK is only a constraint but has nothing to do with indexing by definition. A pk uses an index to enforce the constraints which attend to the defintion of a PK (unique and NOT NULL).

    Microsoft Certified Master: SQL Server 2008
    MVP - Data Platform (2013 - ...)
    my blog: http://www.sqlmaster.de (german only!)

  • Uwe Ricken (7/1/2014)


    sql-lover (7/1/2014)


    utterly rubbish!

    There is no difference between Unique Index and Unique Constraint, besides the obvious syntax of course.[/QUOTE]

    I don't know your knowledge but such an ignorance of ANSI defintions is "cool" 😀

    There is a HUGE difference but I think you seems to refuse to believe the reality

    I think it is worthless to explan in detail the differences BUT... read this from Joe Celko (especially the last sentence)

    https://www.simple-talk.com/sql/t-sql-programming/unique-constraints-in-sql/

    "The good news / bad news is that the UNIQUE constraints will have to create an index"

    Phil has described it in his post absolutely correct!

    sql-lover (7/1/2014)


    And yes! Depending of your table's schema and application design, having a PK that is non clustered Index will boost performance, like insertions.[/QUOTE]

    That's a complete wrong statement: Not the PK but the underlying index (clustered or nonclustered) may boost performance. But it will NEVER boost the performance when using DML-statements!

    If you have a heap the PK won't help you in any situation because you have the table itself (HEAP) and the non clustered index. SQL Server has to maintain 2 "indexes" instead of one!

    sql-lover (7/1/2014)


    I was explaining the guidelines, of picking a PK that is not a CI. Which can be the case in some databases. I did not say that a PK may boost the performance but the type of PK itself.[/QUOTE]

    You did explain it WRONG and... - the last sentences is conflicting with your previous statement! You said it may boost and now you say it will. So - what is your final statement?

    sql-lover (7/1/2014)


    Read again.

    So, one more time...[/QUOTE]

    No please - reading it several times will not make it true. It is wrong?

    sql-lover (7/1/2014)


    -Help with range scans[/QUOTE]

    No - it is not a benefit from the PK but from the index (whether clustered or not!)

    sql-lover (7/1/2014)


    -Order by requirements in the queries[/QUOTE]

    No - it is not a benefit from the PK but from the index (whether clustered or not!)

    sql-lover (7/1/2014)


    -Its values are stable and values are not random[/QUOTE]

    what benefit will have a PK here? Can you please explain in detail, what than will happen if SQL Server is using a PK?

    sql-lover (7/1/2014)


    If none of those are satisfied, then having a different CI on the table, which is not the existing PK column, may be better, depending of the table's schema and application design. Still a CI is needed, but having a NCI/PK may be better for performance.

    This is ludicrous - and again (if you want hear or not!) - you are mixing SQL paradigms and it seems to me that you don't understand them in detail.

    Keep in mind that a PK is only a constraint but has nothing to do with indexing by definition. A pk uses an index to enforce the constraints which attend to the defintion of a PK (unique and NOT NULL).

    I won't waste my time arguing with you. Your statements clearly shows you have zero working experience.

    Moreover, based on your posts on this thread, you only posts to criticize, show others you're always right (when you are not) , and make theoretical statements all the time.

    Do us a favor. Take your MCM test again. You wasted your money and time.

  • sql-lover (7/5/2014)

    I won't waste my time arguing with you. Your statements clearly shows you have zero working experience.[/QUOTE]

    OK - you got me 🙂

    sql-lover (7/5/2014)Moreover, based on your posts on this thread, you only posts to criticize, show others you're always right (when you are not) , and make theoretical statements all the time.

    You didn't answer my question:

    what benefit will have a PK here? Can you please explain in detail, what than will happen if SQL Server is using a PK?

    Instead of affronting you should be able to answer this depending on your experiences but try to avoid "theoretical" statements.

    So, what benefit will a PK have concerning the context you have described? I assume you have not read my posting complete.

    I'm not posting because of criticize people but if someone is giving wrong statements it maybe useful for others to point to the correct side.

    sql-lover (7/5/2014)Do us a favor. Take your MCM test again. You wasted your money and time.[/QUOTE]

    It is not possible because this program has retired. Here you err again 🙂

    And this has nothing to do with theoretical statements!

    Conclusion

    I try to reenforce my statements with theory and with practical examples. Your arguments (including the offence) seems to come to an end because you hedged arround but has not given valued answers which may convice me (or maybe others).

    Try it again - but without offence and clear examples - with sounded arguments and/or examples!

    If you cannot than let this thread go away into the endless of the internet 😉

    Microsoft Certified Master: SQL Server 2008
    MVP - Data Platform (2013 - ...)
    my blog: http://www.sqlmaster.de (german only!)

  • The posts are getting a bit contentious and downgrading to name calling. But I guess I'm complaining too, about some of the posts.

Viewing 14 posts - 16 through 28 (of 28 total)

You must be logged in to reply to this topic. Login to reply