clustered index

  • sandhyarao49

    SSCrazy

    Points: 2215

    why can we have only one clustered index per table.

  • Animal Magic

    SSChampion

    Points: 13964

    because a clustered index physically stores the records in that order, therefore as you can only store the data in one order you can only have one clustered index.

    Non clustered indexes store pointers in a seperate file location within the data file.

  • Ian Yates

    SSCoach

    Points: 19738

    You should read up on exactly what a clustered index is rather than a regular non-clustered index and the difference will make it obvious.

    Essentially a clustered index defines the physical ordering of the data in the table. Obviously the data in the table can only be ordered once - all other (non-clustered) indices store a subset of the columns in a certain order as well as storing the contents of each row as far as the clustered index is concerned. This allows the non-clustered index to do a "bookmark lookup" if need be to access the table/clustered index to retrieve other column values.

  • Key DBA

    SSCertifiable

    Points: 6029

    sandhyarao49,

    SQL Server 2005 Books Online (September 2007)

    CREATE INDEX (Transact-SQL)

    http://msdn2.microsoft.com/en-us/library/ms188783.aspx

    Clustered Index

    The bottom, or leaf, level of the clustered index contains the actual data rows of the table. A table or view is allowed one clustered index at a time. For more information, see Clustered Index Structures.

    For further reading and/or reference ...

    SQL Server 2005 Books Online (September 2007)

    Clustered Index Structures

    http://msdn2.microsoft.com/en-us/library/ms177443.aspx

    SQL Server 2005 Books Online (September 2007)

    Nonclustered Index Structures

    http://msdn2.microsoft.com/en-us/library/ms177484.aspx

    Happy T-SQLing,

    "Key"
    MCITP: DBA, MCSE, MCTS: SQL 2005, OCP

  • I cant let you do that Dave

    Right there with Babe

    Points: 783

    Why would you want to ?

    The primary clustering is designed to limit the number of index pages that need to change each time a row updates onto a new data page. In this scenario only the primary clustered index would change.

    Adding a second clustering index would force the system to update both of the 'clustering indexes' to point to the correct data page.

    If you are not trying to limit the amount of pages touched for Update you could investigate non-clustered indexes on a table without a clustering index (or a heap). As in this case the non-clustered indexes refer to the data pages and not the primary clustering index values. (and in this scenario both would be maintained on each row update onto a new page)

  • EdVassie

    SSC Guru

    Points: 60274

    A clustered index physically stores the data in the sequence given in the index definition. The way this is implemented in SQL Server means you can only have 1 clustered index per table.

    This is true in most other DBMSs, but not all of them. DB2 has supported multiple clustered indexes for some years (for *nix and Windows) and now also on the mainframe. You can define a large number (256?) cluster indexes on the same table. The data is stored only once, but is physically ordered by the sequence defined for each index.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

  • srienstr

    SSCrazy

    Points: 2410

    EdVassie (4/15/2008)


    A clustered index physically stores the data in the sequence given in the index definition. The way this is implemented in SQL Server means you can only have 1 clustered index per table.

    This is true in most other DBMSs, but not all of them. DB2 has supported multiple clustered indexes for some years (for *nix and Windows) and now also on the mainframe. You can define a large number (256?) cluster indexes on the same table. The data is stored only once, but is physically ordered by the sequence defined for each index.

    It is possible in SQL Server, you just need to do it by way of indexed views, which results in a second copy of the table. Given the nature of clustered indices, I really don't see any way to have multiple clustered indices without storing multiple copies of the data, as it's not clustered unless it's physically stored in that order.


    Puto me cogitare, ergo puto me esse.
    I think that I think, therefore I think that I am.

  • EdVassie

    SSC Guru

    Points: 60274

    I really don't see any way to have multiple clustered indices without storing multiple copies of the data, as it's not clustered unless it's physically stored in that order.

    In DB2, the data is stored once, and physically clustered in multiple dimensions. All it needs is a bit of lateral thinking. Consider the following...

    You have a collection of objects you want to cluster by shape, by size, and by colour.

    Store all the small round red things in one database extent "a".

    Store all the small square blue things in extent "b".

    Store all the large triangular blue things in extent "c".

    Create a new type of index that only knows about extents. Call it a Multiple Dimension Clustering type of index.

    Define an index for size. This has 3 entries: large, "c"; small, "a"; small, "b".

    Define an index for shape. This has 3 entries: round,"a"; square,"b"; triangular,"c"

    Define an index for colour. This has 3 entries: blue, "b"; blue, "c"; red,"a"

    You want all the blue things, the database gets you extents "b" and "c".

    You want all the small things, the database gets you extents "a" and "b"

    You want the blue square things, the database gets you extent "b"

    In each extent that is returned, ALL the rows match your WHERE clause.

    So storing a data item once only and physically clustering it in multiple dimensions can be done. You just dedicate a whole extent to store the intersection of all your clustering indexes.

    In DB2, you can set the extent size for each filegroup (called tablespace in DB2) so you can tune this value to minimise unused space in the extent. The DB2 MDC index pointers are the same size as normal RID pointers, but only the extent level information is populated, allowing normal and MDC indexes to be used in the same query with standard index AND and OR logic. This is a cool feature in DB2 but with weaknesses as well as strengths. It would be nice if SQL Server also had this technology - I am sure IBM would licence another one of its database patents to Microsoft for a suitable fee.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

  • srienstr

    SSCrazy

    Points: 2410

    This sounds like unclustered indexes on a grouped heap. Granted, grouping the heap is an interesting idea, but I wouldn't call it multiple clustered indices.


    Puto me cogitare, ergo puto me esse.
    I think that I think, therefore I think that I am.

  • EdVassie

    SSC Guru

    Points: 60274

    The description I gave about MDC was very simplified. If you want more details then you have to search the DB2 documentation.

    Believe me, MDC does what it says on the tin.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

  • srienstr

    SSCrazy

    Points: 2410

    This site (link) states that DB2 only guarantees that a clustered index is initially clustered, clustering is not maintained. While this saves time on inserts (no page splits), I'm glad SQL Server allows index fragmentation instead.

    Additionally, the MDC is primarily useful for low cardinality indices, which are not considered good candidates for a normal clustered index.

    I agree that there are situations where this option would be useful. Do you know if there's a request ticket on Connect for this feature?


    Puto me cogitare, ergo puto me esse.
    I think that I think, therefore I think that I am.

  • Ian Yates

    SSCoach

    Points: 19738

    The suggestion from another person about using indexed views for the purpose is probably the best. It's a separate copy of the data (with extras thrown in if needed) that's maintained automatically for you. You should ensure that the appropriate SET options (such as ANSI defaults, arithabort, etc) are enabled on your server.

    You could also create some non-clustered covering indices on your table although the size of an index in SQL Server is limited, but in 2005 you can "incldue" columns outside of the index key.

  • I cant let you do that Dave

    Right there with Babe

    Points: 783

    It would help to know what he thought multiple clustered indexes would help with ...

    I never would have guessed that the primary key did not effect the row order in data pages in a non clustered table.

    Up until this point I thought a index defrag on a non-clustered table reorged the data pages by primary key order. Stuck in the monotonically increasing primary key paradigm I guess. When would you want a primary key rather than a uniqueness constraint on a non clustered table ?

    I am curious as to the difference between DB2 as SQL servers index implementation.

    As the existing index pages fill and we get page splits are we saying DB2 and SQL server manage this differently ? I would have expected a new page to be allocated to the index linked between the two entries that previously bounded the new insert and the higher level index pages be updated recursively. Obviously I am missing something.

  • Gail Shaw

    SSC Guru

    Points: 1004484

    Primary key never affects the row order in data pages. It's the clustered index that does that. Now, by default the primary key is a clustered index, but that is not required.

    Why would you not want a primary key on a table with no clustered index?

    What SQL does when a page fills and needs to be split is to allocate a new page somewhere in the data file. Where is not important. It then adjusts the next and previous pointers of the original page so that the new one is linked in in the correct logical order. The physical order of the pages may well not correspond to the ogical order. That's fragmentation

    Let's say we have a table with 5 pages, with a clustered index (hence row order matters). (Under 8 pages, so mixed extents

    1st page - pageID 250, previous page pointer = null, next page pointer = 251

    2nd page - pageID 251, previous page pointer = 250, next page pointer = 252

    3rd page - pageID 252, previous page pointer = 251, next page pointer = 264

    4th page - pageID 264, previous page pointer = 252, next page pointer = 275

    5th page - pageID 275, previous page pointer = 264, next page pointer = null.

    Now, page 3 is full and needs, for whatever reason, to be split. SQL allocates a new page. Say everything below page 300 is used, so the new page is 301. Now the table looks like this.

    1st page - pageID 250, previous page pointer = null, next page pointer = 251

    2nd page - pageID 251, previous page pointer = 250, next page pointer = 252

    3rd page - pageID 252, previous page pointer = 251, next page pointer = 301

    4th page - pageID 301 (new), previous page pointer = 252, next page pointer = 264

    5th page - pageID 264, previous page pointer = 301, next page pointer = 275

    6th page - pageID 275, previous page pointer = 264, next page pointer = null.

    Only the pages before and after the new page have to change. It would take waaay too long to adjust all the pages in the table (imagine a table with a few hundred thousand pages)

    When you have a heap (no cluster) the order of rows has no meaning, and new rows are just added to the last page of the table.

    Does that help?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • EdVassie

    SSC Guru

    Points: 60274

    The approach to clustering is different between SQL Server and DB2.

    As most SQL people know, SQL Server will always physically insert a new row in the correct place in a cluster index. This is because the bottom-level leaf pages of a SQL Server cluster index are also the data pages.

    DB2 has 2 types of cluster index, both of which work differently to cluster indexes in SQL Server.

    The original DB2 cluster index (written in the 1970s) most closely compares to a SQL non-clustered index on a table with no cluster index. In other words, the bottom level of the cluster index just has RID pointers into the table. When a row is inserted DB2 will place it in the first available slot, with the index updated as normal. DB2 keeps a statistic called clustered% to show how close the actual physical order of rows is to the clustered index definition. Most DB2 DBAs would rebuild a cluster index when the cluster% drops below 95%. When a DB2 cluster index is rebuilt, the table rows are sorted into the sequence of the cluster index. Immediately after a DB2 cluster index is rebuilt, both the index and the data are in the same physical sequence. There are advantages and disadvantages of the DB2 approach compared to the SQL Server approach.

    DB2 Multiple Dimension Clustering (MDC) indexes work in a different way. See my previous post in this thread for an introduction to the MDC concept, which came into DB2 about 2001. When a new row is inserted in a table with a MDC index, it is stored in the correct extent for the key entry. In this way, a MDC index is self-maintaining, and a MDC index rebuild is only necessary when it is desired to reclaim space.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

Viewing 15 posts - 1 through 15 (of 34 total)

You must be logged in to reply to this topic. Login to reply