Can a Table Really Have a Clustered Index?

  • Neha05 (12/24/2012)


    Good explanation!

    Thank you.

    Hakim Ali
    www.sqlzen.com

  • roger.plowman (12/24/2012)


    I always liked the library analogy.

    The clustered index is how the books are shelved (usually by Dewey Decimal number), the non-clustered indexes are the index cards in the card catalogs.

    Of course this probably tells you I'm old... (laughing)

    Extending this, covering indexes are having the information you're looking for in the card catalog so you don't have to go looking on the shelf. 😀

    I like this analogy a lot. Thanks.

    Hakim Ali
    www.sqlzen.com

  • patrickmcginnis59 (12/24/2012)


    Let's assume there is no clustered index on this table. This makes it a heap, meaning the rows are not physically written to disk in any particular order. When you search for a record in such a table via a select statement, the database must look at each record in the table. This is called a table scan. Not a very good way of doing things, because it is bad for performance.

    If you have a nonclustered index on the table and the select statement had criteria as such that it could use the nonclustered index, does SQL still have to do a table scan? Seems to me it would do a lookup on the index then bookmark lookup for the matching rows in the heap. I do realise that bookmark lookups are more expensive than using a clustered index.

    Yes, you are right. I overlooked this fact: a covering nonclustered index will not require a table scan. I was going with the assumption that the table had neither a clustered nor a nonclustered index.

    Hakim Ali
    www.sqlzen.com

  • roger.plowman (12/24/2012)


    Mr. hoek (12/24/2012)


    It would be nice that MSSQL tables would 'be' clustered or non clustered, but in fact they 'are' heaps with an optional 'clustered index' added to then.

    Removing the clustered index, does not remove the data 😉

    Actually, non-heaps *are* clustered indexes. Creating a clustered index physically rearranges the table into order by the index, and keeps it in that physical order as records are added and deleted. Removing a clustered index merely stops SQL server from keeping the table in order, but doesn't bother to rearrange the table randomly again. 🙂

    Thanks, couldn't have said it better myself.

    Hakim Ali
    www.sqlzen.com

  • dfortier (12/24/2012)


    Finally, an explanation that does not require mental gymnastics and reading between the lines. Thanks for taking the time.

    Gracias.

    Hakim Ali
    www.sqlzen.com

  • Phil Parkin (12/24/2012)


    Nice article, but rather than fields and records, I think you really mean columns and rows - there is a difference.

    Thank you for pointing that out. Another reason to love this site/forum: no matter how well you think you understand something, there's always something more to learn and better yourself. Seriously, thanks.

    Hakim Ali
    www.sqlzen.com

  • annjunk (12/24/2012)


    patrickmcginnis59 (12/24/2012)


    Let's assume there is no clustered index on this table. This makes it a heap, meaning the rows are not physically written to disk in any particular order. When you search for a record in such a table via a select statement, the database must look at each record in the table. This is called a table scan. Not a very good way of doing things, because it is bad for performance.

    If you have a nonclustered index on the table and the select statement had criteria as such that it could use the nonclustered index, does SQL still have to do a table scan? Seems to me it would do a lookup on the index then bookmark lookup for the matching rows in the heap. I do realise that bookmark lookups are more expensive than using a clustered index.

    If a non-clustered index covers a query (meaning everything in your select and where are included somehow in the nc index) SQL will not do a table scan if it decides the index is the optimal way to look at the query.

    For example, if I have a nc index built on the area code field and I have included the first name, last name, and phone number columns in the index. I then run the following query, there is a very good chance that SQL will choose to use my nc index and never touch the table on a heap.

    SELECT Firstname, Lastname, phone FROM phonebook WHERE areacode = '412';

    Agreed. Thanks for helping explain that.

    Hakim Ali
    www.sqlzen.com

  • The best post I have read in awhile! I hope you will contribute more. I will look for you on the net!

  • Mr. hoek (12/24/2012)


    But why does the table allocation 'double' when adding a clustered index?

    The only explanation I can think of, is that SQL server has the information 'twice' in the database:

    - ones as pure 'table' data (all records/fields)

    - ones as sorted 'index' data (all records/fields)

    I'm not sure what you mean by table allocation doubling. I am sure that the database would not duplicate all data as 'pure' data and 'indexed' data. The clustered index causes re-arranging, not copying. A nonclustered index, however, will create a copy of the indexed fields along with pointers etc.

    Hakim Ali
    www.sqlzen.com

  • Mr. hoek (12/24/2012)


    But why does the table allocation 'double' when adding a clustered index?

    The only explanation I can think of, is that SQL server has the information 'twice' in the database:

    - ones as pure 'table' data (all records/fields)

    - ones as sorted 'index' data (all records/fields)

    If it is doubling, there may be something else going on, or you have an extremely large clustering key (the column or columns that you choose to index on).

    To test this, I used the SELECT INTO statement to create a new table with no primary or foreign keys and no indexes. Here is the size info

    name rows reserveddata index_sizeunused

    MyFactInternetSales60398 10568 KB10504 KB8 KB 56 KB

    I'm not sure why there is 8k of indexes on a brand new table with no indexes.

    I then added a clustered index on the productkey field (standard int field) and here are the results

    MyFactInternetSales60398 10264 KB10072 KB80 KB 112 KB

    If you simply remove your index, you don't get all of the space back, but you do get some back. I dropped and recreated the table to make it clean, and then built a clustered index using several columns.

    name rows reserveddata index_sizeunused

    MyFactInternetSales60398 10072 KB9864 KB112 KB96 KB

    The decreased data size tells me that the server is not storing the full key value in each data row, but rather it only considers it part of the index.

    I am using sp_spaceused to get this info. Once an index is fragmented and has other things going on, it will grow. Also, the server needs twice the size of a clustered index to be free to be able to rebuild it.

    I hope this helps.

  • annjunk (12/24/2012)


    I wonder if this over simplification will end up confusing people more in the end. Both clustered and non-clustered indexes have what are called non-leaf-level pages that can help the server in searches.

    Like the person's library analogy earlier, when you walk into the library and know the ID on your book, you can start walking and reading the numbers on the ends of the shelves and only look for your book on the correct shelf. These labels on the ends of the book cases are like the non-leaf level pages of a clustered index.

    This is also why a clustered index is about 105% of the size of your table. There is something more added to the table, not simply the reorganizing of the data.

    Yes, I am aware of the non-leaf nodes. I didn't want to go into those details here, because I intended this to be a basic/introductory level explanation.

    Hakim Ali
    www.sqlzen.com

  • ljwitner (12/24/2012)


    The best post I have read in awhile! I hope you will contribute more. I will look for you on the net!

    Thanks much, you just made my day!

    Hakim Ali
    www.sqlzen.com

  • I find this hard to believe. If I have a table with 50 million rows, is SQL really going to move all those down one space to insert one row at the top?

  • DaleB (12/24/2012)


    I find this hard to believe. If I have a table with 50 million rows, is SQL really going to move all those down one space to insert one row at the top?

    While this is probably not exactly how SQL server behaves, balanced tree indexes don't really have to shift all the rows around to put a new one at the logical beginning: http://en.wikipedia.org/wiki/B-tree

  • DaleB (12/24/2012)


    I find this hard to believe. If I have a table with 50 million rows, is SQL really going to move all those down one space to insert one row at the top?

    SQL Server stores data in pages. How full a page is before a new page is added depends on the fill factor. For a well designed clustered index, this should probably be 100%. New records should generally be added at the bottom of the clustered index sort order. However, if you do add a new record that goes somewhere at the top, and it cannot be accommodated in an existing page because it is full, then SQL Server will split the page where the record should go. The bottom half of the existing records on that page are moved to a new page, and then the new record is written on either the old page or the new page depending on configuration. The new page has links to the old page and the next index page, so it is in logical order but not in contiguous physical order. So yes, the database does not move all 50 million rows, only a few at the top. This does, however, cause index fragmentation, a good reason to design your clustered index to only insert at the bottom.

    Hakim Ali
    www.sqlzen.com

Viewing 15 posts - 16 through 30 (of 53 total)

You must be logged in to reply to this topic. Login to reply