Can a Table Really Have a Clustered Index?

  • hakim.ali

    SSCarpal Tunnel

    Points: 4237

    Neha05 (12/24/2012)


    Good explanation!

    Thank you.

    Hakim Ali
    www.sqlzen.com

  • hakim.ali

    SSCarpal Tunnel

    Points: 4237

    roger.plowman (12/24/2012)


    I always liked the library analogy.

    The clustered index is how the books are shelved (usually by Dewey Decimal number), the non-clustered indexes are the index cards in the card catalogs.

    Of course this probably tells you I'm old... (laughing)

    Extending this, covering indexes are having the information you're looking for in the card catalog so you don't have to go looking on the shelf. 😀

    I like this analogy a lot. Thanks.

    Hakim Ali
    www.sqlzen.com

  • hakim.ali

    SSCarpal Tunnel

    Points: 4237

    patrickmcginnis59 (12/24/2012)


    Let's assume there is no clustered index on this table. This makes it a heap, meaning the rows are not physically written to disk in any particular order. When you search for a record in such a table via a select statement, the database must look at each record in the table. This is called a table scan. Not a very good way of doing things, because it is bad for performance.

    If you have a nonclustered index on the table and the select statement had criteria as such that it could use the nonclustered index, does SQL still have to do a table scan? Seems to me it would do a lookup on the index then bookmark lookup for the matching rows in the heap. I do realise that bookmark lookups are more expensive than using a clustered index.

    Yes, you are right. I overlooked this fact: a covering nonclustered index will not require a table scan. I was going with the assumption that the table had neither a clustered nor a nonclustered index.

    Hakim Ali
    www.sqlzen.com

  • hakim.ali

    SSCarpal Tunnel

    Points: 4237

    roger.plowman (12/24/2012)


    Mr. hoek (12/24/2012)


    It would be nice that MSSQL tables would 'be' clustered or non clustered, but in fact they 'are' heaps with an optional 'clustered index' added to then.

    Removing the clustered index, does not remove the data 😉

    Actually, non-heaps *are* clustered indexes. Creating a clustered index physically rearranges the table into order by the index, and keeps it in that physical order as records are added and deleted. Removing a clustered index merely stops SQL server from keeping the table in order, but doesn't bother to rearrange the table randomly again. 🙂

    Thanks, couldn't have said it better myself.

    Hakim Ali
    www.sqlzen.com

  • hakim.ali

    SSCarpal Tunnel

    Points: 4237

    dfortier (12/24/2012)


    Finally, an explanation that does not require mental gymnastics and reading between the lines. Thanks for taking the time.

    Gracias.

    Hakim Ali
    www.sqlzen.com

  • hakim.ali

    SSCarpal Tunnel

    Points: 4237

    Phil Parkin (12/24/2012)


    Nice article, but rather than fields and records, I think you really mean columns and rows - there is a difference.

    Thank you for pointing that out. Another reason to love this site/forum: no matter how well you think you understand something, there's always something more to learn and better yourself. Seriously, thanks.

    Hakim Ali
    www.sqlzen.com

  • hakim.ali

    SSCarpal Tunnel

    Points: 4237

    annjunk (12/24/2012)


    patrickmcginnis59 (12/24/2012)


    Let's assume there is no clustered index on this table. This makes it a heap, meaning the rows are not physically written to disk in any particular order. When you search for a record in such a table via a select statement, the database must look at each record in the table. This is called a table scan. Not a very good way of doing things, because it is bad for performance.

    If you have a nonclustered index on the table and the select statement had criteria as such that it could use the nonclustered index, does SQL still have to do a table scan? Seems to me it would do a lookup on the index then bookmark lookup for the matching rows in the heap. I do realise that bookmark lookups are more expensive than using a clustered index.

    If a non-clustered index covers a query (meaning everything in your select and where are included somehow in the nc index) SQL will not do a table scan if it decides the index is the optimal way to look at the query.

    For example, if I have a nc index built on the area code field and I have included the first name, last name, and phone number columns in the index. I then run the following query, there is a very good chance that SQL will choose to use my nc index and never touch the table on a heap.

    SELECT Firstname, Lastname, phone FROM phonebook WHERE areacode = '412';

    Agreed. Thanks for helping explain that.

    Hakim Ali
    www.sqlzen.com

  • ljwitner

    Valued Member

    Points: 73

    The best post I have read in awhile! I hope you will contribute more. I will look for you on the net!

  • hakim.ali

    SSCarpal Tunnel

    Points: 4237

    Mr. hoek (12/24/2012)


    But why does the table allocation 'double' when adding a clustered index?

    The only explanation I can think of, is that SQL server has the information 'twice' in the database:

    - ones as pure 'table' data (all records/fields)

    - ones as sorted 'index' data (all records/fields)

    I'm not sure what you mean by table allocation doubling. I am sure that the database would not duplicate all data as 'pure' data and 'indexed' data. The clustered index causes re-arranging, not copying. A nonclustered index, however, will create a copy of the indexed fields along with pointers etc.

    Hakim Ali
    www.sqlzen.com

  • annjunk

    Old Hand

    Points: 335

    Mr. hoek (12/24/2012)


    But why does the table allocation 'double' when adding a clustered index?

    The only explanation I can think of, is that SQL server has the information 'twice' in the database:

    - ones as pure 'table' data (all records/fields)

    - ones as sorted 'index' data (all records/fields)

    If it is doubling, there may be something else going on, or you have an extremely large clustering key (the column or columns that you choose to index on).

    To test this, I used the SELECT INTO statement to create a new table with no primary or foreign keys and no indexes. Here is the size info

    name rows reserved data index_size unused

    MyFactInternetSales 60398 10568 KB 10504 KB 8 KB 56 KB

    I'm not sure why there is 8k of indexes on a brand new table with no indexes.

    I then added a clustered index on the productkey field (standard int field) and here are the results

    MyFactInternetSales 60398 10264 KB 10072 KB 80 KB 112 KB

    If you simply remove your index, you don't get all of the space back, but you do get some back. I dropped and recreated the table to make it clean, and then built a clustered index using several columns.

    name rows reserved data index_size unused

    MyFactInternetSales 60398 10072 KB 9864 KB 112 KB 96 KB

    The decreased data size tells me that the server is not storing the full key value in each data row, but rather it only considers it part of the index.

    I am using sp_spaceused to get this info. Once an index is fragmented and has other things going on, it will grow. Also, the server needs twice the size of a clustered index to be free to be able to rebuild it.

    I hope this helps.

  • hakim.ali

    SSCarpal Tunnel

    Points: 4237

    annjunk (12/24/2012)


    I wonder if this over simplification will end up confusing people more in the end. Both clustered and non-clustered indexes have what are called non-leaf-level pages that can help the server in searches.

    Like the person's library analogy earlier, when you walk into the library and know the ID on your book, you can start walking and reading the numbers on the ends of the shelves and only look for your book on the correct shelf. These labels on the ends of the book cases are like the non-leaf level pages of a clustered index.

    This is also why a clustered index is about 105% of the size of your table. There is something more added to the table, not simply the reorganizing of the data.

    Yes, I am aware of the non-leaf nodes. I didn't want to go into those details here, because I intended this to be a basic/introductory level explanation.

    Hakim Ali
    www.sqlzen.com

  • hakim.ali

    SSCarpal Tunnel

    Points: 4237

    ljwitner (12/24/2012)


    The best post I have read in awhile! I hope you will contribute more. I will look for you on the net!

    Thanks much, you just made my day!

    Hakim Ali
    www.sqlzen.com

  • DaleB

    SSC-Addicted

    Points: 408

    I find this hard to believe. If I have a table with 50 million rows, is SQL really going to move all those down one space to insert one row at the top?

  • patrickmcginnis59

    SSCertifiable

    Points: 6483

    DaleB (12/24/2012)


    I find this hard to believe. If I have a table with 50 million rows, is SQL really going to move all those down one space to insert one row at the top?

    While this is probably not exactly how SQL server behaves, balanced tree indexes don't really have to shift all the rows around to put a new one at the logical beginning: http://en.wikipedia.org/wiki/B-tree

  • hakim.ali

    SSCarpal Tunnel

    Points: 4237

    DaleB (12/24/2012)


    I find this hard to believe. If I have a table with 50 million rows, is SQL really going to move all those down one space to insert one row at the top?

    SQL Server stores data in pages. How full a page is before a new page is added depends on the fill factor. For a well designed clustered index, this should probably be 100%. New records should generally be added at the bottom of the clustered index sort order. However, if you do add a new record that goes somewhere at the top, and it cannot be accommodated in an existing page because it is full, then SQL Server will split the page where the record should go. The bottom half of the existing records on that page are moved to a new page, and then the new record is written on either the old page or the new page depending on configuration. The new page has links to the old page and the next index page, so it is in logical order but not in contiguous physical order. So yes, the database does not move all 50 million rows, only a few at the top. This does, however, cause index fragmentation, a good reason to design your clustered index to only insert at the bottom.

    Hakim Ali
    www.sqlzen.com

Viewing 15 posts - 16 through 30 (of 54 total)

You must be logged in to reply to this topic. Login to reply