Index Query

  • I just wanted to clarify quick query about index. I always read the clustered index sorts the data in physical order and non-clustered index sorts the data in logical order. The leaf has pointers for non-clustered index and leaf has data for clustered. I am trying to understand how SQL works when we store the data without index and with index. I am assuming we are storing A to Z letters on disk and how SQL Server can store these 26 letters without Index and with Index. I tried to draw diagram to clarify my understanding. is it right?

    Please find attached picture.

  • If you truly want to understand indexes this is an excellent read. http://www.sqlservercentral.com/stairway/72399/


    SELECT quote FROM brain WHERE original = 1
    0 rows returned

  • There is also this: http://www.sqlservercentral.com/articles/Indexing/68439/

    Short answer: with no index(heap), the data is on pages/extents (8 pages). It's written as it comes in. Typically this means every 8 page extent of a heap is colocated on disk. If you allocate a few together, they're near each other.

    However, you don't worry about that unless they're really spread. You measure this as fragmentation.

    With a clustered index, the same thing can occur, but in this case, the order of the items on each page is in the order of the clustered index field. This means reads have things sorted, which can make queries faster.

    Non-clustered indexes are their own structure. If you only need the non clustered index, then you care about fragmentation, but that's it. If your query has to return to the clustered index or heap to get other fields, than the ordering may or may not matter. It can be faster, but if you're reading singletons, it doesn't matter if they are really located in some order.

  • dva2007 (2/26/2016)


    I just wanted to clarify quick query about index. I always read the clustered index sorts the data in physical order and non-clustered index sorts the data in logical order.

    That is already incorrect, and your post went downhill from there. Not trying to bash you, by the way - I understand that you have been given incorrect information, but with that starting point all next steps in your thought process are bound to fail.

    Both clustered and nonclustered indexes store data sorted. But not by laying out the data in order on the disk. Have you seen birthday garlands made of flags connected by a thread? In your mind, take one of those, number the flags sequentially from 1 upwards, then take the garland to the first floor and throw it out of the window on a windy day. The end result will be a pile of flags in semi-random order, but you can still follow the thread to "visit" the flags in sequential order.

    Both nonclustered and clustered indexes are like that. The data in those indexes is stored on 8K pages, and those pages can be anywhere on the disk - but every page also has pointers (the "thread" between the flags) to the two pages that hold the logically previous and next data.

    In addition, there are some additional pages that give quick access to points somewhere in the middle of the sequence. A picture illustrating this can be found at http://sqlity.net/en/2445/b-plus-tree/[/url] - but remember that the picture is the logical sequence, the physical pages can be anywhere on the disk and all arrows are implemented as pointers.

    Those extra pages are the same for clustered and nonclustered indexes. The difference between the two is in the leaf pages (the flags of the garland). For a clustered index, the flag contains all data for as many rows will fit in that 8K page. For a nonclustered index, the flag contains only the clustered index key. So finding data using a nonclustered index is comparable to finding data about a specific subject in a textbook - you first find the keyword in the keyword index (which itself is conveniently ordered by alphabet), then use the page number to find the page where you can read the actual information. Again, with the exception that the text books orders both the index entries and the pages in a physical order, whereas the SQL Server equivalents only have that pointer thread and those additional pages to order them.

    When you do not store the data with a (clustered) index, then this is equivalent to cutting all threads of the garland before throwing it out of the window. The data is still in a pile that is in semi-random order, but now without any way to quickly find data you are looking for. There is no thread to follow and there are no additional structures to quickly find a flag based on the number. A table organised in this way is called a heap.

    I could go into more detail, but I will stick to the most important lesson here - unless you really understand all details of how heaps work and what happens to them as data changes, avoid them. Having a clustered index on a randomly chosen column is not good, but in most cases it is at least better than not having a clustered index at all.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • http://sqlity.net/en/2445/b-plus-tree/[/url]

    Nice explanation Hugo and with this link that even I gone through yesterday, tells that Index structure are B+ and not Btree.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply