SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Clustered Index Internals


Clustered Index Internals

Author
Message
Trekman
Trekman
Forum Newbie
Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)

Group: General Forum Members
Points: 2 Visits: 131
Dropping and recreating a Clustered Index can definitely be used to move data to another disk. This is one of several documented/recommended ways to move data of certain tables to another disk in case the original disk got too small or too slow. So the statement itself is correct, but sort of incomplete.

Incomplete, because it requires additional actions : the creation of a new Filegroup on a different disk and the "ON <new-filegroup>" clause in the CREATE INDEX statement.

Excerpt from the SQL 2005 Books Online :

Because the leaf level of a clustered index and the data pages are the same by definition, creating a clustered index and using the ON partition_scheme_name or ON filegroup_name clause effectively moves a table from the filegroup on which the table was created to the new partition scheme or filegroup. Before creating tables or indexes on specific filegroups, verify which filegroups are available and that they have enough empty space for the index. For more information, see Determining Index Disk Space Requirements.


However - I sense some sort of inconsistency here.
I repeatedly read that "data is *copied* to the leaf pages of the Clustered Index". That would contradict the phrase that it "effectively moves a table", because the original Table would still exist with its original data pages.

So - is the data from the original Table physically *copied* or *moved* to the Clustered Index leaf pages ?
If they are copied - what happens to the original Table ? Does it still occupy the diskspace and add to the database size but is not used anymore ?

I *think* to remember that I had done this procedure once, 3-4 years ago on a SQL 2000 server, to move the Index workload to another disk.
And as far as I remember I was able to shrink the original MDF file significantly which would confirm that the data are actually moved.

So the table definition still remains in its original filegroup, but the "leaf level" with the data content is actually saved in the Clustered Index on the other filegroup/disk ?
And are table data modifications done directly on these moved data pages on the other filegroup/disk within the Clustered Index ?
Or on the original filegroup/disk and only moved physically to the Index filegroup/disk when dropping/recreating the Index ?

Very confusing.....
Cliff Jones
Cliff Jones
SSCertifiable
SSCertifiable (5.1K reputation)SSCertifiable (5.1K reputation)SSCertifiable (5.1K reputation)SSCertifiable (5.1K reputation)SSCertifiable (5.1K reputation)SSCertifiable (5.1K reputation)SSCertifiable (5.1K reputation)SSCertifiable (5.1K reputation)

Group: General Forum Members
Points: 5129 Visits: 3648
It have the book and I agree it is a great book, but the answers to the question were a little misleading I thought.
Roddy.CAMERON
Roddy.CAMERON
SSC Eights!
SSC Eights! (988 reputation)SSC Eights! (988 reputation)SSC Eights! (988 reputation)SSC Eights! (988 reputation)SSC Eights! (988 reputation)SSC Eights! (988 reputation)SSC Eights! (988 reputation)SSC Eights! (988 reputation)

Group: General Forum Members
Points: 988 Visits: 2606
Rich-403221 (12/9/2009)
Also, the doubly linked list would imply that there are pointers on every page to the previous and next pages in the table. I have NEVER read anywhere in any documentation that this exists. As far as I knew, the IAM pages contained the bit maps providing the allocation of pages to a table. Can anyone point me to documentation that describes this linked list?


It's actually been documented since SQLServer was first released by Sybase, but a lot of the Microsoft Documentation is pretty poor and badly arranged. Look in BOL under "Clustered Index Structures" http://msdn.microsoft.com/en-us/library/ms177443.aspx

Cheers
Roddy
Dude76
Dude76
Old Hand
Old Hand (332 reputation)Old Hand (332 reputation)Old Hand (332 reputation)Old Hand (332 reputation)Old Hand (332 reputation)Old Hand (332 reputation)Old Hand (332 reputation)Old Hand (332 reputation)

Group: General Forum Members
Points: 332 Visits: 94
"doubly linked lists".
Well, i learn something today Smile
I thought it was B-Tree.

I don't like questions where all answers are right, it don't learn us the limits, it makes me confusing, and there are some in these QoD. :/


My MCP Transcript (ID : 692471 Access : 109741229)
Robert Frasca
Robert Frasca
UDP Broadcaster
UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)

Group: General Forum Members
Points: 1491 Visits: 794
My page 311 paragaph 1 says something different than your page 311 paragraph 1.

Mine says: "...This order is maintained through a doubly linked list called a page chain. (Note that pages in a heap are not linked in any way to each other.) The order of pages in the page chain, and the order of rows on the data pages, is based on the definition of the clustered index. Deciding on which column(s) to cluster is an important performance consideration." (End of Paragraph)

This part isn't in my book.

"When the clustered index is created, it does in fact reorder the data physically on disk according to the cluster key, or as close to physical order as possible given the available space. After that, maintaining the clustered index is a logical operation. Rebuilding the index will again order the data physically. Reorganizing the index is an attempt to make the logical and the physical order match, without all the work of rebuilding the index...."

That being said, I don't understand why everyone is confused.

A clustered index does physically rearrange data on a disk when it is created and can continue to do so if there are page splits or when the index is rebuilt. There's no ambiguity here.

A clustered index logically tracks data on disk using doubly linked lists. This has been true for a long time.

Dropping and rebuilding a clustered index on a partition scheme may move data to other drives. The whole point of using a partition scheme is to distribute the data across multiple disks to improve performance by distributing the I/O as well as by creating logically and physically smaller partitions to search. Even if you don't use partitions, the best way to copy a table from one filegroup to another is to drop the clustered index and re-create it using a different file group. That has also been true for quite awhile.

I find it particularly amusing that some of you have the chutzpah to claim that Kalen Delaney is wrong. :-)

"Beliefs" get in the way of learning.
Mauve
Mauve
Hall of Fame
Hall of Fame (3.2K reputation)Hall of Fame (3.2K reputation)Hall of Fame (3.2K reputation)Hall of Fame (3.2K reputation)Hall of Fame (3.2K reputation)Hall of Fame (3.2K reputation)Hall of Fame (3.2K reputation)Hall of Fame (3.2K reputation)

Group: General Forum Members
Points: 3168 Visits: 2065
Re: A clustered index physically rearanges data on disk.
Technically, the use of the word "disk" is wrong. It rearranges data within the (SQL Server) pages and extents. Where these pages are placed on the physical disk (track and cylinder) is not within the control of SQL Server. It is the OS (Windows) and/or the physical hardware implementation (e.g., a SAN) that determines where the data is actually placed on the physical disk.

The only item that I know of that "physically rearanges data on disk" is a disk defragmenter.


(PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.
Robert Frasca
Robert Frasca
UDP Broadcaster
UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)

Group: General Forum Members
Points: 1491 Visits: 794
Mauve (12/10/2009)
Re: A clustered index physically rearanges data on disk.
Technically, the use of the word "disk" is wrong. It rearranges data within the (SQL Server) pages and extents. Where these pages are placed on the physical disk (track and cylinder) is not within the control of SQL Server. It is the OS (Windows) and/or the physical hardware implementation (e.g., a SAN) that determines where the data is actually placed on the physical disk.

The only item that I know of that "physically rearanges data on disk" is a disk defragmenter.


That's a semantic argument that isn't even accurate. Those pages and extents exist in files on disk and the disk controller must access them. If I rebuild the clustered index then the pages (and their contents) that were logically connected via page chains will be re-arranged (on the disk), i.e. an individual row may move from one physical disk sector of a file extent to another. If I insert a million new rows and my file group needs to extend then that additional disk space must be allocated. I'll grant you that the file system created the extent but it is SQL Server that is creating the physical pages that will be stored on the extent.

I suppose you could say that "technically" the only thing that writes to a disk are the read-write heads which are managed by the firmware of the device but that kind of defeats the purpose of the discussion. We don't need to be concerned with that level of abstraction. It's a lot simpler to say that SQL Server physically rearranges data on disk.

"Beliefs" get in the way of learning.
Source-NH
Source-NH
SSCrazy
SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)

Group: General Forum Members
Points: 2021 Visits: 479
Robert Frasca (12/10/2009)
I find it particularly amusing that some of you have the chutzpah to claim that Kalen Delaney is wrong. :-)



Far be it for me to question Kalen !!!! I just had never heard of the linked lists and still am not totally comfortable with the partitioning part. I still think that the partitioning function will put data right back where it had originally been ....
Brandon Forest
Brandon Forest
Old Hand
Old Hand (399 reputation)Old Hand (399 reputation)Old Hand (399 reputation)Old Hand (399 reputation)Old Hand (399 reputation)Old Hand (399 reputation)Old Hand (399 reputation)Old Hand (399 reputation)

Group: General Forum Members
Points: 399 Visits: 446
Robert is right, the quote is not quite the same as in the book. The following email will explain the difference ;-)
--**********************************************************************************

Hi Brandon

Of course you can publish this note. I would suggest a minor edit to your last sentence:

When the clustered index is created, it does in fact reorder the data physically on disk according to the cluster key, or as close to physical order as possible given the available space. After that, maintaining the clustered index is a logical operation. Rebuilding the index will again order the data physically. Reorganizing the index is an attempt to make the logical and the physical order match, without all the work of rebuilding the index.

Best regards,

Kalen Delaney, SQL Server MVP
www.SQLServerInternals.com
RichB
RichB
SSCrazy
SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)SSCrazy (2.9K reputation)

Group: General Forum Members
Points: 2902 Visits: 1065
Robert Frasca (12/10/2009)
Dropping and rebuilding a clustered index on a partition scheme may move data to other drives. The whole point of using a partition scheme is to distribute the data across multiple disks to improve performance by distributing the I/O as well as by creating logically and physically smaller partitions to search. Even if you don't use partitions, the best way to copy a table from one filegroup to another is to drop the clustered index and re-create it using a different file group. That has also been true for quite awhile.


No its not. The best way is to create with drop existing - not dropping and creating a new one. Many tools on the market take this approach - with the typical effect that they need to drop and recreate every foreign key that references the primary key that you are dropping and creating. Try that on a table with 30 rows but referenced by one with 4bn...



Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search