Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««12345»»»

Clustered Index Internals Expand / Collapse
Author
Message
Posted Wednesday, December 9, 2009 10:49 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, April 9, 2014 7:34 AM
Points: 2, Visits: 117
Dropping and recreating a Clustered Index can definitely be used to move data to another disk. This is one of several documented/recommended ways to move data of certain tables to another disk in case the original disk got too small or too slow. So the statement itself is correct, but sort of incomplete.

Incomplete, because it requires additional actions : the creation of a new Filegroup on a different disk and the "ON <new-filegroup>" clause in the CREATE INDEX statement.

Excerpt from the SQL 2005 Books Online :

Because the leaf level of a clustered index and the data pages are the same by definition, creating a clustered index and using the ON partition_scheme_name or ON filegroup_name clause effectively moves a table from the filegroup on which the table was created to the new partition scheme or filegroup. Before creating tables or indexes on specific filegroups, verify which filegroups are available and that they have enough empty space for the index. For more information, see Determining Index Disk Space Requirements.


However - I sense some sort of inconsistency here.
I repeatedly read that "data is *copied* to the leaf pages of the Clustered Index". That would contradict the phrase that it "effectively moves a table", because the original Table would still exist with its original data pages.

So - is the data from the original Table physically *copied* or *moved* to the Clustered Index leaf pages ?
If they are copied - what happens to the original Table ? Does it still occupy the diskspace and add to the database size but is not used anymore ?

I *think* to remember that I had done this procedure once, 3-4 years ago on a SQL 2000 server, to move the Index workload to another disk.
And as far as I remember I was able to shrink the original MDF file significantly which would confirm that the data are actually moved.

So the table definition still remains in its original filegroup, but the "leaf level" with the data content is actually saved in the Clustered Index on the other filegroup/disk ?
And are table data modifications done directly on these moved data pages on the other filegroup/disk within the Clustered Index ?
Or on the original filegroup/disk and only moved physically to the Index filegroup/disk when dropping/recreating the Index ?

Very confusing.....
Post #831627
Posted Wednesday, December 9, 2009 12:05 PM
Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Today @ 7:30 AM
Points: 3,869, Visits: 3,621
It have the book and I agree it is a great book, but the answers to the question were a little misleading I thought.
Post #831718
Posted Wednesday, December 9, 2009 2:51 PM
Right there with Babe

Right there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with Babe

Group: General Forum Members
Last Login: Yesterday @ 5:34 PM
Points: 723, Visits: 2,122
Rich-403221 (12/9/2009)
Also, the doubly linked list would imply that there are pointers on every page to the previous and next pages in the table. I have NEVER read anywhere in any documentation that this exists. As far as I knew, the IAM pages contained the bit maps providing the allocation of pages to a table. Can anyone point me to documentation that describes this linked list?


It's actually been documented since SQLServer was first released by Sybase, but a lot of the Microsoft Documentation is pretty poor and badly arranged. Look in BOL under "Clustered Index Structures" http://msdn.microsoft.com/en-us/library/ms177443.aspx

Cheers
Roddy
Post #831869
Posted Thursday, December 10, 2009 1:55 AM


SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Monday, March 18, 2013 7:52 PM
Points: 280, Visits: 88
"doubly linked lists".
Well, i learn something today :)
I thought it was B-Tree.

I don't like questions where all answers are right, it don't learn us the limits, it makes me confusing, and there are some in these QoD. :/



My MCP Transcript (ID : 692471 Access : 109741229)
Post #832047
Posted Thursday, December 10, 2009 9:03 AM
Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Monday, May 5, 2014 1:57 PM
Points: 545, Visits: 643
My page 311 paragaph 1 says something different than your page 311 paragraph 1.

Mine says: "...This order is maintained through a doubly linked list called a page chain. (Note that pages in a heap are not linked in any way to each other.) The order of pages in the page chain, and the order of rows on the data pages, is based on the definition of the clustered index. Deciding on which column(s) to cluster is an important performance consideration." (End of Paragraph)

This part isn't in my book.

"When the clustered index is created, it does in fact reorder the data physically on disk according to the cluster key, or as close to physical order as possible given the available space. After that, maintaining the clustered index is a logical operation. Rebuilding the index will again order the data physically. Reorganizing the index is an attempt to make the logical and the physical order match, without all the work of rebuilding the index...."

That being said, I don't understand why everyone is confused.

A clustered index does physically rearrange data on a disk when it is created and can continue to do so if there are page splits or when the index is rebuilt. There's no ambiguity here.

A clustered index logically tracks data on disk using doubly linked lists. This has been true for a long time.

Dropping and rebuilding a clustered index on a partition scheme may move data to other drives. The whole point of using a partition scheme is to distribute the data across multiple disks to improve performance by distributing the I/O as well as by creating logically and physically smaller partitions to search. Even if you don't use partitions, the best way to copy a table from one filegroup to another is to drop the clustered index and re-create it using a different file group. That has also been true for quite awhile.

I find it particularly amusing that some of you have the chutzpah to claim that Kalen Delaney is wrong.



"Beliefs" get in the way of learning.
Post #832315
Posted Thursday, December 10, 2009 9:23 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, July 21, 2014 2:32 PM
Points: 1,182, Visits: 1,969
Re: A clustered index physically rearanges data on disk.
Technically, the use of the word "disk" is wrong. It rearranges data within the (SQL Server) pages and extents. Where these pages are placed on the physical disk (track and cylinder) is not within the control of SQL Server. It is the OS (Windows) and/or the physical hardware implementation (e.g., a SAN) that determines where the data is actually placed on the physical disk.

The only item that I know of that "physically rearanges data on disk" is a disk defragmenter.



(PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.
Post #832339
Posted Thursday, December 10, 2009 9:48 AM
Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Monday, May 5, 2014 1:57 PM
Points: 545, Visits: 643
Mauve (12/10/2009)
Re: A clustered index physically rearanges data on disk.
Technically, the use of the word "disk" is wrong. It rearranges data within the (SQL Server) pages and extents. Where these pages are placed on the physical disk (track and cylinder) is not within the control of SQL Server. It is the OS (Windows) and/or the physical hardware implementation (e.g., a SAN) that determines where the data is actually placed on the physical disk.

The only item that I know of that "physically rearanges data on disk" is a disk defragmenter.


That's a semantic argument that isn't even accurate. Those pages and extents exist in files on disk and the disk controller must access them. If I rebuild the clustered index then the pages (and their contents) that were logically connected via page chains will be re-arranged (on the disk), i.e. an individual row may move from one physical disk sector of a file extent to another. If I insert a million new rows and my file group needs to extend then that additional disk space must be allocated. I'll grant you that the file system created the extent but it is SQL Server that is creating the physical pages that will be stored on the extent.

I suppose you could say that "technically" the only thing that writes to a disk are the read-write heads which are managed by the firmware of the device but that kind of defeats the purpose of the discussion. We don't need to be concerned with that level of abstraction. It's a lot simpler to say that SQL Server physically rearranges data on disk.


"Beliefs" get in the way of learning.
Post #832371
Posted Thursday, December 10, 2009 11:09 AM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Thursday, September 19, 2013 7:41 AM
Points: 1,626, Visits: 477
Robert Frasca (12/10/2009)
I find it particularly amusing that some of you have the chutzpah to claim that Kalen Delaney is wrong.



Far be it for me to question Kalen !!!! I just had never heard of the linked lists and still am not totally comfortable with the partitioning part. I still think that the partitioning function will put data right back where it had originally been ....
Post #832438
Posted Thursday, December 10, 2009 2:52 PM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Wednesday, July 9, 2014 11:13 AM
Points: 65, Visits: 366
Robert is right, the quote is not quite the same as in the book. The following email will explain the difference
--**********************************************************************************

Hi Brandon

Of course you can publish this note. I would suggest a minor edit to your last sentence:

When the clustered index is created, it does in fact reorder the data physically on disk according to the cluster key, or as close to physical order as possible given the available space. After that, maintaining the clustered index is a logical operation. Rebuilding the index will again order the data physically. Reorganizing the index is an attempt to make the logical and the physical order match, without all the work of rebuilding the index.

Best regards,

Kalen Delaney, SQL Server MVP
www.SQLServerInternals.com
Post #832597
Posted Monday, December 14, 2009 10:53 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Thursday, July 10, 2014 7:24 AM
Points: 1,060, Visits: 874
Robert Frasca (12/10/2009)
Dropping and rebuilding a clustered index on a partition scheme may move data to other drives. The whole point of using a partition scheme is to distribute the data across multiple disks to improve performance by distributing the I/O as well as by creating logically and physically smaller partitions to search. Even if you don't use partitions, the best way to copy a table from one filegroup to another is to drop the clustered index and re-create it using a different file group. That has also been true for quite awhile.


No its not. The best way is to create with drop existing - not dropping and creating a new one. Many tools on the market take this approach - with the typical effect that they need to drop and recreate every foreign key that references the primary key that you are dropping and creating. Try that on a table with 30 rows but referenced by one with 4bn...




Post #834004
« Prev Topic | Next Topic »

Add to briefcase «««12345»»»

Permissions Expand / Collapse