Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 1234»»»

Clustered Indexes Expand / Collapse
Author
Message
Posted Monday, October 4, 2010 8:35 PM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Monday, October 13, 2014 4:42 PM
Points: 1,618, Visits: 1,549
Comments posted to this topic are about the item Clustered Indexes



My blog: SQL Soldier
Twitter: @SQLSoldier
My book: Pro SQL Server 2008 Mirroring
Microsoft Certified Master: SQL Server 2008
Principal DBA: Outerwall, Inc.
Also available for consulting: SQL DBA Master
Post #998040
Posted Tuesday, October 5, 2010 12:03 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Tuesday, October 14, 2014 4:43 AM
Points: 1,130, Visits: 1,391
Nice question. Thanks.

Thanks
Post #998090
Posted Tuesday, October 5, 2010 12:28 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 3:07 AM
Points: 6,040, Visits: 8,322
Whe I saw this question, my expectation was that most people would get it right - after all, there have already been several questions that basicallly tested the same concept. The word "superfluous" already started to form in my head.

After answering, I was shocked to see a score of only 39% (at that time) correct answers. Almost half the people (48%, to be precise) apparently STILL think that data in a table is always physically ordered by the clustering key. So I have to apologize for the critique I (fortunately) never uttered - this question is far from superfluous, it is still very relevant and will probably have to be repeated yet more often.

It's too bad that the author did not include a reference. For instance, the books online description of clustered indexes at http://msdn.microsoft.com/en-us/library/ms177443.aspx (found after a one-minute search), that contains a nice graphic of a clustered index - though it unfortunately never mentions that the actual locations of the depicted pages on disk can be completely random.

But apart from this missing reference, the question is good. Thanks!



Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Post #998097
Posted Tuesday, October 5, 2010 1:27 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 1:42 PM
Points: 13,239, Visits: 11,018
Good question!

At the time of writing, only 34% have the right answer. So it is a good question indeed to refreshen some concepts about clustered index.

Another great resource is this article from Gail Shaw:
http://www.sqlservercentral.com/articles/Indexing/68439/




How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?

Member of LinkedIn. My blog at LessThanDot.

MCSA SQL Server 2012 - MCSE Business Intelligence
Post #998113
Posted Tuesday, October 5, 2010 1:55 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Tuesday, October 7, 2014 2:56 AM
Points: 2,842, Visits: 3,876
I am not satisfied with the answer, indeed I find it very misleading.

As data is inserted or the clustering key values are updated, SQL Server only preserves logical ordering of the data. It would require far too many resources to preserve physical ordering of the data.
What you are referring to is the physical order within a page.
What you have left out here, is the fact that rows are indeed sorted in physical order accross pages.
So if you have 2 records on one page, and a new record that logically belongs between them does not fit on the same page anymore, the logically last row of the existing records will be moved to a new page, and the new record will be stored on the existing page to preserve physical order.

Here is some more background information which I think is necessary to really understand the discussed concepts:
Clustered indexes are not a good choice for:

Columns that undergo frequent changes
This results in the entire row moving (because SQL Server must keep the data values of a row in physical order). This is an important consideration in high-volume transaction processing systems where data tends to be volatile.
http://blogs.msdn.com/b/sqlserverstorageengine/archive/2006/06/26/647005.aspx

It's a very common misconception that records within a page are always stored in logical order.
http://www.sqlskills.com/blogs/paul/2007/10/03/InsideTheStorageEngineAnatomyOfAPage.aspx
http://www.sqlskills.com/BLOGS/PAUL/post/Inside-the-Storage-Engine-Proof-that-records-are-not-always-physically-stored-in-index-key-order.aspx
(Again - note that this is referring to order within a page only)


Best Regards,
Chris Büttner
Post #998128
Posted Tuesday, October 5, 2010 2:09 AM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 3:09 PM
Points: 40,172, Visits: 36,560
Christian Buettner-167247 (10/5/2010)

What you have left out here, is the fact that rows are indeed sorted in physical order accross pages.
So if you have 2 records on one page, and a new record that logically belongs between them does not fit on the same page anymore, the logically last row of the existing records will be moved to a new page, and the new record will be stored on the existing page to preserve physical order.


True, but that new page is not necessarily physically located after the existing page. There is no problem whatsoever with a page setup like this (assume ID is an int clustered index)

File 1 page 200 ID 5-20
File 1 page 201 ID 86-140
File 1 page 208 ID 51-85
File 1 page 210 ID 21-50

It's fragmented (50% logical fragmentation) but it's perfectly acceptable. The pages have next and previous pointers that point to the page that's logically next or previous to maintain the index order. The pages are not necessarily physically contiguous or in order.



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #998133
Posted Tuesday, October 5, 2010 2:23 AM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Today @ 8:44 AM
Points: 1,777, Visits: 6,456
Hugo Kornelis (10/5/2010)

After answering, I was shocked to see a score of only 39% (at that time) correct answers. Almost half the people (48%, to be precise) apparently STILL think that data in a table is always physically ordered by the clustering key.


Currently 33% and 51% respectively
Post #998141
Posted Tuesday, October 5, 2010 2:31 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Tuesday, October 7, 2014 2:56 AM
Points: 2,842, Visits: 3,876
GilaMonster (10/5/2010)
True, but that new page is not necessarily physically located after the existing page.

Agreed. The rows are physically ordered within the "page universe"*, but not the pages themselves

*and neither the rows within one page


Best Regards,
Chris Büttner
Post #998146
Posted Tuesday, October 5, 2010 2:43 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 3:07 AM
Points: 6,040, Visits: 8,322
Christian Buettner-167247 (10/5/2010)
I am not satisfied with the answer, indeed I find it very misleading.

I have to disagree. I am completely satisfiied with the answer, as it is 100% correct. I find the explanation lacking, but definitely not misleading.

As data is inserted or the clustering key values are updated, SQL Server only preserves logical ordering of the data. It would require far too many resources to preserve physical ordering of the data.
What you are referring to is the physical order within a page.

I don't think so. In fact, the cost of changing the physical order within a page is not that much; the page is in cache anyway, it is alwayys read and written as a whole, so the only cost is that of moving at most 8,000 bytes around in a memory block. Measured in fractions of a microsecond.
The prohibitive use of resources would come when a page overflows. If physical order has to be preserved and the free space between rows runs out, all rows have to be physically moved. Just think for a while what would happen if you have a 10-billion row table, and you need to insert a row somewhere halfway in the table? Do you reallly expect SQL Server to physically relocate 5 billion rows, to make place for the new row? It does not. It allocates a new page, somewhere, and changes the pointer chain to preserve the logical order.

What you have left out here, is the fact that rows are indeed sorted in physical order accross pages.
So if you have 2 records on one page, and a new record that logically belongs between them does not fit on the same page anymore, the logically last row of the existing records will be moved to a new page, and the new record will be stored on the existing page to preserve physical order.

This would only preserve physical order if SQL Server allocated the new page "between" the two old pages. It does not. (I have some repro code down below).

Here is some more background information which I think is necessary to really understand the discussed concepts:
Clustered indexes are not a good choice for:

Columns that undergo frequent changes
This results in the entire row moving (because SQL Server must keep the data values of a row in physical order). This is an important consideration in high-volume transaction processing systems where data tends to be volatile.
http://blogs.msdn.com/b/sqlserverstorageengine/archive/2006/06/26/647005.aspx

A change in a clustered index value can indeed result in the row moving to another page. But the use of the words "physical order" in this quote is sloppy - not up to Pauls normal standard.

It's a very common misconception that records within a page are always stored in logical order.
http://www.sqlskills.com/blogs/paul/2007/10/03/InsideTheStorageEngineAnatomyOfAPage.aspx
http://www.sqlskills.com/BLOGS/PAUL/post/Inside-the-Storage-Engine-Proof-that-records-are-not-always-physically-stored-in-index-key-order.aspx
(Again - note that this is referring to order within a page only)

Though I don't think this is what Robert intended with his question, it does further support the correct answer. Within a single page, the rows can physically be stored out of order. The slot array is used to represent the logical order.

To demonstrate that the physical order of pages is not equal to the logical order, run the script below and check the output. Here is an explanation of what happens.
I first create a new database, then a table with a large data column so don't need lots of rows to fill a few pages. I then fill the row with data in ascending clustered index order. If the database files are not fragmented, all free space is continuous, the database files do not reside on SAN or RAID, and nobody else is using the database at the same time, then it is likely (though I won't vow for it) that the pages allocated will actually be in physical order.
I then insert a single extra row that logically goes in the beginning of the table. There are four rows per page, so the row to be added goes to the second page (with key values 9, 11, 13, and 15). This page can't accomodate a fifth row, so it has to be split. The first two rows remain on their page, the other two are moved to a new page. This new page is not squeezed in between the second page and the "old" third page, but just allocated where there is place - probably right after the last page allocated to this table (assuming no other activity in the database). Pointers are used to preserve logical order.
Then, I run a simple query. Watch the TABLOCK hint - either that or a NOLOCK hint is required to get the intended results. That is because only a full table lock or no locking at all allows SQL Server to use the "IAM scan" - a scan that uses the index allocation map to see what pages are allocated to the table and then process those pages in their physical allocation order. When the table exceeds a certain size, the cost of this kind of scan is lower than a scan that follows the pointers that define the logical order. (This size threshold is the reason I had to add 500 rows).
If you run the query, you will see that the results are "almost" ordered - the only "disorder" is the gap between values 12 and 17; the rows with key values 13 and 15 are not missing; they appear at the very end of the results, after the vallue 999. This is what the physical order of the tablle looks like after we forced a page split.
CREATE DATABASE Demo;
go
USE Demo;
go
CREATE TABLE Demo
(KeyCol int NOT NULL PRIMARY KEY,
DataCol char(2000));
go
DECLARE @i int;
SET @i = 1
WHILE @i < 1000
BEGIN;
INSERT INTO Demo (KeyCol, DataCol)
VALUES (@i, 'Some data');
SET @i = @i + 2;
END;
go
INSERT INTO Demo (KeyCol, DataCol)
SELECT 12, 'Added later';
go
SELECT *
FROM Demo WITH (TABLOCK);
go
USE master;
go
DROP DATABASE Demo;
go




Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Post #998149
Posted Tuesday, October 5, 2010 3:02 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Tuesday, October 7, 2014 2:56 AM
Points: 2,842, Visits: 3,876
Hi Hugo,

In your explanation you have switched from order of rows accross pages to order of pages.

I have not meant that a physical order is imposed on the pages. I have meant that a physical order is imposed on the rows accros pages. This physical ordering of rows accross pages does not result in physical ordering of the pages, and neither does it result in ordered data on the hard disk itself.




Best Regards,
Chris Büttner
Post #998161
« Prev Topic | Next Topic »

Add to briefcase 1234»»»

Permissions Expand / Collapse