RE: Hash Partitioning – SQLServerCentral

SSC Guru

Points: 110206

August 5, 2012 at 10:46 pm

SQL Kiwi (5/23/2012)
Sergiy (5/22/2012)
If we choose to believe so. IndId = 256 indicates that it's a data page, but I cannot see any indicator that it is a leaf page of clustered index. May be I miss something.
The thing you are missing is circled in green on that diagram. Also see m_level. It's obviously not a heap!
http://www.sqlskills.com/blogs/paul/post/Inside-the-Storage-Engine-Anatomy-of-a-page.aspx
Probable existance of other indexes on the table (real life scenario) will add to the load, as those indexes need to be rewritten as well to update pointers to PK.
Clustering keys don't change when a page splits, so no non-clustered index maintenance is required.
This is counter-intuitive stuff, in a specific scenario, so don't worry too much if you don't understand or agree with it, based on your own experiences.

OK, it's been a while.

But since the script was saved, why don't I run it and share the results?

Here is what I did.

1. Create 2 almost identical tables.

Only difference is in the definition for the PK column: Random (NEWID() ) vs. Sequential (IDENTITY(1,1) )

IF OBJECT_ID('MyBigTable_R') IS NOT NULL

DROP TABLE MyBigTable_R

IF OBJECT_ID('MyBigTable_S') IS NOT NULL

DROP TABLE MyBigTable_S

SET NOCOUNT ON

SET ANSI_WARNINGS OFF

CREATE TABLE MyBigTable_R (

c1 UNIQUEIDENTIFIER ROWGUIDCOL DEFAULT NEWID ()

,c2 DATETIME DEFAULT GETDATE ()

,c3 CHAR (111) DEFAULT 'a'

,c4 INT DEFAULT 1

,c5 INT DEFAULT 2

,c6 BIGINT DEFAULT 42);

CREATE CLUSTERED INDEX MyBigTable_R_cl ON MyBigTable_R (c1);

CREATE INDEX MyBigTable_R_c2 ON MyBigTable_R (c2);

CREATE INDEX MyBigTable_R_c3 ON MyBigTable_R (c3);

CREATE INDEX MyBigTable_R_c4 ON MyBigTable_R (c4);

CREATE TABLE MyBigTable_S (

c1 bigint IDENTITY(1,1) NOT NULL

,c2 datetime DEFAULT GETDATE ()

,c3 char (111) DEFAULT 'a'

,c4 int DEFAULT 1

,c5 INT DEFAULT 2

,c6 BIGINT DEFAULT 42);

CREATE CLUSTERED INDEX MyBigTable_S_cl ON MyBigTable_S (c1);

CREATE INDEX MyBigTable_S_c2 ON MyBigTable_S (c2);

CREATE INDEX MyBigTable_S_c3 ON MyBigTable_S (c3);

CREATE INDEX MyBigTable_S_c4 ON MyBigTable_S (c4);

TRUNCATE TABLE MyBigTable_R

TRUNCATE TABLE MyBigTable_S

2. Populate both tables with identical sets of data.

Again, PK columns were populated with different data, according to their definitions:

INSERT INTO MyBigTable_R (c3, c4)

SELECT TOP 200000

LEFT('abcdefghijk', (CONVERT(bigint, c1.id) + CONVERT(bigint, c2.id+c2.colid))%10), (CONVERT(bigint, c1.id) + CONVERT(bigint, c2.id+c2.colid))%10

FROM syscolumns c1, syscolumns c2--, syscolumns c3--, syscolumns c4, syscolumns c5

ORDER BY (CONVERT(bigint, c1.id) + CONVERT(bigint, c2.id+c2.colid))%10, c1.id, c2.id, c2.colid--, c3.id--, c4.id, c5.id

--ORDER BY c1.id, c2.id, c2.colid--, c3.id--, c4.id, c5.id

INSERT INTO MyBigTable_S (c3, c4)

SELECT TOP 200000

LEFT('abcdefghijk', (CONVERT(bigint, c1.id) + CONVERT(bigint, c2.id+c2.colid))%10), (CONVERT(bigint, c1.id) + CONVERT(bigint, c2.id+c2.colid))%10

FROM syscolumns c1, syscolumns c2--, syscolumns c3--, syscolumns c4, syscolumns c5

ORDER BY (CONVERT(bigint, c1.id) + CONVERT(bigint, c2.id+c2.colid))%10, c1.id, c2.id, c2.colid--, c3.id--, c4.id, c5.id

--ORDER BY c1.id, c2.id, c2.colid--, c3.id--, c4.id, c5.id

3. Check the fragmentation level for all the idexes on the tables:

DBCC SHOWCONTIG ('MyBigTable_R') WITH all_indexes , TABLERESULTS

DBCC SHOWCONTIG ('MyBigTable_S') WITH all_indexes , TABLERESULTS

4. Add small subsets (1/20 of the initial one) to both tables.

No new values are added to columns c3 and c4, new records contain the same values as already existing rows.

c2 on another hand is populated with the current time, so new values are added to the index on this column, and those new values are outside the existing data range.

And after that check the index fragmentation once again:

INSERT INTO MyBigTable_R (c3, c4)

SELECT TOP 10000

LEFT('abcdefghijk', (c1.id + c2.id+c2.colid)%10), (c1.id + (c2.id+c2.colid))%10

FROM syscolumns c1, syscolumns c2--, syscolumns c3--, syscolumns c4, syscolumns c5

ORDER BY (CONVERT(bigint, c1.id) + CONVERT(bigint, c2.id+c2.colid))%10, c1.id, c2.id, c2.colid--, c3.id--, c4.id, c5.id

INSERT INTO MyBigTable_S (c3, c4)

SELECT TOP 10000

LEFT('abcdefghijk', (c1.id + c2.id+c2.colid)%10), (c1.id + (c2.id+c2.colid))%10

FROM syscolumns c1, syscolumns c2--, syscolumns c3--, syscolumns c4, syscolumns c5

ORDER BY (CONVERT(bigint, c1.id) + CONVERT(bigint, c2.id+c2.colid))%10, c1.id, c2.id, c2.colid--, c3.id--, c4.id, c5.id

DBCC SHOWCONTIG ('MyBigTable_R') WITH all_indexes , TABLERESULTS

DBCC SHOWCONTIG ('MyBigTable_S') WITH all_indexes , TABLERESULTS

5. Repeat the step 4 adding another small set to the tables:

INSERT INTO MyBigTable_R (c3, c4)

SELECT TOP 1000

LEFT('abcdefghijk', (c1.id + c2.id+c2.colid)%10), (c1.id + (c2.id+c2.colid))%10

FROM syscolumns c1, syscolumns c2--, syscolumns c3--, syscolumns c4, syscolumns c5

ORDER BY (CONVERT(bigint, c1.id) + CONVERT(bigint, c2.id+c2.colid))%10, c1.id, c2.id, c2.colid--, c3.id--, c4.id, c5.id

INSERT INTO MyBigTable_S (c3, c4)

SELECT TOP 1000

LEFT('abcdefghijk', (c1.id + c2.id+c2.colid)%10), (c1.id + (c2.id+c2.colid))%10

FROM syscolumns c1, syscolumns c2--, syscolumns c3--, syscolumns c4, syscolumns c5

ORDER BY (CONVERT(bigint, c1.id) + CONVERT(bigint, c2.id+c2.colid))%10, c1.id, c2.id, c2.colid--, c3.id--, c4.id, c5.id

DBCC SHOWCONTIG ('MyBigTable_R') WITH all_indexes , TABLERESULTS

DBCC SHOWCONTIG ('MyBigTable_S') WITH all_indexes , TABLERESULTS

You may find the outcome in the files attached (both have the same data, just chose the format you prefer).

What's in there?

After initial uptake there is no much difference: both tables have Scan Density on above 95% for all indexes, logical fragmentation is near zero.

Extent Switches are less than number of extents by 1.

Perfect.

But after the following small set insert the stats become quite different.

"Sequential" table remains on "90th" level of scan density and <5 on logical fragmentation, almost no change to the initial state. Number of Extent Switches is about the same as number of Extents.

But "Random" table stats look quite different:

- PK is on 13% os Scan Density and 94% of Logical Fragmentation, number of extent switches is about 8 times of number of extents (as expected, and it's not what we are discussing here);

- c2 fragmentation remains at the same level: 97% and 1.15%, Extent Switches perfectly below of Extents by 1;

- c3 and c4 suddenly have about 38% Scan Density and about 25% of Logical Fragmentation, number of Extent Switches is about 3 times of number of Extents.

Second small upload does not change the numbers significantly.

Pretty much the same fragmentation stats.

Why would be that?

There are no values in c3 and c4. The only source of the fragmentation can be the clustering keys.

Apparently, existing pages on the leaf levels don't have space to accomodate new PK references, so SQL Server has to add new ones.

For "sequential" table new PK references are added at the end of the existing range, so they cause only minor fragmentation, most of existing leaf pages remain intact indeed.

But for "random" table new PK references force splitting of existing leaf level pages, as the added in random positions.

Index on c2 does not suffer from the splitting.

Why?

Well, I may go intuitive here :-), but apparently because the new data add new values beyond the current range it create new leaf pages and puts PK references to those new pages, so existing pages don't get any new data and therefore do not get fragmented.

I'd say it's not an ordinary case, normally non-clustered indexes follow rather patterns of indexes on columns c3 and c4 than on c2.

I believe, all of this above perfectly proves my point:

Splitting pages of a clustered index does change fragmentation of non-clustered index pages.

And additional non-clustered index maintenance is required because of PK fragmentation.

It might not be mentioned in the latest MS books, but it's still a fact of life.

That's where own brain and some common sense still become useful.

😎

_____________
Code for TallyGenerator