Cluster index fragementation

Question

Cluster index fragementation

vkundar

Hall of Fame

Points: 3431
More actions
June 2, 2014 at 6:20 am

#289327

Hi All,
We have few tables where we do truncate load or only do insert activities , why do the cluster index get fragmented very often to > 80%?
What is thecause behind it?
Cheers,
Got an idea..share it !!
DBA_Vishal

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 1

To properly answer your question regarding the tables where you have this issue we would need to see the DDL (CREATE TABLE statement) and the definition of the clustered index.

Simplest answer is page splits during the insert process to the table. If you clustered index is not an ever increasing unique value the pages in the table can easily become fragmented.

vkundar Hall of Fame Points: 3431 More actions · Answer 2

Hi Lynn,

Tables have PM / cluster index key on identity columns or the VARCHAR(1) . These VARCHAR(1) is nullable but I did search the tables there were no recs with NULL values.

All these tables are more than 98% frag . Tables are huge in size ( > 30 GB) and rebuilding the cluster indexes will be expensive.

Any pointers why so much fragmentaion. one of the table DDL with 98% frag on cluster index

CREATE TABLE XXXXXXXX(

[SRC_ID] [varchar](1) NULL,

[REC_TRNS_TYP] [varchar](1) NULL,

[I_CV] [varchar](11) NULL,

[I_AMBEST_CO_NUM] [varchar](5) NULL,

[N_POL] [varchar](25) NULL,

[N_PROP] [int] NULL,

[C_CVG_TYPE] [varchar](4) NULL,

[A_INDV_LIMIT] [decimal](8, 0) NULL,

[A_OCUR_LIMIT] [decimal](8, 0) NULL,

[A_CMB_SGL_LMT] [decimal](8, 0) NULL,

[LOAD_DT] [datetime] NULL,

[FEED_NAME] [varchar](80) NULL,

[STG_CV_PolicyCoverage_Row_ID] [bigint] NULL,

[Coverage_Checksum] [varchar](32) NULL,

[Partition_Key] [varchar](1) NULL

) ON [PRIMARY]

END

GO

SET ANSI_PADDING OFF

GO

SET ANSI_PADDING ON

GO

/****** Object: Index [CLUS_COV_Partitionkey] Script Date: 6/2/2014 6:49:09 PM ******/

IF NOT EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'XXXXXX') AND name = N'CLUS_XXX')

CREATE CLUSTERED INDEX [CLUS_XXX] ON XXXX

(

[Partition_Key] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

SET ANSI_PADDING ON

GO

Cheers,

Got an idea..share it !!

DBA_Vishal

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 3

vkundar (6/2/2014)
Hi Lynn,
Tables have PM / cluster index key on identity columns or the VARCHAR(1) . These VARCHAR(1) is nullable but I did search the tables there were no recs with NULL values.
All these tables are more than 98% frag . Tables are huge in size ( > 30 GB) and rebuilding the cluster indexes will be expensive.
Any pointers why so much fragmentaion. one of the table DDL with 98% frag on cluster index
CREATE TABLE XXXXXXXX(
[SRC_ID] [varchar](1) NULL,
[REC_TRNS_TYP] [varchar](1) NULL,
[I_CV] [varchar](11) NULL,
[I_AMBEST_CO_NUM] [varchar](5) NULL,
[N_POL] [varchar](25) NULL,
[N_PROP] [int] NULL,
[C_CVG_TYPE] [varchar](4) NULL,
[A_INDV_LIMIT] [decimal](8, 0) NULL,
[A_OCUR_LIMIT] [decimal](8, 0) NULL,
[A_CMB_SGL_LMT] [decimal](8, 0) NULL,
[LOAD_DT] [datetime] NULL,
[FEED_NAME] [varchar](80) NULL,
[STG_CV_PolicyCoverage_Row_ID] [bigint] NULL,
[Coverage_Checksum] [varchar](32) NULL,
[Partition_Key] [varchar](1) NULL
) ON [PRIMARY]
END
GO
SET ANSI_PADDING OFF
GO
SET ANSI_PADDING ON
GO
/****** Object: Index [CLUS_COV_Partitionkey] Script Date: 6/2/2014 6:49:09 PM ******/
IF NOT EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'XXXXXX') AND name = N'CLUS_XXX')
CREATE CLUSTERED INDEX [CLUS_XXX] ON XXXX
(
[Partition_Key] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
SET ANSI_PADDING ON
GO

Let's look at the column that you are clustering on, [Partition_Key] [varchar](1). If I limit the available characters stored in the column to just the capital letters A through Z, there are only 26 possible values for this column. This means it is not a unique ever increasing value. Realize that the clustered index is the logical order in which the data is stored in the table. Every time you enter a new row SQL Server has to add a uniqifier to that value and insert it into the index (the table) and this can easily result in a page split occurring. It is the page splits that cause the clustered index to fragment.

Sean Lange SSC Guru Points: 286573 More actions · Answer 4

Why varchar(1)? You do realize that is literally three times the storage for each row? 1 byte to hold the value plus the 2 bytes of overhead for varying length.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

vkundar Hall of Fame Points: 3431 More actions · Answer 5

Thanks Lynn that was awsome catch. Will cluster index fragmentaion lead tonon cluster index frag also?

Cheers,

Got an idea..share it !!

DBA_Vishal

vkundar Hall of Fame Points: 3431 More actions · Answer 6

Lynn, what about the other cluster indexes which are simple int coloumn with identity values, why these all are frag tyo 90% ?

Cheers,

Got an idea..share it !!

DBA_Vishal

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 7

vkundar (6/2/2014)
Lynn, what about the other cluster indexes which are simple int coloumn with identity values, why these all are frag tyo 90% ?

Would need to see the DDL and index definitions.

vkundar Hall of Fame Points: 3431 More actions · Answer 8

Please find the ddl

CREATE TABLE XXXX_XX(

[DM_CV_AGGR_FACT_POLICY_COVERAGE_ID] [int] IDENTITY(1,1) NOT NULL,

[POL_TYPE_ID] [int] NULL,

[GENDER] [varchar](1) NULL,

[DRIVER_AGE_RANGE_ID] [int] NULL,

[CARRIER_ID] [varchar](11) NULL,

[C_ST_ALPH] [nvarchar](10) NULL,

[LIMIT_ID] [int] NULL,

[BIZ_MONTHYEAR] [int] NOT NULL,

[BIZ_YEAR] [varchar](4) NULL,

[BIZ_MONTH] [varchar](2) NULL,

[TOT_POLICY_COUNT] [int] NULL,

[TOT_VEHICLE_COUNT] [int] NULL,

[TOT_DRIVER_COUNT] [int] NULL,

[START_DT] [int] NULL,

[END_DT] [int] NULL,

[LOAD_DT] [datetime] NOT NULL,

CONSTRAINT [PK_DM_CV_AGGR_Fact_Policy_Coverage] PRIMARY KEY CLUSTERED

(

[BIZ_MONTHYEAR] ASC,

[DM_CV_AGGR_FACT_POLICY_COVERAGE_ID] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [SCH_DM_CV_AGGR_Fact_Policy_Coverage]([BIZ_MONTHYEAR])

) ON [SCH_DM_CV_AGGR_Fact_Policy_Coverage]([BIZ_MONTHYEAR])

END

GO

Cheers,

Got an idea..share it !!

DBA_Vishal

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 9

vkundar (6/2/2014)
Please find the ddl
CREATE TABLE XXXX_XX(
[DM_CV_AGGR_FACT_POLICY_COVERAGE_ID] [int] IDENTITY(1,1) NOT NULL,
[POL_TYPE_ID] [int] NULL,
[GENDER] [varchar](1) NULL,
[DRIVER_AGE_RANGE_ID] [int] NULL,
[CARRIER_ID] [varchar](11) NULL,
[C_ST_ALPH] [nvarchar](10) NULL,
[LIMIT_ID] [int] NULL,
[BIZ_MONTHYEAR] [int] NOT NULL,
[BIZ_YEAR] [varchar](4) NULL,
[BIZ_MONTH] [varchar](2) NULL,
[TOT_POLICY_COUNT] [int] NULL,
[TOT_VEHICLE_COUNT] [int] NULL,
[TOT_DRIVER_COUNT] [int] NULL,
[START_DT] [int] NULL,
[END_DT] [int] NULL,
[LOAD_DT] [datetime] NOT NULL,
CONSTRAINT [PK_DM_CV_AGGR_Fact_Policy_Coverage] PRIMARY KEY CLUSTERED
(
[BIZ_MONTHYEAR] ASC,
[DM_CV_AGGR_FACT_POLICY_COVERAGE_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [SCH_DM_CV_AGGR_Fact_Policy_Coverage]([BIZ_MONTHYEAR])
) ON [SCH_DM_CV_AGGR_Fact_Policy_Coverage]([BIZ_MONTHYEAR])
END
GO

Again, this index may not be an ever increasing unique index:

[BIZ_MONTHYEAR] ASC,

[DM_CV_AGGR_FACT_POLICY_COVERAGE_ID] ASC

Depending on the data in BIZ_MONTHYEAR and when and how it is inserted may easily result in a highly fragmented clustered index.

Basically, for you to have an ever increasing unique index, each row added to the clustered index (table) would be inserted at the end of the index (table).

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 10

CREATE TABLE XXXX_XX(

[DM_CV_AGGR_FACT_POLICY_COVERAGE_ID] [int] IDENTITY(1,1) NOT NULL,

[POL_TYPE_ID] [int] NULL,

[GENDER] [varchar](1) NULL,

[DRIVER_AGE_RANGE_ID] [int] NULL,

[CARRIER_ID] [varchar](11) NULL,

[C_ST_ALPH] [nvarchar](10) NULL,

[LIMIT_ID] [int] NULL,

[BIZ_MONTHYEAR] [int] NOT NULL,

[BIZ_YEAR] [varchar](4) NULL,

[BIZ_MONTH] [varchar](2) NULL,

[TOT_POLICY_COUNT] [int] NULL,

[TOT_VEHICLE_COUNT] [int] NULL,

[TOT_DRIVER_COUNT] [int] NULL,

[START_DT] [int] NULL,

[END_DT] [int] NULL,

[LOAD_DT] [datetime] NOT NULL,

CONSTRAINT [PK_DM_CV_AGGR_Fact_Policy_Coverage] PRIMARY KEY CLUSTERED

(

[BIZ_MONTHYEAR] ASC,

[DM_CV_AGGR_FACT_POLICY_COVERAGE_ID] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [SCH_DM_CV_AGGR_Fact_Policy_Coverage]([BIZ_MONTHYEAR])

) ON [SCH_DM_CV_AGGR_Fact_Policy_Coverage]([BIZ_MONTHYEAR])

END

GO

Wow - hard to know where to start with all the issues I see here.

1) I will assume you have biz_monthyear as leading column for some particular reason (report query performance probably). With it first you will get splits as you insert new data and really bad splitting if you load them out of order.

2) Your PK allows you to insert the EXACT SAME RECORD multiple times. Hopefully you guarantee that cannot happen via some other control.

3) Do you need 4.2Billion values for POL_TYPE_ID? Doubtful. If not you have wasted space over small or tiny int.

4) Gender being varchar(1) was already covered as bad practice.

5) DRIVER_AGE_RANGE_ID DEFINITELY doesn't need 4.2B values. tinyint gives 255 values for 1 byte vs 4 for int.

6) BIZ_MONTHYEAR doesn't need 4.2B values. A smallint gives you sufficient years for 2 bytes.

7) BIZ_MONTH as varchar(2) is like Gender.

8) I wonder how many of your (n)varchars are actually stuffed with valid values sufficient to make their average length at or very close to the variable length allowed. If they are then variable length fields are quite inefficient from storage and processing perspective.

9) Why do you need 2 YEAR fields?

10) START_DT/END_DT again have the int/smallint problem.

Bytes REALLY do add up and are REALLY important!! You could easily have 20 bytes PER ROW in this table that are useless.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 11

TheSQLGuru (6/2/2014)
CREATE TABLE XXXX_XX( [DM_CV_AGGR_FACT_POLICY_COVERAGE_ID] [int] IDENTITY(1,1) NOT NULL, [POL_TYPE_ID] [int] NULL, [GENDER] [varchar](1) NULL, [DRIVER_AGE_RANGE_ID] [int] NULL, [CARRIER_ID] [varchar](11) NULL, [C_ST_ALPH] [nvarchar](10) NULL, [LIMIT_ID] [int] NULL, [BIZ_MONTHYEAR] [int] NOT NULL, [BIZ_YEAR] [varchar](4) NULL, [BIZ_MONTH] [varchar](2) NULL, [TOT_POLICY_COUNT] [int] NULL, [TOT_VEHICLE_COUNT] [int] NULL, [TOT_DRIVER_COUNT] [int] NULL, [START_DT] [int] NULL, [END_DT] [int] NULL, [LOAD_DT] [datetime] NOT NULL, CONSTRAINT [PK_DM_CV_AGGR_Fact_Policy_Coverage] PRIMARY KEY CLUSTERED ( [BIZ_MONTHYEAR] ASC, [DM_CV_AGGR_FACT_POLICY_COVERAGE_ID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [SCH_DM_CV_AGGR_Fact_Policy_Coverage]([BIZ_MONTHYEAR]) ) ON [SCH_DM_CV_AGGR_Fact_Policy_Coverage]([BIZ_MONTHYEAR]) END GO
Wow - hard to know where to start with all the issues I see here.
1) I will assume you have biz_monthyear as leading column for some particular reason (report query performance probably). With it first you will get splits as you insert new data and really bad splitting if you load them out of order.
2) Your PK allows you to insert the EXACT SAME RECORD multiple times. Hopefully you guarantee that cannot happen via some other control.
3) Do you need 4.2Billion values for POL_TYPE_ID? Doubtful. If not you have wasted space over small or tiny int.
4) Gender being varchar(1) was already covered as bad practice.
5) DRIVER_AGE_RANGE_ID DEFINITELY doesn't need 4.2B values. tinyint gives 255 values for 1 byte vs 4 for int.
6) BIZ_MONTHYEAR doesn't need 4.2B values. A smallint gives you sufficient years for 2 bytes.
7) BIZ_MONTH as varchar(2) is like Gender.
8) I wonder how many of your (n)varchars are actually stuffed with valid values sufficient to make their average length at or very close to the variable length allowed. If they are then variable length fields are quite inefficient from storage and processing perspective.
9) Why do you need 2 YEAR fields?
10) START_DT/END_DT again have the int/smallint problem.
Bytes REALLY do add up and are REALLY important!! You could easily have 20 bytes PER ROW in this table that are useless.

Actually, if START_DT and END_DT are dates, they should be stored as dates, not integers.

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 12

I call "daughter distraction" on that one Lynn! 🙂 Actually I have seen, and used myself often, int dates stored as CCYYMMDD in data warehouses. I think a lot of the reasoning for that went away with the DATE data type though. I would love to see a good study done of the plusses and minuses of each on modern systems.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 13

TheSQLGuru (6/2/2014)
I call "daughter distraction" on that one Lynn! 🙂 Actually I have seen, and used myself often, int dates stored as CCYYMMDD in data warehouses. I think a lot of the reasoning for that went away with the DATE data type though. I would love to see a good study done of the plusses and minuses of each on modern systems.

Well, if you store a date as an integer in the format CCYYMMDD you have to use the INT data type.

🙂

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 14

Yeah - that was the main point included in the distraction plea, although not the only one!! :hehe:

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service