Fill factors, index fragmentation and indexing strategy

Question

Fill factors, index fragmentation and indexing strategy

Marios Philippopoulos

SSC Guru

Points: 57190
More actions
October 15, 2008 at 11:39 am

#214122

I need to confirm whether my understanding with regards to fill factors and index fragmentation is correct in the following scenario.
I have a table of about 300,000 rows that looks like this:
Column
---------
AID (INT - Identity column, monotonically increasing - also the PK)
FKID (INT - foreign-key column)
Dt (DATETIME)
The PK is clustered.
The table is truncated and re-populated from scratch according to a daily schedule.
I want to create non-clustered indexes on the FKID and DT columns.
My question is:
do I need to worry about specifying a fill factor for the non-clustered indexes? My feeling is that I don't need a fill factor, since the data is entered one row at a time in the table, so there should be no random inserts into the pages of the non-clustered indexes.
Can someone pls confirm?
__________________________________________________________________________________
SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
Persisting SQL Server Index-Usage Statistics with MERGE[/url]
Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

Viewing 15 posts - 1 through 15 (of 30 total)

You must be logged in to reply to this topic. Login to reply

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 1

You might worry about it if the columns making up the key change a lot. in a similar way to why you worry about fill factor for clustered indexes, data movement in those columns would cause the index to fragment if there isn't enough room to accomodate the change.

Of course - it's not as "bad" as a page split in the clustered index, but still - might be worth considering some amount of slack.

Specify it exactly like you would on a clustered index.

create index ix_jr09 on job20080930(rowID) include (msg) WITH FILLFACTOR=80

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Marios Philippopoulos SSC Guru Points: 57190 More actions · Answer 2

Matt Miller (10/15/2008)
You might worry about it if the columns making up the key change a lot. in a similar way to why you worry about fill factor for clustered indexes, data movement in those columns would cause the index to fragment if there isn't enough room to accomodate the change.
Of course - it's not as "bad" as a page split in the clustered index, but still - might be worth considering some amount of slack.
Specify it exactly like you would on a clustered index.
create index ix_jr09 on job20080930(rowID) include (msg) WITH FILLFACTOR=80

Thanks for responding.

In this case we are not doing any updates, just a straight TRUNCATE of the table and INSERT of new data. Also, the clustered index is set on the identity column, so there should be no random inserts.

__________________________________________________________________________________
SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
Persisting SQL Server Index-Usage Statistics with MERGE[/url]
Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

Jeff Moden SSC Guru Points: 1003857 More actions · Answer 3

In that case, I'd use a FILL Factor of 100 for that tiny bit more speed on any SELECTS you may do on the table.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 4

Jeff Moden (10/15/2008)
In that case, I'd use a FILL Factor of 100 for that tiny bit more speed on any SELECTS you may do on the table.

True - but that assumes a "single load" scenario, where the table is essentially recreated "from scratch" and then left alone afterwards. That also entails dropping the non-clustered's before the truncate and recreating the indexes once the insert has happened. That may well fit your scenario.

If you plan on leaving the indexes in place (or if the indexes happen over a prolonged stretch of time), you'd still need a fill factor <100.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Jeff Moden SSC Guru Points: 1003857 More actions · Answer 5

Matt Miller (10/15/2008)
Jeff Moden (10/15/2008)
In that case, I'd use a FILL Factor of 100 for that tiny bit more speed on any SELECTS you may do on the table.
True - but that assumes a "single load" scenario, where the table is essentially recreated "from scratch" and then left alone afterwards. That also entails dropping the non-clustered's before the truncate and recreating the indexes once the insert has happened. That may well fit your scenario.
If you plan on leaving the indexes in place (or if the indexes happen over a prolonged stretch of time), you'd still need a fill factor <100.

No... 🙂 it assumes a clustered PK on the IDENTITY column (like the OP said) which will keep things just as tidy as a single load.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 6

Jeff Moden (10/15/2008)
Matt Miller (10/15/2008)
Jeff Moden (10/15/2008)
In that case, I'd use a FILL Factor of 100 for that tiny bit more speed on any SELECTS you may do on the table.
True - but that assumes a "single load" scenario, where the table is essentially recreated "from scratch" and then left alone afterwards. That also entails dropping the non-clustered's before the truncate and recreating the indexes once the insert has happened. That may well fit your scenario.
If you plan on leaving the indexes in place (or if the indexes happen over a prolonged stretch of time), you'd still need a fill factor <100.
No... it assumes a clustered PK on the IDENTITY column (like the OP said) which will keep things just as tidy as a single load.

I understand that keeps the CLUSTERED index tidy - but how does that help the NON-Clustered indexes (and their fill factors). OP was asking about the fill factor of the NCI's (which would fragment if present on load of the data)

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Jeff Moden SSC Guru Points: 1003857 More actions · Answer 7

Sorry... the day's coffee is wearing off. 🙂 I didn't read all of your last...

You're correct. Since the rows are going to be added on a onesy basis, there could be some pretty nasty fragmentation at FILL FACTOR 100 on the non-clustered indexes. I don't believe even an 80 FILL FACTOR would help there. Might want to go as low as, say, 60 as a guess. Best thing to do would be to find out what an average day of inserts looks like and calculate a FILL FACTOR to support the inserts.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Marios Philippopoulos SSC Guru Points: 57190 More actions · Answer 8

Thank you both for your replies.

I'm not sure I follow though. How would the non-clustered indexes get fragmented? The table population is done all at once, in a sequential fashion by the clustered PK. Wouldn't the non-clustered-index pages be also filled sequentially in a "from-top-to-bottom" fashion?

There should be no page splits in this scenario.

__________________________________________________________________________________
SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
Persisting SQL Server Index-Usage Statistics with MERGE[/url]
Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 9

Marios Philippopoulos (10/16/2008)
Thank you both for your replies.
I'm not sure I follow though. How would the non-clustered indexes get fragmented? The table population is done all at once, in a sequential fashion by the clustered PK. Wouldn't the non-clustered-index pages be also filled sequentially in a "from-top-to-bottom" fashion?
There should be no page splits in this scenario.

The page inserts are sequential for the clustered index only. Since you're using other columns in the non-clustered indexes, it would be a "random insert" scenario on those indexes, unless the key column in the non-clustered index also happen to follow EXACTLY the same ordering as the ID (which would be rather unusual).

Since you're doing a single load, your best bet is to actually then drop all non-clustered indexes before the load, and rebuild them from scratch when the load is done. In which case - load them as Jeff was recommending (100% fill factor, since there will be NO fragmentation).

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 10

Jeff Moden (10/15/2008)
Sorry... the day's coffee is wearing off. 🙂 I didn't read all of your last...
You're correct. Since the rows are going to be added on a onesy basis, there could be some pretty nasty fragmentation at FILL FACTOR 100 on the non-clustered indexes. I don't believe even an 80 FILL FACTOR would help there. Might want to go as low as, say, 60 as a guess. Best thing to do would be to find out what an average day of inserts looks like and calculate a FILL FACTOR to support the inserts.

It's fine - I was so tired I fell asleep on my keyboard while trying to verify which of us was right 🙂

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Marios Philippopoulos SSC Guru Points: 57190 More actions · Answer 11

It's true that the ordering of the other 2 columns - FKID and Dt - is not sequential, as is that of the AID column.

Let's for a moment assume - for the sake of the argument - that each 8-kb page can only hold 2 rows of data for each non-clustered index in the table, and let's suppose we have the following data:

AID FKID Dt
---------------------------------------------------------------------
1 8000 2008-10-01
2 1000 2008-09-01
3 3000 2008-05-01
4 2000 2008-06-01
5 4000 2008-07-01
6 1500 2008-11-01

Again, the table is defined as follows:

Column
---------
AID (INT - Identity column, monotonically increasing - also the PK)
FKID (INT - foreign-key column)
Dt (DATETIME)

I want to create 2 non-clustered indexes, one on FKID and one on Dt. The clustered index is already defined on the AID column.

What would the pages of each non-clustered index look like?

I think they will look like this:

Index pages on FKID column

Page 1:

8000

1000

Page 2:

3000

2000

Page 3:

4000

1500

Index pages on Dt column

Page 1:

2008-10-01

2008-09-01

Page 2:

2008-05-01

2008-06-01

Page 3:

2008-07-01

2008-11-01

In other words, the data would be inserted into the non-clustered index pages in the order of the clustered key. Sure the actual data in the pages is randomly ordered, BUT, I don't see how the actual insertion of the data is random. The data insertion into the pages is not random, but sequential, so there should be no danger of page splits.

Am I reading this wrong?

__________________________________________________________________________________
SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
Persisting SQL Server Index-Usage Statistics with MERGE[/url]
Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 12

The non-clustered index stores the key values in order. The leaf part of the B-tree has the clustered key. So the pages would look like (AFTER the index was defragmented):

FKID RowID (page)

1000 2 1

1500 6 1

2000 4 2

3000 3 2

4000 5 3

8000 1 3

During the load, assuming it loaded in the same order as the clustered ID, you'd actually end up with something more akin to this

FKID RowID (page)

1000 2 1

1500 6 1

2000 4 2

3000 3 3

4000 5 3

8000 1 4

Because of the page splits.

As you can then see - you have 2 pages that are half full. The other part that isn't clear is that the numbering above is the LOGICAL ordering: the physical order of the pages is even worse than that, since they would be jumbled up on the disk as well.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Marios Philippopoulos SSC Guru Points: 57190 More actions · Answer 13

I've always thought that ONLY the clustered index stores its key values in (logical) order, and that storage of the non-clustered index data just follows the order of the clustered index keys.

Are you sure the non-clustered index stores its keys in order? I don't think that's the case.

__________________________________________________________________________________
SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
Persisting SQL Server Index-Usage Statistics with MERGE[/url]
Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 14

Marios Philippopoulos (10/16/2008)
I've always thought that ONLY the clustered index stores its key values in (logical) order, and that storage of the non-clustered index data just follows the order of the clustered index keys.
Are you sure the non-clustered index stores its keys in order? I don't think that's the case.

Per Books Online:

Nonclustered indexes have the same B-tree structure as clustered indexes, except for the following significant differences:
The data rows of the underlying table are not sorted and stored in order based on their nonclustered keys.
The leaf layer of a nonclustered index is made up of index pages instead of data pages.
Nonclustered indexes can be defined on a table or view with a clustered index or a heap. Each index row in the nonclustered index contains the nonclustered key value and a row locator. This locator points to the data row in the clustered index or heap having the key value.

So the B-tree structure of a clustered and non-clustered are the same at the higher levels, meaning the key values are stored in order (with a small caveat - they aren't necessarily ordered WITHIN each page). It's just what you find "at the bottom" that is radically different.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?