Fill factors, index fragmentation and indexing strategy

Question

Fill factors, index fragmentation and indexing strategy

Viewing 15 posts - 16 through 30 (of 30 total)

You must be logged in to reply to this topic. Login to reply

Marios Philippopoulos SSC Guru Points: 57190 More actions · Answer 1

Matt Miller (10/16/2008)
Marios Philippopoulos (10/16/2008)
I've always thought that ONLY the clustered index stores its key values in (logical) order, and that storage of the non-clustered index data just follows the order of the clustered index keys.
Are you sure the non-clustered index stores its keys in order? I don't think that's the case.
Per Books Online:
Nonclustered indexes have the same B-tree structure as clustered indexes, except for the following significant differences:
The data rows of the underlying table are not sorted and stored in order based on their nonclustered keys.
The leaf layer of a nonclustered index is made up of index pages instead of data pages.
Nonclustered indexes can be defined on a table or view with a clustered index or a heap. Each index row in the nonclustered index contains the nonclustered key value and a row locator. This locator points to the data row in the clustered index or heap having the key value.
So the B-tree structure of a clustered and non-clustered are the same at the higher levels, meaning the key values are stored in order (with a small caveat - they aren't necessarily ordered WITHIN each page). It's just what you find "at the bottom" that is radically different.

Same B-structure does not mean that the data is sorted in ASC order in the non-clustered-idx pages, either from page to page or within each page.

As the reference quoted clearly states:

The data rows of the underlying table are not sorted and stored in order based on their nonclustered keys.

I think this statement corroborates my earlier interpretation of how data will be ordered in the non-cl idx pages.

__________________________________________________________________________________
SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
Persisting SQL Server Index-Usage Statistics with MERGE[/url]
Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 2

The data rows of the underlying table are not sorted and stored in order based on their nonclustered keys.
I think this statement corroborates my earlier interpretation of how data will be ordered in the non-cl idx pages.

Actually - no. This is just highlighting the difference between clustered and non-clustered. Simply put, in a clustered index, the leaf structure IS the data (meaning - the physical rows of the actual table). In a non-clustered index, the leaf structure hold a pointer to the actual data (so, either the clustered key if there is a clustered index, or the Heap key if there is no clustered index). So - the data rows stay ordered but the Heap or clustered key, but the useable parts of the index (i.e. the "key") is ordered, since that's how the index works.

As to ordering, a B-tree entails ordering. If you don't order it, there is no B-tree.

That being said - I'm sure one of our mavens (like Gail) will show up and explain this better than I.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Marios Philippopoulos SSC Guru Points: 57190 More actions · Answer 3

Matt Miller (10/16/2008)
The data rows of the underlying table are not sorted and stored in order based on their nonclustered keys.
I think this statement corroborates my earlier interpretation of how data will be ordered in the non-cl idx pages.
Actually - no. This is just highlighting the difference between clustered and non-clustered. Simply put, in a clustered index, the leaf structure IS the data (meaning - the physical rows of the actual table). In a non-clustered index, the leaf structure hold a pointer to the actual data (so, either the clustered key if there is a clustered index, or the Heap key if there is no clustered index).
As to ordering, a B-tree entails ordering. If you don't order it, there is no B-tree.
That being said - I'm sure one of our mavens (like Gail) will show up and explain this better than I.

Actually, that "a B-tree entails ordering", that makes sense to me. You may have convinced me after all... 🙂

__________________________________________________________________________________
SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
Persisting SQL Server Index-Usage Statistics with MERGE[/url]
Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

Marios Philippopoulos SSC Guru Points: 57190 More actions · Answer 4

I think it's settled. I will ask the developer to create the non-cl. idxs with 60% fill factor.

Thanks for your help.

__________________________________________________________________________________
SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
Persisting SQL Server Index-Usage Statistics with MERGE[/url]
Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

Jeff Moden SSC Guru Points: 1003867 More actions · Answer 5

Marios Philippopoulos (10/16/2008)
Thank you both for your replies.
I'm not sure I follow though. How would the non-clustered indexes get fragmented? The table population is done all at once, in a sequential fashion by the clustered PK. Wouldn't the non-clustered-index pages be also filled sequentially in a "from-top-to-bottom" fashion?
There should be no page splits in this scenario.

I thought you said the table was populated "one row at a time".

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003867 More actions · Answer 6

Marios Philippopoulos (10/16/2008)
I think it's settled. I will ask the developer to create the non-cl. idxs with 60% fill factor.
Thanks for your help.

Based on the fact that you have now stated that the table will be populated all at once instead of one row at a time, my recommendation would be to simply drop the table and rebuild it before you populate it... add the indexes and PK when it's done being populated and use a Fill Factor of 100% on all the indexes.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Marios Philippopoulos SSC Guru Points: 57190 More actions · Answer 7

Jeff Moden (10/16/2008)
Marios Philippopoulos (10/16/2008)
Thank you both for your replies.
I'm not sure I follow though. How would the non-clustered indexes get fragmented? The table population is done all at once, in a sequential fashion by the clustered PK. Wouldn't the non-clustered-index pages be also filled sequentially in a "from-top-to-bottom" fashion?
There should be no page splits in this scenario.
I thought you said the table was populated "one row at a time".

The table is populated in a single session. All tables are populated one row at a time, are they not?

__________________________________________________________________________________
SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
Persisting SQL Server Index-Usage Statistics with MERGE[/url]
Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 8

Marios Philippopoulos (10/16/2008)
I think it's settled. I will ask the developer to create the non-cl. idxs with 60% fill factor.
Thanks for your help.

Since (it seems that) the data is populated in a single shot, here is the optimal mechanism:

1) Drop NC indexes

2) Truncate table

3) Load data

4) recreate NC indexes with 100% fill factor

BTW, the PK CI should have 100% fill factor too.

There - TheSQLGuru has spoken. As it is written, so let it be done! 😀

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Marios Philippopoulos SSC Guru Points: 57190 More actions · Answer 9

TheSQLGuru (10/17/2008)
Marios Philippopoulos (10/16/2008)
I think it's settled. I will ask the developer to create the non-cl. idxs with 60% fill factor.
Thanks for your help.
Since (it seems that) the data is populated in a single shot, here is the optimal mechanism:
1) Drop NC indexes
2) Truncate table
3) Load data
4) recreate NC indexes with 100% fill factor
BTW, the PK CI should have 100% fill factor too.
There - TheSQLGuru has spoken. As it is written, so let it be done! 😀

Thank you, the main difference of your suggestion with the previous one is recreating the NC indexes every time. You win in terms of disk space and less fragmentation, but you lose in terms of having to recreate the indexes time after time.

I will take your suggestion into consideration.

Thanks again!

__________________________________________________________________________________
SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
Persisting SQL Server Index-Usage Statistics with MERGE[/url]
Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 10

Marios Philippopoulos (10/17/2008)
TheSQLGuru (10/17/2008)
Marios Philippopoulos (10/16/2008)
I think it's settled. I will ask the developer to create the non-cl. idxs with 60% fill factor.
Thanks for your help.
Since (it seems that) the data is populated in a single shot, here is the optimal mechanism:
1) Drop NC indexes
2) Truncate table
3) Load data
4) recreate NC indexes with 100% fill factor
BTW, the PK CI should have 100% fill factor too.
There - TheSQLGuru has spoken. As it is written, so let it be done! 😀
Thank you, the main difference of your suggestion with the previous one is recreating the NC indexes every time. You win in terms of disk space and less fragmentation, but you lose in terms of having to recreate the indexes time after time.
I will take your suggestion into consideration.
Thanks again!

Actually - you'll note that Jeff (and I previously) also recommended dropping the indexes and rebuilding. Jeff just mentioned dropping the entire table (which will kill the indexes as well).

You also gain in terms of performance. Loading a table with NC indexes attached is quite a bit slower than having no indexes. Besides - since you're putting them in place to speed you up, you probably want them as pristine as you can get them. Building them from scratch after the insert is the best way to do that (and again - you will save time doing it this way).

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Marios Philippopoulos SSC Guru Points: 57190 More actions · Answer 11

Matt Miller (10/17/2008)
Marios Philippopoulos (10/17/2008)
TheSQLGuru (10/17/2008)
Marios Philippopoulos (10/16/2008)
I think it's settled. I will ask the developer to create the non-cl. idxs with 60% fill factor.
Thanks for your help.
Since (it seems that) the data is populated in a single shot, here is the optimal mechanism:
1) Drop NC indexes
2) Truncate table
3) Load data
4) recreate NC indexes with 100% fill factor
BTW, the PK CI should have 100% fill factor too.
There - TheSQLGuru has spoken. As it is written, so let it be done! 😀
Thank you, the main difference of your suggestion with the previous one is recreating the NC indexes every time. You win in terms of disk space and less fragmentation, but you lose in terms of having to recreate the indexes time after time.
I will take your suggestion into consideration.
Thanks again!
Actually - you'll note that Jeff (and I previously) also recommended dropping the indexes and rebuilding. Jeff just mentioned dropping the entire table (which will kill the indexes as well).
You also gain in terms of performance. Loading a table with NC indexes attached is quite a bit slower than having no indexes. Besides - since you're putting them in place to speed you up, you probably want them as pristine as you can get them. Building them from scratch after the insert is the best way to do that (and again - you will save time doing it this way).

So it looks like the only difference between what you guys are saying and what the TheSQLGuru is saying is in the fill factor for the NC indexes.

Based on our earlier discussion, I'm inclined towards having a low fill factor (say 60%) for the NC indexes.

Would the TheSQLGuru pls comment on that point, as you are advocating a 100% fill factor?

__________________________________________________________________________________
SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
Persisting SQL Server Index-Usage Statistics with MERGE[/url]
Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 12

So it looks like the only difference between what you guys are saying and what the TheSQLGuru is saying is in the fill factor for the NC indexes.
Based on our earlier discussion, I'm inclined towards having a low fill factor (say 60%) for the NC indexes.
Would the TheSQLGuru pls comment on that point, as you are advocating a 100% fill factor?

From what I gathered in the somewhat lengthy discussions was that the data was inserted and then remained static until the next truncate/reload activity. There is absolutely NO reason to not have 100% fill factor in this case. You will never modify the data --> never have page splits --> 100% fill factor will lead to the tightest index and the best performance for NO potential degredation in DML performance (since that doesn't happen) nor index fragmentation either.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 13

TheSQLGuru (10/17/2008)
So it looks like the only difference between what you guys are saying and what the TheSQLGuru is saying is in the fill factor for the NC indexes.
Based on our earlier discussion, I'm inclined towards having a low fill factor (say 60%) for the NC indexes.
Would the TheSQLGuru pls comment on that point, as you are advocating a 100% fill factor?
From what I gathered in the somewhat lengthy discussions was that the data was inserted and then remained static until the next truncate/reload activity. There is absolutely NO reason to not have 100% fill factor in this case. You will never modify the data --> never have page splits --> 100% fill factor will lead to the tightest index and the best performance for NO potential degredation in DML performance (since that doesn't happen) nor index fragmentation either.

Agreed, as long as your dropping and recreating.

The fill factor was if you had continuous inserts or if you were not going to drop the indexes.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Marios Philippopoulos SSC Guru Points: 57190 More actions · Answer 14

Matt Miller (10/17/2008)
TheSQLGuru (10/17/2008)
So it looks like the only difference between what you guys are saying and what the TheSQLGuru is saying is in the fill factor for the NC indexes.
Based on our earlier discussion, I'm inclined towards having a low fill factor (say 60%) for the NC indexes.
Would the TheSQLGuru pls comment on that point, as you are advocating a 100% fill factor?
From what I gathered in the somewhat lengthy discussions was that the data was inserted and then remained static until the next truncate/reload activity. There is absolutely NO reason to not have 100% fill factor in this case. You will never modify the data --> never have page splits --> 100% fill factor will lead to the tightest index and the best performance for NO potential degredation in DML performance (since that doesn't happen) nor index fragmentation either.
Agreed, as long as your dropping and recreating.
The fill factor was if you had continuous inserts or if you were not going to drop the indexes.

That was my original point, ie. no page splits.

Matt I thought you had mentioned that there would be page splits and a low fill factor was needed.

Remember our discussion earlier about the B-tree etc.?

__________________________________________________________________________________
SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
Persisting SQL Server Index-Usage Statistics with MERGE[/url]
Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 15

Marios Philippopoulos (10/17/2008)
Matt Miller (10/17/2008)
TheSQLGuru (10/17/2008)
So it looks like the only difference between what you guys are saying and what the TheSQLGuru is saying is in the fill factor for the NC indexes.
Based on our earlier discussion, I'm inclined towards having a low fill factor (say 60%) for the NC indexes.
Would the TheSQLGuru pls comment on that point, as you are advocating a 100% fill factor?
From what I gathered in the somewhat lengthy discussions was that the data was inserted and then remained static until the next truncate/reload activity. There is absolutely NO reason to not have 100% fill factor in this case. You will never modify the data --> never have page splits --> 100% fill factor will lead to the tightest index and the best performance for NO potential degredation in DML performance (since that doesn't happen) nor index fragmentation either.
Agreed, as long as your dropping and recreating.
The fill factor was if you had continuous inserts or if you were not going to drop the indexes.
That was my original point, ie. no page splits.
Matt I thought you had mentioned that there would be page splits and a low fill factor was needed.
Remember our discussion earlier about the B-tree etc.?

If you leave the indexes in place during the insert (single load or multiple inserts), there will be fragmentation. if this was a table that was updated in a lot of small increments - you'd want the fill factor to counter that. You end up with some fragmentation, but not as much.

On the other hand - since you are loading one single time a day, the very best way is to avoid the fragmentation altogether by deleting before the truncate and recreating the indexes after the load (like Jeff and SQLGuru also mentioned). At that point, since the table is essentially static, there will be NO movement in the data, so no *ongoing* fragmentation on the index, so you can put the fill factor to 100%.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?