SQL Server 2000 Indexing

  • Comments posted to this topic are about the content posted at http://www.sqlservercentral.com/columnists/sjones/sqls

  • I work in a development environment where data is relatively small, even when we do data migration jobs.

    This gives me the luxury of running DBCC DBREINDEX as part of the nightly maintenance plans.

    On a client's site we scheduled in a DBCC DBREINDEX and noticed that a 600Mb database shrunk to 250Mb.

    Obviously we could have shrunk the physical file as well but as data is input at quite a high rate we decided to leave the file as is to prevent an unscheduled  resize happening. 

    This shows that on very large databases there is the potential to save on backup time as well.

  • However David I would suggest monitor that site regularly. The reason you may have gained so much space back could have been due to the number of page splits that have occurred with regards to the database. If you find the data grows drastically over a short period of time and a large amount of space is regained by running DBCC DBREINDEX then you may need to evaluate the FILL FACTOR for the Clustered Index or even change the clustered index to a non-clustered and pick a different index that suffers fewer data shifts and make the clustered, but there you will need to weigh time for queries with this difference.

  • This article is one of the best, most informative I have read on SQL 2000 table indexing. While I have always been able to interpret the output from DBCC ShowContig and act accordingly on the results, I have never fully understood what the relationships between the numbers meant.

    Well done Steve Jones and thanks you

    Steve Walker

    DBA - See Tickets 

  • As always Par Execellent!!!  

  • As usual, nice article Steve. I've read other articles and explanations of showcontig and still came away confused, but this puts it into a more clear light. I also have the luxury of running dbcc dbreindex on our production database, but it does not shrink the data like David experienced (at least not very much). We only run it once per release, or every few months.

    -Vic

  • I am still not 100% clear as to how to decide the FILL FACTOR.

    My understanding was that if you have a high input rate into a table then the indices need a lower fill factor (whatever lower may be).  If your tables are virtually read only then you can get away with a much higher FILL FACTOR.

    If I look at the value for my server using sp_configure this returns zero so I am not sure what this means.

    I could also do with an idiots definition of PAD_INDEX.

  • With SQL if you are adding data to a table and the data will always be inserted as a new row at the end of the data then 100% or 0% (which is 100% btw) is beter with regards to coservation of space.

    Many times folks will put a fill factor of 90% on fields such as a identity field. The problem is they have wasted space that will never be reclaimed and page splits will still be as prevalent.

    Ex. Say you have and indentity field indexed. Now suppose only 10 recors fit per page and at index rebuild there was 37 records.

    Index by page

    Page 1

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    Page 2

    11

    12

    13

    14

    15

    16

    17

    28

    19

    20

    Page 3

    21

    22

    23

    24

    25

    26

    27

    28

    29

    30

    Page 4

    31

    32

    33

    34

    35

    36

    37

    Now add 3 records and thus

    Page 4

    31

    32

    33

    34

    35

    36

    37

    38

    39

    40

    Now add two more records

    Page 4

    31

    32

    33

    34

    35

    Page 5

    36

    37

    38

    39

    40

    41

    42

    Now it might be that SQL will create a new page when 41 is actually inserted I haven't looked to see if it is that smart (I think it would be)

    But the point is the data in pages 1-3 are never affected again since the data is at the end of the index. If you had used a fill factor of 90 then only 9 records would be in each page to begin with, with 1 extra slot left in the example.

    Due to this 1 - 37 records would have taken 5 pages to begin with instead of 4. If say you had had 37000000 records you would have a lot of unused space in the early pages never ging to be reclaimed. So in this case a fill factor of 100% is best.

    That is what I am talking about.

    Now for PAD_INDEX , this will by default be the same as FILL FACTOR. Basically PAD_INDEX is not the non-leaf pages. It works just like FILL FACTOR, you can however override it with a different value if you choose.

    From BOL

    "Specifies the space to leave open on each page (node) in the intermediate levels of the index. The PAD_INDEX option is useful only when FILLFACTOR is specified, because PAD_INDEX uses the percentage specified by FILLFACTOR. By default, SQL Server ensures that each index page has enough empty space to accommodate at least one row of the maximum size the index can have, given the set of keys on the intermediate pages. If the percentage specified for FILLFACTOR is not large enough to accommodate one row, SQL Server internally overrides the percentage to allow the minimum. "

    "Note  The number of rows on an intermediate index page is never less than two, regardless of how low the value of FILLFACTOR."

    However I will say this.

    Basically these will define the range between highest value and lowest on a specific page. It uses these like a Rolladex to decide which page contains the given data from the index. There will always be enough room for 1 more record after an index rebuild no matter what but I feel most folks can get away with nearer 100% no matter what there fill factor but I am still research true effect based on condition.

    Kinda Brain-Dead right now, hope that helps.

  • Great thanks, I think the FILLFACTOR and PAD_INDEX are prime candidates for an article.

    So a page split is expensive because it copies half the records from the last page into the new page?

    If you have a low fill factor, say 20%, in your example this equates to 2 records per page.  Doesn't this mean that every 2nd insert causes a page split?

  • David,

    No, though that's the impression I got at first. The key to it is that the FILLFACTOR determines how much of the index is filled ***when it's first built***. After that, the index gets filled until the point where no further entries can be made, at which point a page split occurs, and as you rightly say, half of the records get copied into the new page, and the record pointers adjusted accordingly.  Only when the index is rebuilt does the FILLFACTOR setting get re-applied to the index, and it then becomes 20% full again.  Otherwise, regardless of the its setting, you'd always end up getting page splits the moment a new record was inserted if the FILLFACTOR was constantly maintained, 'on-the-fly'.

    With regard to the page splits, it's expensive because a new page has to be allocated, which may also mean a new extent has to be allocated if the current one is full. In addition, the records have to be moved (ie: both copied AND marked as deleted), and the pointers to the records have to be re-adjusted - all lots of performance sapping disk I/O.


    Jon

  • So, if you rebuild and index with a low fill factor you get lots of pages with lots of room in them and therefore more space to insert records before the split occurs?

    The downside being that you use a lot of disk space (which is a cheap as dirt)!

  • Yes, that's correct, and hopefully you can schedule a re-build of the indexes before the page splitting starts to occur.

    The other downside is that less-full indexes also increase disk I/O. If you have an index that's 20% full, you still have to read in the entire index page, even though 80% of it is empty. So compared with an optimal index, where no page splits are occuring but every index page is 100% full (unlikely scenario unless you're not inserting any data into your database, but...), you'll end up having to perform 5 times as much disk I/O in order to read the index, as the index pointers / data will be spread over 5 times as many pages.

    I think this is one of the reasons Microsoft recommend that you leave FILLFACTOR settings to SQL Server - you can cripple a system's performance by creating lots of half empty indexes which increase disk I/O. Also, your backups are going to be bigger too, as there's lots of *almost* empty pages in the index which have to be backed up.

    For the most part, I'd look at leaving FILLFACTOR alone, with one exception - "read-only" MIS/reporting databases which are only going to be restored from an OLTP backup, and are never going to have data added to them. In that case, fill your indexes 100%, as it minimises the space they use on disk, which in turn minimises disk I/O when queries read the index, and you never need to worry about page splits because no data is added, so they never happen.


    Jon

  • All of your articles has been excellent.  This one is not the exception.

    I agree with you that indexing and all about performance is un art.  But in this art I am a beginer, and your articles are a good source of knowledge.

    Thanks.


    Leonel E. Umaña Araya
    leo_umana@hotmail.com

  • Thanks for the comments. This was an update of something I'd written a few years back, but it still applies.

    Fillfactor and your indexing is something that needs regular maintenance. If this is a very active table, lots of updatees, inserts, and deletes, you will need to rebuild often to maintain performance. As pointed out by Antares and Jon Reade above. If you are dealing with slowly or non changing data, you can probably set your fillfactor, defragment and leave things alone.

  • Hi

    Great article - thanks.

    Is there anything similar for a table?  When administering Oracle I would check both tables and indexes for fragmentation and want to do the same on SQL Server but so far have not been able to find anything that reports table fragmentation.

    Also, how do I check each index?  Unless I'm missing something here it appears that dbcc showcontig reports on a single index, certainly if I run it on a table with 5 indexes I get one set of results.

    Regards

    Karl

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply