SQLServerCentral Article

SQL Server 2000 Indexing

,

Introduction

One of the keys to SQL Server performance is ensuring that you have the proper indexes on a table so that any queries written against this table can run efficiently. There are more articles written about designing indexes, choosing columns, etc for optimizing performance, so I will refrain from repeating most of what is written elsewhere. I have included a few resources at the end of this article for this topic.

However once you have built the indexes, there is still work to be done. As your data sets grow over time, SQL Server will continue to rebuild indexes and move data around as efficiently as possible. This happens in a number of ways, but the result is that you may need to perform maintenance on your indexes over time despite all of the automatic tools built into SQL Server. This article will discuss some of the issues with data growth over time as well as a technique to find tables in need of maintenance and how to perform this maintenance.

NOTE: I am including some features of SQL 2000 in this article, but the techniques are still useful in SQL v7. and v6.5 with minor modifications.

What happens over time?

If SQL Server includes auto statistic updating, a query optimizer that can learn to be more efficient with your queries, etc., why do we need to perform maintenance? Well, let's examine what happens over time.

When you build an index on a table (let's assume a clustered index), SQL Sever parcels the data across pages and extents. With v7.x and above, extents can be shared between objects (with v6.5 extents contain a single object). As a result, let's assume you create a table with rows that are < of a page in size. If you have 20 rows, then you have 5 pages worth of data. Is your data stored on 5 pages? Only if your FILLFACTOR is 100%. The fillfactor determines how much, percentage wise, your pages are filled. let's assume a FILLFACTOR of 50%, then you would have 10 pages of data allocated to this table. This is getting complicated quickly, but let's examine it a bit more.

If you assume that we expand this example over time, we may grow to 100 pages of data. These (at a minimum) require 7 extents if this object does not share any extents. Each page within the extents links to another page with a pointer. The next page in the chain, however, may not be in the same extent. Therefore as we read the pages, we may need to "switch" to another extent.

The simplest example is assume we take 3 consecutive pages of data in the following order:

Extent 1      Extent 2
Page n        Page n + 1
Page n + 2

These are any three pages where page n links to page n+1 next, then to page

n+2 and so on. To read these three pages we read extent 1, then switch to extent

2, then switch back to extent 1. These "switches" do not necessarily

entail physical I/O, but all of these switches add up. They may not be a big deal on

your local server or even a lightly loaded server, a web application that has

hundreds or thousands of users could see a large performance impact from

repeated scans of this table. Why does the table end up looking like this? This

is how the table is designed to function over time. SQL Server will allocate

space for each row based on the space available at that time. As a result, while

a clustered index stores the data in physical order on a page, the pages may not

be in physical order. Instead each page has a linkage to the next page in the

sequence. Just as your hard disk can become fragmented over time as you delete

and insert files, the allocations of pages for a table can be fragmented over

time across extents as the data changes.

So why doesn't SQL Server just rebuild the indexes? I am

not sure if I would even want it to do so. I would hate for this to occur right

after a large web marketing campaign! Instead the engineers in Redmond have left

it up to the DBA to track this fragmentation and repair it as necessary. How do

we remove this fragmentation? Read on...

Running DBCC SHOWCONTIG

Prior to SQL Server 2000, you had to first get the object ID using the following

command

select object_id('<object name>')

For the user table, I ran

select object_id('user')

This returned me some long number (from sysobjects) that means nothing to me, but the

SQL team in Redmond must use this often and did not feel like including the join

in their code. I guess someone complained long and loud enough because in SQL

2000 you can use the name of the object in dbcc showcontig like this:

dbcc showcontig (user)

This produces the following statistics on your indexes:

DBCC SHOWCONTIG scanning 'User' table...
Table:'User' (962102468); index ID: 1, database ID: 7
TABLE level scan performed.
-Pages Scanned................................: 899
-Extents Scanned..............................: 121
-Extent Switches..............................: 897
-Avg. Pages per Extent........................: 7.4
-Scan Density [Best Count:Actual Count].......: 12.58% [113:898]
-Logical Scan Fragmentation ..................: 99.89%
-Extent Scan Fragmentation ...................: 80.99%
-Avg. Bytes Free per Page.....................: 2606.5
-Avg. Page Density (full).....................: 67.80%

Let's decode this output:

Pages Scanned - Gives the # physical pages in the database scanned in this index. Not

really relevant, but gives you the total size occupied by this index ( each

page is 8k)

Extents scanned - An extent is 8 pages. So this should be pretty close to Pages Scanned

/ 8. In this example we have 121 extents which is 968 pages. Since the index is

only 899 pages, we have a number of shared extents. Not necessarily a bad thing,

but this gives you an idea that you are slightly fragmented. Of course, you do

not know how much physical fragmentation this is which can contribute to longer

query times. The minimum number for the 899 pages above would be 113. (899/8)

Extent Switches - # times the scan forced a switch from one extent to another. As this gets

close to # pages, you have pretty high fragmentation. . If you see number close

to # pages, then you may want to rebuild the index. See a Detailed Example.

Average Pages/Extent - Gives the math of Pages Scanned / Extents Scanned. Not of any great value

other than you don't have to run Calculator to get the number. Fully populated

extents would give a value of 8 here. I guess this is good for me

Scan Density [Best Count:Actual Count].......: 12.58% [113:898]

This is the tough one. This shows a percentage and 2 numbers separated by

a colon. I explain this as I missed it the first two times around. The

percentage is the result of dividing number 1 (113) by number 2 (898). So what

are the two numbers?

The first number is the ideal number of extent changes if everything was

linked in the a contiguous chain. The second number is the number of extents

moved through which is 1 more than the number of extent switches (by

definition). This is really another view of fragmentation. 100% would be minimal

(I hate to say zero) fragmentation. As you can see, this table is fairly

fragmented. The scan is constantly switching back and forth from one extent to

another instead of finding a link from one page to another within an extent.

Logical Scan Fragmentation ..................: 99.89%

The official definition from Books Online (v7.x and 2000 Beta 2) is:

"Percentage of out-of-order pages returned from scanning the leaf

pages of an index. This number is not relevant to heaps and text indexes. An out

of order page is one for which the next page indicated in an IAM is a different

page than the page pointed to by the next page pointer in the leaf page."

I am still not sure what this means. I have not gotten a good explanation

of this anywhere, so here is my best interpretation. This shows how many pages

(as a percentage) in the index which have a pointer to the next page that is

different than the pointer to the next page that is stored in the leaf (data)

page. This is only relevant for clustered indexes as the data (leaf pages)

should be physically in the order of the clustered index.

So how do you use this? If you figure it out, let me know. Since this number is

high for me and other items lead me to think this index is fragmented, I think

this is bad. So try for a low number in OLAP systems and a medium number in OLTP

systems.

Extent

Scan Fragmentation ...................: 80.99%

Again, here is the official BOL explanation (v7.x and 2000 Beta 2).

Percentage of out-of-order extents in scanning the leaf pages of an

index. This number is not relevant to heaps. An out-of-order extent is one for

which the extent containing the current page for an index is not physically the

next extent after the extent containing the previous page for an index.

This shows the percentage of pages where the next page in the index is

not physically located next to the current page. This tells me the I/O system

must move fairly often (80% of the time) when scanning the index to find the

next page. A Detailed Explanation is given below.

Avg. Bytes Free per Page.....................: 2606.5

This tells you (on average) how many bytes are free per page. Since a

page is 8096 bytes, it appears that I have on average, filled about 68% of the

pages. This can be good or bad. If this is an OLTP system with frequent inserts

to this table, then with more free space per page, there is less likely going to

be a page split when an insert occurs. You want to monitor this on tables with

heavy activity and periodically rebuild this index to spread out the data and

create free space on pages. Of course you do this during periods of low activity

(read as 3am) so that there is free space and page splits are minimal during

periods of high activity (when everyone can yell at you for a slow database).

Since this is an OLTP system, I am in good pretty shape.

If this were an OLAP system, then I would rather have this be closer to

zero since most of the activity would be read based and I would want the reads

to grab as much data as possible from each page (to reduce the time it takes to

read the index). As your OLAP table grows, this becomes more critical and can

impact (substantially) the query time for a query to complete.

(build test data of 10,000,000 rows and test index of 99% v 1%

fillfactor).

Avg. Page Density (full).....................: 67.80%

This gives the percentage based on the previous number (I calculated the

number above as 1 - (2606.5 / 8096) and rounded up.

So what does this all mean?

Well, to me this says I need to defragment this table. There are a large

number of extent switches that occur, each of which could potentially cause a

large I/O cost to queries using this table and index.

Defragmenting Indexes

In previous versions, and I guess in this one you can still rebuild the

clustered index which causes the server to read this clustered index and then

begin moving the data to new extents and pages which should start putting

everything back in physical order and reduce fragmentation. There is another

way:

In SQL 2000, the SQL developers added another DBCC option which is INDEXDEFRAG.

This can defragment both clustered and nonclustered indexes which (according to

BOL) should improve performance as the physical order will match the logical

order and (theoretically) reduce the I/O required by the server to scan the

index.

A couple of caveats about this: If your index spans files, then it defragments

each file separately and does NOT move pages between files. Not a good thing if

you have added a new filegroup and allowed objects to grow across files. If you need to move a table to a single filegroup, you need to move the clustered index.

A good thing that is way, way, way, extremely, absolutely, without-a-doubt long

overdue is the reporting of progress by DBCC INDEXDEFRAG as it works. Every 5

minutes this will report the estimated progress back to the user. Of course many

of us who have installed software with a feedback progress bar often wonder why

the bar moves quickly to 99% and remains there for 80% of the total install

time. So time will tell whether this is of any use, but I think some feedback is

better than none.

Another addition that is way, way, way, (you get the idea) overdue is the ability to stop the DBCC. I cannot tell you how many late nights I wished I could do this in v6.5. In fact I often held off on running DBCC until the latest possible time since I could not stop it once it started. (well, there was that O-N-O-F-F switch.)

Still one further addition, that ranks above the other two is that this is an online operation. Let me repeat that. Its an ONLINE operation. It does not hold locks on the table since it operates as a series of short transactions to move pages. It also operates more quickly than a rebuild of a new index and the time required is related to the amount of fragmentation for the object. Of course this means that you must have extensive log space if this is a large index. Something to keep in mind and watch the log growth when you run this to see how much space it eats up. Everything's a trade-off though.

Conclusion

Maintaining indexes still requires DBA intervention for optimal performance

for your database server. How often do you have to maintain these indexes? That

depends (like everything else in the RBDMS world) on your system. This is still

an art and requires some practice, testing, and careful notes over time.

While Microsoft continues to improve this process over time and automate some

of the work, it will probably still require some DBA intervention for the

foreseeable future. Practice with tuning indexes is an often overlooked

maintenance item, but one which will become more important to your users as

their numbers swell.

Good luck and as always, please include yout comments, questions, or suggestions at the bottom in the "Your Opionion" section.

References

Here are some of the references for Indexing on the Internet and in publication. I have used

these in the past, though not necessarily for this article. These are the ones I recommend.

Return to Steve Jones' Home

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating