Physical File Defragmentation

  • Comments posted to this topic are about the item Physical File Defragmentation

    Brad M. McGehee
    DBA

  • I don't know if anyone is still looking at this, I wanted to post a comment on this topic on Friday but the discussion section for this editorial seemed to not work then.

    I highly recommend Contig, which is one of the many free tools available from Microsoft's SysInternals website:

    http://technet.microsoft.com/en-us/sysinternals/default.aspx

    This allows you to defragment individual files if you wish, and can do so fairly quickly.

  • I use contig as well. I created a SQL job to run this periodically (once a month usually) It takes very little time and I've never have a conflict arise from this. Most of our disk space is on a SAN but this still gets done on SAN stored files. I've seen some discussion as to the utility of physical defragmentation on SAN but keeping the files defragmented takes such little time if its being done frequently as opposed to a once a year effort that it just seemed like there was no harm being done with more frequent defragmentation.

    Francis

  • Sorry, a publishing issue. I will re-run this editorial on Friday.

  • Before I knew what I was doing, I had a database set up with autogrow at 10% and autoshrink was on too. After a year or so, the drive was at 87% fragmentation, and the database was slow enough to get rear-ended by a glacier. Detached the database (can't defrag it while it's in use) over a weekend, defragged the drive, and got performance WAY up.

    Later, I learned a bit more, and turned off autoshrink and set the "initial size" and growth increment to more reasonable amounts, and it didn't fragment much over the next 6 years, and performance stayed high.

    So, yeah, it matters. At least, on Server 2003 with SQL 2000 on it, it sure did for me.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Does it depend on the architecture?

    Linchi's (somewhat inconclusive) article only refers to SAN, but what about RAID, or any other intermediate storage layer based architecture?

  • My experience is with good old PATA/SATA drives, so I can't comment on anything beefier.

    I've tried using the built-in microsoft defragmenter, contig, Raxco PerfectDisk, Diskeeper, and most recently O&O Defrag. I can tell you from my completely unscientific testing that O&O has performed the best. While it's true that most defrag tools are geared more towards many files, O&O has also done a fine job of defragmenting my large MDF's. I've even done so with lesser databases while they were online with no noticeable performance hit.

    I will point out that the absolute best way I've found to defragment these large database files (if you have the disk space and can go offline) it to not use a tool. It can be done by detaching, moving to another drive, formatting the original drive, moving it back and attaching. Optionally you could also just leave it on the new drive (disclaimer: backups, verification, yada yada yada).

    Again, my experiences are with smaller (usually less than 100GB) databases in non-mission-critical applications on local storage.

  • CLRP (5/1/2009)


    Does it depend on the architecture?

    Linchi's (somewhat inconclusive) article only refers to SAN, but what about RAID, or any other intermediate storage layer based architecture?

    Or even the cloud, where this type of issue would be entirely abstracted... one can have no control on fragmentation there except to drop and rewrite the entire DB, and pray the write is in a fairly contiguous stream on whatever "platters" are involved.

  • I've toyed around with a few programs but I'm always concerned about which ones are safe to use so seeing a discussion like this develop is nice.

    We currently have two SAN environments (Fibre for the old and IP-based iSCSI for the new) and both are unfortunately setup a little different when it comes to volumes and RAIDS. I've seen differences in how the fragmentation builds up between the two as well.

    I try to never defrag any of the production drives during production hours.

    I've also made a habit of not defragging database files while it's still online. I'm still concerned about data integrity.

    The few questions I have are:

    1. Is Contig the safest free product to use?

    2. If you are presizing the files and turning off autogrowth, what's the best way to handle a file reaching it's max? Are you running something to let you know when it gets close to the max so that you can increase the size? What's the best way to handle that?

  • Physical fragmentation hurts sequential file IO signficantly. However; almost all IO operations on SQL data files are nonsequential reads or writes. That means that the head is going to have to move anyway for the next read or write. Eliminating physical file fragmentation may not have any effect at all, or in fact, may HURT performance in a situation like this depending on the physical location of the most often used data pages.

    This is a debate we have constantly. First of all, whenever SQL allocates a new extent for an object, it is likely NOT contiguous with the previous extent for this objects. So if a table scan is performed, after one extent is read, it will require a head seek and rotational delay before the next extent can be read even if the file is not fragmented at all.

    But think of this, if the most recent data is what is read from file most often, then this is what will cause the file growth. If this happens across several databases on a single set of disks, then physical file fragmentation may actually decrease the distance that the head has to move on an average seek - the most recent data is interleaved on the "fragmented" portions of the files. If the files are not fragmented, then moving from one file to another for a physical head seek may require the head to move farther, thus increasing delay.

    What is much more important is to minimize the extent fragmentation for any scans (and range scans happen as parts of seeks as well). Minimizing the extent fragmentation with reindexing (not index defragmentation - that will not improve extent fragmentation) allows for better effectiveness of the readaheads in SQL, because the next page in the extent is more likely to be the next page logically that is needed for the query.

    So where scans are concerned, there is no question that eliminating the extent fragmentation will help performance. But for physical fragmentation, I have not been able in my testing to really find any difference. Theoretically, is sounds as if it might hurt in some situations, but not all. One thing I am certain of is that it will not always help performance, even if there was physical disk fragmentation previously. And as you mentioned, when SANs are introduced, then everything changes. As a matter of fact, I would highly recommend AGAINST using windows for defragging the files if they sit on a SAN. If you defrag at all, then use the SAN utilities for that because what windows may see as fragmentation may not be fragmentation at all after the SAN has handled the files.

    My recommendation is that you set autogrow on your files, but proactively grow files. THis prevents blocking events that can be associated with autogrowth. Use a reasonably sized fixed size for you autogrowth increment - something like 100 MB. Do not use a percent since when the files are large, it can take a long time to grow files 10% - especially if fast file allocation is not available in your edition of SQL. But just let the auto-growth be a safety valve so that your system doesn't just stop processing during the day. You should always proactively allocate space in the files so that Autogrow never kicks in.

    But as for physical fragmentation? The log file is the only file where sequential reads/writes are done. The data files will have nonsequential reads and writes. Physical file fragmentation is most relevant with Sequential reads/writes. Defragging your files may give benefit, or may not, and it may hurt your performance - it depends entirely on the pattern of data usage in your database files. So there is no easy answer on it. My experience and testing has shown that it makes negligible difference if any at all.

    Just my 2 cents worth

  • You bring up some good points Steve.

    Perhaps the real answer to all this is to better partition our databases into multiple files?

    In this way we cut down on the surface area that physical fragmentation can affect.

    As you say, the next page may be at the very end of the file, if we can keep those files smaller, we may cut down on the distance the heads have to travel.

    Edit:

    I would think that whether defragmenting a database can *hurt* performance or not, there will always be a point where not defragmenting hurts performance even more. Perhaps we can even calculate that percentage of fragmentation threshold based on the size of the file and the number of pages?

  • Jason Crider (5/1/2009)

    2. If you are presizing the files and turning off autogrowth, what's the best way to handle a file reaching it's max? Are you running something to let you know when it gets close to the max so that you can increase the size? What's the best way to handle that?

    I always leave autogrowth on to cover potential unexpected events. But as I mentioned in my editorial, I don't recommend you use autogrowth to grow your files. You should actively manage your files yourself.

    Brad M. McGehee
    DBA

  • I always leave autogrowth on to cover potential unexpected events. But as I mentioned in my editorial, I don't recommend you use autogrowth to grow your files. You should actively manage your files yourself.

    I myself think that you should leave the autogrowth instead of disallowing it and experience a crash in production (as it was before on SQL sever 6.5 ;-)) because of no more space in your data/log file. You should have performance indicators and dashboards to manage the growth. BTW, while creating a database, consider the size of the initial MDF/LDF as it would be in 3 years. This way, there should not be (or less) fragmentation.

    --
    Philippe RUELLO
    Database Project Manager

  • Wow, this is exactly I was looking for..

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply