Index Fragmentation and SANs

  • OK, so this is a showdown. I have been looking all over the place and can only find vague references to SANs 'muddying' the issues of index fragmentation. I have always kept my index fragmentation on databases in check by rebuilding/reorganizing as necessary (code set referring to a dmv that tells me the % of fragmentation, and then it chooses which method, if either, to use). What I DON'T know is whether or not SANs change this. Is fragmentation seriously no longer a problem with SANs? I will admit, this sounds alien to me, but I don't know enough about SANs to be able to argue either way. Do I or don't I regularly defrag my indexes as a good practice in this realm?

  • SANs aren't going to change index fragmentation(file system fragmentation, yes)

    Index fragmentation is when an index's next logical page is not the next physical page within the file. How the files are stored on disk doesn't affect that.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • You're hitting two different kinds of fragmentation. One is physical file fragmentation (how many parts are scattered across the various physical disk strcutures), and the other is logical (within the disk segments you've been given, how organized/full are the data pages). These are largely independent of each other.

    The SAN tends to somewhat muddy the discussion about the physical file fragmentation (with random access DB's, you might benefit from having your physical file stripes span a LOT of disk sets). The DMV is assessing the logcal fragmentation.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • As Gail and Matt already pointed out, physical file fragmentation and index fragmentation are two different things.

    Is fragmentation seriously no longer a problem with SANs?

    Files are fragmented by definition on a SAN implementing striping. On the other hand, enteprise level SANs have built-in caching features that highly mitigate fragmentation performance issues.

    Linchi Shea has a good blog series on SAN fragmentation here.

    -- Gianluca Sartori

  • As far as I'm aware (and do please correct me if I'm wrong) SQL server doesnt know anything about the disk layout or the file layout and their respective physical fragmentation.

    If it needs pages 1, 27, 3 it will request these in that order. It is the drive controllers that work out the most efficent order to get these pages but still returns them in order that SQL requested them.

    Defragging the indexes is easy compared to the physical defrag of the files on crowded disks.

  • adrienne.lore (3/15/2012)


    Is fragmentation seriously no longer a problem with SANs?

    It depends :). Fragmentation primarily slows things down by a logically sequential operation being effectively a random operation at the physical level (my testing shows this does affect both spindles and SSD's, though spindles much, much more than SSD's).

    On spindles, you end up with your heads moving back and forth from the (closer to the) inside to (closer to the) the outside of the disk multiple times during your operation, and perhaps moving a larger distance. Advanced file systems (and SAN's) use something similar to what back in the old Netware days was termed "elevator seek" or "elevator scan"; i.e. when you have a bunch of requests, order them for minimal head movement, which reduces aggregate latency (though any one request may suffer).

    If your particular SAN LUN is part of a pool of disks shared with dozens of other databases, servers, some file servers, and the Exchange box, and those other apps are always active, then fragmentation of your particular files or the data in them is unlikely to matter as much, since the heads are going to have requests from all over the disk most of the time, anyway.

    If you have dedicated spindles for your database alone, and some of your IO is logically sequential (scans, not seeks, of multi-MB or multi-GB indexes), and so on and so forth, then fragmentation at an internal and external level may well play a part in aggregate throughput.

    Linchi's analysis was well done, but limited in a couple of ways. For one, my very cursory reading indicates it's not unlikely the DB files created used sequential free space chunks, instead of the first chunk in the middle of the disk, the second at the beginning, the third at the end, the fourth at the end, the fifth at the beginning, etc. etc. For another, Linchi only created a 10GB file and used 4.5GB of it, and I didn't see how large the SAN's cache was - an 8GB cache plus readahead on a 10GB database could skew results significantly, and I believe the Symmetrix DMX-2's supported up to a 256GB cache. To really see what happens on disk, you need to disable or overwhelm the cache.

  • OK... so I guess my concern is more with index fragmentation then, not disk. I was also told that in a thin-provisioned sql environment, this also meant that you don't necessarily need to defragment indexes within sql server. Is this the same thing? They are talking about disk, and I am talking inside of SQL? It seems incredibly backwards to me to see very fragmented indexes, and then be told not to defragment them because of a thin-provisioned virtual environment. Help?

  • adrienne.lore (3/22/2012)


    OK... so I guess my concern is more with index fragmentation then, not disk. I was also told that in a thin-provisioned sql environment, this also meant that you don't necessarily need to defragment indexes within sql server. Is this the same thing? They are talking about disk, and I am talking inside of SQL? It seems incredibly backwards to me to see very fragmented indexes, and then be told not to defragment them because of a thin-provisioned virtual environment. Help?

    1) You SHOULD be concerned with OS file fragmentation unless you proactively manage your database file sizes. If I had a nickel for every client I have come across that left default 1MB data file growths I would be living on an island I bought - say Hawaii!! 😀 I have found over a half-million OS file fragments before, resulting in unbelievably bad IO performance!

    2) Thin provisioning is ABSOLUTELY TO BE AVOIDED in a SQL SERVER ENVIRONMENT. You are guaranteed to introduce head-thrashing-causing file fragments, as well as internal SQL Server fragmentation as well. I believe some SANs can also 'stall' the IO while additional space is being added too, leading to unpredictable and very difficult to diagnose random performance problems.

    3) I STRONGLY encourage you to bring in a performance tuning professional to give your systems and databases a review. I predict there will be LOTS of low-hanging fruit to be found!! 😎

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Kevin,

    Thanks! As for your recommendations... I am all over #1, so have that under control. I didn't know #2, so now need to find out how to deal with it, because unfortunately I work in a larger company and that stuff isn't my call, and for #3, we do have someone here, but we are a bit understaffed and their time is being used on hundreds of other environments right now! Thank you VERY much for the answer-- and now I am off to try and make the best of our thin-provisioned environment!

    Thanks,

    Adrienne

  • i just want to bump this old thread for future reference and to give props to the people who have posted here. very interesting and excellent information on the subject.

    at my company, we had some issues with reports/queries on SAP and after some analysis on the heavy hit tables I found that fragmentation levels were 95-99%. these tables were heavily inserted deleted on a daily/weekly basis. and huge. 10s of GB, one being almost 90GB.

    i implemented some "smart" reindexing based on the below webpage and reports running 1+hour are now taking 2-3 minutes.

    http://benchmarkitconsulting.com/colin-stasiuk/2008/11/04/smart-reindexing-sql-server-2005-2008/

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply