Extent Scan Fragmentation

  • I am using the Ola Hallengren solution for our database maintenance, reorganizing 5 < x < 30 percent fragmentation, and rebuilding 30 < x percent fragmentation.

    I generally use this type of query to check fragmentation:

    SELECT

    s.name as SchemaName,

    OBJECT_NAME(ps.object_id) as TableName,

    i.name as IndexName,

    ps.index_type_desc,

    ps.page_count,

    ps.avg_fragmentation_in_percent,

    ps.forwarded_record_count

    FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL, 'DETAILED') AS ps

    INNER JOIN sys.indexes (nolock) AS i

    ON ps.OBJECT_ID = i.OBJECT_ID

    AND ps.index_id = i.index_id

    INNER JOIN sys.objects (nolock) AS o

    ON o.object_id = ps.object_id

    INNER JOIN sys.schemas (nolock) AS s

    ON o.schema_id = s.schema_id

    ORDER BY page_count desc

    This query returns very low fragmentation for all tables < 1000 pages, as we run the index optimize from Ola once per week.

    However, I have a vendor telling me we need to use DBCC SHOWCONTIG in order to check extent scan fragmentation on indexes and heaps.

    I see this feature is deprecated, and that the dm_db_index_physical_stats specifically overlooks the extent scan fragmentation unless the table is a heap.

    I have two questions I was hoping to get opinion on:

    1. Should I really be concerned with the extent scan on indexes? It doesn't appear Microsoft felt it important enough to include in the DMV.

    2. For heaps with high fragmentation, it seems we should not be very concerned unless there is a performance problem. In that case, I can rebuild the table during off hours. It also seems the vendor should have a clustered index on an identity field for such tables, but it is their table.

    Any thoughts on this would be helpful.

    Thanks!

  • From what I understand, fragmentation calculations at the extent level were actually flawed in DBCC SHOWCONTIG and that fragmentation detection in sys.dm_db_index_physical_stats is actually more accurate.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff has already addressed question 1. On question 2, yeah, most tables ought to have a clustered index. Much of SQL Server's storage and retrieval optimizations are built around the clustered index, so it's absolutely a best practice to have one on every table, with exceptions (and exceptions are only exceptions if they're exceptional, not common).

    Also, I wouldn't use the phrase "clustered index or identity column" though. It makes them synonymous and they're not. An identity column may or may not include a unique constraint. That constraint may or may not be a primary key. That primary key may or may not be clustered.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Nick Doyle (1/17/2014)


    1. Should I really be concerned with the extent scan on indexes? It doesn't appear Microsoft felt it important enough to include in the DMV.

    It's kinda there. For heaps, avg fragmentation in percent is extent fragmentation, because heaps can't have logical fragmentation as they have no logical order

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks for the replies, I appreciate it!

    The heaps in question don't have a unique identifier. It seems best to me to put a clustered index on a new identity column, keeping inserts on the last or new page. From what I see, I am not finding a great reason to not do so. However, I don't know a lot about their application. Either way the DDL is their domain, and I may have to rebuild some of these heaps to reduce the extent fragmentation.

  • Personally, whenever I have enough time in my maintenance windows, I either don't touch an index, or I rebuild it - I almost never reorganize, as rebuild gives you better end results and can parallelize (Enterprise edition). On Standard edition, of course, Reorg doesn't block much, while Rebuild is blocking, so that is a possible reason to reorg, while another is that an interrupted reorg does leave the index less fragmented than it started, while an interrupted rebuild is a waste.

    I judge whether to defrag indexes based on both logical scan fragmentation > x% (often 10-20%) and (old-school, again because I have enough time in my maintenance windows) Scan Density < y% (often 75%), where the index has more than N pages (never less than 9, often 100). If you might be maintenance window limited, perhaps start with generous thresholds, allowing a lot of fragmentation and only hitting indexes of certain size ranges, and then over weeks work down to a level that's reasonably within your window and seems to be good enough.

    As a reference on heap fragmentation, see Paul Randall's A SQL Server DBA myth a day: (29/30) fixing heap fragmentation http://www.sqlskills.com/blogs/paul/a-sql-server-dba-myth-a-day-2930-fixing-heap-fragmentation/[/url].

    P.S. as far as 5% < n <30% for reorg vs. rebuild, also see Paul Randall's definitive words (since he came up with that rule of thumb): http://www.sqlskills.com/blogs/paul/where-do-the-books-online-index-fragmentation-thresholds-come-from/[/url].

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply