another index question, large table indexing

  • I've been looking for information on how to handle a large tables index. I haven't found a similar situation yet. I have a single table that is 1.16TB and it's at 80% fragmentation. I'm wondering what the best course of action is in re-indexing that table. there are 4 indexes on that table.

    .

  • You definitely want to defrag this table. However, the approach would depend on several factors like what is the downtime you can afford, whether this is a DW table which gets updated only periodically, or if this is a table that gets constantly updated / inserted into ? depending on the answers to those questions, you may also want to choose / adopt a strategy that doesn't get you to 80% fragmentation in the first place.

  • I agree, we're taking this work load over. right now it's a backup of production and it hasn't fully moved over yet. Trying to figure out the best ways to work with this pre-prod. It is mostly for history and queried against. Downtime wouldn't really be feasible long term. We can accept slower impact on nights or possibly a weekend time, but actual downtime is best to avoid if at all possible. It has 3.8 billion rows roughly partitioned across 20+ files.

    .

  • If you have a clustered index on this table, maybe you want to drop all the non clustered indexes first, rebuild the clustered index and then recreate the non clustered indexes. If this is a table that can accumulate more data, you may want to try a fill factor as well ?

  • Make sure that defragmentation will actually buy you something before you move ahead on complex plans to get it done.

    Defragmentation of indexes is really only important if you do range-selects on the leading edge of the index. If you do single-row selects mainly/exclusively, and only rarely do range selects, then fragmented indexes don't matter much, if at all.

    By "range selects", I mean ones that are like, "WHERE MyDateColumn >= @StartDateParameter AND MyDateColumn < @EndDateParameter". Disjointed or non-sequential multi-row selects aren't "range selects". E.g.: "WHERE MyIntColumn IN (1,6,23,-17,70012)" is not a "range selected", it's a "disjointed/non-sequential multi-row select". It'll probably be handled by multiple seeks instead of a range scan, in the execution plan.

    Same applies to ranges that aren't the leading edge of the index. E.g.: "WHERE MyLeadingEdgeIntColumn = 6 AND MyDateColumn >= @StartDateParameter AND MyDateColumn < @EndDateParameter", where the index looks like "CREATE INDEX MyIndex ON MyTable (MyLeadingEdgeIntColumn, MyDateColumn)". Index fragmentation can matter on that, but is less likely to.

    Single-value tests that will return multiple rows are "range selects". If, for example, in the one right above, there are multiple rows where MyLeadingEdgeIntColumn does equal 6, and those are split across a large number of non-contiguous pages, your query will be slowed down by the fragmentation.

    You can check for this in the index DMVs. Look for data on ranges vs single-row seeks. If what you have is seeks, defragmentation won't gain you much, and may not be worth the work.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Doing a recreate is an interesting idea and will be addressed. I don't have the exact queries that are being run, but I was an application user at one point. I know it does massive ranges in multiple ways. A request has been put in for the form queries that are being run. I am currently trying to do the defrag index by index. doing the whole table (just this table) fills up the log file that is 400GB, fills out the drive that this one table is stored on, and still doesn't complete. and the fill factor, setting it to something more common like 80 fills up the drive (as 20% more space would). I'm going to be requesting more storage, though for now I'm working within my confines.

    I greatly appreciate the advice and am looking at all the avenues being presented.

    .

  • fluffydeadangel (12/5/2012)


    Doing a recreate is an interesting idea and will be addressed. I don't have the exact queries that are being run, but I was an application user at one point. I know it does massive ranges in multiple ways. A request has been put in for the form queries that are being run. I am currently trying to do the defrag index by index. doing the whole table (just this table) fills up the log file that is 400GB, fills out the drive that this one table is stored on, and still doesn't complete. and the fill factor, setting it to something more common like 80 fills up the drive (as 20% more space would). I'm going to be requesting more storage, though for now I'm working within my confines.

    I greatly appreciate the advice and am looking at all the avenues being presented.

    Doesn't the fact that you have 80% fragmentation mean that you're already using a significant amount of space over and above what is actually needed ? using a fill factor of 80 would only increase your footprint by 25%, so I am not sure how that would cause the drive to be filled up. I understand the log filling up, but switching to a simple recovery model may be another option to try during the index rebuilds.

    I understand and appreciate the point of seeks vs. scans, but the massive amount of space saving(in some environments) itself may be enough to justify the defrag. I know that SAN space is pretty expensive and once you do the math and see the cost you are incurring with the bloated table(at least about 400 GB, perhaps?) you can make that call. I know that in our shop, another 400 GB can be quite a bit $$ in savings 🙂

  • sqlGDBA (12/5/2012)


    Doesn't the fact that you have 80% fragmentation mean that you're already using a significant amount of space over and above what is actually needed ? using a fill factor of 80 would only increase your footprint by 25%, so I am not sure how that would cause the drive to be filled up. I understand the log filling up, but switching to a simple recovery model may be another option to try during the index rebuilds.

    I understand and appreciate the point of seeks vs. scans, but the massive amount of space saving(in some environments) itself may be enough to justify the defrag. I know that SAN space is pretty expensive and once you do the math and see the cost you are incurring with the bloated table(at least about 400 GB, perhaps?) you can make that call. I know that in our shop, another 400 GB can be quite a bit $$ in savings 🙂

    i never understood fragmentation as a cost in space so much as just bad organization. the trouble is when i run my indexing script that shows what index on what table is at what % and what page count... this table lists 6 times with the same index name with different %'s. that's a new strange issue i'll track down. and on a 1.2 tb drive, a table at 1.17 tb's does mean 25% is an issue currently. oh the webs we weave.

    .

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply