Rapid Fragmentation on identity column

  • I've got a table with bigint IDENTITY clustered with >100,000 deletes and inserts a day. Each row is ~100 bytes. Once the table is filled with an entry per user every operation is a delete of the user's old data (10 rows at a time) followed by an insert of their new data (10 rows at a time)

    It's quickly fragmenting under a stress test reaching 30% fragmentation in an hour and a half after 300,000 rows are replaced.

    The fallback would be to do an index rebuild online for this table daily during off hours but I fear under peak load we may not make it that long.

    I've tricked knocked fill factor / padding to 60% and it didn't appear to have much effect.

    I wasn't expecting this to be such an issue for a fairly simple schema and index. I'm not using a random GUID as my clustered or anything known bad. I think the ratio of probably 0.75 deletes to 1 insert is killing it.

    Any ideas?

  • Using the identity as a clustered index isn't going to help you with your heavy deletes. in fact it seems to guarantee that the space used previously will NOT be reused since all of the inserts will be at the "end" of the space used, not into the space you just freed up by deleting.

    If your goal is to increase the reuse of the space, then I'd consider using the UserID as your clustered index' leading edge. This would have 2 consequences I can think of:

    - it will INCREASE the likelihood of splits during the creation of a new user since this is more of a "random insert" scenario. Still - with your fill factor, it should not be too awful since you can 3 new user entries' worth of available space into your data page.

    - that said, once you have most of the users inserted - it should DECREASE the splits, since it is likely to reuse the exact page you're deleting from.

    Just be sure to consider all consequences before switching CI's.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Are you sure this will really be a problem? Have you noticed any performance problems? How many rows will the table contain?

    Is the fragmentation really on the clustered index? It seems more likely that the fragmentation would be on the non-clustered indexes, like on the user id.

    Deleting rows will cause some fragmentation, but just running an occasional reorganize of the indexes should reclaim the space.

    You should consider setting the fill factor on the clustered index as high as possible, since you will not be inserting rows in the middle, just deleting them.

  • The UserID clustered index is a good thought and we actually tried that last night. Though logically it should have helped the fragmentation still grew substantially by the end of the stress test. There's two orders of magnitude less users than set of data for them, but still plenty (we have a few million users active).

    Yes, I'm sure it's an issue because at higher load rates the SP to delete/insert starts hanging. I suspect the massive fragmentation hits a tipping point causing a plan change to a table scan.

    The current idea I'm looking at is doing a FULL OUTER JOIN to merge new and old records based on just their index and re-use the row already there with the IDENTITY and just update the rest of its data (pretty much everything else on the row). Only where a set has more or less records for the user would there be any deletes/inserts (and based on the business rules that should nearly never happen). An update is still a delete/insert in theory but I'm hoping it will be in the same place and at least it shouldn't fragment...

    I'll update when we re-run tests next week.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply