archiving off records in table with clustered (identity) primary key

  • I need to archive records that are "completed" and older than 32 days

    as a rule the order of records being archived will roughly match the order of the clustered index

    but not exactly

    is it worth taking the extra steps to only archive a sequential block of records from the beginning of the index

    or convert from clustered to non-clustered so it doesn't matter

    or let the engine fix up the stragglers in the clustered index?

    0123456789

    ^^^ ^ delete initial sequence like these

    0123456789

    ^ ^ ^ ^ not like this

    0123456789

    ^ ^ instead delete only records that are sequential (and wait for other records to meet archiving filter)

    I think it is always better to delete rows from the beginning or end of a clustered index

    what do you think?

  • I seriously doubt you need to be concerned about this, certainly not to the point of changing your index.

    a clustered index only guarantees the logical order of data, not physical order (except immediately after create or rebuild).

    As you are only deleting you would only cause internal fragmentation (gaps within pages) rather than external fragmentation (pages out of order)

    You should batch up the deletes if there is a large amount of data to delete to control log growth

    ---------------------------------------------------------------------

  • I am positive that a clustered index defines the physical order of rows

    from BOL:

    Creates an index in which the logical order of the key values determines the physical order of the corresponding rows in a table

    from WIKI:

    Clustering alters the data block into a certain distinct order to match the index, resulting in the row data being stored in order ...

    ... since the physical records are in this sort order on disk, etc.

    I thought if you had clustered index with fillfactor = 0

    and inserted a record "half way" it would painfully reorganise the index to keep the physical order ...

    Just hoped you could remove rows from the beginning without penalty

    Or am I missing something?

  • The books are misleading when they say 'physical' order. this is only true when index is first built.

    You say yourself you thought all the pages would be 'painfully' reorganised if you inserted at some midpoint. You are correct, if it was done this way each insert would be horrendously costly, so it does not make sense to do it this way. What actually happens is a pointer is inserted to where the insert was actually stored. Which gives you fragmentation. The logical order is maintained but the data is no longer in physical order within the database..

    even an insert at the end might not be the next available extent if that was already in use by another object, it could be anywhere in the database.

    Don't believe me (even though I am right :-)), google it. Start with this QOTD from oct 5th

    http://www.sqlservercentral.com/Questions

    for a free point the answer is 'always logically and sometimes physically'

    ---------------------------------------------------------------------

  • Oh - and never believe WIKI

    ---------------------------------------------------------------------

  • ok

    I understand now

    any pain would be felt by a rebuild

    but as my archiving is mostly in sequence anyway - even rebuilds won't be badly affected

    The question that remains is whether archiving records in perfect sequence will still result in a heavy rebuild

    (ie: engine might physically move all records "to the left" some distance)

    I can determine that by testing some rebuilds and watching io stats

    Thanks for your help

  • DataDog (10/25/2010)


    The question that remains is whether archiving records in perfect equence will still result in a heavy rebuild

    (ie: engine might physically move all records "to the left" some distance)

    no. you will just have gaps. The space will be reclaimed when the index is rebuilt. If the pages become completely empty I believe they will be available for reuse by the database, but there will be no shuffling of data.

    ---------------------------------------------------------------------

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply