Merge/Split/Switch partition's affect on indexes?

  • Hi All,

    Currently I have a partitioned table that is partitioned by date.  On this table, I have 3 indexes that are not partitioned, each in their own filegroup.

    I'm now wanting to archive earlier portions of this partitioned data into another table of similar structure; when executing a Merge/Split or a Switch style archive method - how would this affect the indexes that are not partitioned along with the main data?

    Thanks

    Anubis

  • in order to do the switch you need to drop the non partitioned indexes and recreate them afterwards

    see https://docs.microsoft.com/en-us/sql/t-sql/statements/alter-table-transact-sql?view=sql-server-2017 and go to the switch partition description

  • Hi Frederico,

    Thanks for the prompt reply - that's unfortunate as it would create a huge overhead for this table - does the same apply if the indexes exist in the same partition scheme as opposed to being isolated?

    Thanks

     

  • as per the article if all indexes are on the same PS it will work.

    But you need to consider is that some indexes that are non partitioned may have been created because of the possible performance issues that partition tables can have. So while changing them to be partitioned will help moving data out it can cause some queries to perform worst.

  • SystemlordAnubis wrote:

    Hi Frederico, Thanks for the prompt reply - that's unfortunate as it would create a huge overhead for this table - does the same apply if the indexes exist in the same partition scheme as opposed to being isolated? Thanks  

    In most cases no (one exception being columnstore indexes in SQL Server 2012). Not only can you retain your partition-aligned indexes, you can apply index maintenance at the partition level too.

     

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Just curious...

    1. How many rows in this table?
    2. How many GB does the Clustered Index occupy?
    3. How many partitions are there in the table?
    4. Is each partition in a separate file in a separate file group (1 file per partition, 1 file group per file)?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi All,

    Thanks for the replies.

    I have considered aligning the indexes with the partition scheme but as you mentioned Frederico, this requires tuning of the queries using the indexes to make sure they're not impacting performance - currently investigating this now.

    The table in question:

    1. Currently 500,000,000+, growing by 5-7million per day minimum
    2. Data space occupied currently is around 40GB, indexes (3 off) also consuming around 40GB
    3. There is presently around 90-100 partitions (one-per-day).
    4. Yes, one partition per filegroup, one file per filegroup.

    The intention with this table is for the data to remain in the "live" table for around 180 days, the 3 indexes provide quick access to needed data - thereafter, the older data is to be archived - moved as quickly/efficiently as possible from the live table, into an archive table that exists on slower/larger storage media.

    Thanks

     

  • What version of SQL Server are you using?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • SystemlordAnubis wrote:

    Hi All, Thanks for the replies. I have considered aligning the indexes with the partition scheme but as you mentioned Frederico, this requires tuning of the queries using the indexes to make sure they're not impacting performance - currently investigating this now. The table in question:

    1. Currently 500,000,000+, growing by 5-7million per day minimum
    2. Data space occupied currently is around 40GB, indexes (3 off) also consuming around 40GB
    3. There is presently around 90-100 partitions (one-per-day).
    4. Yes, one partition per filegroup, one file per filegroup.

    The intention with this table is for the data to remain in the "live" table for around 180 days, the 3 indexes provide quick access to needed data - thereafter, the older data is to be archived - moved as quickly/efficiently as possible from the live table, into an archive table that exists on slower/larger storage media. Thanks  

    From the looks of it, the only reason to partition this table is to take advantage of the SWITCH ability to quickly release older data from the table so that it can be archived.

    Despite the number of rows involved, this isn't a big table nor is it very wide.  That being said, I think that having 1 partition per day is serious overkill.  If it were me (I've got stuff like this and my recommendation comes from use), I'd only use monthly partitions.  The reduction in partitions might help performance a fair bit for queries that span more than 1 day because of the much more effective B-TREE structures that fewer partitions will provide.

    If the 40GB for indexes includes the Clustered Index (which is where the data lives), then your non-clustered indexes (NCI) are extremely small compared to the data.  Your non-aligned NCIs should rebuild in a flash and not actually be a performance issue if you want to drop them for the SWITCH out and then rebuild them.  It should be especially non-obnoxious if you only have to do a SWITCH out once per month instead of once per day.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Chris, The version is 2017 enterprise.

     

    Jeff, Thanks again for the detailed response - and it's good to know the info comes from real-world background.

    You're correct, the only reason for daily is to reduce the time for switching; however I have two types of queries that primarily hit this data - one is seeking the last day's worth of data, the other query is the last month's worth of data.  All queries know the date bounds that they're seeking in addition to the other data columns or filtering results.

    The 40GB of index data is in addition to the 40GB of clustered index data (just coincidence they happen to be similar in size).  The table contains a bigint identity (as unique record reference) and a datetime2(7) column which forms the primary key.  Partitioning occurs on the datetime2(7) column.

    The challenges I'm encountering with this environment is that records are bulk-logged (20-50K rows total across different transactions) into the table every few minutes or so; I'm also seeing high index fragmentation which I'm managing on a nightly basis to keep it in check.

    After data hits the table, I need to perform some basic processing on the data and a comparison to the previously logged record specific to that rows' source.  I'm currently achieving this with bulk update scripts as agent jobs in batches.

    Certainly keen to understand new/better ways at handling this type of situation.

    Thanks

  • If you can batch up those inserts so that they are taking place under your control every 24 hours rather than at random every few minutes or so, your whole playing field will change.

     

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Hi Chris,

    Unfortunately not; it's coming from many field devices and needs to be accessible as soon as it's generated.

    Thanks.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply