Automate Sliding Window Partition Management: Part I

  • bpportman 52825 (12/14/2010)


    Is there any performance gain when partitions are on the same set of spindles (can be same file group or different)?

    There's probably something to be gained from parallel query processing. I seem to recall that 2008 resolves some issues related to parallel query processing and partitions that were problematic in 2005.

    However, performance for us was not the driving factor for partitioning the tables. I have had some very painful experiences dealing with poorly thought out purge strategies that kill performance, explode transaction logs and otherwise make a general mess of things. My primary goal was to create a fast, extensible process to purge lots of data very quickly each month with minimal impact to other processes.

    Regards,

    Hugh Scott

  • Hugh Scott (12/14/2010)


    amenjonathan (12/14/2010)


    Thanks merlin for your reply and Hugh for this article. Looking forward to the next two installments.

    Anybody have any idea what happens when you do not add the partitioned column to an index? Does it still create an aligned index (by secretly adding it anyway)?

    Nope. The index gets created on the default file group. Not that I would know from experience!:-D

    Regards,

    Hugh

    From vague memory and recollections of BoL, the partitioned column is added (as an included column) to a non-clustered index only if there is no clustered index... although please don't quote me on this!

    --Chris Hamam

    Life's a beach, then you DIE (Do It Eternally)

  • Thanks Hugh for the article which came on the right time for me.

    I am about start working on a partitioning a "continuous loading" event table. The requiremnt is to have 12 months of data online and the data aged "gracefully".

    I will have 12 data partitions + the lower partition. Each partiton will have its own file group, to ensure the most important aspect of recoverability. In case of DR, I can restore the filegroup which contains the current partion and the system is ready to go. It is about 10GB data in each partition, so no need to wait for 120 GB data to be restored to have the system available. IMHO partition/filegroup alignment is very important from point of system availability.

    Sliding window will be implemented to switch in and switch out partitions.

    Few thoughts/questions below.

    1. Rather than sliding the window that moves forward in time by creating new partitions, can we use a "rotating window" which uses partitions in a ring structure? In this scenario we have a partition scheme with fixed names for each month. The script identities the new month and decides which partition will be used within the "ring". Then switches out data using a temp table on that filegroup to archive, truncate the table and switch back in for the new month. The advantage is that the partition number could be easily identified with current month. Any thoughts on this ?

    2. I would love to to keep current quarter data on a filegroup on faster (RAID 10) disk (which handles the loading and most 80 percent of user queries) and previous quarters on slower disks until it get archived to very very slow disk. This means I have to switch out data between quarters which involves data movement. This would have ensured that the data is "aged gracefully". Any thoughts on this ?

    Regards

    Jimmy

  • Chris Hamam (12/15/2010)


    Hugh Scott (12/14/2010)


    amenjonathan (12/14/2010)


    Thanks merlin for your reply and Hugh for this article. Looking forward to the next two installments.

    Anybody have any idea what happens when you do not add the partitioned column to an index? Does it still create an aligned index (by secretly adding it anyway)?

    Nope. The index gets created on the default file group. Not that I would know from experience!:-D

    Regards,

    Hugh

    From vague memory and recollections of BoL, the partitioned column is added (as an included column) to a non-clustered index only if there is no clustered index... although please don't quote me on this!

    I think this is impossible actually. You must have a clustered index on the partitioned column in order to create a partitioned table. But in the same thread, every non-clustered index contains the clustered index as a reference, unless I'm remembering incorrectly. A non-clustered index on a heap table...not sure what the heck is going on in that scenario. haha!

    -------------------------------------------------------------------------------------------------
    My SQL Server Blog

  • 1. Rather than sliding the window that moves forward in time by creating new partitions, can we use a "rotating window" which uses partitions in a ring structure? In this scenario we have a partition scheme with fixed names for each month. The script identities the new month and decides which partition will be used within the "ring". Then switches out data using a temp table on that filegroup to archive, truncate the table and switch back in for the new month. The advantage is that the partition number could be easily identified with current month. Any thoughts on this ?

    the problem is your oldest partition is most likely be partition no 2. you will need swap in the latest data into whole new partition because you need to insert a new boundary into the partition scheme.

    ALTER PARTITION FUNCTION ' @partitionscheme' () SPLIT RANGE

    To remove the boundary of the oldest partition, you issue the MERGE command.

    2. I would love to to keep current quarter data on a filegroup on faster (RAID 10) disk (which handles the loading and most 80 percent of user queries) and previous quarters on slower disks until it get archived to very very slow disk. This means I have to switch out data between quarters which involves data movement. This would have ensured that the data is "aged gracefully". Any thoughts on this ?

    This will not work because in order for you to switch in and out of partitions, you need to be in the same filegroup.

  • I think there may be a bug in the SQL Server views sys.data_spaces and sys.destination_data_spaces when you run the switch out/merge statements in certain circumstances.

    http://www.sqlservercentral.com/Forums/Topic1548437-2799-1.aspx

    The problem is documented in the forum thread above with a reproduction, I'm not 100% sure but it doesn't seem to be working correctly.

  • Yikes! This was a looooong time ago.

    For what it's worth, we used this process in more or less the format presented (in this article and the next two articles) for four years with no troubles. The application (and associated data mart) were retired last year when we converted over everything to a new system. Unfortunately, no one scoped out reporting or BI for the new system, so we're back to running direct queries on the operational database.

    The more things change, the more they stay the same.

    Regards,

    Hugh

  • The first paragraph seems to imply that partitioning will improve the performance of deletes (while purging data). How does partitioning help with that?



    Del Lee

  • Del Lee (3/7/2014)


    The first paragraph seems to imply that partitioning will improve the performance of deletes (while purging data). How does partitioning help with that?

    Instead of deleting the data by individual rows or by chunks of rows (dml), I can remove an entire month of data using a series of non-logged ddl operations. There's an overview of partitioning (SQL 2008 R2) here: http://technet.microsoft.com/en-us/library/ms190787(v=sql.105).aspx.

    Regards,

    Hugh Scott

  • I've read the link you gave before and am generally familiar with partitioning. It doesn't really indicate go into how you can purge records using non logged ddl operations. I don't really see a partition option in the TRUNCATE TABLE command, and I would assume the DROP PARTITION FUNCTION command doesn't affect the data itself (just removes the partitioning). So, I'm still not following you on how you are purging the records. Can you be a little more specific about the series of non-logged ddl operations?



    Del Lee

  • amenjonathan (12/16/2010)


    Chris Hamam (12/15/2010)


    Hugh Scott (12/14/2010)


    amenjonathan (12/14/2010)


    Anybody have any idea what happens when you do not add the partitioned column to an index? Does it still create an aligned index (by secretly adding it anyway)?

    Nope. The index gets created on the default file group. Not that I would know from experience!:-D

    Regards,

    Hugh

    From vague memory and recollections of BoL, the partitioned column is added (as an included column) to a non-clustered index only if there is no clustered index... although please don't quote me on this!

    I think this is impossible actually. You must have a clustered index on the partitioned column in order to create a partitioned table. But in the same thread, every non-clustered index contains the clustered index as a reference, unless I'm remembering incorrectly. A non-clustered index on a heap table...not sure what the heck is going on in that scenario. haha!

    FYI, you can partition a table that is stored as a heap (i.e. no clustered index). Partitioning a table in place which contains existing data, however, is much easier done by creating a partitioned clustered index. Rebuilding the clustered index on a new partition scheme (or no partition scheme) with DROP_EXISTING allows the partition scheme to be changed in only the time it takes to build a clustered index on the table.

    The partition column must be part of the clustered index definition. It is optional for non-clustered indexes that use a partition scheme, but is added automatically as an included column, I believe.

    Best,

    --Mark

  • -- Create Sample Partitioned Table

    CREATE TABLE OrderEvents (

    OrderCloseDate datetime not null,

    OrderNum int not null,

    [Status] char(2) null,

    StatusDate datetime not null)

    ON ps_FACT_DATA_DATE (MyDate)

    GO

    Msg 1911, Level 16, State 1, Line 2

    Column name 'MyDate' does not exist in the target table or view.

    Should the MyDate in the partitionscheme call be OrderCloseDate? or is the partitiondb.sql script including missing something else? I get an error running this portion of the script and figured it was because this column isn't part of the table so it can't partition on it.

    Thanks for looking,

    -Ryan

Viewing 12 posts - 16 through 26 (of 26 total)

You must be logged in to reply to this topic. Login to reply