Production Database - Apply table partition also database is accessible.

  • Hi 
    I have a Production Database on my VMM. I have to partition a huge table (billions of rows) horizontally by Year on different file groups.
    Also i need to have the database is available. How can i achieve this. Can I apply partition function and scheme on the table at the same time 
    database/tables is accessible. (If i apply the partition on the table this will take too long to finish the script and remount the indexes.)

    Please give me some way or direction to achieve it.

    thanks

    VD

  • I'm not going to ask if that is the right choice or why you are doing it - others will most likely do it.

    Assuming it is indeed the right choice.

    - Does the older data have to be available or can it be unavailable for a period of time
    - If it does have to be available all the time, do you have, or can provision temporarily, enough storage to hold another full copy of that table?
    - are the existing records subject to updates or only inserts/reads?
    - is there a field that records when the record was added (and/or modified)

    Answer to the above will affect what and how you do it.

    But assuming that you need all data to be available and that you can get the extra storage the following is a possible approach.

    Create required partition function and partition scheme

    Create a table per partition (table P999 - where 999 is the partition number)
    copy data from main table to their respective partitions
    create all required indexes on each of these tables
    create a check constraint on each of these tables (to match the partition constraint)

    create a further table (Table X) without any partitions or indexes. (may need indexes but not sure now)

    once the above is done there will need to be a short period of time where no activity should be done on this table to allow for the next part of the process.

    once ready perform the final steps as follows

    - copy new records onto their corresponding table P999 - if only inserts this is easy.
    - switch main table to Table X
    - drop all indexes on main table and recreate them as partition aligned
    - switch each partition table P999 onto their respective partition on main table

    If no updates have happened the above is all you need.
    If some records may have been updated then update your main table from Table X

    finaly drop Table X as no longer required, as well as all the P999 tables which should now be empty.

    Doing the Switch just takes a few miliseconds as it is a metadata change only.
    Preparing the P999 tables is what will take time - but if older data does not change this is done well in advance.

    Impact on users should be minimal if tables has no updates - if it has it can be minimized but it will depend on what the table is used for.

  • Yes , we have sufficient space to occupy the replica of full copy of the table.
    Yes , the existing records may update. 
    I have few queries related to the possible approach : - 
    Create a table per partition (table P999 - where 999 is the partition number) Do you mean here to create empty 999 Partitions ???
    Copy data from main table to their respective partitions ( Can you please give any e.g. ) As this might take to long to copy the data from Source To Destination?
    create all required indexes on each of these tables means partition or fresh table.

    Though i find it way to move Partition Switch Out and Switch In

    Thanks
    VD

  • before you implement this make sure you understand all the implication of using partitioning.

    It is not easy to implement, and can lead to performance problems.

    the sample below was done using an easy implementation. Should your table have a primary key it will be an issue that needs to be addressed by yourself based on what data you have, how the primary key is defined, and its usage.
    Eventually you will need to end up with a primary key that is not partitioned and that leads to other problems.

    Regarding implementing the partition function/scheme this link http://www.dbdelta.com/table-partitioning-best-practices/ does show some examples including using different filegroups per partition

    Sample code

    prepare sample table and populate with 3 years worth of data

    -- base table - this is the pre-existing table with data
    if object_id('dbo.bigtable') is not null
      drop table dbo.bigtable;

    create table dbo.bigtable
    (id int identity(1,1) not null
    ,createdon datetime
    ,field1 int
    ,field2 int
    ,modifiedon datetime
    );

    create unique clustered index pk_bigtable on dbo.bigtable
    (id
    ,createdon
    );

    create nonclustered index modifiedon on dbo.bigtable
    (modifiedon
    ) include (id, createdon);

    -- populate data
    insert into bigtable
    (createdon, field1, field2, modifiedon)
    select t.createdon
    ,1
    ,2
    ,dateadd(day, 3, t.createdon)
    from (
    select dateadd(day, row_number() over (order by num) - 1, '2013-01-01') as createdon
    from (select 1 as num union all select 1 as num union all select 1 as num union all select 1 as num union all select 1 as num) t
    group by cube (num, num, num, num,num, num, num, num)
    union all
    select dateadd(day, row_number() over (order by num) - 1, '2014-01-01') as createdon
    from (select 1 as num union all select 1 as num union all select 1 as num union all select 1 as num union all select 1 as num) t
    group by cube (num, num, num, num,num, num, num, num)
    union all
    select dateadd(day, row_number() over (order by num) - 1, '2015-01-01') as createdon
    from (select 1 as num union all select 1 as num union all select 1 as num union all select 1 as num union all select 1 as num) t
    group by cube (num, num, num, num,num, num, num, num)
    ) t


    Now do main work related to partition table
    if object_id('dbo.bigtable_partition_3') is not null
      drop table dbo.bigtable_partition_3;
    if object_id('dbo.bigtable_partition_4') is not null
      drop table dbo.bigtable_partition_4;
    if object_id('dbo.bigtable_partition_5') is not null
      drop table dbo.bigtable_partition_5;

    if exists (select 1 from sys.partition_schemes ps where ps.name = 'PS_Date')
      drop partition scheme PS_Date

    if exists (select 1 from sys.partition_functions pf where pf.name = 'PF_Date')
      drop partition function PF_Date

    -- chosen range right for no particular reason - although this is recomended by some people so this needs to be decided in advance
    create partition function PF_Date (datetime) as
    range right for values (
    null
    , N'2013-01-01 00:00:00.000'
    , N'2014-01-01 00:00:00.000'
    , N'2015-01-01 00:00:00.000'
    );

    -- note that although I placed all partitions onto the same filegroup they could and in many cases is desirable to have them on independent filegroups
    create partition scheme PS_Date as
    partition PF_Date all to ([primary])
    ;

    -- partition tables are created on partition function on purpose
    -- By doing this we accomplish the following
    -- 1 - no need to create constraints in each table to match the destination partition
    -- 2 - if independent filegroups are used then when we load the data onto the tables that data will go onto the final desired filegroup
    --  and when we do the partition switch there will be no data movement
    -- 3 - while preparing the scripts we know in advance which tables are going to contain data on a particular partition
    --  this allows us to name the tables according to the partition that contains data

    -- all partition tables should have the exact indexes that are going to be on the final main table
    -- although when loading the data onto thest it is advisable not to have them, and create them only after the data is loaded

    -- partition table 3
    if object_id('dbo.bigtable_partition_3') is not null
      drop table dbo.bigtable_partition_3;

    create table dbo.bigtable_partition_3
    (id int not null
    , createdon datetime
    , field1 int
    , field2 int
    , modifiedon datetime
    )
    on ps_date (createdon);

    create unique clustered index pk_bigtable_partition_3 on dbo.bigtable_partition_3
    (id
    ,createdon
    )
    on ps_date (createdon);

    create nonclustered index modifiedon on dbo.bigtable_partition_3
    (modifiedon
    ) include (id, createdon);

    -- partition table 4
    if object_id('dbo.bigtable_partition_4') is not null
      drop table dbo.bigtable_partition_4;

    create table dbo.bigtable_partition_4
    (id int not null
    , createdon datetime
    , field1 int
    , field2 int
    , modifiedon datetime
    )
    on ps_date (createdon);

    create nonclustered index modifiedon on dbo.bigtable_partition_4
    (modifiedon
    ) include (id, createdon)
    on ps_date (createdon);

    -- partition table 5
    create unique clustered index pk_bigtable_partition_4 on dbo.bigtable_partition_4
    (id
    ,createdon
    )
    on ps_date (createdon);

    if object_id('dbo.bigtable_partition_5') is not null
      drop table dbo.bigtable_partition_5;

    create table dbo.bigtable_partition_5
    (id int not null
    , createdon datetime
    , field1 int
    , field2 int
    , modifiedon datetime
    )on ps_date (createdon);

    create unique clustered index pk_bigtable_partition_5 on dbo.bigtable_partition_5
    (id
    ,createdon
    )
    on ps_date (createdon);

    create nonclustered index modifiedon on dbo.bigtable_partition_5
    (modifiedon
    ) include (id, createdon)
    on ps_date (createdon);

    -- insert data from main table onto individual partition tables
    -- this is better done using SSIS or a small C# application to select from main table
    -- and bulkload onto each partition table
    insert into bigtable_partition_3 select * from bigtable b where year(b.createdon) = 2013
    insert into bigtable_partition_4 select * from bigtable b where year(b.createdon) = 2014
    insert into bigtable_partition_5 select * from bigtable b where year(b.createdon) = 2015

    -- report the number of records on each partition table
    select 'bigtable_partition_3' as table_name, $partition.PF_Date(bp.createdon) as partition_number, count(*) from bigtable_partition_3 bp group by $partition.PF_Date(bp.createdon)
    select 'bigtable_partition_4' as table_name, $partition.PF_Date(bp.createdon) as partition_number, count(*) from bigtable_partition_4 bp group by $partition.PF_Date(bp.createdon)
    select 'bigtable_partition_5' as table_name, $partition.PF_Date(bp.createdon) as partition_number, count(*) from bigtable_partition_5 bp group by $partition.PF_Date(bp.createdon)

    -- copy of base table - will be used to hold the current data when we are ready to perform the final process
    if object_id('dbo.bigtable_old') is not null
      drop table dbo.bigtable_old;

    create table dbo.bigtable_old
    (id int identity(1,1) not null
    ,createdon datetime
    ,field1 int
    ,field2 int
    ,modifiedon datetime
    );

    -- create same indexes are the main table so that switch will work
    create unique clustered index pk_bigtable_old on dbo.bigtable_old
    (id
    ,createdon
    );

    create nonclustered index modifiedon on dbo.bigtable_old
    (modifiedon
    ) include (id, createdon);

    -- move data out of main table onto a staging table - will be used to perform the updates after we make the main table partitioned
    alter table dbo.bigtable switch to bigtable_old;

    -- replace existing indexes with partition aligned ones
    create unique clustered index pk_bigtable on dbo.bigtable
    (id
    ,createdon
    )
    with (drop_existing = on)
    on ps_date (createdon);

    create nonclustered index modifiedon on dbo.bigtable
    (modifiedon
    ) include (id, createdon)
    with (drop_existing = on)
    on ps_date (createdon);

    -- now switch partition tables onto main table

    alter table dbo.bigtable_partition_3 switch partition 3 to bigtable partition 3
    alter table dbo.bigtable_partition_4 switch partition 4 to bigtable partition 4
    alter table dbo.bigtable_partition_5 switch partition 5 to bigtable partition 5

    -- report final count on main table
    select 'bigtable' as table_name, $partition.PF_Date(bp.createdon) as partition_number, count(*) from bigtable bp group by $partition.PF_Date(bp.createdon)

    -- do any updates required
    /*
    update bt
    from bigtable bt
    inner join bigtable_old bt_old
    on bt_old.id = bt.id
    and bt_old.modifiedon <> bt.modifiedon
    and bt_old.modifieon > '2017-01-01' -- should be the highest date on current bigtable

    */

    -- and finally drop all tables that are no longer required

    if object_id('dbo.bigtable_partition_3') is not null
      drop table dbo.bigtable_partition_3;
    if object_id('dbo.bigtable_partition_4') is not null
      drop table dbo.bigtable_partition_4;
    if object_id('dbo.bigtable_partition_5') is not null
      drop table dbo.bigtable_partition_5;
    if object_id('dbo.bigtable_old') is not null
      drop table dbo.bigtable_old;

  • Hey Frederico ,  

    Much thanks to guide me on this..Now  I am performing  trial at one of the server. but i have an issue as i have very bigtable (billions of rows)
    when i insert the Data to individual Partition tables  for individual Year ,its taking too long. I just clicked with one idea can i create the column store Index on datetime column the exiting table. Can it make the retrival fast. ? Or any other way to achieve this.

    Regards!
    VD

  • How are you copying from the big table to the partition table?

    As I said on my comment -- this is better done using SSIS or a small C# application to select from main table -- if this is not how you are doing it then that is your issue.

  • Initially  I though to preform  direct INSERT but , Yes got the point pull the Data Using SSIS. That way i can perform Multiple jobs at a time to boost the insert.
    I stuck at one place Frederico  :unsure: my source table contain a column  RowVersion timestamp, its not allowing me to perform insert  on the destination table .
    As timesstamp column is a row identifier and unique to system. I don't know how can i address it. (If i put default it will change the metadata. ) Not sure customer will  be agree on this. Might  be this is used as a  FK somewhere in the system.

    VD

  • rowversion is indeed unique and can not be copied like that - talk with the customer and see what are the implications of changing it.
    For that table to be copied you will need to ommit it on the select or on the column mapping if you are using SSIS.

    Do ensure that you are using the fast load option if using SSIS (or a bulkcopy object if later you decide to use C#)

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply