SSIS: Best way to Archive production DB

  • Hi SSIS Experts,

    I am new to SSIS, i got a task to archive the data from production to Archive DB and then delete the data from production keeping 13 months of data in production. There is around 300+ tables of these i have to archive around 50 tables. Out of these 50 table 6 tables have size around 1 TB.

    Listing out the the 2 methods which we are planning.

    1. Using 50 data flow tasks in a sequence container.

    2. Using SELECT * FROM...INSERT INTO.. where table name and column name can be stored in some configuration table and through loop we can archive the data.

    Which will be the better option?

    Is there any other better method so please let me know.

    What precautions(Performanec tips) i have to take while doing the archive process so that it should not affect the Production server?

    Please give your suggestion

    Thanks

  • Just for illustration purposes, let's assume you have a transactional table called Sales. Create another table called SalesHistory.

    use StoreOps;

    create table Sales

    (

    SaleID int not null identity(1,1) primary key

    , SaleDate datetime default getdate()

    );

    create index ix_Sales_SaleDate on Sales ( SaleDate );

    use StoreOpsArchive;

    create table SalesHistory

    (

    SaleID int not null primary key

    , SaleDate datetime

    );

    You can use either of the following statements to simultaneously delete from one table while inserting the same deleted rows into another table having the same columns.

    delete from Sales

    output deleted.SaleID, deleted.SaleDate

    into StoreOpsArchive.SalesHistory ( SaleD, SaleDate )

    where SaleDate < dateadd( month, -13, SaleDate );

    If the archive table has exactly the same column layout, the the following is an even more simplified, because it doesn't refernece specific column names.

    delete from Sales

    output deleted.*

    into StoreOpsArchive.SalesHistory

    where SaleDate < dateadd( month, -13, SaleDate );

    The initial purge / archive operation will likely take considerable time and transaction log space, because you may be deleting millions of accumulated rows. However, if you run subsequent operations daily, then it will be more manageable.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • So this would basically be a SSIS workflow consisting of T-SQL tasks; one for each table you want to archive.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Thanks Eric for the suggestion...it is better than INSERT INTO option. To implement this shall i go with 2nd method where i am using configuration table to store the table names and columns names (for archive condition most preferably this column will be date field) and then using while loop to process each tables.

    Can you give suggestions how to implement transaction or logging so that anything goes wrong it will be properly logged.

  • The DB's are not in same server they are in different server and there is no linked server.

    If i use this DELETE with OUTPUT clause it will not work even if i provide the server.DB.schema.table

    So i guess i have to go with INSERT INTO Select * statement.

    Any suggestions.

  • SQL006 (6/4/2016)


    Thanks Eric for the suggestion...it is better than INSERT INTO option. To implement this shall i go with 2nd method where i am using configuration table to store the table names and columns names (for archive condition most preferably this column will be date field) and then using while loop to process each tables.

    Can you give suggestions how to implement transaction or logging so that anything goes wrong it will be properly logged.

    Single statements are contained within an implicit transaction. So, in the following example, if an error occurs midpoint, both the delete from Sales and the insert into SalesHistory will be rolled back.

    delete from Sales

    output deleted.SaleID, deleted.SaleDate

    into StoreOpsArchive.SalesHistory ( SaleD, SaleDate )

    where SaleDate < dateadd( month, -13, SaleDate );

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • If it is same server, it is good to use TSQL as Eric suggested. If it across servers, you can even use a data flow to leverage the SSIS pipeline for performance. Yes you might end up in creating data flow tasks for each table.

    For transaction, you may need to enable DTC for this.

    https://msdn.microsoft.com/en-us/library/cc304421.aspx

    This ideally is background job and this would be a rare case where I would prefer a row by row logic to minimize blocking.

    It is good to design a configurable batch, and filter criteria for archival

    ----------------------------------------------------------------------------------------------------------------------------------------------------
    Roshan Joe

    Jeff Moden -Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • Thanks for the reply.

    i am started creating package for this archival process, but it getting

    error variable "USER:Tablename" does not contain a valid data object ssis

    Following steps i had done to create package

    1. Config table created to store the table name

    CREATE TABLE [dbo].[Config](

    [ID] [int] NULL,

    [Tablename] [varchar](50) NULL,

    [Columns] [varchar](50) NULL

    ) ON [PRIMARY]

    GO

    INSERT INto [dbo].[Config]

    SELECT 1,'dbo.Emp','Date'

    UNION ALL

    SELECT 2,'dbo.address','Date'

    1. Create variable Tablename, datatype =object, scope =package

    2.Create variable SingleTablename, datatype =string, scope =package,value= dbo.Emp

    3.Created Execute SQL Task

    SQL Statement = SELECT Tablename from dbo.config

    ResultSet = Single Row

    In Result Set Tab

    Variable name = User::Tablename, Resultname = 0

    4. Added FOR EACH LOOP Container

    Enumerator = For Each ADO enumerator

    ADO object source variable = User::Tablename

    Enumeration mode = Rows in the first table

    Variable mappings

    User::SingleTablename, Index =0

    5.Added dataflowtask in FEL container

    Drag the source OLEDB provided the connection string

    Data access mode = Table name or view name variable

    Variable name = User::SingleTablename

    DRAG the destination OLEDB provided the connection string

    Data access mode = Table name or view name variable- Fast load

    Variable name = User::SingleTablename

    Mapping is correct.

    Package saved and executed

    Error is coming like this "USER:Tablename" does not contain a valid data object ssis

    While debugging the data is coming in this variable USER:Tablename .

    i checked twice the variable is declared only once

    Any suggestions will be helpful

    Thanks

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply