ETL Archive Issue, advice sought, please.

  • Hello,

    I have a situation that I'd imagine a few people have been in. I need to archive a few years from a database, but the issue is that the whole database will be copied to another machine, then the original database will have a few years removed to make it smaller. I have been asked to make sure that the few years remaining in the database will need to be synched with the same years in the archived database. But this can only happen once a month.

    I am trying to come up with some way of updating only the relevant years in the archived database once a month. I first thought transaction replication might work, but there is no way the main database will retain all those logs for a month.

    I cant use snapshot replication as it will be too large and I guess it would overwrite the archived years in addition to the couple of years I only want to affect, creating an identical database.

    I cant use CDC or change tracking because not all the tables have primary keys, and I am not allowed to create any.

    So I am currently looking at creating a script to bulk copy only the data I want to go across to the archived database, but I have only seen examples that concentrate on one table, not the whole database which consists of approx. 100 tables. So I am not sure how to tackle that.

    Would I be able to use the Transfer SQL server objects task for this and copy data only?

    I'm quite new to SSIS, so I am not yet familiar with what the appropriate tool might be.

    Could I separate a portion of the database after a certain date and put it in a read only file group, and somehow backup the primary file group of the smaller live database and overwrite is once a month? I don't think that is possible.

    I'd appreciate any advice. And thank you for taking the time to read my post.

    Regards,

    D.

  • Do you have a field in each table that you can confidently rely upon to provide a monthly cut off, Or a field which provides a FK reference to the primary table

    E.g. Order would have an order_Date and and OrderID, OrderLine would have an OrderID

    If so then I have some scripts that will generate dynamic SQL to copy the master table and child tables to the archive database one month at a time. You add all the tables you want to copy to a temp table and it then CURSORS through the TT records generating an executable script. The source and target database need to have the same table structure and they need to be on the same server, or at least a linked server. As part of the TT criteria, you can specify how many months you want to keep in the application database

    The cutoff for each master table is based on a defined field name containing a date and the cutoff for each child table is based on a field name containing the FK. It currently only supports single field FKs so you would need to modify it you had any compound keys

    The copy-delete of each table is fully supported by transaction scoping so if there is an issue with the delete, both the delete and copy will roll back and fill in a log table with the results of the transfer.

  • So what's supposed to happen if someone adds a column to a table or an index?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Aaron,

    Thanks for replying...

    Do you have a field in each table that you can confidently rely upon to provide a monthly cut off, Or a field which provides a FK reference to the primary table

    ...unfortunately not.

    Jeff,

    That Q for me or Aaron?

    Regards,

    D.

  • @OP

    so if you don't know what dates records were created/last updated, how would you choose records to be archived?

    @JM

    If someone changes the table structure, you are ~£%$!"$* 🙂 but then you would be if you were using SSIS packages to archive the data.

    You would need to restructure the tables in the archive database to match the current schema.

  • Duran (4/22/2015)


    Aaron,

    Thanks for replying...

    Do you have a field in each table that you can confidently rely upon to provide a monthly cut off, Or a field which provides a FK reference to the primary table

    ...unfortunately not.

    Jeff,

    That Q for me or Aaron?

    Regards,

    D.

    You.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • aaron.reese (4/22/2015)


    @OP

    so if you don't know what dates records were created/last updated, how would you choose records to be archived?

    @JM

    If someone changes the table structure, you are ~£%$!"$* 🙂 but then you would be if you were using SSIS packages to archive the data.

    You would need to restructure the tables in the archive database to match the current schema.

    That's exactly what I was getting at. IIRC, replication now handles such thing "auto-magically" but I'm personally loath to setup replication for anything.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hello,

    Sorry Aaron, I misread the question, I have had a look and some tables to not have PK or FK, most tables have a createdate column, but some tables have no date columns at all. Is this a badly designed DB, well yes, but unfortunately we cannot change the design because it was created by the application DEV team of the third party, so we therefore are not allowed to change the schema. Which I guess answers your question Jeff, nobody will be adding or taking anything away.

    Regards,

    D.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply