Data Archival: 1 table (Database A) to 1 table (Database B)

  • I'm currently in the process of building an archival  & reporting solution and just want to run through a bunch of scenarios and determine what would be best in my situation. I can easily achieve this multiple ways but I don't know if this is the "right way" or the "best" method

    My objective is to move Previous Months data from 1 table in Database A to  1 table in Database B. This table will be identical, with the exception of maybe something like an additional column like "Load Date" and Reference to the Original table

    My current transactional table will only store "Current Months" data.

    CREATE TABLE [dbo].[tbl_audit_log](
    [ID] [int] IDENTITY(1,1) NOT NULL,
    [LoggedInUsername] [varchar](100) NOT NULL,
    [AuditAction] [varchar](255) NOT NULL,
    [StartDateTime] [datetime] NOT NULL,
    [EndDateTime] [datetime] NULL,
    CONSTRAINT [PK_tbl_audit_log] PRIMARY KEY CLUSTERED
    (
    [ID] ASC
    )WITH (PAD_INDEX = ON, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 100) ON [PRIMARY]
    ) ON [PRIMARY]
    GO

    CREATE TABLE [dbo].[tbl_audit_log_archive](
    [ID] [int] IDENTITY(1,1) NOT NULL,
    [AuditLogID] [int] NOT NULL,
    [LoggedInUsername] [varchar](100) NOT NULL,
    [AuditAction] [varchar](255) NOT NULL,
    [StartDateTime] [datetime] NOT NULL,
    [EndDateTime] [datetime] NULL,
    [LoadDateTime] [datetime] NOT NULL,
    CONSTRAINT [PK_tbl_audit_log_archive] PRIMARY KEY CLUSTERED
    (
    [ID] ASC
    )WITH (PAD_INDEX = ON, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 100) ON [PRIMARY]
    ) ON [PRIMARY]
    GO

    ALTER TABLE [dbo].[tbl_audit_log_archive] ADD CONSTRAINT [DF_tbl_audit_log_archive_CreatedDateTime] DEFAULT (getdate()) FOR [CreatedDateTime]
    GO

     

    Some of the possibilities I can think of are

    1. I can build a SQL job / SSIS Job (Monthly), that will move batches of data to the new table and then delete from the original table which is a slow process but will achieve the result.

    2. I can build a dynamic script that will "select into"  the results to a new table and then bulk insert that into the archive table, followed by cleaning up original table

    3. I can create a Monthly partition (which can be automated)

    I have the luxury of maintenance times / outages to setup/process and fix if it fails. Also, if it doesn't run as expected for 1 month its not end of the world.

    Any feedback on best method, to follow through.

     

     

     

     

  • Cluster both tables by ( StartDateTime, ID ) rather than just ID.  Then, since you'll be moving data in cluster key order, you can just use a standard copy-then-delete approach.

    Btw, if possible, you should encode the AuditAction to make it a smallint/int rather than a long text column.  That will be more normalized, more accurate if a description needs to change, and save a lot of disk space too.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Hey Scott,

    Thanks for that, I was planning to just create a NC IDX on StartDate time, and then write a CTE to return the IDs  for say TOP X records Where StartDateTime < @datetime, Once I had those IDs, I was going to copy to target and delete from Source.

    Which I guess is a similar approach but you just had to CI  on both

    So you think just Inserting the Data into target by batches then Deleting the data by that batch in the source is the way to go.

    You think over the partition option?

    Re: Audit Action, I totally agree, however this is an old notepad logging process, which is now coming to the DB to load - Unfortunately I can't win that battle to make it better ... out of my hands.

     

     

     

  • This was removed by the editor as SPAM

  • You don't really need to partition, assuming you re-cluster, although you could.  What will happen is that you will partition on date, so you will have to add date to all other indexes anyway to align them to the partitioning.

    Yes, it will be a lot of rework to partition the main table.  Instead, perhaps for now, you could just re-cluster the archive table.  The real problem is that by default people partition just by identity instead of picking the best clustered index for that table.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • The issue with partitioning is that the archive table must reside in the same filegroup as the main table - in the same database.  If the goal is to move the archive data to a separate database - then partitioning would not gain you anything.

    If the tables are in the same database or on the same instance you could use DELETE ... OUTPUT.  Instead of writing 2 separate statements it can be done in a single statement - deleting from the main outputting the deleted rows into the archive table.

    If the goal is to move the data off that instance, then SSIS would be an option.  If I were looking at doing this, I would setup an SSIS project to archive yesterday's data.  Get that running and then you don't have to worry about cleaning up - just set up a separate job that purges data from the table older than xx.  You could run the purge every day for any data older than 1 month from today and have a rolling month of data in the source - and all data up to midnight this morning in your archive.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Thanks both, Good news is the DB is on the same instance, so I can go down the route of DELETE OUTPUT etc....

    Thanks for clearing up the Partition, I didn't know it had to be on the same DB etc.

     

    I'll take both your suggestions and work on the solution - thanks for all your help.

  • It is better to not just designate an arbitrary integer as your clustering index. But instead consider how the table will join to other tables and be filtered.

    If you are moving loads of records in a busy environment be careful with causing blocking. Run in batches of less than 5000 at a time to control contention with other working processes.

    ----------------------------------------------------

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply