SQL Server 2016 Archive Strategy advice.

  • Greetings.

    I’m after some advice.

    I’m helping a colleague with a SQL Server 2016 database that is getting quite large and they are considering an archive strategy.

    The database essentially stores data from a number of sensors on machines that are building stuff. Readings are taken every 1, 5 and 10 seconds from a variety of sensors. As you can imagine the tables storing this data are growing quite quickly.

    The historical data needs to be maintained in case one of the devices demonstrates an anomaly in the field and an investigation needs to be undertaken to see if there were any issues within the manufacturing process that were not originally picked up. The consequences of a failure can be serious.

    So far in several years no major failures have occurred and this data has only been needed on a couple of occasions. Failures are so far considered a rare event.

    If and when this data is required it is not required immediately so it can be stored off site and off line and loaded into an offline database if it is ever required to be reviewed by engineering staff.

     

    The thoughts are.

    1 Create a new identically structured database on a separate storage system within the data centre and copy the historical data across to this storage.

    2 Verify that the data copy is valid.

    3 Copy the newly created historical archive database to an off site storage area.

    4 Delete the historical data from the production database.

    5 Re-run statistics if they haven’t already been triggered by the delete.

     

    Is this a valid archive strategy?

    Is there a better option?

    Are there any gotchas with this option?

    Thanks in anticipation.

  • This sounds like a reasonable approach to me. After the archive you might want to shrink the databases a little. I'd also suggest setting up an automated process to continue to archive the data, maybe only keep a few months immediately online, otherwise, archive the whole thing. Have it constantly roll to the offsite.

    I'm not a huge fan, but you could consider Stretch Database. It's a mechanism that moves rows to azure based on your definition and does it all automatically behind the scenes. Setup is silly easy. However, it's fairly sensitive and pretty costly. It's at least worth an evaluation in this situation.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thanks for your vote of approval Grant.

    The database has been alive for about 7 years now and they would like to keep about 18 months online.

    We'll proceed with a few test environment test and see how it goes.

    Cheers

     

  • I had to do something similar where database size was growing out of control, simple maintenance tasks were taking for ever. The sooner the better you have this implemented. In my case the business needed only 36 months of data, i added a little buffer and basically moved anything older than 38 months based on certain criteria. Few things that i have learnt during my journey:

    i) Try to use SSIS , it makes big difference in performance. My dream was to using SSIS with CozyRoc plugins for dynamic column mapping, unfortunately that never happened. I wrote few stored procedures to make this happen.

    ii) Your process should be rock solid and able to handle a failure, meaning if you had to kill your process it should do a clean roll back, you definitely do not want to have duplicate data. This is one place where spent a lot of time testing which different scenarios to make absolute sure there are no duplicate data.

    iii) Try to come up with an approach where can may be just get few key unique columns in a separate table, and join back to that table when archiving.

    iv) Have some sort of alerting to check the status and notify you in-case there is duplicate data or data missing.

  • If duplicate rows of data are not a problem (for a stream of measurement data, duplicates probably aren't a worry) then a simple copy of everything should work. Verifying all rows are copied, with zero duplicates, for a continuous stream of time-based data, is not easy.

    We have implemented a solution for a similar problem, using Partitioned Tables - which solves a lot of problems about data consistency and integrity, and also allows large quantities of data to be swapped out very quickly. Older tables are set to read-only, and backups can be easily taken. Very old tables can be deleted, after backups are verified.

    The solution is relatively complex, but might be worth considering for your environment.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply