Keep Data Forever

  • Comments posted to this topic are about the item Keep Data Forever

  • The problem of readability of (very) old data from a backup is twofold: a) is the data physically readable (the bits) and b) can those bits be translated into meaningful data (ie is the file-format still supported by current software or is the original software still around and executable on current operating systems or can the OS needed to execute the software for reading the data-file installed on current hardware or can the old hardware with the old OS needed to execute the original software for reading the data-file be connected to your current network?

    So, for archiving data you need to be very conservative in both physical as well as "soft" format of the data-backup. The physical problem is the easiest to solve (copy the backups to newer media every few years). A similar technique could be applied to the "soft" problem at the same time: when copying to newer media you could convert the files into a more modern file-format.

    So it is not going to be a set-and-forget solution (putting the backup-media into a file-cabinet in the cellar). Because before you know it it cannot be retrieved into a meaningful form.

    The end-result for me is that my portable USB-drive contains excel-files originally created with Lotus 1-2-3 in DOS on my XT (remember that one :crazy:?). Begs the question why I keep them though, I haven't touched them since I converted them into Excel about 7 years ago or so.... But I keep faithfully copying them over whenever I move to a next backup-solution...

  • Interesting that this editorial comes up now as my current project is to design a purge of older data. I'm not archiving any of the data, but the needed data is archived through documents that have been generated which are stored electronically in the CMS system. Now how long they'll keep the documents is not my worry.:-P

  • Keeping data forever?

    There are a few challenges:

    Physical locations fail.

    Media fails catastrophically (crashes). Try to keep it on two _different_ forms of media, at least one of which should not be vulnerable to shock.

    Media fails slowly (bit rot). Try to use something that is resistant to this, something that can detect this, and something that can recover afterward.

    Checksums and hashes (preferably SHA512 or another 512-bit current generation hash) for larger files... like databases can detect this; but don't resist and can't help you recover.

    RAID1 and RAID10 is not necessarily a good choice; does your implementation correctly figure out which of two supposedly good copies that are somehow different is correct, assuming neither drive is reporting a bad block?

    RAID5 and RAID6 fit all three criteria if and only if consistency checks are performed; without this, it's entirely possible for bit rot to be hidden until one (RAID5) or two (RAID6) drive fails, after which they cannot recover (and may not be able to detect).

    Reed-solomon and other ECC software solutions (DVDisaster for CD, DVD, and Blu-Ray[/url], Par2+TBB for files in general[/url], etc.) can both detect and recover from bit rot or partial physical media destruction.

    Means to read your media no longer exist. So; you've got some reel to reel tape, likely in EBCDIC, in the closet. You're going to read this how? Or a tape cartridge from the late 80's? 8" floppies? A stack of punch cards? Even if it is good (not terribly likely except for the punch cards, which are probably fine), do you have anything left that can read it? Do you have something to plug it in to? Can that something talk to anything else anymore, and if so, how close to modern tech can you get with each jump?

    You need to keep moving the archived data onto newer media, and running some kind of bit rot check every time, making sure each copy really is accurate.

    Catalogs are forgotten. A much harder problem; after twenty years, if there's no one left that remembers either what file X is, or how to answer question Y... what good are your forever archives, particularly if you have eight versions of almost the same thing?

  • I tend to keep old database backups on hard drives, USB Keys, CDs and/or DVD’s. I recently had a client ask me about something we designed eight years ago. I was able to restore the backup and answer the question.

    I had another client years ago who knew that I have a tendency to stash db backups all over the place. They lost a machine and had not backed up their database for 1.5 years. They found an old backup and were at least able to get something restored. If it were not for the old backup they would have had nothing.

    The point is to keep as much as you can as long as you can in as many places as you can. At the end of the day we are all DBAs and one of our core responsibilities is to protect the data. You never know when something is better than nothing.

  • The Sumerians have already successfully addressed all these issues. You want to store your data in cunieform on low-fired clay tablets.

    We have and can reading Sumerian accounting data that is over 5,000 years old. Both the storage and coding solutions they used have proven track records of durability and usability.

    🙂

  • The first question really is "should you be keeping the data?" Wikipedia has a nice overview of that issue at http://en.wikipedia.org/wiki/Records_management

    Too few companies pay attention to records retention standards even for paper documents. Digital data is just a different delivery system in many respects, so I believe the existing rules would apply.

    Once you've determined what data to keep, the technology issues of how to maintain it comes into play. The VERS standard from Australia may come in handy: http://210.8.122.120/vers/vers/default.asp


    Here there be dragons...,

    Steph Brown

  • You want to store your data in cunieform on low-fired clay tablets.

    Cunieform - I had to snicker at this. Yet essentially correct. Who knows if we'll be using the same language in 1000 years, much less that a glyph will be represented by eight (or sixteen - we can't even standardize on that) 1's and 0's that far ahead? As others have said, I agree the problem won't be the method of storing the data, it'll be reading/interpreting it back out again. Anyone want to try reverse-engineering table data out of an MDF file bit dump? :w00t:

  • This sounds like an annual project: verify long-term backups are on supported media, and in a usable format.


    Peter MaloofServing Data

  • tslaxar (8/25/2011)


    You want to store your data in cunieform on low-fired clay tablets.

    Can someone please tell me the collation setting for Cunieform? I can't seem to find it.


    Peter MaloofServing Data

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply