Spring Cleaning

  • TravisDBA (4/23/2010)


    Absolutely yes! We just completed a production archival of over 150 million records we no longer needed on our daily production boxes. It's not really complex to do, but you must have a clearly thought-out plan going into it and have all your ducks in a row before starting. As a result, database size, index size, log shipping, replication, etc. has benefited from doing this. We do it March-April of every year. Highly recommended if you want to keep your db servers humming! 😀

    Exactly! Disk space is cheap but when large DBs and indexes start chewing away at performance then there's a problem. So any new project we start either gets partioning implemented from the start or, if we have a retention policy of say 6 months, each night when our DW imports run, the 181st day (and older) data gets deleted.

    But some of our projects didn't start that way. I think business analysts can be blamed for some of this pack rat mentality. To make us keep 3 or 4 year old raw data on the < 1% chance that someone will ever ask for it is a little ridiculous.

    Ken

  • Because we deal with health Care-related information, we never delete anything. My biggest DB is now about 20GB, growing 100MB per day.

    There is no "cleaning" option for us.

    So long, and thanks for all the fish,

    Russell Shilling, MCDBA, MCSA 2K3, MCSE 2K3

  • My background is in Accounting and Business Administration - an often overlooked risk organizations need to manage is record retention...

    PLEASE understand the legal implications of having data you do not NEED to have.

    I really have worked for some really great organizations over the years (and a few dogs). EVEN a great organization risks problems when it keeps information it is not required to maintain. When a subpeona is received for data that you have - it is no defense that you did not *HAVE* to have it. You must assemble and submit all the information requested. Plaintiffs have gone data-surfing to find an issue (even when it is in only a faint hope that they will find something). Yes, you can fight a subpoena. Yes, you should keep quiet about how much data beyond what is required you have opted to keep. But it is SO much simpler to only have what you need.

    I have a client that suffered through more than a year of providing sales data related to a COMPETITORS' anti-trust defense *because* they had the information that might prove/disprove effects of price-fixing in years upon years of sales history. They got "reimbursed" for much of the cost but they were flabbergasted that so much of their data could be dragged out and the total cost/distraction/frustration of complying was hard to measure.

    My suggestion:

    1) Keep everything required by guidelines

    2) Keep what you are using (and what you reasonably will use)

    3) Get RID of the rest

    My favorite response to a subpoena (or any other request for out-dated information): "I'm sorry, we only keep X years of that data."

  • Matt Algate (4/28/2010)


    My background is in Accounting and Business Administration - an often overlooked risk organizations need to manage is record retention...

    PLEASE understand the legal implications of having data you do not NEED to have.

    I really have worked for some really great organizations over the years (and a few dogs). EVEN a great organization risks problems when it keeps information it is not required to maintain. When a subpeona is received for data that you have - it is no defense that you did not *HAVE* to have it. You must assemble and submit all the information requested. Plaintiffs have gone data-surfing to find an issue (even when it is in only a faint hope that they will find something). Yes, you can fight a subpoena. Yes, you should keep quiet about how much data beyond what is required you have opted to keep. But it is SO much simpler to only have what you need.

    I have a client that suffered through more than a year of providing sales data related to a COMPETITORS' anti-trust defense *because* they had the information that might prove/disprove effects of price-fixing in years upon years of sales history. They got "reimbursed" for much of the cost but they were flabbergasted that so much of their data could be dragged out and the total cost/distraction/frustration of complying was hard to measure.

    My suggestion:

    1) Keep everything required by guidelines

    2) Keep what you are using (and what you reasonably will use)

    3) Get RID of the rest

    My favorite response to a subpoena (or any other request for out-dated information): "I'm sorry, we only keep X years of that data."

    I understand, but please remember that "archiving" data is very different than "removing it completely". We "archive" data where I work at, which means we can retrieve it from backups or tape quite easily if a future need arises as you state, but it does not clutter up our current production system, thus affecting performance and precious disk space..In some cases if a IT shop falls under SarBanes-Oxley AUDITING standards (and many of them do today) you just cannot claim " We only keep x years of data", particularly if it is 3-4-5 years or less, but we don't want to keep 3-4-5 years of data in our production system either because 99% of most of our daily data requests/inquires are current year or at most 1 year back. But, on the otherhand, we can't just tell people we don't keep that beyond 1 year either, that won't float with the auditors. if you don't think this is the way it is nowadays, please see this link for the legal requirements in long-term data retention that Sar-Box requires by law now. Depending on the industry, what was once five or seven year retention periods is now expanding to 20, 30, or even 70 years. Today, retention periods are determined almost exclusively by government regulations and not from business needs. "We don't keep data back that far" just does not wash anymore.

    😀

    "Technology is a weird thing. It brings you great gifts with one hand, and it stabs you in the back with the other. ...:-D"

  • I'm coming rather late to this discussion, I guess, but here goes.

    At Neos we set up our customer sites so that the main active database got a full backup daily (retained for at least 2 weeks) and a log backup every 15 minutes (also retained for at least two weeks), and that database was replicated (transactional replication) to a second server which was also backed up. Since the size of the activity history grew grew by roughly the same as the base size of that database in a week (two days on some sites, three or four weeks on others, depending on how busy the site was) the performance cost of backups and the disc cost of storage space for them would have been prohibitive if we didn't do regular archiving from the main active database to a history database (which operated on a very different backup regime, and used simple recovery instead of full recovery, and was also replicated). We automated this as a trickle feed from the active database to the history archive, with a partial purge of the main database (keeping history records less than 3 weeks old in the active db, since these might be needed there) every time we secured an full backup of the history database (twice a month, retained for 3 months). History over five years old was scrapped completely - three or fout times a year, also automated. Because this was all automated, the people cost (once it ws all working) was zero, which was extremely important as we were rather short of people. Also the disc storage requirement was reduced greatly, and the load load blips backups were greatly reduced, which was also important because our customers generally didn't want to pay for more discs.

    History was needed for management information, mostly analysing usage trends (of various features) and popularity/takeup (of various media - individual films, music albums, TV channels, web sites,...) and customers wanted year-on-year comparisons covering several years so it could just be throw away.

    Tom

Viewing 5 posts - 16 through 19 (of 19 total)

You must be logged in to reply to this topic. Login to reply