Auto-Deleting Data

  • It's amazing how often old emails, tweets, videos and pictures surface in the news. What seems like "useless data" today may not be so in the tomorrow. Deleting versus archiving carries a risk of losing that value.

    Who's to say what throwaway comments made on this very forum will be newsworthy in a decade? I guess it depends on how famous any of us get!

  • Eric M Russell - Tuesday, May 1, 2018 7:45 AM

    It makes perfect sense to retain things like sales orders, financial transactions, court records, and health records in digital format indefinitely, because there may be an essential need for this historical information at a later point. But this type of normalized transactional data doesn't consume nearly as much storage as stuff like surveillance video, IoT telemetry, click stream, and binary objects.

    The fact Susan deposited $100 to Tom's bank account back in 2004 may still be relevant today or even 10 years from now. The fact that Susan liked Tom's comment on FaceBook yesterday has questionable value today and will be totally irrelevant 10 years from now.

    I'd disgree. There are legal statues of limitations here, as well as EULA-like arbitration items. Once those pass, the data should be removed to reduce risk, liability, and cost. These aren't short limits, and respecting them makes some sense. If we need longer ones, codify those.

  • skeleton567 - Tuesday, May 1, 2018 7:54 AM

    OK, just for instance, knowing that a decade ago a customer bought $1000 worth of a product probably was a much larger volume of the product than $1000 of the same product now, and that get's into the whole area of cost of storage, sales, transportation, and profitability, all adjusted for inflation.  You can tell I'm still a detail-oriented obsessive-compulsive, right?

    It would appear so.

  • lburleso - Tuesday, May 1, 2018 9:17 AM

    We are living in a time where we store information in unprecedented amounts, yet it will be a black hole to generations in the distant future.
    We have a serious misconception that data lives forever. It only lives as long as technologists maintain it.

    True, but plenty of analog data is lost over the years. Sometimes that's sad, sometimes it can be problematic, sometimes we can recontruct. However, life goes on.

    Some is worth saving and preserving, but not all.

  • dld - Tuesday, May 1, 2018 9:39 AM

    Small text emails are not really a problem in terms of space. The issue with email is when large attachments get replicated many times within a company and it does actually become a space issue. My company has not instituted any retention policies, but I can certainly see problems in the future. Of course, nobody has the time to actually decide what is important and what is not, therefore, everything is saved in the event that it might be important.

    The other issue is with data associated with an application. When the application has been retired or replaced, are we required to keep the software around that can make sense of the data? As an example, we replaced our ERP system 10 years ago. A few years after the migration we told accounting that data would go away in a certain amount of time and they should run reports to save either on paper or as PDF's for audit purposes.

    I'm going to throw something more into this. Recently our CIO informed everything that all emails mentioning a certain topic (that will remain unnamed) must be saved, going back to the early 90's. Apparently some sort of legal issue. I appreciate what my CIO is requesting we do and am willing to comply. However, with thousands of employees that's likely to represent a lot of data. My point is that sometimes it is necessary for a company to retain a lot of data for some regulatory or legal issue. This could represent a lot of storage.

    Kindest Regards, Rod Connect with me on LinkedIn.

  • Of course, deleting data from the online operational database doesn't mean it isn't archived somewhere offline or accessible online from a data warehouse. These days, even the most ancient and obscure archived data need not be totally cold.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Rod at work - Wednesday, May 2, 2018 8:18 AM

    I'm going to throw something more into this. Recently our CIO informed everything that all emails mentioning a certain topic (that will remain unnamed) must be saved, going back to the early 90's. Apparently some sort of legal issue. I appreciate what my CIO is requesting we do and am willing to comply. However, with thousands of employees that's likely to represent a lot of data. My point is that sometimes it is necessary for a company to retain a lot of data for some regulatory or legal issue. This could represent a lot of storage.

    The only thing I can think of here is that there is some litigation that has been initiated. Once that happens, you typically can't delete anything. That's a good reason why you should remove old emails once legal statutes have been exceeded. If you don't, you can't later if an issue arises.

  • I am late to the party, but I do have something to voice in this aspect. I simply hate the very idea of deleting information!
    I make a living off understanding the past and predicting the future, and I keep my work processes fit and lean by constantly reapplying analysis on what happened and what my respond to it was.

    As a rule of thumb: I can never predict which piece of information I am going to need.
    And as a trained statistician, I know that in an analysis what causes most work of all is to deal with holes in data. Because that leads to bias and missing observations - and the complexity needed in the calculations to handle this put so much pressure on the intellect and the machine power that the ratio of errors-to-good-work goes through the roof and otherwise manageable tasks moves way out of reach.
    So: Missing information when you need it = unemployment due to irrelevance.

    And then there is the whole "holding someone accountable for their actions" part: I do not side with the opinion that we should delete delicate information due to the risk that legal proceedings may find them useful (against us). If managers in my company are committing a crime, I as a data professional should definitely not help them cover up their tracks by applying deletion algorithms to emails, records, and files. And in some cases it would even be for their benefit that their own lawyer can identify the risks and issues, instead of first learning about them in the courtroom - because several people may have a copy of an email or a file. And everything can be secured. Even if an email is set to auto-destruct, it can be printet, there can be screenshots, there can be taps into the binary stream, there can be all sorts of reasons why something wasn't deleted after all.

Viewing 8 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply