• roger.plowman - Tuesday, February 26, 2019 7:24 AM

    The first question that should be asked is, should you even have a data lake or data warehouse?

    Harking back to the whole security issue, a data lake is precisely the kind of holy grail hackers would be salivating for. Since you're dumping (mostly) raw data into it, what are the chances that it contains PII? Or even sensitive information that could embarrass/seriously threaten your company?

    Second, if you make the data immutable how do you update data that's erroneous? Or delete data in accordance with GDPR / some as yet unwritten law?

    I suspect immutability should be asked after asking if you should even have the data lake or warehouse in the first place.

    Having or not having data is a real problem in the legal arena, and can help or hurt.  Obviously it can and does work both ways, and unfortunately will likely depend on the skills of your legal defense versus the opposition.  Ideally I would have to favor the historical method of correcting rather than modifying the original, but either way carries its own risks.  At least preserving history and making a real record of corrections offers an honest approach and would serve to remove suspicion of tampering from consideration of other problems.

    Rick
    Disaster Recovery = Backup ( Backup ( Your Backup ) )