Data de-identification and/or/versus Encryption?

  • I've seen a procedure that goes through a list of tables and columns and runs a de-identification routine on the columns that replaces the values with output from this routine that gets its value from newid() and substitutes what was there with the new content. So if it runs on first, last, dob, addr, city state zip and replaces them with nonsense its done.

    Otoh, there is encryption, and that sounds like I can leave the names alone because they would be encoded and decoded on the way into and out of storage AND backups, so that sounds much more robust, but before I go ahead with it, am I conflating things by thinking encryption and de-identification accomplish the same thing, or do I need on one and not the other or both?

    thanks very much

  • Deidentification and encryption have different purposes and usually it makes no sense to use one where the other is appropriate; there are cases where you should used both.

    Deidentification is needed when you want to give a dataset to someone for statistical analysis or for testing new software but parts of the actual data (usually data that could be used to identify a person) must be hidden from them (often required by regulatory legislation).

    Encryption is needed when you want to ensure that people who can access the files containing the data or backups can't read any of the data unless they are able to log in to the database and have database user permissions to see what they want to see.

    You may want to encrypt deidentified data for testing and performance measurement purposes, so that the testers can't see the identities but do find any problems arising out of the encryption as well as other problems - that's the case where both are needed.

    Tom

  • Thanks very much.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply