Suggsted approach to masking

  • We have a reporting system that I present multiple databases to (i.e. for multiple clients). Our UAT & DEV source systems don't provide enough business-like data for me to provide real-life situations for testing. Resource constraints also mean I'm not going to get situations keyed in. This has resulted in users developing & testing their reports on the production reporting system.

    So I'm looking to present a copy of the production database (or even the production database itself) to our development servers, but in doing so I need to satisfy the compliance requirement that sensitive data is not available. Preferably I would like something that is self-maintaining i.e. that is self-aware of the user, or the environment it is on and either makes the data available or doesn't.

    So what approach would you take? Would it be a script to mask the data post-restore? Or a table that is security-controlled with an overlayed view / proc that either masks the sensitive data on execute / query or decrypts the encrypted data on execute / query? Or would you use a 3rd party tool to generate duff data? Some other route?

    My issue with something like Data Generator (I have the SQL Toolbelt) is that it is restricted, it's either a whole table or nothing which isn't what I want since that will result in inconsistent data. Additionally, it isn't set up to be all that automated - you need to build a project then run it from the command line, it would be amazing to be able to do something along the lines of "data generator"

    [column] [data type]

    Thanks for looking 😉

  • So what approach would you take? Would it be a script to mask the data post-restore? Or a table that is security-controlled with an overlayed view / proc that either masks the sensitive data on execute / query or decrypts the encrypted data on execute / query? Or would you use a 3rd party tool to generate duff data? Some other route?

    My first choice would be to restore the database while keeping users locked out, cleansing the data, then letting users in. Scripts to sanitize personally identifiable or sensitive data while maintaining the overall linkages between entities are tedious to write but are relatively easy, e.g. update this name to that name, scramble this number, rewrite that credit card number by pulling a random one from a set of check-digit-valid dummy card numbers, etc. are trivial and do not require intimate knowledge of every parent-child relationship.

    Second choice would be to use a data generator but I fear that would become even more tedious depending on how good the toolset is.

    My issue with something like Data Generator (I have the SQL Toolbelt) is that it is restricted, it's either a whole table or nothing which isn't what I want since that will result in inconsistent data. Additionally, it isn't set up to be all that automated - you need to build a project then run it from the command line, it would be amazing to be able to do something along the lines of "data generator"

    [column] [data type]

    Visual Studio has a data generator that may prove to be more flexible for you. I have used it to do unit testing but do not remember the details surrounding that specific nuance.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • we also do data masking with below steps,

    1) updating credit card information (randomly generated alpha-numeric words)

    2) emails : adding some irrelevant prefix to profiler name plus changing the domains.

    3) upodating other payment related columnns ( full fledge columns updates)4) chagin the contact details too like mobile nos.

    i must say all the above steps are time- consuming (week - long - running jobs at)

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply