Advanced Data Masking

  • Good afternoon.

    I was told that we need to obfuscate the pii data in our lower level environments.

    I used the Dynamic Data Masking in SQL Server 2016.  Works great.  However, the BI team doesn't like.

    Birthdate - they want the mask to show the real birthdate plus 2 months.
    For the name - they want the mask to show another name rather than FXXX SXXXX.
    For the SSN - they want the mask to show randomized SSN numbers and actually have randomized numbers instead 'XXX-XX-XXXX'

    I've gone through several blogs but they are all introduction to the Data Masking in SQL Server 2016.  Anyone have some advance examples of using the Data masking?

    Thanks.

    Things will work out.  Get back up, change some parameters and recode.

  • You might get the SSN one done, but data masking won't work for a or b.

  • Thank you.  I didn't see any examples that could add two months onto the birthday by using the Data Masking.  I also didn't see any way to display a masked name account as another name.  I've seen a RedGate product that can do these two things.  However, I have not seen how to get the desired masking by using the Data Masking functions.

    Things will work out.  Get back up, change some parameters and recode.

  • Wasn't trying to push the Redgate tools, but we do have Data Masker for SQL Server that does this. There is randomization, variation of numerics/dates, and more.

    https://www.red-gate.com/products/dba/data-masker/

    Disclosure: I work for Redgate.

  • WebTechie - Monday, April 30, 2018 12:22 PM

    Good afternoon.

    I was told that we need to obfuscate the pii data in our lower level environments.

    I used the Dynamic Data Masking in SQL Server 2016.  Works great.  However, the BI team doesn't like.

    Birthdate - they want the mask to show the real birthdate plus 2 months.
    For the name - they want the mask to show another name rather than FXXX SXXXX.
    For the SSN - they want the mask to show randomized SSN numbers and actually have randomized numbers instead 'XXX-XX-XXXX'

    I've gone through several blogs but they are all introduction to the Data Masking in SQL Server 2016.  Anyone have some advance examples of using the Data masking?

    Thanks.

    Masking is easily broken.  They need to get the idea that they need to change the actual data instead of masking it or doing some two month trick.  Masking stuff won't help you pass an audit.  You have to irreversibly obfuscate the data. 

    Oh, yeah.... if your SSNs are stored in clear text in production, you're begging to fail an audit.  SSN/TID should be encrypted.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Our Data Masker product will replace all data with other data. No algorithmic change that can be broken.

  • Steve,

    Yes, I saw a demo of that product.  That is why I was asking if the Dynamic masking did that.

    When they say they need the birthdate, but add two months to the month, I didn't think data masking would do that.
    Can we change the name to an actual name, but just not the name of the clients?  Again, I didn't think data masking would do that.

     To show random ssn's, I need to unencrypt the ssn to get a value and then try to randomize it I guess.  Or keep it encrypted and just add a randomized 9 character value as a mask.

    That was why I was asking to see if anyone knew advanced techniques that I wasn't seeing with the Dynamic data masking in SQL Server 2016.

    Thanks.

    Things will work out.  Get back up, change some parameters and recode.

  • WebTechie - Tuesday, May 1, 2018 11:07 AM

    Steve,

    Yes, I saw a demo of that product.  That is why I was asking if the Dynamic masking did that.

    When they say they need the birthdate, but add two months to the month, I didn't think data masking would do that.
    Can we change the name to an actual name, but just not the name of the clients?  Again, I didn't think data masking would do that.

     To show random ssn's, I need to unencrypt the ssn to get a value and then try to randomize it I guess.  Or keep it encrypted and just add a randomized 9 character value as a mask.

    That was why I was asking to see if anyone knew advanced techniques that I wasn't seeing with the Dynamic data masking in SQL Server 2016.

    Thanks.

    Seriously... stop trying to use masking for any of this.  You need to change the actual data.  Masking is too easily broken even by rookies.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Data Masker can help. You have some good thoughts here, so I'll try to write a few pieces that might let you see how our product works.

    In terms of the SSNs, I'd just replace with a new encrypted value. Don't bother decrypting. For names, dates, that's easy with a product. Actually could be easy with scripts, but once you start to have scale, tooling helps.

  • Jeff,

    What is your thought on changing the data?  I am not sure I follow you with changing the data.  This data masking obfuscation is just for refreshing the lower level environments with production data.

    However, I am interesting in hearing your thoughts.  If you change the data, how do you maintain the integrity of your business data?

    Thanks.

    Things will work out.  Get back up, change some parameters and recode.

  • If, by data masking, you simply mean something like using a mask for viewing purposes, then be aware that such masks are incredibly easy to overcome and you can actually get to the actual underlying data. It isn't satisfactory to pass a PII or any other type of security audit and if someone is able to grab the data from the tables or an in-house person wants to see the actual data, it's a nearly trivial hack that even a newbie to SQL Server can do to see the unmasked data.

    That means that the actual data in the tables needs to be irreversibly modified.  For example, even though we have encrypted SSNs in our production database and that encryption methodology is active in the lesser environments that we copy to, we still very deliberately overwrite all the SSNs with a simple 9 digit value based only on the ID of the row because, while we need to be able to test the encryption, we don't want anyone in the lesser environments to be able to use the decryption to view real SSNs.

    As for maintaining the integrity of the business data, we do that by database design.  We can easily change names and SSNs and other PII at any time because we use surrogate keys when addressing such data from other tables.  It's just basic normalization.  You should NEVER have key columns in any other tables that contain any PII.  For example, SSN should only live in one and only one table in the entire database.  It's a fatal flaw to use SSNs as the key for anything.

    If your database isn't setup that way (normalized to prevent the use of PII info as key columns), then you'll fail audits and you risk a MUCH greater chance of someone being able to hack your data, possibly providing you with the future entertainment of reading about you and your company on "todays list of companies that have been penetrated" and paying (could add up to millions of dollars) out the nose to protect the identity of those people that have had their information exposed.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • To be clear, Data Masker allows you to changes data in multiple columns/tables and maintain integrity. Even if you don't have a good design like Jeff has. I've looked for holes, and I don't really see any. You can really match up data across tables and ensure that children match the parent, even while you've changed PKs, altered names, etc.

  • Thanks.  I downloaded it and am taking a look at it.

    Things will work out.  Get back up, change some parameters and recode.

  • One thing to keep in mind is that this changes data inline, so copy your db (restore in dev), then mask.

    The tool is a little non-intuitive. I'm working on some getting started video/written stuff to make it clearer.

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply