Data Masker for SQL Server is a fantastic product, and one that can reduce your attack surface area and exposure under the GDPR or any other regulation. There are plenty or rules and templates to help you anonymize data, in ways you might not have expected. It’s a part of SQL Provision and the SQL Data Privacy Suite.
In this post, I want to show you a common technique that can be used to anonymize date data. In this case, we’ll change birthdays from an original data set to new values using SQL Provision.
Masking the Birthdate
Here is my original data. As you can see, I’ve got some values for person_id, full_name, and birth_date. These are the values that exist in my database right now.
My concern as I move to QA and development environments is that even if I mask the id or name, the birthday might reveal too much information about this user. As a result, I need to anonymize these. I’ll use Data Masker to accomplish this task.
This isn’t a tutorial on using Data Masker, so I’m assuming you know how to create a masking set. In this post, I’ll just use an empty masking set and alter the birthdates.
To do this, I’ll add a masking rule that’s a substitution rule. I’ll add this and choose my dbo.DM_EMPLOYEE.birth_date column. Once I do that, I have a number of datasets I can choose from.
If I choose the Data Variance (Constant) item. I see this configuration:
Here I can set a random variance that will be used and choose a range. For my test, I altered these defaults to +-5 days, with zero being invalid. If I run that on my database, I get a constant set of movement of the birth_date values. I’ve set up a vertical partition to show you the original data on the left and the new data on the right:
As you can see, the values have all moved 4 days. That might be OK, but in this case, I don’t like this.
Instead, I’ll reset my database (in seconds with SQL Clone), and use a different dataset. This time I’ll choose Date Variance (Random) to ensure each row gets a random date that is within a variance. Again, +- 5 days and no same dates.
After running this rule, I now see these results (again, left is original, right is masked):
Now I see the first person has a date that’s 2 days later while the second person has one that’s 4 days earlier. Random movement among all the rows.
This is what I want, and I can now easily save this dataset as RandomBirthdays.dmsset.
When I go to use SQL Provision to create a new image from my original data, I’ll add this dataset in the modification step (GUI shown, but this would typically be added as a parameter in the PoSh script).
This would ensure that every clone made from this process included random birthdays that are close, but not correct. This should keep my application functioning when there are date values needed for data.
Similar techniques can be used to alter order dates, shipping dates, etc.
SQL Provision includes Data Masker and SQL Clone, two amazing tools that will free up your developers from tedious tasks, ensure they can use large datasets for their work without the hassles and delays of restoring large databases.
If you want to give this a try, download an evaluation today and play around by implementing your own masking requirements.
I have other articles on Data Masker if you’re interested.