Randomizing Names with Data Masker

Steve Jones, 2018-05-18 (first published: 2018-05-07)

Data Masker for SQL Server is pretty cool. It can reduce your attack surface area and exposure under the GDPR or any other regulation. There are plenty or rules and templates to help you anonymize data, in ways you might not have expected. Data Masker is a part of SQL Provision and the SQL Data Privacy Suite.

In this post, I want to show you a common technique that can be used to anonymize names. That’s a fairly common requirement for many datasets, and Data Masker makes it easy.

Last Names

Since Redgate is a UK company, we call last names, surnames. These might also be family names in your culture, but these are the common names shared by everyone in a family.

In any case, there is a series of datasets for last names/surnames. If I look through the list, I see these:

Names, Surname Suffixes
Names, Surname Titles
Names, Surname Titles (Short List)
Names, Surnames, Random (DE)
Names, Surnames, Random (PT)
Names, Surnames, Random (ES)
Names, Surnames, Random (FR)
Names, Surnames, Random (Large List)
Names, Surnames, Random (NL)
Names, Surnames, Random (Short List)

Each of these is a file with a list of names. The encoding is binary, but if you choose one, you can click the “Sample” button to see some of the entries. For example, here are the Last Names (Large List) sample:

There are country oriented sets and then a large and small list of English names. In this case, I’ll map the large list to my customer_lastname column. I could make these all upper case or choose unique values if I choose. I won’t here.

I could also use the WHERE Clause and Sampling tab to limit the replacements. The default it to replace rows where the value is NOT NULL or empty. I won’t bother and I’ll just create this rule.

First Names

Replacing first names is a two stage process, or it can be if you care. Gender can matter with first names, so the process can be one of two methods:

replace gender = ‘F’ with female names
replace gender=’M’ with male names.

This is slower as it requires checking in both passes. A better solution might be:

replace all rows with female names
replace gender=’M’ with male names.

In this case only one set of checks is made in the second step. I’ll do this.

First, I’ll modify the existing substitution rule to add customer_firstname to the list of replacements. In this case, I’ll choose the Names, First Names, Female as the data set.

I’ll leave the WHERE clause alone and save this rule.

Once this is done, I’ll create a new substitution rule. For the dataset, I’ll choose male first names.

Once this is done, I want to limit replacements, so I’ll go to the WHERE clause tab and enter a valid T-SQL clause:

I’ll save this rule and then I see this:

Both rules are listed in order, but there is no linkage. My data masking can potentially run these two rules in parallel. I don’t want that. The second rule depends on the first one having run.

I can add a dependency by dragging the lower rule onto the type of the upper rule. The object behavior is a little funny here, so experiment with “grabbing” the lower rule by clicking on it and holding, them moving it up. You should end up with an indent, signifying dependency.

Now I’ll run the masking set, after querying the table in a window. Once the masking is run, I’ll open a second window in a vertical tab and re-run the same query. You can see the before and after below:

This is a basic look at a multi-stage process to anonymize data in a way that makes sense to your application. Using this technique, I can get production like data in lower environments, without using production data.

Data Masker does some really interesting things as far as cleaning and masking data, so I’d suggest you give this a try as a way to process your lower environments.

If you have questions or want to see other masks, let me know. I also have other articles on Data Masker if you’re interested.

Rate

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

Log in or register to rate

You rated this post out of 5. Change rating

Share

Share

Rate

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

Log in or register to rate

You rated this post out of 5. Change rating

Related content

Book Review: Big Red - Voyage of a Trident Submarine

by Andy Warren

SQLServerCentral.com

Blogs

I've grown up reading Tom Clancy and probably most of you have at least seen Red October, so this book caught my eye when browsing used books for a recent trip. It's a fairly human look at what's involved in sailing on a Trident missile submarine...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

Log in or register to rate

You rated this post out of 5. Change rating

2009-03-10

1,439 reads

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

by Robert Davis

SQLServerCentral.com

Blogs

Question: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? This question was sent to me via email. My reply follows. Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? Databases to be mirrored are currently running on 2005 SQL instances but will be upgraded to 2008 SQL in the near future.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

Log in or register to rate

You rated this post out of 5. Change rating

2009-02-23

1,567 reads

Inserting Markup into a String with SQL

by Phil Factor

SQLServerCentral.com

T-SQL

In which Phil illustrates an old trick using STUFF to intert a number of substrings from a table into a string, and explains why the technique might speed up your code...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

Log in or register to rate

You rated this post out of 5. Change rating

2009-02-18

1,631 reads

Networking - Part 4

by Andy Warren

SQLServerCentral.com

Blogs

You may want to read Part 1 , Part 2 , and Part 3 before continuing. This time around I'd like to talk about social networking. We'll start with social networking. Facebook, MySpace, and Twitter are all good examples of using technology to let...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

Log in or register to rate

You rated this post out of 5. Change rating

2009-02-17

1,530 reads

Speaking at Community Events - More Thoughts

by Andy Warren

SQLServerCentral.com

Blogs

Last week I posted Speaking at Community Events - Time to Raise the Bar?, a first cut at talking about to what degree we should require experience for speakers at events like SQLSaturday as well as when it might be appropriate to add additional focus/limitations on the presentations that are accepted. I've got a few more thoughts on the topic this week, and I look forward to your comments.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

Log in or register to rate

You rated this post out of 5. Change rating

2009-02-13

360 reads