SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 

SQL Data Generator–Masking Production Data

I learned a new trick with SQL Data Generator that I wasn’t aware of previously. I think this is a good idea for masking some of that production data that you might not want developers to have.

Let’s start with a production table. In my case, I’ve created a Sandbox_Prod database with a table in it for employees. I’ve added a few records that contain some sensitive information.

CREATE TABLE Employees
(
empid INT IDENTITY(1,1)
, EmployeeName VARCHAR(250)
, EmpployeeEmail VARCHAR(250)
, active TINYINT
, salary money
, pwd VARBINARY(max)    
)
;
GO
INSERT INTO Employees
VALUES  ( 'sjones', 'sjones@sqlservercentral.com', 1, 10000, ENCRYPTBYPASSPHRASE('The User Sample', 'MyS%83ongPa44#word')) 
     ,  ( 'awarren', 'awarren@sqlservercentral.com', 1, 20000, ENCRYPTBYPASSPHRASE('The User Sample', 'Ano$2therS%83ongPa44#word') )
     ,  ( 'rsmith', 'rsmith@sqlservercentral.com', 1, 5000, ENCRYPTBYPASSPHRASE('The User Sample', 'Kedj93m@@83ongPa44#word'));

GO

I’ve got a second database, called Sandbox, that simulates my development environment. I’ve got the same table, but without any data in it.

What I want to do is move some of the production data to my development area, but not all of it. Some of the production data needs to be masked.

SQL Data Generator Sources

I can use SQL Data Generator from Redgate to do this, by using a data source that actually exists. In this case, I’ll create a new project and point it at my Sandbox database. I’ve deselected all of the tables except my Employees table.

2015-09-22 16_31_57-Cortana

When I pick my Employees table, I see the familiar generation screen on the right. Most of you are like me and notice the number of rows and the option to delete data.

2015-09-22 16_32_54-SQL Data Generator  - New Project _

However there’s another option. I can select the “Use existing data source” radio button instead. When I do this, I have a few choices for data. I can use an existing table or a CSV file. Both of those can be good choices, especially if I have sets of data I want to load into the table. Either one can help me to build known, specific data sets for development (or testing).

2015-09-22 16_35_19-SQL Data Generator  - New Project _

In my case I will choose and existing table. When I do this, I click the “Browse” button and I get a connection dialog for SQL Server. I pick my instance and the production database.

2015-09-22 16_35_26-SQL Data Generator  - New Project _

I click “Next” and then get the chance to select the table to use. In this case, I’ll pick the Employees table.

2015-09-22 16_39_03-Select SQL Table or View

When I return to the main SDG screen, I see the table listed as the source, but my preview shows the actual production data. This is because I’ve mapped the production table as a source, and it will be used as it currently exists.

2015-09-22 16_40_12-New notification

That’s not what I want. I want to mask the email address and the salary. However, now I can change things like I might do for any random data generation.

Let’s first click in the EmployeeEmail column. When I do that, I see the following, the column with its source set as the existing column in the production table.

2015-09-22 16_59_52-SQL Data Generator  - New Project _

However the drop down gives me lots of choices, including an Internet email generator.

2015-09-22 17_00_07-

If I select, then my preview changes. Now the image below shows production data for all columns other than the email.

2015-09-22 17_01_23-New notification

I can repeat this for the salary (and password to be safe). When I do that, I’ll see random data for those columns and production data for others.

2015-09-22 17_03_26-New notification

I can repeat this for all tables in my project, mapping through data that isn’t sensitive, and masking data that is. It’s a tedious process, but it’s a one time process for specific data. Once this is done, every restore can have the project run and the data masked. If production DBAs do this refresh, then developers never see sensitive information


Filed under: Blog Tagged: Data Generator, Redgate, syndicated

The Voice of the DBA

Steve Jones is the editor of SQLServerCentral.com and visits a wide variety of data related topics in his daily editorial. Steve has spent years working as a DBA and general purpose Windows administrator, primarily working with SQL Server since it was ported from Sybase in 1990. You can follow Steve on Twitter at twitter.com/way0utwest

Comments

Leave a comment on the original post [voiceofthedba.wordpress.com, opens in a new window]

Loading comments...