SQLServerCentral Article

Dynamic Data Masking

,

Dynamic Data Masking is a SQL Server 2016 feature that can be deployed as part of your overall security strategy. Data masking is not intended as a primary security solution, but provides a way to obfuscate or hide characters in data such as social security numbers, credit card numbers, email addresses, and other possibly sensitive data so that end users (or non-privileged users) are not able to see this sensitive data.  Data masking hides data by placing X’s or 0’s in place of characters in a data field.

For example, a masking rule can be defined that masks all but the last four digits of any social security number, or the characters of an email string in the result set of any query. A social security number could look like xxx-xx-2345, or an email could display as Gxxxxxx@xxxxx.com. 

Let’s be clear…Data masking is not encryption!  With data masking, data on the back end is still visible. True encryption like transparent data encryption (TDE) and column level encryption is meant to protect the data in case the physical media (like drives or backup tapes) are stolen, allowing bad guys to restore or attach the database and steal data.  Whereas TDE and column level encryption actually change and scramble the backend data, data masking only hides the data in the query results for end users.  

Data is not masked on the physical media or storage device, rather it simply obscures the output or the display of the data, based on whether the user is normal or privileged. Data masking hides the sensitive data in the result set or output of a query over designated database fields, however the data in the database is not changed.  Because data masking rules are applied only to the query results, many applications can mask sensitive data without having to modify existing queries. In addition to using this feature in conjunction with best practices such as encryption, security is further enhanced by not allowing users to execute ad hoc queries, as the data can be easily unmasked by using a CAST statement.

Pros

the benefits of the data masking feature are: 

  • Developers can test environments using real (albeit, masked) data as opposed to scrubbed data, which can often skew query results.  And by using the appropriate data mask to protect personally identifiable information (PII) data, a developer can query production environments for troubleshooting purposes without violating compliance regulations.
  • End users (normal users) are prevented from seeing sensitive data, which can be compliant with policy or simply considered best practice.
  • Creating a mask on a column doesn’t prevent updates
  • SELECT INTO or INSERT INTO to move data from masked column to another table will result in masked data in the target table
  • Backup files retain masked data, and importing a database with masked data will retain masked status
  • Modifications to applications not necessary to read data

Cons

There are also some disadvantages of data masking. These are:

  • Data is completely viewable when querying tables as a privileged user (non read-only user).
  • Masked data can be completely viewable when querying from a temp table
  • Masking can be ‘unmasked’ by using a CAST statement in an ad hoc query
  • Masking cannot be used for the following types of columns:  Encrypted (Always Encrypted), FILESTREAM, or COLUMN_SET
  • It is not useful for protecting data from theft or hacking…it simply hides data when viewing through an end user application or a select statement executed by a read only user

There are four ways to mask the data:  DEFAULT(), EMAIL(), PARTIAL(), and RANDOM(). Each of these produces different masked values for different data types.  PARTIAL() allows you to customize the masking somewhat with a prefix and suffix of the actual data and padded string for character data. For full explanation of defining your data mask, go here https://msdn.microsoft.com/en-us/library/mt130841.aspx

In order to make the dynamic data masking feature work when connecting through an end user application, use security enabled connection string rather than the original one. The following is a link for connecting to a database through the security enabled connection string:  https://www.connectionstrings.com/sql-server/

Data Masking Example

You can mask data in your table columns by either creating your table with masked columns, or altering an already existing table for masking. Below is an example of masking data in an existing table.

-- alter existing table column
USE AdventureWorks2014
GO
ALTER TABLE HumanResources.Employee
  ALTER COLUMN NationalIDNumber
   ADD MASKED WITH (FUNCTION = ‘PARTIAL(0, "xxx-xx-", 4)')
--create a Read Only user
CREATE USER Maryann WITHOUT LOGIN;
GRANT SELECT ON HumanResources.Employee TO Maryann;

You can see below when the select statement is run as a privileged user, all characters are visible in the query results.

-- this will show clear data - run as privileged user
SELECT * From HumanResources.Employee   

You can see the results below:

But when the select statement is run as the read-only user as shown below, the sensitive data is masked in the query results.

--run select statement as the read only user
EXECUTE AS USER = 'Maryann'
SELECT * From HumanResources.Employee
REVERT

The results for this query are a bit different:

Below is an example of a masked email address, selected as a read-only user

Conclusion

The purpose of dynamic data masking is to limit exposure of sensitive data, preventing users who should not have access to the data from viewing it. With today’s FERPA and HIPAA laws, along with a growing concern over identity theft and security, data masking provides an extra layer of privacy and security. Dynamic data masking is complementary to other SQL Server security features (auditing, encryption, row level security…) and it is highly recommended to use this feature in conjunction with them.

“Have you started using Dynamic Data Masking (DDM) yet? If so, you’ll need a safe and reliable way of migrating any DDM changes from one database to the next. The good news is Redgate SQL Compare, the industry standard for comparing and deploying SQL Server databases, supports DDM. Data Platform MVP Steve Jones explores exactly how this works in Using SQL Compare with Dynamic Data Masking.”

Rate

4.43 (7)

You rated this post out of 5. Change rating

Share

Share

Rate

4.43 (7)

You rated this post out of 5. Change rating