In the first three levels of this Stairway, we have examined how dynamic data masking works from the perspective of the developer that implements the masks on the columns in a table. Now that we can implement masking, we need to look at the various security permissions as well the limitations of data masking to hide data from users.
The UNMASK Permission
Dynamic Data Masking follows the idea of providing security by default. In this paradigm, without taking an action to allow access to data, the default behavior is security is more restrictive, rather than open. We can see this in a table that has a mask applied to a column. If a user has SELECT permission to the table (or column), the default behavior is that the mask is applied to the data. Users cannot get the original values unless they are explicitly granted permission to do so.
In the current version of Dynamic Data Masking (SQL Azure v12 and SQL Server 2016 as of this writing), there is only one permission associated with the feature. This is the UNMASK permission. When granted to a user, the user can see the original values in a table. The user does not need to change their query in any way. Just as DDM masks data automatically in queries, the UNMASK permission automatically returns the original values of the data from any query.
The UNMASK permission is granted like any other user permission in SQL Server: with GRANT, REVOKE, and DENY. Let's see how this works in practice. Using the same users from the first Stairways, remember that kjohnson is a manager in our system. If kjohnson executes a query, this user sees masked data.
SELECT TOP 10 * FROM OrderDetail;
As we can see the data is masked. Now let's allow kjohnson to see the original data. We grant UNMASK as we could other permissions. We could also grant this to the role, which is the preferred method of dealing with permissions.
GRANT UNMASK TO kjohnson
Once we do this, we can run the same query.
SELECT TOP 10 * FROM OrderDetail;
This time, kjohnson sees unmasked data.
Nothing has changed in our code; only permissions were changed for this user.
However, we have a slight issue. There is also data masked in the OrderHeader table. The OrderTotal column is masked, as seen here:
If kjohnson queries this table, this user can see data.
In fact, all masks are "removed" when the UNMASK permission is granted. When we add this permission to a database principal, we don't specify any object (unlike for the SELECT permission).
I would have hoped for more granularity, and I expect this will be added in future versions, but for now our principals either see masked data from all tables or no masked data.
DDM is really an application programming convenience feature, not a security feature. Despite it being marketed and documented in the security section, and despite the perspective of most of users, this feature only really limits access to data; it doesn't protect the data.
The reason this is a problem has to do with the ability of a user to "guess" data and expose the original values. There are a variety of ways this can be done, and it can be complex, but let's look at a simple example: salaries in a table.
Let's assume that we have an Employee table with the salary for each worker stored in the table. For simplicity, we will stored only the "thousands" value that each employee is paid. Let's look at the table DDL and sample data. Note: we are randomly masking the data.
CREATE TABLE employee ( empid INT , empname VARCHAR(200) , title VARCHAR(50) , salary INT MASKED WITH (FUNCTION= 'random( 1, 100)' ) ); GO INSERT employee VALUES ( 1, 'Mark', 'CEO', 100 ) , ( 2, 'Sheryl', 'COO', 90 ) , ( 3, 'Beth', 'CFO', 88 ) , ( 4, 'Allen', 'VP', 80 ) , ( 5, 'Sarah', 'Director', 60 ) , ( 6, 'Joe', 'Developer', 50 ) , ( 7, 'Amy', 'Developer', 50 ) , ( 8, 'Kendall', 'QA', 45 );
None of these users has been granted the UNMASK permission. Therefore, if any of them query the table, they will get random values returned for the salary. We can see this if user, Joe, SELECTs all the data from the table.
Now Joe wants to know if Amy is making the same salary, so he decides to query the table. He knows his salary, so he uses this in a WHERE clause.
EXECUTE AS USER = 'Joe'; SELECT * FROM employee WHERE salary = 50; REVERT;
He gets these results, which show something we might not want Joe to know:
Without knowing the value of Amy's salary, he can determine what it is. Even though the mask is in place, Joe can see who's salary matches his.
Now, he modifies his query. He decides to see who makes more than he does.
EXECUTE AS USER = 'Joe'; SELECT * FROM employee WHERE salary > 50; REVERT;
These results list all the users who have larger salaries than Joe.
Once again, Joe has gained information about other users, despite the mask. Even though the value aren't correct, Joe knows there's a domain of values that apply. Let's use this information further in a new query:
EXECUTE AS USER = 'joe' ; WITH myTally ( n) AS ( SELECT n = ROW_NUMBER() OVER (ORDER BY ( SELECT NULL)) FROM ( VALUES (1 ), ( 2), (3), (4 ), ( 5), (6), (7 ), ( 8), (9), (0 )) a( i) , (VALUES ( 1), (2), (3 ), ( 4), (5), (6 ), ( 7), (8), (9), (0 )) b( i) ) SELECT e. empid , e. empname , e. title , e. salary , 'real salary' = t .n FROM dbo .employee e INNER JOIN myTally t on t .n = e.salary WHERE salary > 50 REVERT
Now Joe has determined the actual value of all employees' salaries. This is because when the domain of value is known, or can be guessed, a non-privileged user can still query the actual values. In this case, I used a tally table to build a list of all values from 1 to 100 (my domain). By joining this on the real column, I can expose the actual values.
In the results, you see the random, masked value, as well as the actual value in the far right column. These are the same values I inserted into the table at the beginning of this section. However this time I used a tally table to join back to the original table and determine the actual values.
While this is a simple example, this could be extended. The domains for credit card numbers, social security numbers, and various other Personally Identifiable Information (PII) are known. With a tally table, a user can easily circumvent dynamic data masking to determine the actual values of data. This isn't limited to numbers. Tally tables (or a list of any data), can be used to determine the value of string data.
There is also data leakage in other ways. While a user copying data to a temp table or exporting data maintains the mask, data in statistics and CDC contains the actual value. We can see this by creating specific statistics on this table and examining them.
CREATE STATISTICS mystat ON dbo .employee( salary) DBCC SHOW_STATISTICS (employee, mystat)
When we examine the data, note that we see there are rows with various values. I've only shown some of the DBCC results, but you can see that actual data is exposed. While this might not be as disturbing in a larger table, especially one that had substantially more rows than the 200 steps in the statistics histogram, there is still the potential for data leakage.
There is only one permission associated with DDM: the UNMASK permission. However this permission is globally applied at the database level, meaning that if a user has this permission, they have the ability to read the actual data in any column for which they have SELECT permission.
DDM is also not really a security feature. As we showed, a user can determine the actual value of a row with a brute force attack by querying the domain of possible values. Even with a large, but known, domain of possible values, a patient hacker can submit multiple queries and slowly map the actual values of the data.
There are also other potential areas where data leakage can occur, the full list of which is not documented at this time. However users should be aware, again, that this is really a convenience feature for applications to obfuscate data, not a comprehensive security feature that protects data from unauthorized viewing.