SQLServerCentral Article

SQL Server Data Masking: a comparison with Gallium Data

,

Introduction

In SQL Server 2016, Microsoft introduced a new feature called dynamic data masking, which allows you to mask the values of certain columns and keep that data hidden from certain users, without having to modify your applications.

Let's take a look at how SQL Server does data masking, and compare it to the way Gallium Data (a free database proxy) goes about it.

Bottom Line

  • SQL Server data masking has the main advantage of being built-in
  • Gallium Data has the main advantage of being much more flexible

Demo Database

For this article, we'll use the following trivial database:

create table demo.customers (
    id            int not null,
    first_name    nvarchar(100) not null,
    country       char(2) not null,
    primary key  (id)
)
insert into demo.customers values
(1, 'Andrea', 'AR'),
(2, 'Bruno', 'BE'),
(3, 'Charles', 'CA'),
(4, 'Daniella', 'DK'),
(5, 'Eric', 'ES')

SQL Server Dynamic Data Masking

Let's assume you want to partially hide the value of the first_name column. With SQL Server, you can define a dynamic mask with the following:

alter table demo.customers
alter column first_name 
add masked with (function = 'partial(1,"XXXXXXX",0)')

Any user (other than the owner of the table and DBAs) will then see the data as masked, e.g.:

execute as user='test_user'
select * from demo.customers
revert
id    first_name  country
1     AXXXXXXX    AR
2     BXXXXXXX    BE
3     CXXXXXXX    CA
4     DXXXXXXX    DK
5     EXXXXXXX    ES

You may wish to exempt some users from this type of masking with the following:

grant unmask to test_user

But that's a database-wide permission: that user will now have unmasked access to all masked data in the database.

Overall, this is a nice feature, and it's fairly easy to use, but it does have some limitations. What if we need a mask that depends on the value, or on the user, or on the time of day? You can only use one of 4 built-in functions to do masking, and you cannot use your own functions.

How Secure Is This?

Data masking works well if your users are limited to a fixed set of SQL queries, which is often the case. But if database users can create their own SQL queries, they can easily bypass data masking with queries such as the following:

execute as user='test_user'
select * from demo.customers where first_name like 'Br%'
revert
id    first_name  country
2     BXXXXXXX    BE

This allows the user to determine that the second letter of customer 2's name is 'r'. Obviously, it's pretty easy to extend that and make an end-run around the masking. That's OK: data masking is not meant to be a security measure at the database level, it's just meant to help you make your applications more secure.

Gallium Data Masking

Gallium Data takes a very different approach. It acts as a smart proxy between database clients and database servers and executes user-defined logic (expressed as filters and JavaScript code) that can modify that network traffic when appropriate. For data masking, Gallium Data has two options: modify the SQL command on its way to SQL Server, or modify the result set coming back from SQL Server.

Request Filter: Rewrite the SQL Command

A simple request filter in Gallium Data can catch the statement:

select id, first_name, country 
from demo.customers where first_name like '?'

before it gets to SQL Server and rewrite it to:

select id, substring(first_name, 1, 1) + 'XXXX' as first_name, country
from demo.customers where first_name like '?'

This works well if you know in advance what SQL commands to expect.

You can of course be as fancy as you want in how you rewrite the query, for instance:

select id, 
case
    when country in ('ES', 'DK', 'BE') then substring(first_name, 1, 1) + '€€€€'
    when country in ('US', 'CA') then substring(first_name, 1, 1) + '$$$$'
    else first_name
end as first_name,
country 
from demo.customers
where first_name like '?'

This will make the mask depend on the country column. Note that this rewriting can depend on who the user is, the time of day, or whatever conditions are relevant.

Response Filter: Change the Result Set

The other (sometimes complementary) option is to change the result set as it comes back from SQL Server. This is more expensive because we'll potentially execute code for every row in the result set, but modern computers are amazingly fast.

The JavaScript code for the response filter might look something like this:

let pkt = context.packet;
if (pkt.country == 'BE' || pkt.country == 'DK' || pkt.country == 'ES') {
    pkt.first_name = pkt.first_name.substring(0, 1) + "€€€€";
}
else if (pkt.country == 'US' || pkt.country == 'CA') {
    pkt.first_name = pkt.first_name.substring(0, 1) + "$$$$";
}

If you can do what you need by modifying the query, that's usually preferable, but having complete freedom over how or whether to mask the data on a row-by-row basis gives you an unparalleled level of control.

Summary

The advantages of SQL Server dynamic masking are:

  • it's built-in and does not require another system
  • it has excellent performance, whereas Gallium Data requires additional CPU cycles
The advantages of Gallium Data masking are:

  • the masking is completely up to you, whereas SQL Server can only use one of 4 pre-built functions
  • masking can depend on the value of any columns, whereas SQL Server only supports one mask for all rows
  • masking can be different depending on the user, the user's IP address, etc..., whereas SQL Server only supports one mask for all users
  • masking can be done on all columns, including computed columns and stored procedures, whereas SQL Server can only mask stored columns
  • masking/unmasking can be done with surgical precision, whereas SQL Server's masking is all-or-nothing per user

Conclusion: Who Does It Better?

These two approaches have different merits. They are not exclusive: it's possible to use both to implement different aspects of the requirements.

SQL Server has a simple mechanism that is pretty much all-or-nothing. This is appropriate for many applications that require nothing more.

Gallium Data gives you a lot more flexibility and may be preferable for the more complex cases when the masking requirements are not straightforward. Perhaps you need to mask only some rows for some users, or perhaps the masking depends on the data itself. You get to decide, without limitations.

P.S.

If you're intrigued, you may be interested to see how Gallium Data can also help with row-level security or work with SQL Server's data classification feature to restrict data access, or how Gallium Data can easily restrict queries based on prior behavior.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating