Stairway to Dynamic Data Masking

Datatypes and the Default Mask with Dynamic Data Masking - Step 2 of the Stairway to Dynamic Data Masking

,

In a previous article, we looked at Dynamic Data Masking at a very high level, learning how to set this up, use the UNMASK permission, and remove masking from a column.  In this article we will build on the knowledge from the previous piece and examine the deafult mask available and how it interact with the various data types in SQL Server.

The Masks

There are four different types of masks available in SQL Srver 2016. Each of these is designed to work with certain type s of data, and each is described in Books OnlineThere are a couple of additional masks in Azure SQL Database, and I am sure these will come to the on-premise version of SQL Server at some point. The four types of masks are:

  • Default
  • Email
  • Custom String
  • Random

I will look at other masks in different articles, but let's examine the default mask in detail.

Default

The default mask works with many datatypes. When the default mask is used, the value returned depends on the datatype. This means that a user querying the column, with knowledge of DDM, will know what datatype the underlying column contains. This is a bit of a security weakness.

The default mask values returned are:

  • 4 Xs if the type is a string type and the length is longer than 3 characters. Fewer Xs if the size is 3 or smaller.
  • 0 for the numeic data types.
  • the default date of 1900-01-01 for date datatypes.

We can see this with a little code. Let's create a table for character data, masking all columns with the default mask. Note that the format for this is with the format of (FUNCTION='xx') and the default() keyword inside the quotes instead of xx.

  --  Now create our character table
  CREATE TABLE MyCharTest (
    mychar1 CHAR(1) MASKED WITH (FUNCTION='default()') DEFAULT ('a') 
  , mychar2 CHAR(2) MASKED WITH (FUNCTION='default()') DEFAULT ('aa')
  , mychar3 CHAR(3) MASKED WITH (FUNCTION='default()') DEFAULT ('aaa')
  , mychar4 CHAR(4) MASKED WITH (FUNCTION='default()') DEFAULT ('aaaa')
  , mychar5 CHAR(5) MASKED WITH (FUNCTION='default()') DEFAULT ('aaaaa')
  , mychar10 CHAR(10) MASKED WITH (FUNCTION='default()') DEFAULT ('aaaaaaaaaa')
  , mynchar1 NCHAR(1) MASKED WITH (FUNCTION='default()') DEFAULT ('a')
  , mynchar2 NCHAR(2) MASKED WITH (FUNCTION='default()') DEFAULT ('aa')
  , mynchar3 NCHAR(3) MASKED WITH (FUNCTION='default()') DEFAULT ('aaa')
  , mynchar4 NCHAR(4) MASKED WITH (FUNCTION='default()') DEFAULT ('aaaa')
  , mynchar5 NCHAR(5) MASKED WITH (FUNCTION='default()') DEFAULT ('aaaaa')
  , mynchar10 NCHAR(10) MASKED WITH (FUNCTION='default()') DEFAULT ('aaaaaaaaaa')
  , myvarchar varchar(2000) MASKED WITH (FUNCTION='default()') DEFAULT ('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa')
  , mynvarchar nvarchar(2000) MASKED WITH (FUNCTION='default()') DEFAULT ('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa')
  , mytext TEXT MASKED WITH (FUNCTION='default()') DEFAULT ('1234567890') 
  , myntext NTEXT MASKED WITH (FUNCTION='default()') DEFAULT ('1234567890') 
  )
GO
-- insert data
INSERT dbo.MyCharTest
  DEFAULT VALUES
INSERT dbo.MyCharTest
  VALUES ('1', '12', '123', '1234', '12345', '1234567890', '0', '21', '321', '4321', '54321', '9876543210'
           , 'qwertyuiopasdfghjkl', 'qwertyuiopasdfghjkl', 'qwertyuiopasdfghjkl', 'qwertyuiopasdfghjkl')

If we query this data as a non-privileged user, we find these results:

These results are what we expect, based on the description. The number of Xs corresponds to the size of the string, with a max of 4. If we look at the variable types, we see the same thing.

I didn't include lower lengths for variable types, but they function the same. Note that the MAX type is supported, despite what BOL shows.

Now let's look at numeric types. We see something similar if we create a table with the various types, masked with default. Here's the table:

 CREATE TABLE MyNumbers(
  Mytinyint TINYINT MASKED WITH (FUNCTION='default()') DEFAULT (0)
, mysmallint SMALLINT MASKED WITH (FUNCTION='default()') DEFAULT 1
, myint INT MASKED WITH (FUNCTION='default()') DEFAULT 0
, mybit BIT MASKED WITH (FUNCTION='default()') DEFAULT 0
, myfloat FLOAT  MASKED WITH (FUNCTION='default()') DEFAULT (0.0)
, mynumeric NUMERIC(10, 4)  MASKED WITH (FUNCTION='default()') DEFAULT (0.0)
, mysmallnumeric NUMERIC(2,1)  MASKED WITH (FUNCTION='default()') DEFAULT (0.0)
, mymoney MONEY MASKED WITH (FUNCTION='default()') DEFAULT 0.0
, mysmallmoney SMALLMONEY MASKED WITH (FUNCTION='default()') DEFAULT 0.0
)

And here are the results:

This is what we expect. The numerics are masked with zeros.

I won't repeat this here for date types, but the code is in the download from the Resources section that shows a default date of 1900-010-10 for date types. Here are the results.

What about the special data types? Let's try them and see. They aren't mentioned in the BOL documentation. First, XML.

CREATE TABLE MyXMLTest(
 myxml XML MASKED WITH (FUNCTION='default()')
 )
 GO
INSERT dbo.MyXMLTest
    ( myxml )
  VALUES
    ( '11' ) 
GO

That works fine. The entire XML document is:

Now SQL_Variant? This is interesting. Let's insert a variety of data:

CREATE TABLE MyVariantTest(
 myxml SQL_VARIANT MASKED WITH (FUNCTION='default()')
 )
 GO
INSERT dbo.MyVariantTest ( myxml )
  VALUES ( '11' ) 
INSERT dbo.MyVariantTest ( myxml )
  VALUES ( 21453)
INSERT dbo.MyVariantTest ( myxml )
  VALUES ( 'Z')
INSERT dbo.MyVariantTest ( myxml )
  VALUES ( 'A longer test')
INSERT dbo.MyVariantTest ( myxml )
  VALUES ( DATETIME2FROMPARTS(2012, 02, 14, 8, 15, 22, 4, 7))
GO

If I query this, I get what I expect, based on implicit casting:

Now let's look at GUIDs.

CREATE TABLE MyGUIDTest(
 myguid UNIQUEIDENTIFIER MASKED WITH (FUNCTION='default()')
 )
 GO
INSERT dbo.MyGUIDTest ( myguid )
  VALUES ( NEWID()) 
GO

This isn't quite what I expect, and I think another security hole.

Spatial data?

-- Spatial
CREATE TABLE MySpatialTypes (
  mygeo GEOGRAPHY MASKED WITH (FUNCTION='default()')
, mygeom GEOMETRY MASKED WITH (FUNCTION='default()')
);
GO
INSERT dbo.MySpatialTypes
    ( mygeo, mygeom )
  VALUES
    ( geography::STGeomFromText('LINESTRING(-122.360 47.656, -122.343 47.656 )', 4326)
, geometry::STGeomFromText('LINESTRING (100 100, 20 180, 180 180)', 0)
)
GO

It works:

I also checked the hierarchyID. This one is unique, and working with it is challenging. However once you do, you can see that the data is masked. Here's the table:

CREATE TABLE MyH (
  myhierarchy HIERARCHYID MASKED WITH (FUNCTION='default()') 
);
GO
INSERT MyH 
    VALUES(hierarchyid::GetRoot()) ;

And the results, which are 0x in the table.

Microsoft appears to have considered the default for all types built into SQL Server. There are a few I didn't test here (real, for example), but I suspect they all behave in a similar manner.

Conclusion

The default data mask for DDM seems to be well thought out and covers all the data types most of you will use in your system. I have shown how the various results are affected by the datatype chosen by the column, with each result being what one might guess.

I am a bit disappointed that each data type doesn't return Xs consistently, which would help hide the datatype from the user. Instead, a malicious user will know roughly what type of data is stored in the columns and could tailor attacks for the data types. We will look at attacks in another article.

I hope I've shed some light on the default mask in this piece, and will look at other masks and how they are affected by data types in future articles.

This article is part of the parent stairway Stairway to Dynamic Data Masking

Resources

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating