SQLServerCentral Article

Searching an Encrypted Column

,

When encrypting a database column, equality searching can be preserved by using a deterministic encryption algorithm. A deterministic algorithm will always return the same encrypted value for a given value. For example, if a [Customer] table is defined as follows:  (Note: I’ve left out usual constraint syntax for brevity)

CREATE TABLE dbo.Customer (
ID INT,
Name NVARCHAR(100),
BirthDate DATE,
SSNEncrypted VARBINARY(8000)
)

When searching for a specific SSN, simply encrypt the parameter and issue a query against the table:

SELECT
Name
FROM
dbo.Customer
WHERE
SSNEncrypted = @EncryptedSSNValue

This method works well for both equality as well as non-equality searches but what about other comparisons?

Problems with Inequality Searches

Performing searches on a column using comparison operators (such as greater-than or less-than) implies the column can be arranged in an order. This is problematic when searching an encrypted column because the encrypted values are not order-able. For example, we could retrieve all customers born in February 1969 with the following query:

SELECT
Name
FROM
dbo.Customer
WHERE
BirthDate BETWEEN '1969-2-1' AND '1969-2-28'

But if we want to encrypt [BirthDate] our [Customer] table might look like the following:

CREATE TABLE dbo.Customer (
ID INT,
Name NVARCHAR(100),
BirthDateEncrypted VARBINARY(8000),
SSNEncrypted VARBINARY(8000)
)

Because the encrypted, binary values for [BirthDateEncrypted] are no longer order-able, encrypting the parameters and executing the below query will result in incorrect results.

SELECT
Name
FROM
dbo.Customer
WHERE
BirthDateEncrypted BETWEEN @EncryptedStartDate AND @EncryptedEndDate

One might decrypt the column for comparison. This is what happens automatically when using the database engine to obfuscate encryption: (I’m using a pseudo function DECRYPT for illustration)

SELECT
Name
FROM
dbo.Customer
WHERE
DECRYPT(BirthDateEncrypted) BETWEEN '1969-2-1' AND '1969-2-28'

Although this will return the correct result (eventually) it is extremely expensive. It requires the entire [Customer] table (or index) to be loaded into memory and every [BirthDayEncrypted] decrypted for comparison. It’s simply not practical for most use cases.

Another approach might be to develop an encryption method that preserves the order. For example, instead of storing a date, one could store the number of days since 1900-1-1 (or even something more obscure). Then, similar to equality searches, we could encrypt the parameters and issue the query in the same way:

SELECT
Name
FROM
dbo.Customer
WHERE
BirthDateEncrypted BETWEEN @EncryptedStartDate AND @EncryptedEndDate

This is called “Security through obscurity” and is generally considered a bad idea. It is not true encryption and the smaller your dataset is, the simpler it is to understand how the encryption works.

So, how can we use inequality searches on truly encrypted data if it’s not ordered?

An [Encrypted] Relational Solution

In many cases, what necessitates encryption is the relationship of data to an identifiable source. For example, the date Feb-2-1969 is meaningless unless you know it is Joe Smith’s birthday. It is the relationship that makes the data sensitive information. If we can keep the sensitive data separated from the identifying information, it is encrypted. Imagine a table structure as follows:

CREATE TABLE dbo.Customer (
ID INT,
Name NVARCHAR(100),
SSNEncrypted VARBINARY(8000)
)
CREATE TABLE dbo.Customer_Birthdate (
BirthDate DATE,
EncryptedID VARBINARY(8000)
)

Customers are contained in the [Customer] table and unencrypted birthdays are contained in the [Customer_Birthdate] table. But, the relationship that joins the birthdays to the customers is encrypted. To retrieve customers born in February 1969 we would issue a query as follows: (Again, I’m using a pseudo function DECRYPT to show the decryption of the column)

SELECT
Name
FROM
dbo.Customer
WHERE 
ID IN (
SELECT
DECRYPT(EncryptedID)
FROM
dbo.Customer_Birthdate
WHERE
BirthDate BETWEEN '1969-2-1' AND '1969-2-28'
)

With an index on the [Customer_BirthDate].[BirthDate] column, the sub-query easily identifies the [relatively small] set of customers with birthdays in February 1969 and decrypts only these relationships.

Rate

4.75 (4)

You rated this post out of 5. Change rating

Share

Share

Rate

4.75 (4)

You rated this post out of 5. Change rating