Obfuscating values (via T-SQL) in several Varchar, Char, and Datetime columns ?

  • Is there any good T-SQL solution that you can recommend for obfuscating a dozen of  Char and Varchar fields in several tables containing sensitive data?  Preferably without using Rand() functions due to restrictions as to where it can be used (cannot be called from within UDFs, for example..)
    We are also thinking about obfuscating a couple of Datetime columns too so that certain processes get safely tested in a testing environment by third-party consultants. Not looking for any free or paid packaged tool but an elegant T-SQL based solution. Cannot find one so far...
           It would also be awesome to be able to replace sensitive values with some meaningful values.  (Like 'Smith, John' to become 'Oppenheimer, Jeffrey', and an address like '13 Karl Marx avenue' to be replaced with some random '950 Blue Ribbon crossing', etc.)

    Thank you,
    and happy holidays!

    Likes to play Chess

  • Maybe column level encryption would work for this.

  • Can a Select statement be run against an encrypted column?

    Likes to play Chess

  • VoldemarG - Wednesday, September 19, 2018 7:38 PM

    Can a Select statement be run against an encrypted column?

    Yes. But the value returned by a straight select is not useful.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Take a look at the article by Phil Factor posted today - 

    Pseudonymizing a Database with Realistic Data for Development Work

  • I found this obfuscation code somewhere a while back and kept it.
    CREATE VIEW dbo.NewIDForFunction AS
    SELECT checksum(NEWID()) idNew
    GO

    CREATE FUNCTION dbo.RandomLetter() RETURNS CHAR(1) BEGIN
    DECLARE @cReturn CHAR
    SELECT @cReturn=CHAR(ABS(CHECKSUM(idNew))%26+65) FROM NewIDForFunction
    RETURN @cReturn
    END
    GO

    CREATE FUNCTION dbo.RandomVowel_NotY() RETURNS CHAR(1) BEGIN
    DECLARE @cReturn CHAR
    SELECT @cReturn=SUBSTRING('AEIOU',ABS(CHECKSUM(idNew))%5+1,1)
    FROM NewIDForFunction
    RETURN @cReturn
    END
    GO

    CREATE FUNCTION dbo.RandomConsonate() RETURNS CHAR(1) BEGIN
    DECLARE @cReturn CHAR
    SELECT @cReturn=SUBSTRING('BCDFGHJKLMNPQRSTVWXZY',ABS(CHECKSUM(idNew))%20+1,1)
    FROM NewIDForFunction
    RETURN @cReturn
    END
    GO

    CREATE FUNCTION dbo.RandomDigit() RETURNS CHAR(1) BEGIN
    DECLARE @cReturn CHAR
    SELECT @cReturn=CAST(ABS(CHECKSUM(idNew))%10 AS CHAR(1)) FROM NewIDForFunction
    RETURN @cReturn
    END
    GO

    CREATE FUNCTION dbo.RandomReplace_Constrained(@sIn VARCHAR(255))
    RETURNS VARCHAR(255) AS
    BEGIN

    DECLARE @cbIn TINYINT
    DECLARE @sResult VARCHAR(50)
    DECLARE @cbProcessed TINYINT
    DECLARE @sNextChar CHAR(1)
    DECLARE @nNextChar TINYINT

    SET @cbProcessed = 1
    SET @sResult = ''
    SELECT @cbIn=LEN(@sIn)

    WHILE @cbProcessed <= @cbIn BEGIN
    SET @sNextChar = SUBSTRING(@sIn,@cbProcessed,1)
    SET @nNextChar = ASCII(@sNextChar)
    SELECT @sResult = @sResult +
    CASE
    WHEN @nNextChar BETWEEN 48 AND 57 THEN dbo.RandomDigit()
    WHEN @nNextChar IN (65,69,73,79,85) THEN dbo.RandomVowel_NotY()
    WHEN @nNextChar IN (89,121) THEN @sNextChar
    WHEN @nNextChar BETWEEN 65 AND 90 THEN dbo.RandomConsonate()
    WHEN @nNextChar IN (97,101,105,111,117) THEN LOWER(dbo.RandomVowel_NotY())
    WHEN @nNextChar BETWEEN 97 AND 122 THEN LOWER(dbo.RandomConsonate())
    WHEN @nNextChar IN (40,41,32,46) THEN LOWER(dbo.RandomConsonate())
    ELSE @sNextChar
    END
    SET @cbProcessed = @cbProcessed + 1
    END

    RETURN @sResult

    END
    GO

    SELECT [dbo].[RandomReplace_Constrained](100012345) AS ObfuscationInt
        , [dbo].[RandomReplace_Constrained]('Strange Text Result') AS ObfuscationVarchar
    GO

    The results: (will be different for each run):
    ObfuscationInt    ObfuscationVarchar
    571458568    ClhonlerFomzlMafojj

    Unfortunately I can't credit anyone with it, but it is handy for inline SELECT obfuscation, just don't expect to be able to unobfuscate the data...

    Michael Gilchrist
    Database Specialist
    There are 10 types of people in the world, those who understand binary and those that don't. 😀

  • Michael G - Friday, September 21, 2018 5:42 AM

    I found this obfuscation code somewhere a while back and kept it.
    CREATE VIEW dbo.NewIDForFunction AS
    SELECT checksum(NEWID()) idNew
    GO

    CREATE FUNCTION dbo.RandomLetter() RETURNS CHAR(1) BEGIN
    DECLARE @cReturn CHAR
    SELECT @cReturn=CHAR(ABS(CHECKSUM(idNew))%26+65) FROM NewIDForFunction
    RETURN @cReturn
    END
    GO

    CREATE FUNCTION dbo.RandomVowel_NotY() RETURNS CHAR(1) BEGIN
    DECLARE @cReturn CHAR
    SELECT @cReturn=SUBSTRING('AEIOU',ABS(CHECKSUM(idNew))%5+1,1)
    FROM NewIDForFunction
    RETURN @cReturn
    END
    GO

    CREATE FUNCTION dbo.RandomConsonate() RETURNS CHAR(1) BEGIN
    DECLARE @cReturn CHAR
    SELECT @cReturn=SUBSTRING('BCDFGHJKLMNPQRSTVWXZY',ABS(CHECKSUM(idNew))%20+1,1)
    FROM NewIDForFunction
    RETURN @cReturn
    END
    GO

    CREATE FUNCTION dbo.RandomDigit() RETURNS CHAR(1) BEGIN
    DECLARE @cReturn CHAR
    SELECT @cReturn=CAST(ABS(CHECKSUM(idNew))%10 AS CHAR(1)) FROM NewIDForFunction
    RETURN @cReturn
    END
    GO

    CREATE FUNCTION dbo.RandomReplace_Constrained(@sIn VARCHAR(255))
    RETURNS VARCHAR(255) AS
    BEGIN

    DECLARE @cbIn TINYINT
    DECLARE @sResult VARCHAR(50)
    DECLARE @cbProcessed TINYINT
    DECLARE @sNextChar CHAR(1)
    DECLARE @nNextChar TINYINT

    SET @cbProcessed = 1
    SET @sResult = ''
    SELECT @cbIn=LEN(@sIn)

    WHILE @cbProcessed <= @cbIn BEGIN
    SET @sNextChar = SUBSTRING(@sIn,@cbProcessed,1)
    SET @nNextChar = ASCII(@sNextChar)
    SELECT @sResult = @sResult +
    CASE
    WHEN @nNextChar BETWEEN 48 AND 57 THEN dbo.RandomDigit()
    WHEN @nNextChar IN (65,69,73,79,85) THEN dbo.RandomVowel_NotY()
    WHEN @nNextChar IN (89,121) THEN @sNextChar
    WHEN @nNextChar BETWEEN 65 AND 90 THEN dbo.RandomConsonate()
    WHEN @nNextChar IN (97,101,105,111,117) THEN LOWER(dbo.RandomVowel_NotY())
    WHEN @nNextChar BETWEEN 97 AND 122 THEN LOWER(dbo.RandomConsonate())
    WHEN @nNextChar IN (40,41,32,46) THEN LOWER(dbo.RandomConsonate())
    ELSE @sNextChar
    END
    SET @cbProcessed = @cbProcessed + 1
    END

    RETURN @sResult

    END
    GO

    SELECT [dbo].[RandomReplace_Constrained](100012345) AS ObfuscationInt
        , [dbo].[RandomReplace_Constrained]('Strange Text Result') AS ObfuscationVarchar
    GO

    The results: (will be different for each run):
    ObfuscationInt    ObfuscationVarchar
    571458568    ClhonlerFomzlMafojj

    Unfortunately I can't credit anyone with it, but it is handy for inline SELECT obfuscation, just don't expect to be able to unobfuscate the data...

    Nothing wrong with the code, although I might make some different choices in the final routine..   And I'd also change dbo.Consonate to dbo.Consonant because I can't stand the idea of that kind of thing being spelled incorrectly.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • sgmunson - Friday, September 21, 2018 6:26 AM

    Michael G - Friday, September 21, 2018 5:42 AM

    I found this obfuscation code somewhere a while back and kept it.
    CREATE VIEW dbo.NewIDForFunction AS
    SELECT checksum(NEWID()) idNew
    GO

    CREATE FUNCTION dbo.RandomLetter() RETURNS CHAR(1) BEGIN
    DECLARE @cReturn CHAR
    SELECT @cReturn=CHAR(ABS(CHECKSUM(idNew))%26+65) FROM NewIDForFunction
    RETURN @cReturn
    END
    GO

    CREATE FUNCTION dbo.RandomVowel_NotY() RETURNS CHAR(1) BEGIN
    DECLARE @cReturn CHAR
    SELECT @cReturn=SUBSTRING('AEIOU',ABS(CHECKSUM(idNew))%5+1,1)
    FROM NewIDForFunction
    RETURN @cReturn
    END
    GO

    CREATE FUNCTION dbo.RandomConsonate() RETURNS CHAR(1) BEGIN
    DECLARE @cReturn CHAR
    SELECT @cReturn=SUBSTRING('BCDFGHJKLMNPQRSTVWXZY',ABS(CHECKSUM(idNew))%20+1,1)
    FROM NewIDForFunction
    RETURN @cReturn
    END
    GO

    CREATE FUNCTION dbo.RandomDigit() RETURNS CHAR(1) BEGIN
    DECLARE @cReturn CHAR
    SELECT @cReturn=CAST(ABS(CHECKSUM(idNew))%10 AS CHAR(1)) FROM NewIDForFunction
    RETURN @cReturn
    END
    GO

    CREATE FUNCTION dbo.RandomReplace_Constrained(@sIn VARCHAR(255))
    RETURNS VARCHAR(255) AS
    BEGIN

    DECLARE @cbIn TINYINT
    DECLARE @sResult VARCHAR(50)
    DECLARE @cbProcessed TINYINT
    DECLARE @sNextChar CHAR(1)
    DECLARE @nNextChar TINYINT

    SET @cbProcessed = 1
    SET @sResult = ''
    SELECT @cbIn=LEN(@sIn)

    WHILE @cbProcessed <= @cbIn BEGIN
    SET @sNextChar = SUBSTRING(@sIn,@cbProcessed,1)
    SET @nNextChar = ASCII(@sNextChar)
    SELECT @sResult = @sResult +
    CASE
    WHEN @nNextChar BETWEEN 48 AND 57 THEN dbo.RandomDigit()
    WHEN @nNextChar IN (65,69,73,79,85) THEN dbo.RandomVowel_NotY()
    WHEN @nNextChar IN (89,121) THEN @sNextChar
    WHEN @nNextChar BETWEEN 65 AND 90 THEN dbo.RandomConsonate()
    WHEN @nNextChar IN (97,101,105,111,117) THEN LOWER(dbo.RandomVowel_NotY())
    WHEN @nNextChar BETWEEN 97 AND 122 THEN LOWER(dbo.RandomConsonate())
    WHEN @nNextChar IN (40,41,32,46) THEN LOWER(dbo.RandomConsonate())
    ELSE @sNextChar
    END
    SET @cbProcessed = @cbProcessed + 1
    END

    RETURN @sResult

    END
    GO

    SELECT [dbo].[RandomReplace_Constrained](100012345) AS ObfuscationInt
        , [dbo].[RandomReplace_Constrained]('Strange Text Result') AS ObfuscationVarchar
    GO

    The results: (will be different for each run):
    ObfuscationInt    ObfuscationVarchar
    571458568    ClhonlerFomzlMafojj

    Unfortunately I can't credit anyone with it, but it is handy for inline SELECT obfuscation, just don't expect to be able to unobfuscate the data...

    Nothing wrong with the code, although I might make some different choices in the final routine..   And I'd also change dbo.Consonate to dbo.Consonant because I can't stand the idea of that kind of thing being spelled incorrectly.

    Nice catch, I didn't even notice the spelling error, which is weird because I tend to notice grammar and spelling issues.

    Michael Gilchrist
    Database Specialist
    There are 10 types of people in the world, those who understand binary and those that don't. 😀

  • VoldemarG - Wednesday, September 19, 2018 1:26 PM

    Is there any good T-SQL solution that you can recommend for obfuscating a dozen of  Char and Varchar fields in several tables containing sensitive data?  Preferably without using Rand() functions due to restrictions as to where it can be used (cannot be called from within UDFs, for example..)
    We are also thinking about obfuscating a couple of Datetime columns too so that certain processes get safely tested in a testing environment by third-party consultants. Not looking for any free or paid packaged tool but an elegant T-SQL based solution. Cannot find one so far...
           It would also be awesome to be able to replace sensitive values with some meaningful values.  (Like 'Smith, John' to become 'Oppenheimer, Jeffrey', and an address like '13 Karl Marx avenue' to be replaced with some random '950 Blue Ribbon crossing', etc.)

    Thank you,
    and happy holidays!

    Not quite right.  RAND can actually be used in a function by creating a view that uses RAND and the function selects from the view instead.  Same with NEWID().

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Keith Oliver - Thursday, September 20, 2018 7:48 AM

    Take a look at the article by Phil Factor posted today - 

    Pseudonymizing a Database with Realistic Data for Development Work

    .looks so complex...

    Likes to play Chess

  • This was removed by the editor as SPAM

  • sgmunson - Friday, September 21, 2018 6:26 AM

    Michael G - Friday, September 21, 2018 5:42 AM

    I found this obfuscation code somewhere a while back and kept it.
    CREATE VIEW dbo.NewIDForFunction AS
    SELECT checksum(NEWID()) idNew
    GO

    CREATE FUNCTION dbo.RandomLetter() RETURNS CHAR(1) BEGIN
    DECLARE @cReturn CHAR
    SELECT @cReturn=CHAR(ABS(CHECKSUM(idNew))%26+65) FROM NewIDForFunction
    RETURN @cReturn
    END
    GO

    CREATE FUNCTION dbo.RandomVowel_NotY() RETURNS CHAR(1) BEGIN
    DECLARE @cReturn CHAR
    SELECT @cReturn=SUBSTRING('AEIOU',ABS(CHECKSUM(idNew))%5+1,1)
    FROM NewIDForFunction
    RETURN @cReturn
    END
    GO

    CREATE FUNCTION dbo.RandomConsonate() RETURNS CHAR(1) BEGIN
    DECLARE @cReturn CHAR
    SELECT @cReturn=SUBSTRING('BCDFGHJKLMNPQRSTVWXZY',ABS(CHECKSUM(idNew))%20+1,1)
    FROM NewIDForFunction
    RETURN @cReturn
    END
    GO

    CREATE FUNCTION dbo.RandomDigit() RETURNS CHAR(1) BEGIN
    DECLARE @cReturn CHAR
    SELECT @cReturn=CAST(ABS(CHECKSUM(idNew))%10 AS CHAR(1)) FROM NewIDForFunction
    RETURN @cReturn
    END
    GO

    CREATE FUNCTION dbo.RandomReplace_Constrained(@sIn VARCHAR(255))
    RETURNS VARCHAR(255) AS
    BEGIN

    DECLARE @cbIn TINYINT
    DECLARE @sResult VARCHAR(50)
    DECLARE @cbProcessed TINYINT
    DECLARE @sNextChar CHAR(1)
    DECLARE @nNextChar TINYINT

    SET @cbProcessed = 1
    SET @sResult = ''
    SELECT @cbIn=LEN(@sIn)

    WHILE @cbProcessed <= @cbIn BEGIN
    SET @sNextChar = SUBSTRING(@sIn,@cbProcessed,1)
    SET @nNextChar = ASCII(@sNextChar)
    SELECT @sResult = @sResult +
    CASE
    WHEN @nNextChar BETWEEN 48 AND 57 THEN dbo.RandomDigit()
    WHEN @nNextChar IN (65,69,73,79,85) THEN dbo.RandomVowel_NotY()
    WHEN @nNextChar IN (89,121) THEN @sNextChar
    WHEN @nNextChar BETWEEN 65 AND 90 THEN dbo.RandomConsonate()
    WHEN @nNextChar IN (97,101,105,111,117) THEN LOWER(dbo.RandomVowel_NotY())
    WHEN @nNextChar BETWEEN 97 AND 122 THEN LOWER(dbo.RandomConsonate())
    WHEN @nNextChar IN (40,41,32,46) THEN LOWER(dbo.RandomConsonate())
    ELSE @sNextChar
    END
    SET @cbProcessed = @cbProcessed + 1
    END

    RETURN @sResult

    END
    GO

    SELECT [dbo].[RandomReplace_Constrained](100012345) AS ObfuscationInt
        , [dbo].[RandomReplace_Constrained]('Strange Text Result') AS ObfuscationVarchar
    GO

    The results: (will be different for each run):
    ObfuscationInt    ObfuscationVarchar
    571458568    ClhonlerFomzlMafojj

    Unfortunately I can't credit anyone with it, but it is handy for inline SELECT obfuscation, just don't expect to be able to unobfuscate the data...

    Nothing wrong with the code, although I might make some different choices in the final routine..   And I'd also change dbo.Consonate to dbo.Consonant because I can't stand the idea of that kind of thing being spelled incorrectly.

    If we do it this way, we need to loop through each character in the field within the Update Statement, right?

    Likes to play Chess

  • VoldemarG - Wednesday, September 19, 2018 1:26 PM

    Is there any good T-SQL solution that you can recommend for obfuscating a dozen of  Char and Varchar fields in several tables containing sensitive data?  Preferably without using Rand() functions due to restrictions as to where it can be used (cannot be called from within UDFs, for example..)
    We are also thinking about obfuscating a couple of Datetime columns too so that certain processes get safely tested in a testing environment by third-party consultants. Not looking for any free or paid packaged tool but an elegant T-SQL based solution. Cannot find one so far...
           It would also be awesome to be able to replace sensitive values with some meaningful values.  (Like 'Smith, John' to become 'Oppenheimer, Jeffrey', and an address like '13 Karl Marx avenue' to be replaced with some random '950 Blue Ribbon crossing', etc.)

    Thank you,
    and happy holidays!

    You could either:
    SELECT [RandomReplace_Constrained]('13 Karl Marx avenue') as NewColumn
    or
    UPDATE Table
    SET Old_Value = [RandomReplace_Constrained](Old_Value )

    I would rather store the new value in a new table - ideally you should go a step further and retain a lookup between the old value and the new value - then only supply the new value.

    Michael Gilchrist
    Database Specialist
    There are 10 types of people in the world, those who understand binary and those that don't. 😀

  • Nice! Thank you. I appreciate everybody's input on this.

    Likes to play Chess

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply