Need to convert lowercase characters to 'a' and uppercase to 'A'

  • Hi,
    I have a requirement to anonymise the data in table by replacing all the lowercase characters to 'a' and uppercase to 'A'. I am not sure if this can be achieved given the sql server case insensitivity?
    Any thoughts on this?

  • srikanthrv - Friday, January 19, 2018 6:24 AM

    Hi,
    I have a requirement to anonymise the data in table by replacing all the lowercase characters to 'a' and uppercase to 'A'. I am not sure if this can be achieved given the sql server case insensitivity?
    Any thoughts on this?

    This is easy, have a look at these scripts
    😎
    Ping back if you have any problem converting those to produce what you need.

  • This should get you started
    😎

    USE TEEST;
    GO
    SET NOCOUNT ON;

    DECLARE @pString VARCHAR(8000) = 'This Is My Camel Case Stuff With 12345678 Numbers In It, innit?';
    ;WITH
    E1(N) AS
    (
    SELECT N
    FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0))X(N)
    ),
    iTally(N) AS
    (
    SELECT TOP (LEN(ISNULL(@pString,CHAR(32))))
      (CHECKSUM(ROW_NUMBER() OVER (ORDER BY (SELECT NULL))))
    FROM E1 a CROSS JOIN E1 b CROSS JOIN E1 c CROSS JOIN E1 d
    )
    SELECT AlphaAaNumericOnly =
    (
    SELECT
      CASE
       WHEN ASCII(SUBSTRING(@pString,N,1)) = 32 THEN CHAR(32)
       WHEN ((ASCII(SUBSTRING(@pString,N,1)) - 48) & 0x7FFF) < 10 THEN SUBSTRING(@pString,N,1)
       WHEN ((ASCII(SUBSTRING(@pString,N,1)) - 65) & 0x7FFF) < 26 THEN 'A'
       WHEN ((ASCII(SUBSTRING(@pString,N,1)) - 97) & 0x7FFF) < 26 THEN 'a'
       ELSE ''
      END
    FROM iTally
    WHERE
       ASCII(SUBSTRING(@pString,N,1)) = 32
    OR ((ASCII(SUBSTRING(@pString,N,1)) - 48) & 0x7FFF) < 10
    OR ((ASCII(SUBSTRING(@pString,N,1)) - 65) & 0x7FFF) < 26
    OR ((ASCII(SUBSTRING(@pString,N,1)) - 97) & 0x7FFF) < 26
    FOR XML PATH(''), TYPE
    ).value('(./text())[1]','VARCHAR(8000)');

    Output
    Aaaa Aa Aa Aaaaa Aaaa Aaaaa Aaaa 12345678 Aaaaaaa Aa Aa aaaaa

  • srikanthrv - Friday, January 19, 2018 6:24 AM

    Hi,
    I have a requirement to anonymise the data in table by replacing all the lowercase characters to 'a' and uppercase to 'A'. I am not sure if this can be achieved given the sql server case insensitivity?
    Any thoughts on this?

    I do hope that you're actually kidding or that this is just a simple example of what the larger and more complex task will be because changing lower case to upper is NOT a form of anonymising anything.   I'll also state that simple character replacement isn't a valid form of anonymising anything either because there's software out there that can figure out your magic decoder ring in seconds.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Friday, January 19, 2018 9:59 AM

    srikanthrv - Friday, January 19, 2018 6:24 AM

    Hi,
    I have a requirement to anonymise the data in table by replacing all the lowercase characters to 'a' and uppercase to 'A'. I am not sure if this can be achieved given the sql server case insensitivity?
    Any thoughts on this?

    I do hope that you're actually kidding or that this is just a simple example of what the larger and more complex task will be because changing lower case to upper is NOT a form of anonymising anything.   I'll also state that simple character replacement isn't a valid form of anonymising anything either because there's software out there that can figure out your magic decoder ring in seconds.

    +100

    ...

  • Jeff Moden - Friday, January 19, 2018 9:59 AM

    srikanthrv - Friday, January 19, 2018 6:24 AM

    Hi,
    I have a requirement to anonymise the data in table by replacing all the lowercase characters to 'a' and uppercase to 'A'. I am not sure if this can be achieved given the sql server case insensitivity?
    Any thoughts on this?

    I do hope that you're actually kidding or that this is just a simple example of what the larger and more complex task will be because changing lower case to upper is NOT a form of anonymising anything.   I'll also state that simple character replacement isn't a valid form of anonymising anything either because there's software out there that can figure out your magic decoder ring in seconds.

    He's not swapping lower and uppercase, he's mapping EVERY letter to 'a' while preserving case.  Also, this is essentially a HASH, not a simple character replacement, so I don't think it is going to be as easy to crack as you suggest.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen - Monday, January 22, 2018 7:38 AM

    Jeff Moden - Friday, January 19, 2018 9:59 AM

    srikanthrv - Friday, January 19, 2018 6:24 AM

    Hi,
    I have a requirement to anonymise the data in table by replacing all the lowercase characters to 'a' and uppercase to 'A'. I am not sure if this can be achieved given the sql server case insensitivity?
    Any thoughts on this?

    I do hope that you're actually kidding or that this is just a simple example of what the larger and more complex task will be because changing lower case to upper is NOT a form of anonymising anything.   I'll also state that simple character replacement isn't a valid form of anonymising anything either because there's software out there that can figure out your magic decoder ring in seconds.

    He's not swapping lower and uppercase, he's mapping EVERY letter to 'a' while preserving case.  Also, this is essentially a HASH, not a simple character replacement, so I don't think it is going to be as easy to crack as you suggest.

    Drew

    Yeah... I misread the hell out of that one.  Thanks, Drew.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Eirikur Eiriksson - Friday, January 19, 2018 8:04 AM

    This should get you started
    😎

    USE TEEST;
    GO
    SET NOCOUNT ON;

    DECLARE @pString VARCHAR(8000) = 'This Is My Camel Case Stuff With 12345678 Numbers In It, innit?';
    ;WITH
    E1(N) AS
    (
    SELECT N
    FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0))X(N)
    ),
    iTally(N) AS
    (
    SELECT TOP (LEN(ISNULL(@pString,CHAR(32))))
      (CHECKSUM(ROW_NUMBER() OVER (ORDER BY (SELECT NULL))))
    FROM E1 a CROSS JOIN E1 b CROSS JOIN E1 c CROSS JOIN E1 d
    )
    SELECT AlphaAaNumericOnly =
    (
    SELECT
      CASE
       WHEN ASCII(SUBSTRING(@pString,N,1)) = 32 THEN CHAR(32)
       WHEN ((ASCII(SUBSTRING(@pString,N,1)) - 48) & 0x7FFF) < 10 THEN SUBSTRING(@pString,N,1)
       WHEN ((ASCII(SUBSTRING(@pString,N,1)) - 65) & 0x7FFF) < 26 THEN 'A'
       WHEN ((ASCII(SUBSTRING(@pString,N,1)) - 97) & 0x7FFF) < 26 THEN 'a'
       ELSE ''
      END
    FROM iTally
    WHERE
       ASCII(SUBSTRING(@pString,N,1)) = 32
    OR ((ASCII(SUBSTRING(@pString,N,1)) - 48) & 0x7FFF) < 10
    OR ((ASCII(SUBSTRING(@pString,N,1)) - 65) & 0x7FFF) < 26
    OR ((ASCII(SUBSTRING(@pString,N,1)) - 97) & 0x7FFF) < 26
    FOR XML PATH(''), TYPE
    ).value('(./text())[1]','VARCHAR(8000)');

    Output
    Aaaa Aa Aa Aaaaa Aaaa Aaaaa Aaaa 12345678 Aaaaaaa Aa Aa aaaaa

    Worked like a charm! Thanks!

  • srikanthrv - Friday, January 19, 2018 6:24 AM

    Hi,
    I have a requirement to anonymise the data in table by replacing all the lowercase characters to 'a' and uppercase to 'A'. I am not sure if this can be achieved given the sql server case insensitivity?
    Any thoughts on this?

    One other thought - you might want to consider transforming digits as well, particularly if you're transforming free-text fields that might contain phone numbers, birthdates, SSNs etc.

    Thomas Rushton
    blog: https://thelonedba.wordpress.com

  • ThomasRushton - Monday, January 22, 2018 8:28 AM

    srikanthrv - Friday, January 19, 2018 6:24 AM

    Hi,
    I have a requirement to anonymise the data in table by replacing all the lowercase characters to 'a' and uppercase to 'A'. I am not sure if this can be achieved given the sql server case insensitivity?
    Any thoughts on this?

    One other thought - you might want to consider transforming digits as well, particularly if you're transforming free-text fields that might contain phone numbers, birthdates, SSNs etc.

    Yes..all the numbers are being replaced by 1s.

  • srikanthrv - Monday, January 22, 2018 8:38 AM

    ThomasRushton - Monday, January 22, 2018 8:28 AM

    srikanthrv - Friday, January 19, 2018 6:24 AM

    Hi,
    I have a requirement to anonymise the data in table by replacing all the lowercase characters to 'a' and uppercase to 'A'. I am not sure if this can be achieved given the sql server case insensitivity?
    Any thoughts on this?

    One other thought - you might want to consider transforming digits as well, particularly if you're transforming free-text fields that might contain phone numbers, birthdates, SSNs etc.

    Yes..all the numbers are being replaced by 1s.

    There could be a serious problem with that.... If any of those columns are a PK, it won't work.  Also (and I don't know that you have), if you have SSNs in clear text, you'll be in a world of hurt if you ever get audited or you ever suffer a breach.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Stupid question maybe, but why bother? If you're converting it to junk data, then why not just convert to junk data, and stuff "lorem ipsum" into it or similar? Why run the risk of someone backwards-engineering as Jeff mentioned?

    -------------------------------------------------------------------------------------------------------------------------------------
    Please follow Best Practices For Posting On Forums to receive quicker and higher quality responses

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply