how to one-way encode sensitive data for display?

  • I'm working on a project where the source data (out of my control) uses social security numbers to uniquely identify people. I don't even want to import SSN's into my database (in my control), let alone include them on reports (as is currently happening). But I need some way to be able to correlate an SSN in the source data with a [whatever value I use] in my own data. 

    So, I could maybe do something like hash the SSN's during the ETL step and store the encoded values in my database. But then I wind up with a huge 64-character or 128-character string. That's fine behind the scenes, but I also need some reasonable way to display that value on reports. It would be nice if I could have some kind of YouTube-like format, 10 digits or so of human-readable characters. 

    Any thoughts on how to do a one-way hash of a SSN (fixed 9 digit length) that results in a short-ish value without totally compromising security?

    Or is the only option to store the full 64/128-character hashed value and then map it to some other YouTube-like unique identifier that I'd have to create for each record using perhaps a homemade function or something? I'd prefer not to use an auto-incrementing number, because I know that common wisdom is not to display those to end-users ("why do I see #100 and #102 but not #101???"). SQL Server doesn't seem to have anything other than GUID (also too long) built in, so my function would have to create a random string of letters and numbers and make sure it has not already been used in the table. Seems annoying and kinda hoopty. 🙁


    "If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

  • You could use a SEQUENCE, or if that is not available, insert the SSN into table with an identity and use the identity value as a link back to the SSN.

    Typically on reports you'd show AT MOST the last 4 digits of SSN, and even that's not a good idea.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Yes, my concern with an incrementing, numeric, identity-type column is that I know it's generally discouraged to make those client-facing. And what I specifically need here is a reasonable-size value that will be client-facing. 

    It would be great if there was a way to repeatably one-way convert a SSN into a 10-digit alphanumeric value. Any ideas along those lines?


    "If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

  • autoexcrement - Friday, August 3, 2018 1:41 PM

    Yes, my concern with an incrementing, numeric, identity-type column is that I know it's generally discouraged to make those client-facing. And what I specifically need here is a reasonable-size value that will be client-facing. 

    It would be great if there was a way to repeatably one-way convert a SSN into a 10-digit alphanumeric value. Any ideas along those lines? 

    It'd need to be unique for each ssn,  which would allow it to be breakable. Don't see why the client can't see an identity in this specific case -- general rules are just that, general, known to deserve breaking at times.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • I definitely appreciate your feedback. Maybe I am just overthinking it. Thanks!


    "If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

  • autoexcrement - Friday, August 3, 2018 2:08 PM

    I definitely appreciate your feedback. Maybe I am just overthinking it. Thanks!

    You need to NOT try to build your own decoder ring on this one.  Even using something like hashbytes with the SHA2_512 algorithm won't protect you if someone gets ahold of the data because there are only 10 billion possible SSNs and even just one CPU server can hack all those in just a day or two using rainbow tables, etc.  Someone with a rack of GPUs can do it in sub-second times.  You need to use properly salted encryption with a master encryption key and the whole 9 yards for this stuff and then, yes, you can create a surrogate integer value for the encrypted data.

    And make damned sure you have the key(s) safely protected in a physical safe somewhere because if you lose the key, your data is toast and so are you.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • p.s. And, as a fellow data professional, thank you VERY much for trying to protect people.  Now all you have to do is turn in the side-stepping beach creatures that are not only transmitting SSNs in unsecured clear text but you'll also need to make sure that no one has stored the files you receive or backed them up in such a fashion that the "data at rest" isn't usable by someone trying to make you read about yourself and the company you work for in the morning news.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks as always for weighing in Jeff. I've read up on salted hashed values before, so conceptually it makes perfect sense to me. But I've never actually implemented anything like that before, so I'm still kinda fuzzy on the wheres and hows of storing the "salt" or "master encryption key". Do you have a favorite methodology for that? Or a link to any articles that outline it in a way that you agree with?


    "If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

  • An option would be to use an encrypted column for the SSN  and if you need 10 digits, then use a seeded bigint identity column as a key.
    😎

  • autoexcrement - Friday, August 3, 2018 8:23 PM

    Thanks as always for weighing in Jeff. I've read up on salted hashed values before, so conceptually it makes perfect sense to me. But I've never actually implemented anything like that before, so I'm still kinda fuzzy on the wheres and hows of storing the "salt" or "master encryption key". Do you have a favorite methodology for that? Or a link to any articles that outline it in a way that you agree with?

    Ah... my apologies for not including a link. :blush:  Eirikur provided good cover for the very same link I would have pointed you to.  Thanks, Eirikur.:exclamationmark:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Saturday, August 4, 2018 9:36 AM

    autoexcrement - Friday, August 3, 2018 8:23 PM

    Thanks as always for weighing in Jeff. I've read up on salted hashed values before, so conceptually it makes perfect sense to me. But I've never actually implemented anything like that before, so I'm still kinda fuzzy on the wheres and hows of storing the "salt" or "master encryption key". Do you have a favorite methodology for that? Or a link to any articles that outline it in a way that you agree with?

    Ah... my apologies for not including a link. :blush:  Eirikur provided good cover for the very same link I would have pointed you to.  Thanks, Eirikur.:exclamationmark:

    You are welcome Jeff, and you would have done the same 😉
    😎

  • For storing of SSN, you might consider adding a check digit to the value, so that it does become 10 digits (or 11 or whatever).  That way they are trying to hack into the wrong number of digits, which should slow down the process a bit at least.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher - Monday, August 6, 2018 7:43 AM

    For storing of SSN, you might consider adding a check digit to the value, so that it does become 10 digits (or 11 or whatever).  That way they are trying to hack into the wrong number of digits, which should slow down the process a bit at least.

    If it's stored that way in plain text, a hacker will eat that up in less than a second.  The only right way to protect such things is with properly salted encryption.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Monday, August 6, 2018 8:00 AM

    ScottPletcher - Monday, August 6, 2018 7:43 AM

    For storing of SSN, you might consider adding a check digit to the value, so that it does become 10 digits (or 11 or whatever).  That way they are trying to hack into the wrong number of digits, which should slow down the process a bit at least.

    If it's stored that way in plain text, a hacker will eat that up in less than a second.  The only right way to protect such things is with properly salted encryption.

    Not talking about plain text, but encrypting a value with extra digit(s) added.  Could even add random digits in fixed locations, just to make it harder for someone to know how many digits they were trying to hack.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • I still just don't fully understand how salts and the like are supposedly so helpful, because it seems like they have to be stored somewhere. Either in code or in an environment variable or somewhere. Why is there an assumption that that's safe when a database isn't?


    "If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply