Storing encypted SSN's

  • Hi all,

    First of all, I am against the use of SSN's as UID; but, such is the way the database has been constructed. This data will be imported frequently; daily truncate and re-import of latest exported data file.

    I'm reading through < THIS > article to see if this is something I can use to store SSN's in my DB. The DB will store this data and will need to be referenced frequently by the SSN as the UID. I think I'd like to have this column encypted and my web application will need to do a lookup based on this value. Is this something where I can use a Deterministic Encryption? How will I do a look-up from the web application? Will it need the encrypted value to use for the look-up instead?

    Alternatively, I've thought of just encrypting the SSN values (MD5), storing these MD5 hashes and doing the lookup by the hash instead.

  • I also came across this article: http://dotnetslackers.com/articles/sql/IntroductionToSQLServerEncryptionAndSymmetricKeyEncryptionTutorial.aspx .

    Does this mean that I should use the encrypted key value as a lookup in the table? I will need something for my web application to use to query for data. And, as I explained, the SSN is the key used to fetch from the db.

  • ram302 (4/25/2016)


    Alternatively, I've thought of just encrypting the SSN values (MD5), storing these MD5 hashes and doing the lookup by the hash instead.

    MD5 is so weak that a google search is enough to break the hash. Using MD5 is a waste of time.

    Always Encrypted (which is a new SQL 2016 feature) may be useful, though you MUST read through it in detail and test carefully. Other forms of encryption (EncryptByKey, EncryptByCert) are non-deterministic, and hence can't be used for joins or searches.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • That first article is for SQL 2016. In that version you will have options like these to explore. Older (i.e. current and before) versions don't support encryption like this out of the box and the app must support it as well. What version of SQL are you intending to use?

  • GilaMonster, so this would call for the use of Deterministic encryption, but also assuming as Jeff.Mason points out, that I'll be using SQL Server 2016.

    Jeff.Mason, I think I'll be using SQL Server 2012 for this project. Should I ask for a 2016 install?

    Is 2016 good for production?

  • SQL 2016 has not yet been released, and at this point you probably are too late to be on any pre-release betas with Microsoft. You would need to wait until RTM, which last I had heard was around June-ish? And at that point it would be up to you whether RTM was too scary to do or not.

    That said, what you want probably isn't going to be easily or cheaply available in earlier versions, and not without either third party solutions or lots of custom code. If it were me, and that were a solid requirement, I would wait a few months for SQL 2016. You can use the CTPs to get familiar with it and start coding now so that your solution will be ready when 2016 is released in a month or two.

  • jeff.mason (4/26/2016)


    You would need to wait until RTM, which last I had heard was around June-ish?

    Probably sooner. RC3, which was released on the 15th was described (on the SQL blog) as the last public release candidate, so we're close. I'm betting sometime in May.

    The RCs are stable enough to download, install in a test environment and start evaluating.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I have heard both May and June and was being conservative. This article also indicates that RTM worries of the past may not be needed, since Azure has been running SQL 2016 code for quite some time and therefore the code should be solid:

    http://sqlmag.com/sql-server-2016/sneak-preview-sql-server-2016-and-cortana-intelligence

    But at this point, if you HAVE a choice to wait and you want encryption, SQL 2016 is going to be great. I have not been this excited about a new release of SQL since 2005 fixed all of the gaps we felt in 2000.

  • Yeah, I'm sold on 2016 now. I found the RC3 download: https://www.microsoft.com/en-us/evalcenter/evaluate-sql-server-2016. So I'll start testing this.

    Thanks guys. As always, very helpful 🙂

  • ram302 (4/25/2016)


    Hi all,

    First of all, I am against the use of SSN's as UID; but, such is the way the database has been constructed. This data will be imported frequently; daily truncate and re-import of latest exported data file.

    I'm reading through < THIS > article to see if this is something I can use to store SSN's in my DB. The DB will store this data and will need to be referenced frequently by the SSN as the UID. I think I'd like to have this column encypted and my web application will need to do a lookup based on this value. Is this something where I can use a Deterministic Encryption? How will I do a look-up from the web application? Will it need the encrypted value to use for the look-up instead?

    Alternatively, I've thought of just encrypting the SSN values (MD5), storing these MD5 hashes and doing the lookup by the hash instead.

    I strongly agree with Gail... MD5 is weak.

    The big question here is, will you ever need to decrypt the SSN?

    I'd also get after whomever is sending you SSNs in a file in clear text. Although the Social Security Administration doesn't require encryption of SSN data according to their website, it's a clear violation of all that is holy for PII and I'd hate to hear about you good folks in the news.

    As a bit of a sidebar, my original Social Security card clearly states that it shall not be used for identification. I know a whole lot has changed since that particular epoch but it shouldn't have.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Always Encrypted works well, but you need a good mechanism for distributing certificates to each client. You can use a GPO or AD scripting, but be aware, if you have a cert issue, you need to distribute new certs to all clients.

    Also, Always Encrypted is for cases where you trust the clients, but not the server. It still works if you trust the server.

    As Gail mentioned, you don't get deterministic encryption any other way. If you have control of your clients' code, you could use a temp symmetric key to protect the data, but that doesn't help you with searching.

    If you trust the server, and want to play with the server side, symmetric/asymmetric encryption, you can do some partial bucket hashing to make queries work better, but still have encryption. I should write something up, but essentially you use a second field that contains a bucket you've calculated from your hash. You use that to limit the number of rows that need to be decrypted, which can speed up searches.

    However, I'd really minimize the searches if possible. If you're doing singleton lookups for users, this probably works well.

  • I will never need to decrypt the SSN, but I will need to add/remove ones as personnel are hired/terminated. Also, the table itself may contain multiple entries for one particular SSN.

    And yes, as soon as this came to my attention, I strongly urged the client to reconsider. He added that they're revamping the system, but someone else will decide whether they want us to wait or to proceed with the current setup and pay for redevelopment later. I advised our management to consider a clause in the contract to indemnify us of any potential mishaps as we have been vocal about our reluctance to use such sensitive data.

  • Steve,

    What I'll be doing is importing data to a newly built database. The searches are absolutely necessary as the web application will be querying for data using the key. I'll be using PHP CodeIgniter, which I've tested with sql server and works ok; this to add security layer with the queries (only select) being executed.

  • ram302 (4/26/2016)


    I will never need to decrypt the SSN, but I will need to add/remove ones as personnel are hired/terminated.

    In that case, don't look at encryption. Just hash the SSN with a strong hashing algorithm and only ever work with the hashed values. Far easier than messing with Always Encrypted.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • ram302 (4/26/2016)


    I will never need to decrypt the SSN, but I will need to add/remove ones as personnel are hired/terminated. Also, the table itself may contain multiple entries for one particular SSN.

    And yes, as soon as this came to my attention, I strongly urged the client to reconsider. He added that they're revamping the system, but someone else will decide whether they want us to wait or to proceed with the current setup and pay for redevelopment later. I advised our management to consider a clause in the contract to indemnify us of any potential mishaps as we have been vocal about our reluctance to use such sensitive data.

    I really glad that someone get's it about SSNs. Thanks for your attention to detail there. They don't know it now but your company should thank you, as well. There are huge costs associated with a breach of such PII not to mention the distrust your customers would suddenly have costing the company customers on top of all the other costs.

    Because you don't actually need to decrypt the SSNs, you might think that you could probably get away with simple "one way" encryption using "hashing" using a much better hashing algorithm than MDF. BUT, because there are only 1 Billion possible SSNs (a relatively small table by today's standards) and whole ranges of those have been declared as "not currently used" by the SSA, I recommend doing a bit of independent research on adding a "salt" to make the use of a "rainbow" table impossible if there ever is a breach.

    That being said, I'm not a security expert nor an encryption Ninja. I can only recommend what you look for and not how to actually do it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 48 total)

You must be logged in to reply to this topic. Login to reply