Hashing data in existing schema

  • Greetings good folks!

    I have an unusual request. I have a table in a database for customers which has the usual personal details columns and we would like to hash those out. The original values we want to keep in another database/table which can be secured better.

    However, the trick is that I cannot change the datatype of the existing fields to varbinary as SQL does to the value whenever it does any hashing or encryption. I want the original fields to remain as varchar if not then it might break some 3rd party applications. I don't care of those 3rd party apps report junk instead of the actual data...we can live with that.

    I am planning to use MD5.....

    Can anyone here help me with this issue?

    Thanks in advance!

    Anish

  • How are you moving the data to the "more secure database"? Are you doing parallel inserts? Or are you inserting, then moving? Or some sort of "instead" trigger?

    Whatever you're doing, you'll just need to store the hash as character data, instead of binary data. But "0x1AD6" can be stored as a varchar and then converted when you want to reverse the hash, if you want to. It just means using something other than SQL Server's encryption for it. You'll have to do it with a custom piece of code. That shouldn't be too hard to achieve.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Thanks! GSquared

    Yes, we are hoping that a trigger would do what we want to achieve and use md5 hashing for the purpose.

    Regards

    Anish

  • Can you build a .NET DLL that will do the hashing for you? If so, you can use that in a CLR trigger in SQL 2005/2008. That should accomplish what you need. Or call a CLR UDF from a T-SQL trigger, whichever is easier.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • you cannot change the datatype, but can you change the size?

    for example, a 8 letter password using AES requires 40 characters to store the hashed value...

    so if you cannot change the size of the field, you might run into trouble, unless you use a substitution cyper instead of encryption.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • I would like to avoid changing the datasize as far as possible. The issue is that it is a 3rd party app which we maintain....so, if we break the app.....we will need to fix it too 🙂

    Here are the fields that we would like to hash/hide/encrypt

    Fname varchar(30)

    Lname varchar(30)

    Address varchar(180)

    City varchar(100)

    State varchar(2)

    Zip varchar(15)

    HomePhone varchar(25)

    BusinessPhone varchar(25)

    Fax varchar(25)

    Now, I understand that MD5 is 32 characters long....so, I can only use MD5 for the Address and City field. I was thinking of using CRC-32 for the others where I have field size limitations....but, then CRC-32 is easier to crack. Somewhere else I found that I could use 64bit DES which would generate a hash with special characters 12 bytes long. Are there any more suggestions?

    Before implementing this solution we will need to do some data sanitization...i.e. trimming blanks and converting the fields to either upper or lower case so the hashes match. Also, I wont be able to use hashbytes so, I believe a UDF would be the best approach? Please advise.

    Thanks Gsquared & Lowell for your help

    Regards

    Anish

  • Of course, you could always have some fun with it...

    Change the collation to Hebrew_CI_AI, Cyrillic_General_CI_AI, or Japanese_CS_AS.

    Use a replacement cypher that only swaps to characters at CHAR(500) +...


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Hello Folks!

    So, taking suggestions from here, I am trying to update the customers table which has over 66 columns and almost a million rows. We have decided to use the following method to hash out the data:

    SUBSTRING(master.dbo.fn_varbintohexstr(HashBytes('MD5', tbc1.NameFirst)), 3, 30)

    and so on and so forth for some of the other fields. As I have mentioned earlier in another post there is going to be another database which will hold the real data, "hidden" from the main application.

    The PK on the said table i.e. CustomerID is a char(10) and I am now looking for the safest method to run the update without locking the resource(s) too much. Can anyone please suggest a method to do an update in batches of 10000 or so but, I should be able to stop and resume the job from the last row updated if required. Bearing in mind that I want to avoid adding/modifying any schema changes on the Customer table?

    Thanks in advance for your help.

    Regards

    Anish

  • anish, it looks to me like you are truncating your value by at least a couple of characters..., so you would never be able to compare a hash to the correct hash value:

    /*

    0xC08A666033900CFF914346EA6442934A

    0xc08a666033900cff914346ea6442934a

    c08a666033900cff914346ea644293

    */

    SELECT

    HashBytes('MD5', 'Lowell'),

    master.dbo.fn_varbintohexstr(HashBytes('MD5', 'Lowell')),

    SUBSTRING(master.dbo.fn_varbintohexstr(HashBytes('MD5', 'Lowell')), 3, 30)

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Hi Lowell

    The hashing is just being used to obscure the data. So, I will not be doing any string matches....

    Anything that needs to use the customers personal details will be using the other "hidden" database.

    🙂

    I know it's unusual.....but that's the way we want to do it.

    Cheers!

    Anish

  • do you also need to "decrypt" these columns ?

    If you answer is "No" or "I don't care rubbish is published", just put blancs or NULL in your columns, or "censored" :discuss:

    Another thing I've seen is create a new table where you save the to be encrypted data. Make sure to include the PK of your source table and an identity column or an uniqueidentifier with newid()

    replace the original value in the original table with the identity or uniqueidentifier. No more need to enlarge the columns:w00t:

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Thanks ALZDBA 😉

    ALZDBA (10/27/2010)


    do you also need to "decrypt" these columns ?

    If you answer is "No" or "I don't care rubbish is published", just put blancs or NULL in your columns, or "censored" :discuss:

    Another thing I've seen is create a new table where you save the to be encrypted data. Make sure to include the PK of your source table and an identity column or an uniqueidentifier with newid()

    replace the original value in the original table with the identity or uniqueidentifier. No more need to enlarge the columns:w00t:

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply