HASHBYTES

  • L' Eomot Inversé (2/9/2012)


    ...Hashbytes is a hashing function, not an encryption function...

    I would say the question as intended is a useful learning tool, but it was written poorly.

    HASHBYTES does not, and cannot, perform encryption, nor can it decrypt. I see interview candidates conflate encrypting with hashing all the time; they aren't the same. Hashing is mathematically easy to do in one direction (cleartext to hash), and mathematically difficult to do in the other direction (hash to cleartext), theoretically to the point that the only way to get the cleartext from the hash is to guess the cleartext and see if you're right*.

    The SALE vs. SALT typo was also a minor issue, but that's a normal typo.

    *Guessing MD5 or SHA1 can happen, on a single computer, at speeds measured in tens of billions of guesses per second, applied against dictionaries combined with rules (i.e. every word in the dictionary doubled, every word with different case settings, every word in several dialects of 1337 speak, every word with a suffix of every number from 0 to 1000, etc. etc. etc. still doesn't take very long).

  • thanks for the question. another good lesson for me today!

  • Good question - thanks, Steve!


  • Add the sale string

    Good Question, but "Add the sale string" seems to be tricky... 🙂

  • 🙁 never added salt before.

    Alex S
  • Rich Weissler (2/9/2012)


    GPO (2/8/2012)


    The SALE string? This confused me! 😛

    Yeah, I decided SALE had to be a typo for SALT. (If that isn't want happened, someone please yell... I'm still assuming.)

    Correct, the typo has been corrected.

  • AlexSQLForums (2/9/2012)


    🙁 never added salt before.

    Adding a salt in and of itself is merely a device to 1) prevent the trivial identification of the same cleartext (i.e. hash A5BC is present in rows 5, 15, 33, and 114; they have the same cleartext value!) and 2) render more difficult precomputed dictionaries. While these are laudable goals, the second requires significantly long salts (a 1 character hex salt makes a 10 million word dictionary precomputation take as long as a 2.5 billion word dictionary... which _still_ takes less than a second and almost certainly fits in RAM on a modern consumer machine, for a single MD5/SHA1 round each).

    If you're using this for passwords, see my post in Should we outsource identity management

    for now; but the summary is:

    0) Require difficult to guess cleartexts, particularly for password fields

    1) Salt with a long salt that, for each row, is independently created by a cryptographic random number generator (I recommend at least 64 bits) and stored cleartext.

    2) Run a _lot_ of iterations using a standard iteration algorithm (not available in SQL Server). See my post for some details on the complexity/length vs. iteration tradeoff

  • Raghavendra Mudugal (2/9/2012)


    For me this question did not made any sense.

    the sample code is just concatenating another variable to it, you can name it @salt to @sugar... still the sample code will not make sense to me.

    ...

    My only concern is - question and it's answer does not really suites. I dont think SALT is tech word here in SQL, so it does not paints proper picture.

    A salt is an addition to the hash function to change the outcome, not a parameter. I have altered the question to say "salt value" instead of parameter, which may have been too tricky.

    Salt is a concept, not a specific value. The variable name was chosen not to imply @salt is needed, but to mark the idea.

  • Daniel Bowlin (2/9/2012)


    Interesting question, and an interesting bit of reading. However

    That quote comes from a section in whch the topic is encryption and encrypted data; SQL Server encryption always uses a random salt.

    How can SQL Server use a random salt and still have reproduceable results? Am I getting some concepts confused here?

    If you stick some random bits onto the front of something that's being encrypted using a symmetric cipher whose key schedule is impacted at each point in the encryption by the story so far (eg by cipher block chaining) you (a) render dictionary attacks ineffective and (b) ensure that someone who can see rows with an encrypted field can't discover, without doing the decryption, whether two rows both have the same value in that field; but the only effect on decryption is that when you have decrypted the field there are some extra bits on the front that need to be thrown away.

    The salt can of course be random length, so that the same value encrypted twice may have two different lengths, which really gives attackers trouble; doing this of course requires the length of the salt to be encoded in the salt, so that decryption knows how much of the decrypted string to throw away. I have no idea whether SQL Server uses varying length salts or a fixed length (and if I had I wouldn't be telling).

    Of course while screwing up attackers one also screws up performance; there's lots of tradeoffs to be made and serious design to be done if you want really good security at the same time as really good performance.

    Tom

  • michael.kaufmann (2/9/2012)


    I'd second this; it's my understanding that concatenating a fixed string as salt (in Steve's example assigned to a variable) to another string can't be considered a salt parameter, which should be a random value (for increased security). The following query will return the exact same results as Steve's proposed solution in the 'Correct Answer' section of this QotD :cool::

    ...

    I'd say, no matter how many string parts are concatenated, the combined string qualifies as { @input | 'input' } following the HASHBYTES syntax.

    Your code does return the same values, but I'm not sure pre-pending or appending is the issue. I could be wrong, but it seems that length is more important.

    If I do:

    declare @t nvarchar(200)

    select @t = N'This is my string'

    select

    Hashbytes('SHA1', @t)

    UNION ALL

    SELECT Hashbytes('SHA1', @T + N'R@nd0mS!a6lTValue')

    UNION ALL

    SELECT Hashbytes('SHA1', @T + N'R@nd0mS!a6lTValue2342343')

    UNION ALL

    SELECT Hashbytes('SHA1', @T + N'R@nd0mS!a6lTValuefvsddgdfgfdgdf')

    Adding different values, I get different results. On my machine:

    0xB9A02E529093456D139C69FC5E5D4D825B7EC24B

    0xCDE457DD8AB6C020E9852FE5B6953E02631A2CB2

    0x6872C2C174FD33931D702F321C427D355B28016E

    0x208DBF4BE2F339ED5861258F7854F4A6EAAFBE23

    The idea here would be to use this in your table. So if I have salaries:

    CREATE TABLE employees

    ( firstname VARCHAR(50)

    , pwd VARCHAR(200)

    )

    GO

    INSERT dbo.Employees

    VALUES ('Steve', 'Easy')

    , ('Bob', 'H@rder')

    , ('Andy', 'VeryH$ardP2ssword')

    I could hash these as

    SELECT HASHBYTES( 'SHA1', Pwd)

    FROM Employees

    However the results will always be the same, and more important, I could do a copy of the value from Steve's row to Andy's row and then log in as Andy.

    However I can salt these to make this type of attack less of an issue:

    SELECT HASHBYTES( 'SHA1', Pwd + firstname)

    FROM Employees

    That's simple, and potentially an attacker can still go through all columns in the table, appending and prepending values, but I can make it harder with something like:

    SELECT HASHBYTES( 'SHA1', 'R@nd0m' + Pwd + firstname)

    FROM Employees

    In this case, without access to the code, it becomes hard to determine what the input values for the hash function are.

  • Darn, talked myself of the correct answer.

    http://brittcluff.blogspot.com/

  • Steve Jones - SSC Editor (2/9/2012)


    The idea here would be to use this in your table. So if I have salaries:

    CREATE TABLE employees

    ( firstname VARCHAR(50)

    , pwd VARCHAR(200)

    )

    GO

    INSERT dbo.Employees

    VALUES ('Steve', 'Easy')

    , ('Bob', 'H@rder')

    , ('Andy', 'VeryH$ardP2ssword')

    I could hash these as

    SELECT HASHBYTES( 'SHA1', Pwd)

    FROM Employees

    However the results will always be the same, and more important, I could do a copy of the value from Steve's row to Andy's row and then log in as Andy.

    However I can salt these to make this type of attack less of an issue:

    SELECT HASHBYTES( 'SHA1', Pwd + firstname)

    FROM Employees

    That's simple, and potentially an attacker can still go through all columns in the table, appending and prepending values, but I can make it harder with something like:

    SELECT HASHBYTES( 'SHA1', 'R@nd0m' + Pwd + firstname)

    FROM Employees

    In this case, without access to the code, it becomes hard to determine what the input values for the hash function are.

    Note that while it's harder, this example is of a low cardinality salt, if I take my 10 million word password dictionary, and multiply it by my 10,000 word firstname list, I end up with a 10 billion word dictionary. Were I to precalculate it, it would take up on the order of 2TB - we'll call it a single 2TB hard drive; pretty cheap. To compute them, I can use modern software on a set of GPU's in a single machine, and run through all 100 billion possibilities in less than 10 seconds (plus the amount of time to write them to the disk).

    Better is a fully random salt of large enough length (64+ random bits), so as to make these blind precalculation attacks impractical; instead, each attack will need to apply salt+hash for every word in the dictionary separately; still at over 15 billion tries per second, but at least it has to be retried on each row.

  • Nadrek (2/9/2012)[hr1) Salt with a long salt that, for each row, is independently created by a cryptographic random number generator (I recommend at least 64 bits) and stored cleartext.

    I would be unhappy with 64 bits; basicly I think of the salt length as being comparable to key length for symmetric excryption, and for symmetric encryption I choose random salts of at least the key length. For a high security system I would be distinctly unhappy with a salt shorter than 128 bits, just as I would be distinctly unhappy with a symmetric key shorter that 128 bits. In fact I would go for a salt length the same as the hash output length (160 bits for SHA1, if - unlikely for high security and long duration - I thought SHA1 adequate).

    2) Run a _lot_ of iterations using a standard iteration algorithm (not available in SQL Server). See my post for some details on the complexity/length vs. iteration tradeoff

    Not too sure about standard iteration algorithms; but I'm a lot more than a decade behind the state of the art, so maybe there are some good standard iteration algorithms that I don't know about; anyway, by whatever method one has to achieve an adequately slow hash. Assuming that a required slowness is stored along with the salt, that the hash has a required slowness parameter, and that the login mechanism can have write access to the store, slowness can be increased without user visibility at next login, and that is highly desirable.

    Tom

  • Nice question, had no idea.

  • L' Eomot Inversé (2/9/2012)


    I would be unhappy with 64 bits; basicly I think of the salt length as being comparable to key length for symmetric excryption, and for symmetric encryption I choose random salts of at least the key length. For a high security system I would be distinctly unhappy with a salt shorter than 128 bits, just as I would be distinctly unhappy with a symmetric key shorter that 128 bits. In fact I would go for a salt length the same as the hash output length (160 bits for SHA1, if - unlikely for high security and long duration - I thought SHA1 adequate).

    2) Run a _lot_ of iterations using a standard iteration algorithm (not available in SQL Server). See my post for some details on the complexity/length vs. iteration tradeoff

    Not too sure about standard iteration algorithms;...

    I almost always agree with longer is better (assuming equal randomness); 64 bits of random salt (nonce) probably puts it in the "a couple of decades" protection level category, which is good for some things, and poor for others.

    SHA-1 is not recommended anymore - it's considered to have collision complexity on the order of 2^52 (52 bits), per this Eurocrypt 2009 presentation, by Cameron McDonald, Philip Hawkes and Josef Pieprzyk, and no more than 80 bits of "security", per the U.S. NIST SP800-131A.

    Standard iteration algorithm: PBKDF2 (Password-Based Key Derivation Function, PKCS #5 2.0) is a specific algorithm for having multiple iterating hashes for generating a password; see my post in the Should we outsource identity management thread for more detail and some links to specifications and implementations.

Viewing 15 posts - 31 through 45 (of 64 total)

You must be logged in to reply to this topic. Login to reply