Hashing column using SHA256

  • I've got a table with one ID column (10 chars long). I'd like to create a new table with 2 columns:     ID | SHA256 (hashed ID)

    I know in SSIS-2012  I can use:

    HASHBYTES('SHA2_256', 'abc')

    and it will produce a new hashed value, but only for a variable or a string.

    My questions:

    • How can I  do the same, but for a whole column, in one go (which might have potentially millions of rows) ?
    • I might need to some processing in the column (e.g. deleting dashes '-', blank spaces, ...). Assuming then there is a way to run the hashing as a T-SQL query:  what is better for cleaning + hashing: a stored procedure, or a T-SQL query? The idea is not to overcomplicate the query

    Thanks in advance,       p.

  • variable, string OR [COLUMN NAME] of type varchar/nvarchar  is accepted for the parameter!
    so if you choose a column, or the converted value of a column that is NOT a string, you are good to go.
    SELECT
    HASHBYTES('SHA2_256',CONVERT(varchar(30),object_id)),
    HASHBYTES('SHA2_256',name),
    * from sys.columns

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Thanks if that's the case !

    But when I checked the HASHBYTES reference it seemed the entry is only varchar, nvarchar or varbinary.  Maybe I misunderstood it.

  • sure the final result must be one of those data types....but you can convert your columns to the acceptable data types and you are all set.
    and anyplace they say variable, you can infer a column name is acceptable too.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Apologies all working fine.

    I'm still in doubt re cleaning certain characters (' ', '-', maybe others). Should I do that in a single SQL command (e.g. REPLACE) or do I need a series of them, one for each different character?

    Would then an SP be better, or T-SQL?

  • Hashbytes is a varbinary(8000) return value, so you need a larger column, and then you are all set.
    you do NOT need to filter or edit your columns for dashes or anything...the data is the data, so just hash the data without fiddling with it.
    https://docs.microsoft.com/en-us/sql/t-sql/functions/hashbytes-transact-sql

    Return Value

    varbinary (maximum 8000 bytes)

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Lowell - Monday, July 17, 2017 10:02 AM

    Hashbytes is a varbinary(8000) return value, so you need a larger column, and then you are all set.
    you do NOT need to filter or edit your columns for dashes or anything...the data is the data, so just hash the data without fiddling with it.
    https://docs.microsoft.com/en-us/sql/t-sql/functions/hashbytes-transact-sql

    Return Value

    varbinary (maximum 8000 bytes)

    That is the maximum, but the number of bytes returned is based on the hashing algorithm used:
    The output conforms to the algorithm standard: 128 bits (16 bytes) for MD2, MD4, and MD5; 160 bits (20 bytes) for SHA and SHA1; 256 bits (32 bytes) for SHA2_256, and 512 bits (64 bytes) for SHA2_512.

  • Lynn Pettis - Monday, July 17, 2017 10:48 AM

    Lowell - Monday, July 17, 2017 10:02 AM

    That is the maximum, but the number of bytes returned is based on the hashing algorithm used:
    The output conforms to the algorithm standard: 128 bits (16 bytes) for MD2, MD4, and MD5; 160 bits (20 bytes) for SHA and SHA1; 256 bits (32 bytes) for SHA2_256, and 512 bits (64 bytes) for SHA2_512.

    I'm somewhat confused now. I wanted the SHA256 to give an ASCII output (see this previous thread), because I need the output to be matched against the output of other application. 
    When I use a variable (see ditto post) I can get an ASCII output. However, if I run HASHBYTES on a query, according to Lynn and also to Lowell, I get a VARBINARY of up to 8,000 chars??   
    Do I need to convert it to VARCHAR to make it 'readable' and match it against other software (say, R) using ASCII code?

  • a_ud - Monday, July 17, 2017 11:29 AM

    Lynn Pettis - Monday, July 17, 2017 10:48 AM

    Lowell - Monday, July 17, 2017 10:02 AM

    That is the maximum, but the number of bytes returned is based on the hashing algorithm used:
    The output conforms to the algorithm standard: 128 bits (16 bytes) for MD2, MD4, and MD5; 160 bits (20 bytes) for SHA and SHA1; 256 bits (32 bytes) for SHA2_256, and 512 bits (64 bytes) for SHA2_512.

    I'm somewhat confused now. I wanted the SHA256 to give an ASCII output (see this previous thread), because I need the output to be matched against the output of other application. 
    When I use a variable (see ditto post) I can get an ASCII output. However, if I run HASHBYTES on a query, according to Lynn and also to Lowell, I get a VARBINARY of up to 8,000 chars??   
    Do I need to convert it to VARCHAR to make it 'readable' and match it against other software (say, R) using ASCII code?

    First, it is not up to 8000 characters, it is up to 8000 bytes.  Also, not knowing R I would guess that if R has a hash function it would return a value similar to that returned by SQL.  Also, run the following and tell me if the character version of the hash makes any sense.


    DECLARE @testval VARCHAR(20) = 'qwertyuiopasdfghjklz';
    SELECT
      @testval TestVal
      , HASHBYTES('SHA2_256',@testval) TestValHash
      , LEN(HASHBYTES('SHA2_256',@testval)) LenOfTestValHash
      , CAST(HASHBYTES('SHA2_256',@testval) AS VARCHAR(64)) CharVersionOfHash
      , LEN(CAST(HASHBYTES('SHA2_256',@testval) AS VARCHAR(64))) LenOfCharVersionOfHash;

  • Thanks Lynn, your code is clarifying. Still I'm confused as to why results display different. I probably don't get the different charsets involved? Any help welcome.

    If I run your code and choose the first 2 columns:
    TestVal                               TestValHash
    qwertyuiopasdfghjklz                   0x3B7D1943F03E8ACFED7AB74E895B59920D75CB78BA682247A487498E26DBC0AB

    TestValHash is, precisely with this format, what I would like to see and store in my target table  (this is also what every other software I know  -e.g. R- produces when you run SHA256('qwertyuiopasdfghjklz')   ).

    If I try to build this with my own T-SQL code:

    --source table with ids
    CREATE TABLE nonhashed (
    id varchar(20) ) ;
    INSERT INTO nonhashed(id)
    VALUES('qwertyuiopasdfghjklz') ;

    --target table with hashes
    CREATE TABLE hashed (
    id varchar(20),
    sha256 varchar(280) ) ;

    --using ASCII code for hashed table?
    INSERT INTO hashed(id, sha256)
    SELECT id, HASHBYTES('SHA2_256', CONVERT(varchar(20), id) )
    FROM [nonhashed] ;

    the output is:
    id    sha256
    qwertyuiopasdfghjklz    ;}C?>???z·N‰[Y’u?x?h"G¤‡I?&??«

    So, as simply as I can put it, I'd like to see stored the same as before, with the variable. I don't get if my problem is:

    • How SSIS displays information (internally the unreadable sha256, and the readable one, are the same)
    • The definition of the source/target tables (insufficient lenght of fields causing truncating)
    • Or that HASHBYTES produces different output when you use it on a table, or on a variable.

  • does this help at all?

    INSERT INTO hashed(id, sha256)
    SELECT id, convert(varchar(280), HASHBYTES('SHA2_256',id),1)
    FROM [nonhashed] ;

    see "Binary style" on this page:
    https://docs.microsoft.com/en-us/sql/t-sql/functions/cast-and-convert-transact-sql

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • J Livingston SQL - Tuesday, July 18, 2017 3:35 AM

    does this help at all?

    INSERT INTO hashed(id, sha256)
    SELECT id, convert(varchar(280), HASHBYTES('SHA2_256',id),1)
    FROM [nonhashed] ;

    see "Binary style" on this page:
    https://docs.microsoft.com/en-us/sql/t-sql/functions/cast-and-convert-transact-sql

    Absolutely, this does the trick.   Many thanks !

  • I would leave it as a binary value, but that is just me.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply