SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Hashing column using SHA256


Hashing column using SHA256

Author
Message
a_ud
a_ud
Mr or Mrs. 500
Mr or Mrs. 500 (529 reputation)Mr or Mrs. 500 (529 reputation)Mr or Mrs. 500 (529 reputation)Mr or Mrs. 500 (529 reputation)Mr or Mrs. 500 (529 reputation)Mr or Mrs. 500 (529 reputation)Mr or Mrs. 500 (529 reputation)Mr or Mrs. 500 (529 reputation)

Group: General Forum Members
Points: 529 Visits: 368
I've got a table with one ID column (10 chars long). I'd like to create a new table with 2 columns: ID | SHA256 (hashed ID)

I know in SSIS-2012 I can use:
HASHBYTES('SHA2_256', 'abc')
and it will produce a new hashed value, but only for a variable or a string.

My questions:
  • How can I do the same, but for a whole column, in one go (which might have potentially millions of rows) ?
  • I might need to some processing in the column (e.g. deleting dashes '-', blank spaces, ...). Assuming then there is a way to run the hashing as a T-SQL query: what is better for cleaning + hashing: a stored procedure, or a T-SQL query? The idea is not to overcomplicate the query

Thanks in advance, p.


Lowell
Lowell
SSC Guru
SSC Guru (73K reputation)SSC Guru (73K reputation)SSC Guru (73K reputation)SSC Guru (73K reputation)SSC Guru (73K reputation)SSC Guru (73K reputation)SSC Guru (73K reputation)SSC Guru (73K reputation)

Group: General Forum Members
Points: 73623 Visits: 40974
variable, string OR [COLUMN NAME] of type varchar/nvarchar is accepted for the parameter!
so if you choose a column, or the converted value of a column that is NOT a string, you are good to go.
SELECT 
HASHBYTES('SHA2_256',CONVERT(varchar(30),object_id)),
HASHBYTES('SHA2_256',name),
* from sys.columns


Lowell
--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!
a_ud
a_ud
Mr or Mrs. 500
Mr or Mrs. 500 (529 reputation)Mr or Mrs. 500 (529 reputation)Mr or Mrs. 500 (529 reputation)Mr or Mrs. 500 (529 reputation)Mr or Mrs. 500 (529 reputation)Mr or Mrs. 500 (529 reputation)Mr or Mrs. 500 (529 reputation)Mr or Mrs. 500 (529 reputation)

Group: General Forum Members
Points: 529 Visits: 368
Thanks if that's the case !

But when I checked the HASHBYTES reference it seemed the entry is only varchar, nvarchar or varbinary. Maybe I misunderstood it.
Lowell
Lowell
SSC Guru
SSC Guru (73K reputation)SSC Guru (73K reputation)SSC Guru (73K reputation)SSC Guru (73K reputation)SSC Guru (73K reputation)SSC Guru (73K reputation)SSC Guru (73K reputation)SSC Guru (73K reputation)

Group: General Forum Members
Points: 73623 Visits: 40974
sure the final result must be one of those data types....but you can convert your columns to the acceptable data types and you are all set.
and anyplace they say variable, you can infer a column name is acceptable too.

Lowell
--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!
a_ud
a_ud
Mr or Mrs. 500
Mr or Mrs. 500 (529 reputation)Mr or Mrs. 500 (529 reputation)Mr or Mrs. 500 (529 reputation)Mr or Mrs. 500 (529 reputation)Mr or Mrs. 500 (529 reputation)Mr or Mrs. 500 (529 reputation)Mr or Mrs. 500 (529 reputation)Mr or Mrs. 500 (529 reputation)

Group: General Forum Members
Points: 529 Visits: 368
Apologies all working fine.


I'm still in doubt re cleaning certain characters (' ', '-', maybe others). Should I do that in a single SQL command (e.g. REPLACE) or do I need a series of them, one for each different character?

Would then an SP be better, or T-SQL?

Lowell
Lowell
SSC Guru
SSC Guru (73K reputation)SSC Guru (73K reputation)SSC Guru (73K reputation)SSC Guru (73K reputation)SSC Guru (73K reputation)SSC Guru (73K reputation)SSC Guru (73K reputation)SSC Guru (73K reputation)

Group: General Forum Members
Points: 73623 Visits: 40974
Hashbytes is a varbinary(8000) return value, so you need a larger column, and then you are all set.
you do NOT need to filter or edit your columns for dashes or anything...the data is the data, so just hash the data without fiddling with it.
https://docs.microsoft.com/en-us/sql/t-sql/functions/hashbytes-transact-sql


Return Value

varbinary (maximum 8000 bytes)




Lowell
--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!
Lynn Pettis
Lynn Pettis
SSC Guru
SSC Guru (96K reputation)SSC Guru (96K reputation)SSC Guru (96K reputation)SSC Guru (96K reputation)SSC Guru (96K reputation)SSC Guru (96K reputation)SSC Guru (96K reputation)SSC Guru (96K reputation)

Group: General Forum Members
Points: 96763 Visits: 38988
Lowell - Monday, July 17, 2017 10:02 AM
Hashbytes is a varbinary(8000) return value, so you need a larger column, and then you are all set.
you do NOT need to filter or edit your columns for dashes or anything...the data is the data, so just hash the data without fiddling with it.
https://docs.microsoft.com/en-us/sql/t-sql/functions/hashbytes-transact-sql


Return Value

varbinary (maximum 8000 bytes)




That is the maximum, but the number of bytes returned is based on the hashing algorithm used:
The output conforms to the algorithm standard: 128 bits (16 bytes) for MD2, MD4, and MD5; 160 bits (20 bytes) for SHA and SHA1; 256 bits (32 bytes) for SHA2_256, and 512 bits (64 bytes) for SHA2_512.

Cool
Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
a_ud
a_ud
Mr or Mrs. 500
Mr or Mrs. 500 (529 reputation)Mr or Mrs. 500 (529 reputation)Mr or Mrs. 500 (529 reputation)Mr or Mrs. 500 (529 reputation)Mr or Mrs. 500 (529 reputation)Mr or Mrs. 500 (529 reputation)Mr or Mrs. 500 (529 reputation)Mr or Mrs. 500 (529 reputation)

Group: General Forum Members
Points: 529 Visits: 368
Lynn Pettis - Monday, July 17, 2017 10:48 AM
[quote]
Lowell - Monday, July 17, 2017 10:02 AM

That is the maximum, but the number of bytes returned is based on the hashing algorithm used:
The output conforms to the algorithm standard: 128 bits (16 bytes) for MD2, MD4, and MD5; 160 bits (20 bytes) for SHA and SHA1; 256 bits (32 bytes) for SHA2_256, and 512 bits (64 bytes) for SHA2_512.

I'm somewhat confused now. I wanted the SHA256 to give an ASCII output (see this previous thread), because I need the output to be matched against the output of other application.
When I use a variable (see ditto post) I can get an ASCII output. However, if I run HASHBYTES on a query, according to Lynn and also to Lowell, I get a VARBINARY of up to 8,000 chars??
Do I need to convert it to VARCHAR to make it 'readable' and match it against other software (say, R) using ASCII code?

Lynn Pettis
Lynn Pettis
SSC Guru
SSC Guru (96K reputation)SSC Guru (96K reputation)SSC Guru (96K reputation)SSC Guru (96K reputation)SSC Guru (96K reputation)SSC Guru (96K reputation)SSC Guru (96K reputation)SSC Guru (96K reputation)

Group: General Forum Members
Points: 96763 Visits: 38988
a_ud - Monday, July 17, 2017 11:29 AM
Lynn Pettis - Monday, July 17, 2017 10:48 AM
[quote]
Lowell - Monday, July 17, 2017 10:02 AM

That is the maximum, but the number of bytes returned is based on the hashing algorithm used:
The output conforms to the algorithm standard: 128 bits (16 bytes) for MD2, MD4, and MD5; 160 bits (20 bytes) for SHA and SHA1; 256 bits (32 bytes) for SHA2_256, and 512 bits (64 bytes) for SHA2_512.

I'm somewhat confused now. I wanted the SHA256 to give an ASCII output (see this previous thread), because I need the output to be matched against the output of other application.
When I use a variable (see ditto post) I can get an ASCII output. However, if I run HASHBYTES on a query, according to Lynn and also to Lowell, I get a VARBINARY of up to 8,000 chars??
Do I need to convert it to VARCHAR to make it 'readable' and match it against other software (say, R) using ASCII code?


First, it is not up to 8000 characters, it is up to 8000 bytes. Also, not knowing R I would guess that if R has a hash function it would return a value similar to that returned by SQL. Also, run the following and tell me if the character version of the hash makes any sense.


DECLARE @testval VARCHAR(20) = 'qwertyuiopasdfghjklz';
SELECT
@testval TestVal
, HASHBYTES('SHA2_256',@testval) TestValHash
, LEN(HASHBYTES('SHA2_256',@testval)) LenOfTestValHash
, CAST(HASHBYTES('SHA2_256',@testval) AS VARCHAR(64)) CharVersionOfHash
, LEN(CAST(HASHBYTES('SHA2_256',@testval) AS VARCHAR(64))) LenOfCharVersionOfHash;


Cool
Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
a_ud
a_ud
Mr or Mrs. 500
Mr or Mrs. 500 (529 reputation)Mr or Mrs. 500 (529 reputation)Mr or Mrs. 500 (529 reputation)Mr or Mrs. 500 (529 reputation)Mr or Mrs. 500 (529 reputation)Mr or Mrs. 500 (529 reputation)Mr or Mrs. 500 (529 reputation)Mr or Mrs. 500 (529 reputation)

Group: General Forum Members
Points: 529 Visits: 368
Thanks Lynn, your code is clarifying. Still I'm confused as to why results display different. I probably don't get the different charsets involved? Any help welcome.

If I run your code and choose the first 2 columns:
TestVal                               TestValHash
qwertyuiopasdfghjklz 0x3B7D1943F03E8ACFED7AB74E895B59920D75CB78BA682247A487498E26DBC0AB


TestValHash is, precisely with this format, what I would like to see and store in my target table (t
his is also what every other software I know -e.g. R- produces when you run SHA256('qwertyuiopasdfghjklz') ).

If I try to build this with my own T-SQL code:

--source table with ids
CREATE TABLE nonhashed (
id varchar(20) ) ;
INSERT INTO nonhashed(id)
VALUES('qwertyuiopasdfghjklz') ;

--target table with hashes
CREATE TABLE hashed (
id varchar(20),
sha256 varchar(280) ) ;

--using ASCII code for hashed table?
INSERT INTO hashed(id, sha256)
SELECT id, HASHBYTES('SHA2_256', CONVERT(varchar(20), id) )
FROM [nonhashed] ;


the output is:
id    sha256
qwertyuiopasdfghjklz ;}Cً>ٹديz·N‰[Y’uثx؛h"G¤‡Iژ&غہ«


So, as simply as I can put it, I'd like to see stored the same as before, with the variable. I don't get if my problem is:

  • How SSIS displays information (internally the unreadable sha256, and the readable one, are the same)
  • The definition of the source/target tables (insufficient lenght of fields causing truncating)
  • Or that HASHBYTES produces different output when you use it on a table, or on a variable.

Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search