can i store hashbyte on field with length smaller than fields i concatenate it?

  • I work on sql server 2019 i have table have multi column string concatenation as StrSubstance,strmass ,strcasnumber

    i need to create filed hashchemical with nvarchar(700) and this will store hash for 3 columns concatenation with each other

    are this possible

    what i mean

    alter table [dbo].[fmdchemical] add hashchemical nvarchar(700) null

    update ch set ch.hashchemical =HASHBYTES('SHA2_512',concate(StrSubstance,strmass,strcasnumber)) from [dbo].[fmdchemical] ch

     

     

    so are hashing will be correct with nvarchar(700) and every field have length 3500

    are this will make issue on the feature with big counts

    also are hash every column alone then compare it or concate all these columns on one column then compare

     

    CREATE TABLE [dbo].[fmdchemical](
    [id] [int] IDENTITY(1,1) NOT NULL,
    [chemicalid] [int] NULL,
    [StrSubstance] [nvarchar](3500) NULL,
    [strmass] [nvarchar](3500) NULL,
    [strcasnumber] [nvarchar](3500) NULL
    ) ON [PRIMARY]

    GO
    SET IDENTITY_INSERT [dbo].[fmdchemical] ON

    INSERT [dbo].[fmdchemical] ([id], [chemicalid], [StrSubstance], [strmass], [strcasnumber]) VALUES (826807, 748787, N'3P04,AL2O3,C,C,Cr203,Cu,Cu,Fe,Mn,Others,others,others,Others,Pb,Si02,SiC,Silicone Resin,SiO2,Sn,Sn,Ti02', N'0.0004,0.0096,0.336,0.3992,0.5904,0.764,0.7904,1.5408,1.6176,1.6808,1.6808,2.9344,5.2392,7.2,7.9608,19.2072,32.08,65.3584,148.3384,232.7992,269.472', N'592910,592960,592961,593043,593043,593259,593290,593306,593306,593495,593501,593501,593505,593505,621442,621442,621442,621442,622089,624020,650559')
  • Is this related to my response to https://www.sqlservercentral.com/forums/topic/slow-execution-when-update-chemical-id-on-table-chemicalhash-so-how-to-enhance-i suggesting using a hash for comparisons & indexing?

    Hashbytes returns a varbinary(64) for that algorithm --  a very nice, efficient type for sorting & comparing --, not an nvarchar, and certainly not something 700 bytes long.

    Why do you want to convert it to nvarchar? Converted, it's a big hexadecimal string not readable or usable by a human being. And it's a hash, so you can't decrypt it back into readable strings. So I see no reason you would store it as nvarchar.... At least you aren't trying to use nvarchar(3500).

    If you can guarantee that no concatenation of those three columns could be ambiguous -- i.e., no two different concatenated sets of the three different columns could result in the same concatenated string -- then it would probably be safe to hash the concatenated columns. If not, can you use a separate hash for each column?

  • ahmed_elbarbary.2010 wrote:

    I work on sql server 2019 i have table have multi column string concatenation as StrSubstance,strmass ,strcasnumber

    i need to create filed hashchemical with nvarchar(700) and this will store hash for 3 columns concatenation with each other

    are this possible

    what i mean

    alter table [dbo].[fmdchemical] add hashchemical nvarchar(700) null

    update ch set ch.hashchemical =HASHBYTES('SHA2_512',concate(StrSubstance,strmass,strcasnumber)) from [dbo].[fmdchemical] ch

    so are hashing will be correct with nvarchar(700) and every field have length 3500

    are this will make issue on the feature with big counts

    also are hash every column alone then compare it or concate all these columns on one column then compare

    CREATE TABLE [dbo].[fmdchemical](
    [id] [int] IDENTITY(1,1) NOT NULL,
    [chemicalid] [int] NULL,
    [StrSubstance] [nvarchar](3500) NULL,
    [strmass] [nvarchar](3500) NULL,
    [strcasnumber] [nvarchar](3500) NULL
    ) ON [PRIMARY]

    GO
    SET IDENTITY_INSERT [dbo].[fmdchemical] ON

    INSERT [dbo].[fmdchemical] ([id], [chemicalid], [StrSubstance], [strmass], [strcasnumber]) VALUES (826807, 748787, N'3P04,AL2O3,C,C,Cr203,Cu,Cu,Fe,Mn,Others,others,others,Others,Pb,Si02,SiC,Silicone Resin,SiO2,Sn,Sn,Ti02', N'0.0004,0.0096,0.336,0.3992,0.5904,0.764,0.7904,1.5408,1.6176,1.6808,1.6808,2.9344,5.2392,7.2,7.9608,19.2072,32.08,65.3584,148.3384,232.7992,269.472', N'592910,592960,592961,593043,593043,593259,593290,593306,593306,593495,593501,593501,593505,593505,621442,621442,621442,621442,622089,624020,650559')

    Ok... so you changed the other thread on this subject without telling us???  Why didn't you just ask this question over on that other thread?  Hint:  This is why people sometimes don't like helping your or that get a bit angry at your questions.  😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • HASHBYTES returns a set size, as Jeff noted. The inputs can be varchar(max), so you'd need to ensure that the first input is cast to varchar(max) to ensure this returns the correct values.

    Note that HASHBYTES can have collisions, meaning that different sets of inputs can return the same HASHBYTES output. This isn't likely, but I have no idea how a large input might affect the algorithm. You'd want to test this with a lot of inputs and look for duplicates of the hash output.

    HASHBYTES - https://docs.microsoft.com/en-us/sql/t-sql/functions/hashbytes-transact-sql?view=sql-server-ver15

    CONCAT - https://docs.microsoft.com/en-us/sql/t-sql/functions/concat-transact-sql?view=sql-server-ver15

  • Getting back to the original issue, if you'd go back to the original thread that start all this and answer my simple question, you might get somewhere.

    As for the question on this thread, you really should start reading documentation on stuff you think you might want to use because the answers to such simple questions are in the documentation.

    What's not in the documentation is the right way to do the concatenation for comparison hashes... which is super important in your case because you're trying to compare complex chemical formulas.

    So take this conversation back to the original thread on this subject and answer my question so we can make all of this much faster.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply