Is it reliable to compare HASHBYTES for equality when CAST as VARCHAR?

  • SSIS does not allow comparison of the varbinary data type; the output of HASHBYTES. So if I want to compare source and destination HASHBYTES within the Conditional Split component I must convert to string, e.g. (DT_STR,8000,1252)SourceHashbytes != (DT_STR,8000,1252)DestinationHashbytes. In this case, I'm using SHA1 so SourceHashbytes and DestinationHashbytes are varbinary(20).

    I'm not sure about a few things:

    - What's the max varchar (DT_STR) length I need to use in order to compare varbinary(20) when it is CAST as a string, i.e. what does varbinary(20) translate to as a string? I've looked at the string output and I'm definitely not in danger of exceeding 8000 characters/appx bytes. The lengths do vary, however, and I'd like to understand how that works.

    - Is this a reliable comparison, i.e. will the varchar conversion output always contain reproducible, unique values for the varbinary input?

    - Am I taking it on the performance nose forcing the compare in SSIS using strings; would it be better to do it in t-sql (where I would now have to deal with crossing source/destination boundaries in the same stored proc)?

    Any insight appreciated, thanks.

  • Bruce Hendry (4/16/2013)


    SSIS does not allow comparison of the varbinary data type; the output of HASHBYTES. So if I want to compare source and destination HASHBYTES within the Conditional Split component I must convert to string, e.g. (DT_STR,8000,1252)SourceHashbytes != (DT_STR,8000,1252)DestinationHashbytes. In this case, I'm using SHA1 so SourceHashbytes and DestinationHashbytes are varbinary(20).

    I'm not sure about a few things:

    - What's the max varchar (DT_STR) length I need to use in order to compare varbinary(20) when it is CAST as a string, i.e. what does varbinary(20) translate to as a string? I've looked at the string output and I'm definitely not in danger of exceeding 8000 characters/appx bytes. The lengths do vary, however, and I'd like to understand how that works.

    - Is this a reliable comparison, i.e. will the varchar conversion output always contain reproducible, unique values for the varbinary input?

    - Am I taking it on the performance nose forcing the compare in SSIS using strings; would it be better to do it in t-sql (where I would now have to deal with crossing source/destination boundaries in the same stored proc)?

    I do the same thing in my SSIS packages; SHA1 produces varbinary(20) which would convert to a string that's 40 characters long (two characters to represent one byte (00 to FF).

    From a conditional split:

    Saved_Type1Hash != (DT_WSTR,40)Calc_Type1Hash

    HTH,

    Rob

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply