Java's String.hashCode() function in T-SQL

  • Comments posted to this topic are about the item Java's String.hashCode() function in T-SQL

  • Thanks for a very useful function. When applied to a scalar value, the routine is fine. However, if you're applying it to thousands of rows, it's very slow because of the while loop.

    Here's an alternative.

    alter function dbo.UTIL_JAVA_HASHCODE(@str varchar(max))

    returns int

    as

    -------------------------------------------------------------------------------

    -- Proc: UTIL_JAVA_HASHCODE

    -- Desc: Replicate Java's String.HashCode() function

    -- Inputs: @STR: String

    -- Outputs: Java hashcode of the string (4 byte integer)

    -------------------------------------------------------------------------------

    begin

    declare @h bigint

    set @h = 0

    select @h = (@h*31 + ascii(substring(@str,X.pos,1)))%4294967296

    from (select top(len(@str))

    row_number() over (order by getdate()) as pos

    from sys.all_objects) as X

    if @h >= 2147483648 set @h = @h - 4294967296

    return convert(int, @h)

    end;

    go

    With the following logic:

    set statistics time on

    go

    declare @table table ( hashCode int )

    insert into @table

    select global.dbo.UTIL_JAVA_HASHCODE( X.varchar32field )

    from TableWith33000Rows as X

    go

    Using the original function, the insert completed in 53s (avg of 3 runs). Applying the original function to a varchar(64) field, the elapsed time grew to 100s.

    After revising the function to eliminate the while loop, the varchar(32) processing completes in 11s and the varchar(64) processing in only 14s. (If you have a tally table feel free to use it instead of the derived table from sys.all_objects.)

    🙂

  • Thanks for this enhancement! I have always been reluctant to use the "select @var=@var + col from tbl" construct, for fear of it stopping working in some future release. But it seems like it's here to stay. And with that kind of performance improvement it's hard to resist.

  • We are also trying to simulate String hashcode of Java in SQL. Why do I get this error when I try to do hashcode in SQL using the below code instead? I'm using this sample to run it:

    Select dbo.fn_get_string_hdp_group_id('000b4c09-9b4b-42b0-88a4-7c348be6a92d');

    Here is the error we get:

    Arithmetic overflow error converting expression to data type numeric.

    And this is the function we wrote in SQL. It looks to my eye as the equivalent of the Java hatched method:

    ALTER FUNCTION fn_get_string_hdp_group_id(@text varchar(50))

    returns int

    BEGIN

    Declare @result int, @index int, @char char(1), @calcVal decimal(38,0);

    Set @calcVal = 0;

    Set @result = 0;

    Set @index = 1;

    While(@index <= Len(@text))

    BEGIN

    Set @char = Substring(@text, @index, 1);

    Set @calcVal = (@calcVal * 31) + (Ascii(@char));

    Set @index = @index + 1;

    END

    Set @result = Abs(@calcVal) % 10;

    return @result;

    END

    GO

    Thanks much in advance!

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply