An Implementation of the FNV1a Hash Algorithm for SQL Server

  • Comments posted to this topic are about the item An Implementation of the FNV1a Hash Algorithm for SQL Server

  • I use HASHBYTES('SHA1', ... to detect changes in strings.

    Can anyone from Microsoft validate your function? If they can I am hoping to save space since HASHBYTES('SHA1' currently uses varbinary(20).

  • Can someone please explain to me an example for using this?

  • Why ToLower() ? This makes it unusable for hashing passwords (which are mixed case).

  • Anyone interested in the strength of a universal hash function should read this:

    http://home.comcast.net/~bretm/hash/

    FNV does'nt come out that well, but that said, its a very simple algorithmn that can easily be implemented in SQL code if one needs a quick solution.

    Also take a look at the other hash related work of Bob Jenkins: http://burtleburtle.net/bob/hash/

    And an even faster good hash: http://www.azillionmonkeys.com/qed/hash.html

  • I use hash functions to generate a synthetic key; some people might call it a surrogate key, but I believe the proper phrase is synthetic. I currently use

    keyField = CHECKSUM( HASHBYTES(‘MD5’, @VARCHAR(8000)) ) as a deterministic function that produces a INT key for data warehousing work.

    I’m not in the position right now (due to work load) to test this out, but it will go into the investigate list for when my workload is lessened.

    Beer's Law: Absolutum obsoletum
    "if it works it's out-of-date"

  • When you say 36 byte GUID, I'm assuming you are talking about a string represented with dashes and without brasses. GUIDs are really 16 bytes, but can be represented as 32 byte strings also. Now if you take that into account, going to a 8 byte bigint (2 ^ 64 = 18,446,744,073,709,551,616 combinations) you are still statistically more likely to have collisions than with GUIDs (2 ^ 128), but for your sample, relatively safe. 🙂

    Robert

  • Hello,  I first appreciate a lot the work ! Bravo !!

    We found the functions very usefull to change from characters to numeric data without lookups nor joins, in a repeatable way, accross several databases !!!

    We tested the xf_GetHash64 on 550 millions real keys with only 47 collisions. After we look on theses collisions, we found the collisions was caused by very similar strings that had an upper case i turkish (İ) . This is the only caracter we had problem with.


    SELECT
    dbo.xf_GetHash64(N'6ALT-MICROSILIKA-16012') [xf_GetHash64_6ALT-MICROSILIKA-16012],
    dbo.xf_GetHash64(N'6ALT-MİCROSİLİKA-16012') [xf_GetHash64_6ALT-MİCROSİLİKA-16012],
    dbo.xf_GetHash16(N'I') [xf_GetHash16_I],
    dbo.xf_GetHash16(N'İ') [xf_GetHash16_İ],
    dbo.xf_GetHash32(N'I') [xf_GetHash32_I],
    dbo.xf_GetHash32(N'İ') [xf_GetHash32_İ],
    dbo.xf_GetHash64(N'I') [xf_GetHash64_I],
    dbo.xf_GetHash64(N'İ') [xf_GetHash64_İ]

    That return :

    Do you have any idea why this caracter causing an issue ?

    We use :
    SQL 2016 SP2: 13.0.5026
    The instance and database collation is SQL_Latin1_General_CP1_CI_AS
    CLR function compiled with Visual Studio 2015 and .net 4.5

  • Implemented this as a SQL CLR function but I am getting the below error on execution : Small change to the original code

    // hash = (hash ^ Convert.ToUInt64(bytes)) * prime64;

    hash = (hash ^ BitConverter.ToUInt64(bytes, 0)) * prime64;

    Msg 6522, Level 16, State 2, Line 35

    A .NET Framework error occurred during execution of user-defined routine or aggregate "GetHashValue":

    System.ArgumentException: Destination array is not long enough to copy all the items in the collection. Check array index and length.

    System.ArgumentException:

    at System.ThrowHelper.ThrowArgumentException(ExceptionResource resource)

    at System.BitConverter.ToUInt64(Byte[] value, Int32 startIndex)

    at CLRDemo.CLRFunction.HashFNV1a_64(String value)

    at CLRDemo.CLRFunction.xf_GetHash64(SqlString value)

  • We got the answer : it was du to  the to_lower function.

    Change to To_string and no collisions

     

    using System;
    using System.Data;
    using System.Data.SqlClient;
    using System.Data.SqlTypes;
    using Microsoft.SqlServer.Server;
    using System.Text;

    namespace DBCLRFNV1A
    {
    public partial class FNV1A
    {
    static readonly ulong prime64 = 1099511628211;
    static readonly ulong offset64 = 0xcbf29ce484222325;
    static readonly uint prime32 = 16777619;
    static readonly uint offset32 = 2166136261;
    [SqlFunction(IsDeterministic = true, IsPrecise = true, DataAccess = DataAccessKind.None)]
    public static SqlInt64 xf_GetHash64(SqlString value)
    {
    return (SqlInt64)HashFNV1a_64((string)value);
    }
    [SqlFunction(IsDeterministic = true, IsPrecise = true, DataAccess = DataAccessKind.None)]

    private static long HashFNV1a_64(string value)
    {
    ulong hash = offset64;
    byte[] bytes = Encoding.UTF8.GetBytes(value.ToString());
    for (int i = 0; i < bytes.Length; i++)
    {
    ulong v = (hash ^ bytes);
    hash = v * prime64;
    }
    return (long)(hash - long.MaxValue);
    }

    };
    }

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply