An Implementation of the FNV1a Hash Algorithm for SQL Server

  • bret.lowery

    Old Hand

    Points: 302

    Comments posted to this topic are about the item An Implementation of the FNV1a Hash Algorithm for SQL Server

  • martinko_michael

    Grasshopper

    Points: 23

    I use HASHBYTES('SHA1', ... to detect changes in strings.

    Can anyone from Microsoft validate your function? If they can I am hoping to save space since HASHBYTES('SHA1' currently uses varbinary(20).

  • Nathan Veysey

    Grasshopper

    Points: 19

    Can someone please explain to me an example for using this?

  • Dave Semmelink

    SSC Enthusiast

    Points: 117

    Why ToLower() ? This makes it unusable for hashing passwords (which are mixed case).

  • peter-757102

    SSCertifiable

    Points: 6877

    Anyone interested in the strength of a universal hash function should read this:

    http://home.comcast.net/~bretm/hash/

    FNV does'nt come out that well, but that said, its a very simple algorithmn that can easily be implemented in SQL code if one needs a quick solution.

    Also take a look at the other hash related work of Bob Jenkins: http://burtleburtle.net/bob/hash/

    And an even faster good hash: http://www.azillionmonkeys.com/qed/hash.html

  • DEK46656

    SSCrazy

    Points: 2001

    I use hash functions to generate a synthetic key; some people might call it a surrogate key, but I believe the proper phrase is synthetic. I currently use

    keyField = CHECKSUM( HASHBYTES(‘MD5’, @VARCHAR(8000)) ) as a deterministic function that produces a INT key for data warehousing work.

    I’m not in the position right now (due to work load) to test this out, but it will go into the investigate list for when my workload is lessened.

    Beer's Law: Absolutum obsoletum
    "if it works it's out-of-date"

  • rtaylor72

    SSC Rookie

    Points: 46

    When you say 36 byte GUID, I'm assuming you are talking about a string represented with dashes and without brasses. GUIDs are really 16 bytes, but can be represented as 32 byte strings also. Now if you take that into account, going to a 8 byte bigint (2 ^ 64 = 18,446,744,073,709,551,616 combinations) you are still statistically more likely to have collisions than with GUIDs (2 ^ 128), but for your sample, relatively safe. 🙂

    Robert

  • Flashx

    SSC Journeyman

    Points: 95

    Hello,  I first appreciate a lot the work ! Bravo !!

    We found the functions very usefull to change from characters to numeric data without lookups nor joins, in a repeatable way, accross several databases !!!

    We tested the xf_GetHash64 on 550 millions real keys with only 47 collisions. After we look on theses collisions, we found the collisions was caused by very similar strings that had an upper case i turkish (İ) . This is the only caracter we had problem with.


    SELECT
    dbo.xf_GetHash64(N'6ALT-MICROSILIKA-16012') [xf_GetHash64_6ALT-MICROSILIKA-16012],
    dbo.xf_GetHash64(N'6ALT-MİCROSİLİKA-16012') [xf_GetHash64_6ALT-MİCROSİLİKA-16012],
    dbo.xf_GetHash16(N'I') [xf_GetHash16_I],
    dbo.xf_GetHash16(N'İ') [xf_GetHash16_İ],
    dbo.xf_GetHash32(N'I') [xf_GetHash32_I],
    dbo.xf_GetHash32(N'İ') [xf_GetHash32_İ],
    dbo.xf_GetHash64(N'I') [xf_GetHash64_I],
    dbo.xf_GetHash64(N'İ') [xf_GetHash64_İ]

    That return :

    Do you have any idea why this caracter causing an issue ?

    We use :
    SQL 2016 SP2: 13.0.5026
    The instance and database collation is SQL_Latin1_General_CP1_CI_AS
    CLR function compiled with Visual Studio 2015 and .net 4.5

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply