An Implementation of the FNV1a Hash Algorithm for SQL Server

, 2012-02-20

Summary

This CLR code implements the Fowler-Noll-Vo (FNV) variant "1a" hash function. This algorithm provides exellent avalanche characteristics, low collision, and high dispersion across either string or integer inputs such as GUIDs, URLs, host names, file names, long text, IPv4 or v6 addresses, etc. My personal testing shows that the 64-bit version can successfully hash 10 billion 36-byte UNIQUEIDENTIFIER (GUID) values to 8-byte BIGINT (long) values with zero collisions. It is IMHO far, far, far better than the SQL native HASHBYTES, CHECKSUM-based hashing, etc.

You can read more about the FNV algorithm, its mathematical underpinning, and its variants here.

Three functions are provided:

xf_GetHash16

Hashes a VARCHAR or NVARCHAR string (up to 4000 characters) to a SMALLINT

OK collision avoidance

xf_GetHash32

Hashes a VARCHAR or NVARCHAR string (up to 4000 characters) to an INT

Better collision avoidance

xf_GetHash64

Hashes a VARCHAR or NVARCHAR string (up to 4000 characters) to a BIGINT

Best collision avoidance

Note that these are NOT encryption functions, they are hashing functions and have no equivalent "decryption" algorithm. One-way only!

Source Code

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Text;
public partial class UserDefinedFunctions
{
    static readonly ulong prime64 = 1099511628211;
    static readonly ulong offset64 = 0xcbf29ce484222325;
    static readonly uint prime32 = 16777619;
    static readonly uint offset32 = 2166136261;
    [SqlFunction(IsDeterministic = true, IsPrecise = true, DataAccess = DataAccessKind.None)]
    public static SqlInt64 xf_GetHash64(SqlString value)
    {
        return (SqlInt64)HashFNV1a_64((string)value);
    }
    [SqlFunction(IsDeterministic = true, IsPrecise = true, DataAccess = DataAccessKind.None)]
    public static SqlInt32 xf_GetHash32(SqlString value)
    {
        return (SqlInt32)HashFNV1a_32((string)value);
    }
    [SqlFunction(IsDeterministic = true, IsPrecise = true, DataAccess = DataAccessKind.None)]
    public static SqlInt16 xf_GetHash16(SqlString value)
    {
        return (SqlInt16)HashFNV1a_16((string)value);
    }
    private static long HashFNV1a_64(string value)
    {
        ulong hash = offset64;
        byte[] bytes = Encoding.UTF8.GetBytes(value.ToLower());
        for (int i = 0; i < bytes.Length; i++)
        {
            hash = (hash ^ bytes) * prime64;
        }
        return (long)(hash - long.MaxValue);
    }
    private static int HashFNV1a_32(string value)
    {
        uint hash = offset32;
        byte[] bytes = Encoding.UTF8.GetBytes(value.ToLower());
        for (int i = 0; i < bytes.Length; i++)
        {
            hash = (hash ^ bytes) * prime32;
        }
        return (int)(hash - int.MaxValue);
    }
    private static short HashFNV1a_16(string value)
    {
        uint MASK_16 = (((uint)1 << 16) - 1);    /* i.e., (u_int32_t)0xffff */
        uint hash = offset32;
        byte[] bytes = Encoding.UTF8.GetBytes(value.ToLower());
        for (int i = 0; i < bytes.Length; i++)
        {
            hash = (hash ^ bytes) * prime32;
        }
        hash = (hash >> 16) ^ (hash & MASK_16);
        return (short)(hash - short.MaxValue);
    }
};

Usage

Once installed, you can use these CLR functions like any TSQL function. Here is a proc that will return the 16-, 32-, and 64-bit hashes for a given passed string:

CREATE PROCEDURE GetHashes(@stringval NVARCHAR(4000))
AS BEGIN
DECLARE
   @smallhash SMALLINT, 
   @inthash INT,
   @bighash BIGINT
SELECT
   @smallhash=dbo.xf_GetHash16(@stringval),
   @inthash=dbo.xf_GetHash32(@stringval),
   @bighash=dbo.xf_GetHash64(@stringval)
SELECT
   @smallhash as Hash16Bit,
   @inthash as Hash32Bit,
   @bighash as Hash64Bit
END

Reference inline in a query:

SELECT dbo.xf_GetHash32(x.somecolumn) as HashValue
FROM mytable x
WHERE....

Or use in a computed column and optionally persist it for use as an index or even the primary key:

CREATE TABLE StringTable (
StringValue VARCHAR(512),
HashKey AS (dbo.xf_GetHash64(StringValue)) PERSISTED,
CONSTRAINT PK_StringTable PRIMARY KEY CLUSTERED (HashKey),
CONSTRAINT UK_StringTable_StringValue UNIQUE (StringValue)
)

Disclaimer

The source code is provided to you as is, without warranty. There is no warranty for the program, expressed or implied, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose and non infringement of third party rights. The entire risk as to the quality and performance of the program is with you. Should the program prove defective, you assume the cost of all necessary servicing, repair and correction.

Rate

4.29 (7)

Share

Share

Rate

4.29 (7)

Related content

SQL Server CLR function to concatenate values in a column

If a column is normalized, but the user really wants to see the values as a short comma separated list, how can I write a query that produces the list? Concatenating the values in a column would be pretty easy if SQL Server had a concatenate aggregate function, which it doesn't. What's more, for efficiency sake it's important to write the reporting queries without using cursors.

2009-02-09

2,240 reads