I need to use multiple columns in the hashindex and so use a checksum over hashbytes such as:
checksum(hashbytes('MD5', column1), hashbytes('MD5',column2)) - this works pretty well but is still not 100% unique across all records (in fact even a hashbytes on a single column is not 100% unique across all values).
As you says: This still is not unique
I suggest you wrap the hashbytes into a UDF so that a change is easy to make for all ETL loads so instead of doing hashbytes('MD5', column1) do dbo.fnHash(column1).
The problem here is that you can pass only one column to a function. Perhaps i have to create a semi array and pass this to a function. I know there are some solutions for this.
I want something like this in a view:
fnHashCalc(field1, Field2) AS Meta_Hash
From view, table
Table.Meta_Hash <> fnHashCalc(field1, Field2)
It's not correct SQL but i hope get my point. I can't do calculations before this query because its a view.
A checksum like function for hash would be great yeah. It would help building datawarehouses further on. Comparing whether you have loaded the record earlier over millions and millions of records and with a great performance is one of my concerns.