RE: Sound Matching and a Phonetic Toolkit

Duray AKAR

SSC Veteran

Points: 261

August 3, 2005 at 1:05 pm

#579326

Another quick note:

The user defined function udf_levenshtein is not symmetric. In other words,

udf_levenshtein(a,b) is not always equal to udf_levenshtein(b,a)

You might wonder "so what, why does it matter?"

But if I use it to order my results, and if I have two fields to weight, such as bl_lastname_str and bl_company_str, SQL Server will not allow me to use it as below:

order by dbo.udf_levenshtein(@myname, isnull(bl_lastname_str,''))

, bl_lastname_str

, dbo.udf_levenshtein(@myname, isnull(bl_company_str,''))

, bl_company_str

So I have to use :

order by dbo.udf_levenshtein(isnull(bl_lastname_str,''), @myname)

, bl_lastname_str

, dbo.udf_levenshtein(isnull(bl_company_str,''), @myname)

, bl_company_str

which in fact, produces different results.

I know that this is a general problem with user defined functions in SQL server. SQL server tends to optimize the function results in the query with only the value of the FIRST parameter, so you have to change the value of your FIRST parameter... Otherwise, both function calls will be considered to return the same value... Quick example, lets say you have a udfBinAnd that takes two int parameters a and b, and calculates a & b (Binary and) ... If you use it in an SQL query, it will consider

udfBinAnd(1,2) = udfBinAnd(1,3).

Pretty SCARY thing, so watch out...

That might even be considered an SQL Server "BUG", but I will take it as a "caveat of performance" And I won't go further with the storage optimization of consecutive bit fields and their negative effects on the group by queries, because it is totally out of context.

But again, I thought that might be some useful info for other people, or maybe even trigger some workarounds to make the function symmetric...

Thanks

Duray AKAR