RE: Fuzzy searching – SQLServerCentral

Old Hand

Points: 333

July 25, 2013 at 6:47 am

Could you post the output from "SELECT @@version" for your server, as well information about the machine. How much memory does it have? Are there other instances on the machine? Any other software competing about resources?

Here's the results of SELECT @@versions:

Microsoft SQL Server 2005 - 9.00.5069.00 (Intel X86) Aug 22 2012 16:01:52 Copyright (c) 1988-2005 Microsoft Corporation Standard Edition on Windows NT 5.2 (Build 3790: Service Pack 2) The server has 16 GBs of memory, but there is a lot of competition for memory. This is our main server and there are a lot of jobs running in the background.

The reason why I'm not able to use CLR functions is because I get the following error when I try to install the .dll file: Msg 6513, Level 16, State 27, Line 8 Failed to initialize the Common Language Runtime (CLR) v2.0.50727 due to memory pressure. Please restart SQL server in Address Windowing Extensions (AWE) mode to use CLR integration features.

I've tried restarting the server with AWE mode enabled and have tried all the common solutions that the internet has suggested. We think that the problem could be a memory leak. When the problem first started happening, we allocated more memory to SQL server. It sucked it up, but the problems didn't go away, including the CLR error. I've tried installing the CLR function on other servers, but strangely, I get the same error even on servers that are hardly being used and have a lot of free memory.

Its unclear if you are trying to match individual words (as in name matching) or entire phrases. The performance of any routine which has to match every row against every other row in a large table will always be a problem. The trick is to find a way of "rough matching" which can be indexed. In word checking, you might use the length of the words being matched, so that only those word close to each other in length are checked. Usually, users of Levenstein type routines will only accept matches with values less than (say) 4. In that case, the only words which can match must have lengths within 4 of each other. You can filter the data to eliminate the vast bulk of potential matches which can never match. There are lots of academic papers on improving the performance of matching routines, and they might be fruitful source of potential ideas for your matching problem. They often have long lists of references which can be consulted as well. Good luck.

Thanks for the suggestion. Right now I'm comparing full names with each other. I'll try editing the function so that it ignores pairs that are too different in length.

@ChrisM@home: I'll give it a try, thanks.

Thanks again, everyone. I'm at a roadblock atm, so I appreciate your expertise.