﻿<?xml version='1.0' encoding='UTF-8'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>SQLServerCentral / Article Discussions / Article Discussions by Author / Discuss content posted by Ira Whiteside  / Roll Your Own Fuzzy Match / Grouping (Jaro Winkler) - T-SQL / Latest Posts</title><generator>InstantForum.NET v4.1.4</generator><description>SQLServerCentral</description><link>http://www.sqlservercentral.com/Forums/</link><webMaster>notifications@sqlservercentral.com</webMaster><lastBuildDate>Sun, 21 Mar 2010 12:53:10 GMT</lastBuildDate><ttl>20</ttl><item><title>RE: Roll Your Own Fuzzy Match / Grouping (Jaro Winkler) - T-SQL</title><link>http://www.sqlservercentral.com/Forums/Topic732037-1345-1.aspx</link><description>Martin,I have updated the article and code.ThanksIra</description><pubDate>Mon, 13 Jul 2009 12:54:40 GMT</pubDate><dc:creator>Ira Warren Whiteside</dc:creator></item><item><title>RE: Roll Your Own Fuzzy Match / Grouping (Jaro Winkler) - T-SQL</title><link>http://www.sqlservercentral.com/Forums/Topic732037-1345-1.aspx</link><description>I agree but keep in mind many folks still rely on tsql. I have used SSIS extensively, but have found it has a bit of a learning curve. I wanted to remove the "Black Box"aura surronding Fuzzy Matching</description><pubDate>Mon, 13 Jul 2009 12:53:14 GMT</pubDate><dc:creator>Ira Warren Whiteside</dc:creator></item><item><title>RE: Roll Your Own Fuzzy Match / Grouping (Jaro Winkler) - T-SQL</title><link>http://www.sqlservercentral.com/Forums/Topic732037-1345-1.aspx</link><description>Update: Based on several excellent forum contribution’s I asked Michael Capes(friend with big brain) to review them and provide feedback. The result is a revision to the original script/article. The following is the result of his review and the code for for the article has been updated. The code in the article for the JaroWinkler algothrim has has been updated and new code is in the zip file TestJaroWinkler.ZIP an will be available soon. Any question please let me know.   I believe the "expected" results that the forum poster gave are incorrect. I was looking through the pdf document from Winkler, that I believe the poster took his test set. The paper is titled: "AN APPLICATION OF THE FELLEGI-SUNTER MODEL OF RECORD LINKAGE TO THE 1990 U.S. DECENNIAL CENSUS" There's a table in that paper which matches the poster's comments. I don't beleive this table was produced with Jaro-Winkler. Instead, it looks like it was produced by another algorithm. Furthermore, I got the C# SimMetrics utility to run this morning. It's results also matched the earlier results from my Oracle and T-SQL test. I believe the revised T-SQL is correctly implementing the Jaro-Winkler algorthim. I've verified it against two other implementations, and all three results match.Ira</description><pubDate>Sun, 12 Jul 2009 18:29:16 GMT</pubDate><dc:creator>Ira Warren Whiteside</dc:creator></item><item><title>RE: Roll Your Own Fuzzy Match / Grouping (Jaro Winkler) - T-SQL</title><link>http://www.sqlservercentral.com/Forums/Topic732037-1345-1.aspx</link><description>OK,Address matching isn't quite exactly related to the topic, but of course it's related to name matching somehow. Because of the scale of the address matching topic I will not discuss it on this BLOG and rather have a little bit discussion on name match.Regarding matching on initial, it's quite useful when matching on household level; when you have father/son relationship, when they share the same first name.For each name part we can have the following cases if we using fuzzy matching:1. Exact match2. Nick name match3. Fuzzy match4. Match on initial5  No matchall cases are listed according their matching weight, actually you can have it differently;for surname we don't do match on first letter(case 4) and nickname(case2); also there could be cases when surname could change, in case of marriage.Each name part itself has matching weight; for example match on surname has higher weight than match on first or second name, not sure about to compare weight of first name and second name against surname.That mean we having here a scoring engine with possible tuneup parameters.That was my approach.Also a word about cleansing function used by Ira.I think it's kind of so called matchkey generator and it take me back to the times when no fuzzy matching was used but we were using matchkey to do our match.For example keeping first 3 letters and removing wovels and doubles.Match key generation could be another topic of discussion, but I would stop here.Thank you again for the interesting discussion and useful script.</description><pubDate>Fri, 19 Jun 2009 17:42:56 GMT</pubDate><dc:creator>Anatoliy.Martirosian</dc:creator></item><item><title>RE: Roll Your Own Fuzzy Match / Grouping (Jaro Winkler) - T-SQL</title><link>http://www.sqlservercentral.com/Forums/Topic732037-1345-1.aspx</link><description>In my case, middle name/initial wasn't of any real value in and of itself.First initials are often an issue.One step I have not yet taken is splitting up names into individual words, so Guthridge-Henderson matches with either Guthridge or Henderson at a fuzzyish level of matching.Likewise, matching on common nicknames (Dick vs. Richard, Bob vs. Rob vs. Robert) I haven't implemented yet.Matching on the first initial of nicknames, however, I'd judge as a lost cause (D matching R).Addresses take quite a lot of cleanup, and even then, they're hard to match.  Sending them out to a good address cleanup service is probably the best bet, but failing that, we have to try ourselves.</description><pubDate>Fri, 19 Jun 2009 11:51:48 GMT</pubDate><dc:creator>Nadrek</dc:creator></item><item><title>RE: Roll Your Own Fuzzy Match / Grouping (Jaro Winkler) - T-SQL</title><link>http://www.sqlservercentral.com/Forums/Topic732037-1345-1.aspx</link><description>Hi, In my case I had also Middle Name involved in matching process.Also we could have initials instead of full names which actually makes it more complicated,and even more complicated for Asian names.Regarding addresses, I want to mention that from my poiny of view addressses are virtual.There are different address databases around but there is no absolute one.Addresses people use are another source of address virtualisation  we dealing with.And to build up to date address DB is quite complex task.</description><pubDate>Thu, 18 Jun 2009 18:21:03 GMT</pubDate><dc:creator>Anatoliy.Martirosian</dc:creator></item><item><title>RE: Roll Your Own Fuzzy Match / Grouping (Jaro Winkler) - T-SQL</title><link>http://www.sqlservercentral.com/Forums/Topic732037-1345-1.aspx</link><description>I agree that parsing is critical.  On the other hand, how to parse is also very much an "it depends" question.In general, for my only slightly imperfect matching I tend to combine split fields (addr1, addr2, first name, last name), clean them up (nested REPLACE() for SQL), remove extraneous elements (professional designations, etc.), remove all remaining spaces, and then compare them.  With at least some matching on another selective field (license number, for instance), this produces high quality results.For later, fuzzier passes... well, that's where splitting pieces apart, cross-matching, Jaro-Winkler, Double Metaphone, and so on come in.As far as splitting pieces apart, for the ' ' + addr1 + ' ' + addr2 + ' ' (etc.), I have code based on U.S. addresses that tries to find all leading digits, all suite/apartment indicators (whether they be first or last), whatever's in the middle, and the first "word" of whatever's in the middle.  These are then used in various combinations with City+State and/or Zip code on the address portion of the matching.In pure T-SQL, splitting the pieces apart is fairly resource heavy.</description><pubDate>Thu, 18 Jun 2009 13:19:10 GMT</pubDate><dc:creator>Nadrek</dc:creator></item><item><title>RE: Roll Your Own Fuzzy Match / Grouping (Jaro Winkler) - T-SQL</title><link>http://www.sqlservercentral.com/Forums/Topic732037-1345-1.aspx</link><description>Hi,Article is very interesting.I use to work with name and address matching for a while.Never tried Jaro Winkler algorithm and want to thank Ira for that.What striked my mind is that good matching itself is just a part of matching process.Another interesting part of matching is data parsing.Good parsing gives better matching results. It would be interesting to hear about data parsing and it's common issues.RegardsAnatoliy</description><pubDate>Wed, 17 Jun 2009 19:37:56 GMT</pubDate><dc:creator>Anatoliy.Martirosian</dc:creator></item><item><title>RE: Roll Your Own Fuzzy Match / Grouping (Jaro Winkler) - T-SQL</title><link>http://www.sqlservercentral.com/Forums/Topic732037-1345-1.aspx</link><description>Any update on the potential error in the function?Martin</description><pubDate>Sun, 14 Jun 2009 22:35:09 GMT</pubDate><dc:creator>Martin N</dc:creator></item><item><title>RE: Roll Your Own Fuzzy Match / Grouping (Jaro Winkler) - T-SQL</title><link>http://www.sqlservercentral.com/Forums/Topic732037-1345-1.aspx</link><description>Very interesting, I am stepping through the code, I am getting the same results. I will let you know.</description><pubDate>Wed, 10 Jun 2009 15:06:38 GMT</pubDate><dc:creator>Ira Warren Whiteside</dc:creator></item><item><title>RE: Roll Your Own Fuzzy Match / Grouping (Jaro Winkler) - T-SQL</title><link>http://www.sqlservercentral.com/Forums/Topic732037-1345-1.aspx</link><description>Ira, I apologize, but it appears there is at least one error in the algorithm, most likely dealing with transpositions (or I accidentally changed something important when making what I thought were cosmetic changes).Most importantly, I would not expect ('Yvette','Yevett'), ('Martha','Marhta'), and ('Shackleford','Shackelford') to have the 0.999999 scores they're getting.Additionally, ('Massey','Massie') is also scored significantly lower by your T-SQL code than by the Winkler paper below (0.8888 vs. 0.9444).Most other scores are either almost identical, or relatively similar.Jaro-Winkler output validation against a paper by William E. Winkler and Yves Thibaudeau, U.S. Census: [url=http://www.census.gov/srd/papers/pdf/rr91-9.pdf]url=http://www.census.gov/srd/papers/pdf/rr91-9.pdf[/url]  Referenced from the U.S. National Institute of Standards and Technology: [url=http://www.itl.nist.gov/div897/sqg/dads/HTML/jaroWinkler.html]url=http://www.itl.nist.gov/div897/sqg/dads/HTML/jaroWinkler.html[/url][code]SELECT [dbo].[JaroWinkler]('shackleford','shackelford') -- expect 0.9848SELECT [dbo].[JaroWinkler]('cunningham','cunnigham') -- expect 0.9833SELECT [dbo].[JaroWinkler]('campell','campbell') -- expect 0.9792SELECT [dbo].[JaroWinkler]('nichleson','nichulson') -- expect 0.9630SELECT [dbo].[JaroWinkler]('massey','massie') -- expect 0.9444SELECT [dbo].[JaroWinkler]('abroms','abrams') -- expect 0.9333SELECT [dbo].[JaroWinkler]('galloway','calloway') -- expect 0.9167SELECT [dbo].[JaroWinkler]('lampley','campley') -- expect 0.9048SELECT [dbo].[JaroWinkler]('dixon','dickson') -- expect 0.8533SELECT [dbo].[JaroWinkler]('frederick','fredrick') -- expect 0.9815SELECT [dbo].[JaroWinkler]('michele','michelle') -- expect 0.9792SELECT [dbo].[JaroWinkler]('jesse','jessie') -- expect 0.9722SELECT [dbo].[JaroWinkler]('marhta','martha') -- expect 0.9667SELECT [dbo].[JaroWinkler]('jonathon','jonathan') -- expect 0.9583SELECT [dbo].[JaroWinkler]('julies','juluis') -- expect 0.9333SELECT [dbo].[JaroWinkler]('jeraldine','geraldine') -- expect 0.9246SELECT [dbo].[JaroWinkler]('yvette','yevett') -- expect 0.9111SELECT [dbo].[JaroWinkler]('tanya','tonya') -- expect 0.8933SELECT [dbo].[JaroWinkler]('dwayne','duane') -- expect 0.8578[/code]</description><pubDate>Wed, 10 Jun 2009 13:46:55 GMT</pubDate><dc:creator>Nadrek</dc:creator></item><item><title>RE: Roll Your Own Fuzzy Match / Grouping (Jaro Winkler) - T-SQL</title><link>http://www.sqlservercentral.com/Forums/Topic732037-1345-1.aspx</link><description>And, if anyone cares, here is some U.S. centric address sanitization to add to the clean_string function or set of functions. Given that addresses vary significantly country to country, please review carefully before using in your country.At the end is also primitive SQL to turn consecutive spaces into single spaces, which can be trivially separated from the PO Box replacements.Unfortunately, this trivial but tedious method does result in very long SQL, and substantial CPU time for the cleansing function.[code]-- 238 nestings, ALLEY to LOOP plus " BY PASS " and " BL ", and " BLUFFS ", " BROOKS ", " BURGS ", " CENTERS ", " CLIFFS ", " CIRCLES ", " CORNERS ", " COURTS ", " COVES ", " DRIVES ", " ESTATES ", " EXTENSIONS ", " FALLS ", " FIELDS ", " FLATS ", " FORDS ", " FORGES ", " FORKS ", " GARDENS ", " GLENS ", " GREENS ", " GROVES ", " HARBORS ", " HILLS ", " ISLANDS ", " JUNCTIONS ", " KEYS ", " KNOLLS ", " LAKES ", " LIGHTS ", " LOCKS ", and so on to the singular abbreviation from the official plural abbreviations-- spaces remain at this point to allow for word separations, created in bulk via Excel and global search and replace, meaningless self-replacements removed.SET @String = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@String,' ALLEE ',' ALY '),' ALLEY ',' ALY '),' ALLY ',' ALY '),' ANEX ',' ANX '),' ANNEX ',' ANX '),' ANNX ',' ANX '),' ARCADE ',' ARC '),' AV ',' AVE '),' AVEN ',' AVE '),' AVENU ',' AVE '),' AVENUE ',' AVE '),' AVN ',' AVE '),' AVNUE ',' AVE '),' BAYOO ',' BYU '),' BAYOU ',' BYU '),' BEACH ',' BCH '),' BEND ',' BND '),' BLUF ',' BLF '),' BLUFF ',' BLF '),' BLUFFS ',' BLF '),' BLFS ',' BLF '),' BOT ',' BTM '),' BOTTM ',' BTM '),' BOTTOM ',' BTM '),' BL ',' BLVD '),' BOUL ',' BLVD '),' BOULEVARD ',' BLVD '),' BOULV ',' BLVD '),' BRANCH ',' BR '),' BRNCH ',' BR '),' BRDGE ',' BRG '),' BRIDGE ',' BRG '),' BROOK ',' BRK '),' BROOKS ',' BRK '),' BRKS ',' BRK '),' BURG ',' BG '),' BURGS ',' BG '),' BGS ',' BG '),' BYPA ',' BYP '),' BYPAS ',' BYP '),' BYPASS ',' BYP '),' BY PASS ',' BYP '),' BYPS ',' BYP '),' CAMP ',' CP '),' CMP ',' CP '),' CANYN ',' CYN '),' CANYON ',' CYN '),' CNYN ',' CYN '),' CAPE ',' CPE '),' CAUSEWAY ',' CSWY '),' CAUSWAY ',' CSWY ') ,' CEN ',' CTR '),' CENT ',' CTR '),' CENTER ',' CTR '),' CENTR ',' CTR '),' CENTRE ',' CTR '),' CNTER ',' CTR '),' CNTR ',' CTR '),' CENTERS ',' CTR '),' CTRS ',' CTR '),' CIRC ',' CIR '),' CIRCL ',' CIR '),' CIRCLE ',' CIR '),' CRCL ',' CIR '),' CRCLE ',' CIR '),' CIRCLES ',' CIR '),' CIRS ',' CIR '),' CLIFF ',' CLF '),' CLFS ',' CLF '),' CLIFFS ',' CLF '),' CLUB ',' CLB '),' COMMON ',' CMN '),' CORNER ',' COR '),' CORNERS ',' COR '),' CORS ',' COR '),' COURSE ',' CRSE '),' COURT ',' CT '),' CRT ',' CT '),' COURTS ',' CT '),' CTS ',' CT '),' COVE ',' CV '),' CVS ',' CV '),' COVES ',' CV '),' CK ',' CRK '),' CR ',' CRK '),' CREEK ',' CRK '),' CRECENT ',' CRES '),' CRESCENT ',' CRES '),' CRESENT ',' CRES '),' CRSCNT ',' CRES ') ,' CRSENT ',' CRES '),' CRSNT ',' CRES '),' CREST ',' CRST '),' CROSSING ',' XING '),' CRSSING ',' XING '),' CRSSNG ',' XING '),' CROSSROAD ',' XRD '),' CURVE ',' CURV '),' DALE ',' DL '),' DAM ',' DM '),' DIV ',' DV '),' DIVIDE ',' DV '),' DVD ',' DV '),' DRIV ',' DR '),' DRIVE ',' DR '),' DRV ',' DR '),' DRIVES ',' DR '),' DRS ',' DR '),' ESTATE ',' EST '),' ESTATES ',' EST '),' ESTS ',' EST '),' EXP ',' EXPY '),' EXPR ',' EXPY '),' EXPRESS ',' EXPY '),' EXPRESSWAY ',' EXPY '),' EXPW ',' EXPY '),' EXTENSION ',' EXT '),' EXTN ',' EXT '),' EXTNSN ',' EXT '),' EXTENSIONS ',' EXT '),' EXTS ',' EXT '),' FALLS ',' FALL '),' FLS ',' FALL '),' FERRY ',' FRY '),' FRRY ',' FRY '),' FIELD ',' FLD '),' FIELDS ',' FLD '),' FLDS ',' FLD '),' FLAT ',' FLT '),' FLATS ',' FLT '),' FLTS ',' FLT '),' FORD ',' FRD '),' FRDS ',' FRD '),' FORDS ',' FRD '),' FOREST ',' FRST '),' FORESTS ',' FRST '),' FORG ',' FRG '),' FORGE ',' FRG '),' FRGS ',' FRG '),' FORGES ',' FRG '),' FORK ',' FRK '),' FORKS ',' FRK '),' FRKS ',' FRK '),' FORT ',' FT '),' FRT ',' FT '),' FREEWAY ',' FWY '),' FREEWY ',' FWY '),' FRWAY ',' FWY '),' FRWY ',' FWY '),' GARDEN ',' GDN '),' GARDN ',' GDN '),' GRDEN ',' GDN '),' GRDN ',' GDN '),' GARDENS ',' GDN '),' GDNS ',' GDN '),' GRDNS ',' GDN '),' GATEWAY ',' GTWY '),' GATEWY ',' GTWY '),' GATWAY ',' GTWY '),' GTWAY ',' GTWY '),' GLEN ',' GLN '),' GLNS ',' GLN '),' GLENS ',' GLN '),' GREEN ',' GRN '),' GRNS ',' GRN '),' GREENS ',' GRN '),' GROV ',' GRV '),' GROVE ',' GRV '),' GRVS ',' GRV '),' GROVES ',' GRV '),' HARB ',' HBR '),' HARBOR ',' HBR '),' HARBR ',' HBR '),' HBRS ',' HBR ') ,' HRBOR ',' HBR ') ,' HARBORS ',' HBR '),' HAVEN ',' HVN '),' HAVN ',' HVN '),' HEIGHT ',' HTS '),' HEIGHTS ',' HTS '),' HGTS ',' HTS '),' HT ',' HTS '),' HIGHWAY ',' HWY '),' HIGHWY ',' HWY '),' HIWAY ',' HWY '),' HIWY ',' HWY '),' HWAY ',' HWY '),' HILL ',' HL '),' HILLS ',' HL '),' HLS ',' HL '),' HLLW ',' HOLW '),' HOLLOW ',' HOLW '),' HOLLOWS ',' HOLW '),' HOLWS ',' HOLW '),' INLET ',' INLT '),' ISLAND ',' IS '),' ISLND ',' IS '),' ISLANDS ',' IS '),' ISLNDS ',' IS '),' ISS ',' IS '),' ISLES ',' ISLE '),' JCTION ',' JCT '),' JCTN ',' JCT '),' JUNCTION ',' JCT '),' JUNCTN ',' JCT '),' JUNCTON ',' JCT '),' JCTNS ',' JCT '),' JCTS ',' JCT '),' JUNCTIONS ',' JCT '),' KEY ',' KY '),' KEYS ',' KY '),' KYS ',' KY '),' KNL ',' KNL '),' KNOL ',' KNL '),' KNOLL ',' KNL '),' KNLS ',' KNL '),' KNOLLS ',' KNL '),' LAKE ',' LK '),' LK ',' LK '),' LAKES ',' LK '),' LKS ',' LK '),' LAND ',' LAND '),' LANDING ',' LNDG '),' LNDNG ',' LNDG '),' LA ',' LN '),' LANE ',' LN '),' LANES ',' LN '),' LIGHT ',' LGT '),' LIGHTS ',' LGT '),' LGTS ',' LGT '),' LOAF ',' LF '),' LOCK ',' LCK '),' LCKS ',' LCK '),' LOCKS ',' LCK '),' LDGE ',' LDG '),' LODG ',' LDG '),' LODGE ',' LDG '),' LOOPS ',' LOOP ')-- 201 nestings, MALL to WELLS plus " UNIT " then all secondary designators APARTMENT to UPPER, and " MANORS ", " MEADOWS ", " MILLS ", " MOUNTAINS ", " PINES ", " PLAINS ", " POINTS ", " PORTS ", " RAPIDS ", " RIDGES ", " ROADS ", " SHOALS ", " SHORES ", " SPRINGS ", " SQUARES ", " STREETS ", " UNIONS ", " VALLEYS ", " VIEWS ", " VILLAGES ", " WAYS ", " WELLS ", and so on to the singular abbreviation from the official plural abbreviations-- spaces remain at this point to allow for word separations, created in bulk via Excel and global search and replace, meaningless self-replacements removed.SET @String = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@String,' MANOR ',' MNR '),' MNR ',' MNR '),' MANORS ',' MNR '),' MNRS ',' MNR '),' MEADOW ',' MDW '),' MDWS ',' MDW '),' MEADOWS ',' MDW ') ,' MEDOWS ',' MDW '),' MEWS ',' MEWS '),' MILL ',' ML '),' MILLS ',' ML '),' MLS ',' ML ') ,' MISSION ',' MSN '),' MISSN ',' MSN '),' MSSN ',' MSN '),' MOTORWAY ',' MTWY '),' MNT ',' MT '),' MOUNT ',' MT '),' MNTAIN ',' MTN '),' MNTN ',' MTN '),' MOUNTAIN ',' MTN '),' MOUNTIN ',' MTN '),' MTIN ',' MTN '),' MTNS ',' MTN '),' MNTNS ',' MTN ') ,' MOUNTAINS ',' MTN '),' NECK ',' NCK '),' ORCHARD ',' ORCH '),' ORCHRD ',' ORCH '),' OVL ',' OVAL '),' OVERPASS ',' OPAS '),' PK ',' PARK '),' PRK ',' PARK '),' PARKS ',' PARK '),' PARKWAY ',' PKWY '),' PARKWY ',' PKWY '),' PKWAY ',' PKWY '),' PKY ',' PKWY '),' PARKWAYS ',' PKWY ') ,' PKWYS ',' PKWY '),' PASSAGE ',' PSGE '),' PATHS ',' PATH '),' PIKES ',' PIKE '),' PINE ',' PNE '),' PINES ',' PNE '),' PNES ',' PNE '),' PLACE ',' PL '),' PLAIN ',' PLN '),' PLAINES ',' PLN '),' PLAINS ',' PLN '),' PLNS ',' PLN '),' PLAZA ',' PLZ '),' PLZA ',' PLZ '),' POINT ',' PT '),' POINTS ',' PT '),' PTS ',' PT '),' PORT ',' PRT '),' PORTS ',' PRT '),' PRTS ',' PRT '),' PRAIRIE ',' PR '),' PRARIE ',' PR '),' PRR ',' PR '),' RAD ',' RADL '),' RADIAL ',' RADL '),' RADIEL ',' RADL '),' RANCH ',' RNCH '),' RANCHES ',' RNCH '),' RNCHS ',' RNCH '),' RAPID ',' RPD '),' RAPIDS ',' RPD '),' RPDS ',' RPD '),' REST ',' RST '),' RDGE ',' RDG '),' RIDGE ',' RDG '),' RDGS ',' RDG '),' RIDGES ',' RDG '),' RIVER ',' RIV '),' RIVR ',' RIV '),' RVR ',' RIV '),' ROAD ',' RD '),' RDS ',' RD '),' ROADS ',' RD '),' ROUTE ',' RTE '),' SHOAL ',' SHL '),' SHLS ',' SHL '),' SHOALS ',' SHL '),' SHOAR ',' SHR '),' SHORE ',' SHR '),' SHR ',' SHR '),' SHOARS ',' SHR '),' SHORES ',' SHR '),' SHRS ',' SHR '),' SKYWAY ',' SKWY '),' SPNG ',' SPG '),' SPRING ',' SPG '),' SPRNG ',' SPG '),' SPGS ',' SPG '),' SPNGS ',' SPG '),' SPRINGS ',' SPG '),' SPRNGS ',' SPG '),' SPURS ',' SPUR '),' SQR ',' SQ '),' SQRE ',' SQ '),' SQU ',' SQ '),' SQUARE ',' SQ '),' SQRS ',' SQ '),' SQUARES ',' SQ ') ,' SQS ',' SQ '),' STATION ',' STA '),' STATN ',' STA '),' STN ',' STA '),' STRAV ',' STRA '),' STRAVE ',' STRA '),' STRAVEN ',' STRA '),' STRAVENUE ',' STRA '),' STRAVN ',' STRA '),' STRVN ',' STRA '),' STRVNUE ',' STRA '),' STREAM ',' STRM '),' STREME ',' STRM '),' STR ',' ST '),' STREET ',' ST '),' STRT ',' ST '),' STREETS ',' ST '),' STS ',' ST '),' SUMIT ',' SMT '),' SUMITT ',' SMT '),' SUMMIT ',' SMT '),' TERR ',' TER '),' TERRACE ',' TER '),' THROUGHWAY ',' TRWY '),' TRACE ',' TRCE '),' TRACES ',' TRCE '),' TRACK ',' TRAK '),' TRACKS ',' TRAK '),' TRK ',' TRAK '),' TRKS ',' TRAK '),' TRAFFICWAY ',' TRFY '),' TR ',' TRL '),' TRAIL ',' TRL '),' TRAILS ',' TRL '),' TRLS ',' TRL '),' TUNEL ',' TUNL '),' TUNLS ',' TUNL '),' TUNNEL ',' TUNL '),' TUNNELS ',' TUNL '),' TUNNL ',' TUNL '),' TPK ',' TPKE '),' TRNPK ',' TPKE '),' TRPK ',' TPKE '),' TURNPIKE ',' TPKE '),' TURNPK ',' TPKE '),' UNDERPASS ',' UPAS '),' UNION ',' UN '),' UNIONS ',' UN '),' UNS ',' UN '),' VALLEY ',' VLY '),' VALLY ',' VLY '),' VLLY ',' VLY '),' VALLEYS ',' VLY ') ,' VLYS ',' VLY ') ,' VDCT ',' VIA '),' VIADCT ',' VIA '),' VIADUCT ',' VIA '),' VIEW ',' VW '),' VIEWS ',' VW '),' VWS ',' VW '),' VILL ',' VLG '),' VILLAG ',' VLG '),' VILLAGE ',' VLG '),' VILLG ',' VLG '),' VILLIAGE ',' VLG '),' VILLAGES ',' VLG '),' VLGS ',' VLG '),' VILLE ',' VL '),' VIST ',' VIS '),' VISTA ',' VIS '),' VST ',' VIS '),' VSTA ',' VIS '),' WALKS ',' WALK '),' WY ',' WAY '),' WAYS ',' WAY '),' WELL ',' WL '),' WELLS ',' WL '),' WLS ',' WL '),' APARTMENT ',' APT '),' BASEMENT ',' BSMT ') ,' BUILDING ',' BLDG '),' DEPARTMENT ',' DEPT '),' FLOOR ',' FL '),' FRONT ',' FRNT '),' HANGAR ',' HNGR '),' LOBBY ',' LBBY '),' LOWER ',' LOWR '),' OFFICE ',' OFC '),' PENTHOUSE ',' PH '),' ROOM ',' RM '),' SPACE ',' SPC '),' SUITE ',' STE '),' TRAILER ',' TRLR '),' UNT ',' UNIT '),' UPPER ',' UPPR ')-- Turn any set of spaces into a single space.  Why 13 2:1 replacements?  Because 2^13 is 8192, just in excess of VARCHAR(8000), so we _know_ that any number of contiguous spaces is now one space.-- Also, standardize the most common misspelled PO BOX types _after_ the spaces are collapsed.-- single spaces remain at this point to allow for word separationsSET @String = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@String,'  ',' '),'  ',' '),'  ',' '),'  ',' '),'  ',' '),'  ',' '),'  ',' '),'  ',' '),'  ',' '),'  ',' '),'  ',' '),'  ',' '),'  ',' '), ' po bx ',' PO BOX '), ' p o box ',' PO BOX '), ' P0 BOX ',' PO BOX ')[/code]</description><pubDate>Wed, 10 Jun 2009 11:48:50 GMT</pubDate><dc:creator>Nadrek</dc:creator></item><item><title>RE: Roll Your Own Fuzzy Match / Grouping (Jaro Winkler) - T-SQL</title><link>http://www.sqlservercentral.com/Forums/Topic732037-1345-1.aspx</link><description>We did this very same "string distance" matching algorithms 3-4 years ago at a place I worked. It works great for small datasets, but SSIS fuzzy matching is much faster and easier to use for large datasets.</description><pubDate>Wed, 10 Jun 2009 11:13:51 GMT</pubDate><dc:creator>Jody Claggett-376930</dc:creator></item><item><title>RE: Roll Your Own Fuzzy Match / Grouping (Jaro Winkler) - T-SQL</title><link>http://www.sqlservercentral.com/Forums/Topic732037-1345-1.aspx</link><description>"karo yaa na karo, koshish jaisa kuch nahi hai""Do, or do not. There is no try."Nice, a quote from a Yogi and a quote from a Yoda.Great article. Thanks for the code samples.Now if we can just adjust western culture to shorten William to Will rather than Bill we'll all be better off.Who decided that Robert should become BOB?!-D</description><pubDate>Wed, 10 Jun 2009 09:51:43 GMT</pubDate><dc:creator>Darren Wallace</dc:creator></item><item><title>RE: Roll Your Own Fuzzy Match / Grouping (Jaro Winkler) - T-SQL</title><link>http://www.sqlservercentral.com/Forums/Topic732037-1345-1.aspx</link><description>Hi Ira,The Hindi saying: "karo yaa na karo, koshish jaisa kuch nahi hai" literally translates to "Do or not do, there is nothing like try". However, the implied translation is "[Whether you] do or not do [something], there is nothing like try[ing]" with the implied portions in square brackets. That is because of the structure of the Hindi language.Cheers. :)</description><pubDate>Wed, 10 Jun 2009 09:40:57 GMT</pubDate><dc:creator>pallu</dc:creator></item><item><title>RE: Roll Your Own Fuzzy Match / Grouping (Jaro Winkler) - T-SQL</title><link>http://www.sqlservercentral.com/Forums/Topic732037-1345-1.aspx</link><description>Thanks for the suggestions, I try them out.</description><pubDate>Wed, 10 Jun 2009 09:25:10 GMT</pubDate><dc:creator>Ira Warren Whiteside</dc:creator></item><item><title>RE: Roll Your Own Fuzzy Match / Grouping (Jaro Winkler) - T-SQL</title><link>http://www.sqlservercentral.com/Forums/Topic732037-1345-1.aspx</link><description>Thank you, the translation came from some friends in New Delhi,  what would be your translation for "Do, or do not. There is no try".</description><pubDate>Wed, 10 Jun 2009 09:24:18 GMT</pubDate><dc:creator>Ira Warren Whiteside</dc:creator></item><item><title>RE: Roll Your Own Fuzzy Match / Grouping (Jaro Winkler) - T-SQL</title><link>http://www.sqlservercentral.com/Forums/Topic732037-1345-1.aspx</link><description>[Edited after an accidental early post]First and foremost, thank you very much for writing Jaro-Winkler for T-SQL; that was an algorithm in a language I was looking for just a few weeks ago for name and address matching.Second, thank you for taking the time to post a comprehensive article, including test cases, on a very important topic.Third, I have a few suggestions for what I would consider improvements:1) Descriptive variable names.  @max_len, for instance, is much better than @m, @z, and so on.2) Typo?  2a) You call [dbo].[clean_sting] and give source code to a function [dbo].[Cleansing]  2b) In [dbo].[Cleansing], there appears to be an extraneous  ') just before the RETURN3) Efficiency improvements  3a) In the cleansing function, instead of[code]            Declare @ret_value varchar(max)            set @ret_value = @p_str1            set   @ret_value = REPLACE(@ret_value, '.', '')            set   @ret_value = REPLACE(@ret_value, ',', '')             set   @ret_value = REPLACE(@ret_value, '-', '')             set   @ret_value = REPLACE(@ret_value, ';', '')             set   @ret_value = REPLACE(@ret_value, ':', '')  ')            RETURN @ret_value  [/code]I would suggest[code]            Declare @ret_value varchar(max)            set @ret_value = @p_str1            set   @ret_value = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@ret_value, '.', ''), ',', ''), '-', '') , ';', ''), ':', '')            RETURN @ret_value[/code]On a sample size of 98,304 rows (and a pretty pathetic SQL Server), CPU on SQL Server profiler dropped from 4422 to 3906 with this change (i.e. this uses only 88% the CPU time of the original).4) Suggested higher cost, higher sanitization REPLACE() for dirtier strings in general when one wants to compare alphanumerics, presented in bite size pieces (combine the REPLACE() sets for higher efficiency:[code]-- HTML character entity references-- An authoritative list for HTML 4.0 is at http://www.w3.org/TR/REC-html40/sgml/entities.htmlSET @String = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@String, '&amp;amp;',' '), '&amp;#38;',' '),'',' '), '&amp;#160;',' '),'&amp;quot;',' '), '&amp;#34;',' ')-- (U.S. centric) Punctuation merely gets in the way of matching; replace with spaces to keep our word separation; we won't do a fully complete replacement, but we'll do what we do relatively quickly.-- EXCEPT FOR THE # SIGN (which merely gets turned to ' # ', and is used for further address splitting, and is cleaned at the end of that), the bottom three rows of punctuation on the right hand side of a U.S. keyboard, with and without shift by row, then the top row's symbols the same way.--   Further address splitting: pull the number off the front, then pull the Suite/Apartment/#/Unit/etc. off the end, leaving the middle by itselfSET @String = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@String, '.', ' '), ',', ' '), '/', ' '), '&lt;', ' '), '&gt;', ' '), '?', ' '), ';',' '), '''',' '), ':',' '), '"',' '), '[',' '), ']',' '), '\',' '), '{',' '), '}',' '), '|',' '), '`', ' '), '-', ' '), '=', ' '), '~', ' '), '!', ' '), '@', ' '), '#', ' # '), '$',' '), '%',' '), '^',' '), '&amp;',' '), '*',' '), '(',' '), ')',' '), '_',' '), '+',' ')-- (U.S. centric) Special characters merely gets in the way of matching, and some may still be in use in old systems and data, particularly mainframes and mainframe derived code.  Here's a list of codes, from more common, to quite rare-- NOTE: Be careful with UNICODE, though these are all from the first 7 bits of ASCII encodingSET @String = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@String, CHAR(9),' '), CHAR(11),' '), CHAR(13),' '), CHAR(10),' '), CHAR(0),' '), CHAR(27),' '), CHAR(12),' '), CHAR(1),' '), CHAR(2),' '), CHAR(3),' '), CHAR(4),' '), CHAR(8),' '), CHAR(23),' '), CHAR(24),' '), CHAR(25),' '), CHAR(26),' '), CHAR(28),' '), CHAR(29),' '), CHAR(30),' '), CHAR(31),' ')[/code]5) Suggested higher cost, higher sanitization address cleaning):[code]-- Get rid of any combination of leading 0's and spaces in a moderately performant way:SET @StringToStripLeadingZerosSpacesFrom = COALESCE(SUBSTRING(@StringToStripLeadingZerosSpacesFrom,PATINDEX('%[^0 ]%',@StringToStripLeadingZerosSpacesFrom),DATALENGTH(@StringToStripLeadingZerosSpacesFrom)),'')-- Directions can be changed to standard abbreviations-- This must be done before punctuations is replaced by spaces.SET @String = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@String, ' north-east ',' ne '),' northeast ',' ne '), ' north-west ',' nw '),' northwest ',' nw '), ' south-east ',' se '),' southeast ',' se '), ' south-west ',' sw '),' southwest ',' sw '), ' north ',' n '), ' east ',' e '), ' south ',' s '), ' west ',' w ')[/code]6) Suggested higher cost, higher sanitization name cleaning for highly fuzzy matches, again, consolidate for higher efficiency:[code]-- Dental/Medical centric abbreviation cleaning-- A select few sourced from http://www.tabers.com/tabersonline/ub/view/Tabers/144251/0/Professional_Designations_and_Titles_in_the_Health_Sciences)-- Precondition: Punctuation replaced by spaces, and then all consecutive spaces consolidated into single spacesSET @String = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@String, ' DDS ',' '),' D D S ',' '), ' DMD ',' '),' D M D ',' '), ' PC ',' '),' P C ',' '), ' MS ',' '),' M S ',' '),' PA ',' '),' P A ',' '),' FACD ',' '),' F A C D ',' '),' RDH ',' '),' R D H ',' '),' RDA ',' '),' R D A ',' '),' CDA ',' '),' C D A ',' '),' APC ',' '),' A P C ',' '),' MSD ',' '),' M S D ',' '),' INC ',' '),' I N C ',' '),' MD ',' '),' M D ',' '),' LTD ',' '),' L T D ',' ')-- Now we get rid of common suffixes--   This definitely is going to generate false positive matches on parents and children!--   Use at a fairly fuzzy level of matching, after more precise matches have been handledSET @String = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@String, ' I ',' '),' II ',' '), ' III ',' '),' IV ',' '), ' V ',' '),' JR ',' '), ' SR ',' ')[/code]7) As always, for bulk use, generate and save the "clean" string separately in its own columns using REPLACE() on sets.Fourth, nothing to do with this article, but I would suggest adding Double Metaphone to your list of fuzzy logic functions.  It's computationally expensive, but can be done in bulk, and is really quite reasonable when it comes to matching strings that sound alike.  I found source at http://www.sqlservercentral.com/scripts/Miscellaneous/30219/ and updated it to generate 35 character strings instead of 5 character strings for less fuzzy matching on longer names.</description><pubDate>Wed, 10 Jun 2009 08:47:52 GMT</pubDate><dc:creator>Nadrek</dc:creator></item><item><title>RE: Roll Your Own Fuzzy Match / Grouping (Jaro Winkler) - T-SQL</title><link>http://www.sqlservercentral.com/Forums/Topic732037-1345-1.aspx</link><description>There are a few misspellings in the posted content, in the jaro procedure the string cleansing function called is 'Clean_sting' yet below the function created is 'Cleansing'</description><pubDate>Wed, 10 Jun 2009 08:46:07 GMT</pubDate><dc:creator>rawliquid</dc:creator></item><item><title>RE: Roll Your Own Fuzzy Match / Grouping (Jaro Winkler) - T-SQL</title><link>http://www.sqlservercentral.com/Forums/Topic732037-1345-1.aspx</link><description>Good stuff Ira. I had been using the built in Fuzzy Matching Package, but there's nothing like being able to use a regular query.A small thing - a slightly more accurate translation of the Hindi saying at the bottom is - [i]Whether you do or do not do [somethin], there is nothing like trying.[/i]Cheers :-)</description><pubDate>Wed, 10 Jun 2009 08:04:22 GMT</pubDate><dc:creator>pallu</dc:creator></item><item><title>RE: Roll Your Own Fuzzy Match / Grouping (Jaro Winkler) - T-SQL</title><link>http://www.sqlservercentral.com/Forums/Topic732037-1345-1.aspx</link><description>Thanks, I'll give Jaro-Winkler a try.I've already a strategy in place to filter the number of rows through to the fuzzy match query, including pre-weeding out exact matches and matches that don't start with the same first letter as the match target. (In my experience here, we tend to at least get the first letter of the name right!) Those filters alone are enough to bring 2+ minute queries down to sub-second queries, which is good enough for me.Now if I could just get my nickname-based matching working as fast, I'd be a truly happy camper. :)</description><pubDate>Wed, 10 Jun 2009 07:49:07 GMT</pubDate><dc:creator>ezahurak</dc:creator></item><item><title>RE: Roll Your Own Fuzzy Match / Grouping (Jaro Winkler) - T-SQL</title><link>http://www.sqlservercentral.com/Forums/Topic732037-1345-1.aspx</link><description>Yes, Jaro-Winkler will give more weight to the length of the string match from the beginning of the string. It should be more accurate the Levi, however the real challenge is to implement a blocking/partioning strategy to limits the num of records sent to Jaro.</description><pubDate>Wed, 10 Jun 2009 07:31:45 GMT</pubDate><dc:creator>Ira Warren Whiteside</dc:creator></item><item><title>RE: Roll Your Own Fuzzy Match / Grouping (Jaro Winkler) - T-SQL</title><link>http://www.sqlservercentral.com/Forums/Topic732037-1345-1.aspx</link><description>Ira, interesting article.  Timely, too, as I've been working on a project to create an index of people listed in multiple data sources, locating the same people in each database by means of matching on various fields. I'm currently using a similar approach, but using a Levenshtien distance algoritm instead of Jaro-Winkler.  Would Jaro-Winkler be more efficient?Thanks,Ed Z</description><pubDate>Wed, 10 Jun 2009 07:21:29 GMT</pubDate><dc:creator>ezahurak</dc:creator></item><item><title>Roll Your Own Fuzzy Match / Grouping (Jaro Winkler) - T-SQL</title><link>http://www.sqlservercentral.com/Forums/Topic732037-1345-1.aspx</link><description>Comments posted to this topic are about the item [B]&lt;A HREF="/articles/Fuzzy+Match/65702/"&gt;Roll Your Own Fuzzy Match / Grouping (Jaro Winkler) - T-SQL&lt;/A&gt;[/B]</description><pubDate>Tue, 09 Jun 2009 23:59:31 GMT</pubDate><dc:creator>Ira Warren Whiteside</dc:creator></item></channel></rss>