Comparing strings in non-English language

  • hi people.

    I have a address data base and I have problems with replicated name of streets.

    for example:

    souza street

    souza street ,

    due comma, t-sql replicates that two addresses. I know soundex command, however that datas are in portuguese language.

    is there any way to compare strings in non-english language?

  • Can you strip out punctuation before you compare them? Replace commas, periods and a few others (I don't know what's common in Portugal/Brazil/wherever else you might be), then compare them.

    In US addresses, hyphens need to stay in, and slashes, but not a lot of other punctuation marks. That might help.

    What I ended up having to do for multinational addresses was use a full-text index with a custom thesaurus, then use that to compare the strings using Contains(). That handles things like "123 Main Street" vs "123 Main St", or "Apt 31" vs "#31". It can get slow, though, so don't do that for an OLTP system. Mine stages the data in raw format, then cleans it up and compares it to existing addresses, etc., in an off-hours data load.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply