SOUNDEX

  • Comments posted to this topic are about the item SOUNDEX

  • Interesting question, thanks. Keep it up!

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • This was removed by the editor as SPAM

  • Great troubles are with slavic languages, especially with transcriptions from cyrilic to latin alphabet.



    See, understand, learn, try, use efficient
    © Dr.Plch

  • That was a very interesting question on a topic I know almost nothing about. Thank you.

  • If SOUNDEX() as implemented in SQL Server was remotely useful outside of English-speaking countries I might care one way or the other. As it stands it's one of the quaint features of the language.


    Just because you're right doesn't mean everybody else is wrong.

  • Rune Bivrin (11/26/2013)


    If SOUNDEX() as implemented in SQL Server was remotely useful outside of English-speaking countries I might care one way or the other. As it stands it's one of the quaint features of the language.

    Rune, I could not agree more;-). Most people (business) ask for "fuzzy" lookups, what they are asking for is more for an "exact" fuzzy lookup. I cannot tell you how many times I have had to explain the differences between the two. Heck, I may as well use LIKE when performing my searches. This is why I use a true programming language and the steps (some of them) I outlined in my answer.

  • Good question, and nice explanation.

    Too many people still fall for the claims that Soundex is a phonetic algorithm for indexing by sound and that it's based on American English pronunciation, despite its being extremely flawed for that purpose nothing really based on AmEng pronunciation would return different codes for "sealing" and "ceiling" while delivering the same code for "ceiling" and "killing". That it was originally intened only to be used for names does mean that it's less erratic when used as originally intended, but we still have "Gene" and "Jean" getting different codes, while and "Jean" and "Jane" get the same code as each other, as do "Gene" and "Gawain". It seems quite clear that there was very little attention paid to phonetics in the design of this stuff.

    So it's nice to see a QotD that illustrates some of Soundex's failings.

    Tom

  • Rune Bivrin (11/26/2013)


    If SOUNDEX() as implemented in SQL Server was remotely useful outside of English-speaking countries I might care one way or the other. As it stands it's one of the quaint features of the language.

    Does that mean you think it's useful in English-speaking countries?

    As long as 70 years ago the US census bureau found Soundex was useless for analysing their early (first 30 years) census records and changed the algorithm; now the US government publishes its own encoding rules, which may be the same as the original Soundex rules (but I doubt it). There are rather a lot of better phonetic fuzzy encoding algorithms, and the name is widely misused: most things that are called Soundex are not actually Soundex.

    I don't know whether the Soundex in SQL Server is the original Soundex or not; if not it still shares most of the failings of the original - enough to make it not useful unless augmented by other matching and distinguishing techniques even for names in English-speaking countries.

    Tom

  • Thanks for the interesting question. I got caught out with "Phone". Don't really use this kind of thing in my work right now. But, as with many QotD topics, it keeps me looking at things I don't normally see, in case I ever need them.

    [font="Verdana"]Please don't go. The drones need you. They look up to you.[/font]
    Connect to me on LinkedIn

  • I had to spend more time researching cursors than SOUNDEX. I've never used one in my life. Not sure I would have used one here either. Seems like there should be a GROUP BY...HAVING approach that would work, though I'm too lazy to think about it much right now. 😀

    ron

    -----
    a haiku...

    NULL is not zero
    NULL is not an empty string
    NULL is the unknown

  • This was absolutely vicious, Steve. I spent on it almost half an hour. 😉

    Thanks, I should have thought about this!

  • Amazing question. Even more amazing is that I got it right. I guessed, admittedly. But I will run and study the code, because it's a really cool puzzle.

    By the way, is it possible to have a "SOUNDEX" type of function for images? Just wondering.

    Thanks,

    webrunner

    -------------------
    A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
    Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html

  • webrunner (11/26/2013)


    Amazing question. Even more amazing is that I got it right. I guessed, admittedly. But I will run and study the code, because it's a really cool puzzle.

    By the way, is it possible to have a "SOUNDEX" type of function for images? Just wondering.

    Thanks,

    webrunner

    By definition, no. SOUNDEX acts upon characters, not BLOBS (or images). However, if you properly index your image (say, based on name or some other identity), you can use SOUNDEX and DIFFERNCE but, as explained in the answer, it is not guaranteed to return what you may think it should. Anything dealing with images, a custom CLR\Assembly would have to be added to the SQL solution if you wish to act directly upon the image.

  • Revenant (11/26/2013)


    This was absolutely vicious, Steve. I spent on it almost half an hour. 😉

    Thanks, I should have thought about this!

    :-D. SOUNDEX (and DIFFERENCE) has always intrigued me because of the amount of code I have to write to meet business' needs and the potential both could be. Additionally, until Microsoft (SQL Server Team) gets it right, SOUNDEX and DIFFERENCE will be nothing more than "enhancements" to LIKE. :w00t:.

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply