SOUNDEX

  • steve.jacobs

    SSCommitted

    Points: 1830

    Comments posted to this topic are about the item SOUNDEX

  • Koen Verbeeck

    SSC Guru

    Points: 258955

    Interesting question, thanks. Keep it up!

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • This was removed by the editor as SPAM

  • honza.mf

    SSCertifiable

    Points: 5519

    Great troubles are with slavic languages, especially with transcriptions from cyrilic to latin alphabet.



    See, understand, learn, try, use efficient
    © Dr.Plch

  • Ed Wagner

    SSC Guru

    Points: 286960

    That was a very interesting question on a topic I know almost nothing about. Thank you.

  • Rune Bivrin

    SSCertifiable

    Points: 7617

    If SOUNDEX() as implemented in SQL Server was remotely useful outside of English-speaking countries I might care one way or the other. As it stands it's one of the quaint features of the language.


    Just because you're right doesn't mean everybody else is wrong.

  • steve.jacobs

    SSCommitted

    Points: 1830

    Rune Bivrin (11/26/2013)


    If SOUNDEX() as implemented in SQL Server was remotely useful outside of English-speaking countries I might care one way or the other. As it stands it's one of the quaint features of the language.

    Rune, I could not agree more;-). Most people (business) ask for "fuzzy" lookups, what they are asking for is more for an "exact" fuzzy lookup. I cannot tell you how many times I have had to explain the differences between the two. Heck, I may as well use LIKE when performing my searches. This is why I use a true programming language and the steps (some of them) I outlined in my answer.

  • TomThomson

    SSC Guru

    Points: 104772

    Good question, and nice explanation.

    Too many people still fall for the claims that Soundex is a phonetic algorithm for indexing by sound and that it's based on American English pronunciation, despite its being extremely flawed for that purpose nothing really based on AmEng pronunciation would return different codes for "sealing" and "ceiling" while delivering the same code for "ceiling" and "killing". That it was originally intened only to be used for names does mean that it's less erratic when used as originally intended, but we still have "Gene" and "Jean" getting different codes, while and "Jean" and "Jane" get the same code as each other, as do "Gene" and "Gawain". It seems quite clear that there was very little attention paid to phonetics in the design of this stuff.

    So it's nice to see a QotD that illustrates some of Soundex's failings.

    Tom

  • TomThomson

    SSC Guru

    Points: 104772

    Rune Bivrin (11/26/2013)


    If SOUNDEX() as implemented in SQL Server was remotely useful outside of English-speaking countries I might care one way or the other. As it stands it's one of the quaint features of the language.

    Does that mean you think it's useful in English-speaking countries?

    As long as 70 years ago the US census bureau found Soundex was useless for analysing their early (first 30 years) census records and changed the algorithm; now the US government publishes its own encoding rules, which may be the same as the original Soundex rules (but I doubt it). There are rather a lot of better phonetic fuzzy encoding algorithms, and the name is widely misused: most things that are called Soundex are not actually Soundex.

    I don't know whether the Soundex in SQL Server is the original Soundex or not; if not it still shares most of the failings of the original - enough to make it not useful unless augmented by other matching and distinguishing techniques even for names in English-speaking countries.

    Tom

  • Thomas Abraham

    SSChampion

    Points: 10761

    Thanks for the interesting question. I got caught out with "Phone". Don't really use this kind of thing in my work right now. But, as with many QotD topics, it keeps me looking at things I don't normally see, in case I ever need them.

    [font="Verdana"]Please don't go. The drones need you. They look up to you.[/font]
    Connect to me on LinkedIn

  • ronmoses@gmail.com

    SSCarpal Tunnel

    Points: 4480

    I had to spend more time researching cursors than SOUNDEX. I've never used one in my life. Not sure I would have used one here either. Seems like there should be a GROUP BY...HAVING approach that would work, though I'm too lazy to think about it much right now. 😀

    ron

    -----
    a haiku...

    NULL is not zero
    NULL is not an empty string
    NULL is the unknown

  • Revenant

    SSC-Forever

    Points: 42467

    This was absolutely vicious, Steve. I spent on it almost half an hour. 😉

    Thanks, I should have thought about this!

  • webrunner

    One Orange Chip

    Points: 29991

    Amazing question. Even more amazing is that I got it right. I guessed, admittedly. But I will run and study the code, because it's a really cool puzzle.

    By the way, is it possible to have a "SOUNDEX" type of function for images? Just wondering.

    Thanks,

    webrunner

    -------------------
    A SQL query walks into a bar and sees two tables. He walks up to them and says Can I join you?
    Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html

  • steve.jacobs

    SSCommitted

    Points: 1830

    webrunner (11/26/2013)


    Amazing question. Even more amazing is that I got it right. I guessed, admittedly. But I will run and study the code, because it's a really cool puzzle.

    By the way, is it possible to have a "SOUNDEX" type of function for images? Just wondering.

    Thanks,

    webrunner

    By definition, no. SOUNDEX acts upon characters, not BLOBS (or images). However, if you properly index your image (say, based on name or some other identity), you can use SOUNDEX and DIFFERNCE but, as explained in the answer, it is not guaranteed to return what you may think it should. Anything dealing with images, a custom CLR\Assembly would have to be added to the SQL solution if you wish to act directly upon the image.

  • steve.jacobs

    SSCommitted

    Points: 1830

    Revenant (11/26/2013)


    This was absolutely vicious, Steve. I spent on it almost half an hour. 😉

    Thanks, I should have thought about this!

    :-D. SOUNDEX (and DIFFERENCE) has always intrigued me because of the amount of code I have to write to meet business' needs and the potential both could be. Additionally, until Microsoft (SQL Server Team) gets it right, SOUNDEX and DIFFERENCE will be nothing more than "enhancements" to LIKE. :w00t:.

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply