How does SQL pronounce SQL?

  • While I wouldn't trust the results without checking them by eye (or expect all cases to be found) DIFFERENCE can be used as a quick and dirty way of identifying a short list of possible duplicate records.

    SELECT P1.*, P2.*

    FROM People AS P1

    INNER JOIN People AS P2

    ON P1.Date_of_Birth = P2.Date_of_Birth

    AND P1.Last_Name = P2.Last_Name

    WHERE P1.Person_ID <> P2.Person_ID

    AND DIFFERENCE(P1.First_Name,P2.First_Name) = 4

    Fuzzy grouping through SSIS is a lot more sophisticated but the above query (or variants swapping first and last name) will give a quick idea of what you're facing very quickly.

  • BarneyL (6/28/2013)


    While I wouldn't trust the results without checking them by eye (or expect all cases to be found) DIFFERENCE can be used as a quick and dirty way of identifying a short list of possible duplicate records.

    I actually had to script something with a similar objective quite recently, and it didn't even occur to me to use this approach. I think I'll revisit that, thanks!

    ron

    -----
    a haiku...

    NULL is not zero
    NULL is not an empty string
    NULL is the unknown

  • Thanks Ron, awesome question.

    ----------------------------------------------------------------------------
    Sacramento SQL Server users group - http://sac.sqlpass.org
    Follow me on Twitter - @SQLDCH
    ----------------------------------------------------------------------------

    Yeah, well...The Dude abides.
  • nice question..

    Thanks..

  • Thanks - really cool question.

    - webrunner

    -------------------
    A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
    Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html

  • ronmoses (6/28/2013)


    It doesn't strike me as a particularly useful function, especially given these examples:

    SELECT SOUNDEX('cent') --C530

    SELECT SOUNDEX('scent') --S253

    SELECT DIFFERENCE('cent','scent') --2, a very weak match

    SELECT SOUNDEX('through') --T620

    SELECT SOUNDEX('threw') --T600

    SELECT DIFFERENCE('through','threw') --3, a fair match

    --an attempt at regional diplomacy?

    SELECT SOUNDEX('route') --R300

    SELECT SOUNDEX('root') --R300

    SELECT SOUNDEX('rout') --R300

    SELECT DIFFERENCE('root','rout') --4, a perfect match

    SELECT DIFFERENCE('root','route') --4, a perfect match

    SELECT DIFFERENCE('rout','route') --4, a perfect match

    --let's see if the US and Canada can put this to rest once and for all:

    SELECT SOUNDEX('about') --A130

    SELECT SOUNDEX('aboot') --A130

    SELECT SOUNDEX('abowt') --A130

    SELECT SOUNDEX('abeet') --A130

    --all of these will DIFFERENCE with a result of 4, of course

    So that seems pretty useless.

    ron

    Well, I guess it depends on what variety of English you speak. I imagine there really is somewhere where people pronounce English words the way soundex and difference seem to indicate, but it isn't anywhere I've ever been. If it has to be used outside of that place (wherever it is) your "pretty useless" seems rather accurate. But you made a good question out of it despite that, although it's maybe a bit too culture-specific for anyone who cares about the point. And I hadn't come across the four "about"s before I saw your comment.

    Tom

  • thanks for the question.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Fun question. I have never found a good use for these functions either. Your post with all the other examples clearly illustrates how useless it really is. I would like to see some cases where these functions can be used and actually relied on for accurate results. (Accurate to humans that is). Maybe Difference works better with some other languages?

  • BarneyL (6/28/2013)


    While I wouldn't trust the results without checking them by eye (or expect all cases to be found) DIFFERENCE can be used as a quick and dirty way of identifying a short list of possible duplicate records.

    SELECT P1.*, P2.*

    FROM People AS P1

    INNER JOIN People AS P2

    ON P1.Date_of_Birth = P2.Date_of_Birth

    AND P1.Last_Name = P2.Last_Name

    WHERE P1.Person_ID <> P2.Person_ID

    AND DIFFERENCE(P1.First_Name,P2.First_Name) = 4

    Fuzzy grouping through SSIS is a lot more sophisticated but the above query (or variants swapping first and last name) will give a quick idea of what you're facing very quickly.

    Although I did not know about the DIFFERENCE function, I have been using SOUNDEX for many years, for the same purpose as above. We always review possible matches, but it has been very useful.

  • ronmoses (6/28/2013)


    It doesn't strike me as a particularly useful function, especially given these examples:

    SOUNDEX and DIFFERENCE can be useful, but you have to know their specifications. Here is a link to the SOUNDEX algorithm:

    http://en.wikipedia.org/wiki/Soundex

    As you see, vowels (except when in the first position) are ignored. So the title of the post is misleading, the perfect match between SQL and sequel is caused by them being the same after removing all vowels, not because of anything pronounciation related.

    SELECT DIFFERENCE('SQL','soooooooooquiiiiiaaaaaaaoooiiuueeoioueel')


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo Kornelis (6/30/2013)


    ronmoses (6/28/2013)


    It doesn't strike me as a particularly useful function, especially given these examples:

    SOUNDEX and DIFFERENCE can be useful, but you have to know their specifications. Here is a link to the SOUNDEX algorithm:

    http://en.wikipedia.org/wiki/Soundex

    As you see, vowels (except when in the first position) are ignored. So the title of the post is misleading, the perfect match between SQL and sequel is caused by them being the same after removing all vowels, not because of anything pronounciation related.

    SELECT DIFFERENCE('SQL','soooooooooquiiiiiaaaaaaaoooiiuueeoioueel')

    It doesn't just ignore vowels (except the first letter of the word), the treatment of consonants isn't very helpful either.

    SELECT DIFFERENCE('sql','sjzcgjkqsxzlll')

    Anyway, this question is clearly about soundex, a function that's been around a lot longer than SQL has, so if this is a fair QOTD would questions about mathematical functions that are used in SQL be fair game? I suspect they would be allowed by the editor if someone produced something as amusing as this question; but it's hard to think of amusing questions about SQRT or POWER. It's fairly easy to produce something that looks amusing but actually isn't (because the only sensible way to answer it is run the code, it's far too difficult to do by hand: for example if you run

    select round(sqrt(6),2) as A, round(power(88.27,0.2),2) as B, round(power(88.3,0.2),2) as C,

    round(power(89.17,0.2),2) as D, round(power(87.40,0.2),2) as E, round(sqrt(6),3) as Fwhich columns in the resulting row have the highest and lowest numbers in them? C has the highest - a side-effect of the target type for the implicit conversion, I believe, although BOL seems to disagree, and F the lowest (that's because of the rounding). Of course an even worse (because it's far too hard and you can't just run the code) question could be devised for soundex, eg how many words in standard English have soundex code 520 (possible answers 0, 3 /2/1, 4/5/6, 7/8/9, 10/11/12, 13, 14, and 15, and 16 or more; it's certainly at least 10, but working out whether it's more than 15 would probably require a search through the h section of the OED; including proper nouns like Hanse (from which we get the adjective "hanseatic", which of course has a different soundex code) would lead to objections from people who don't realise that soundex was originally designed specifically for proper nouns so it would be bizarre to exclude them, while excluding them would annoy those who are familiar with soundex and its uses; and the cultural bias in favour of peope with native or bilingual fluency in English in a question like that would be pretty extreme.

    Tom

  • Hugo Kornelis (6/30/2013)


    So the title of the post is misleading

    Not intentionally so, of course 🙂

    Thank you for the added information, those results make perfect sense now.

    ron

    -----
    a haiku...

    NULL is not zero
    NULL is not an empty string
    NULL is the unknown

  • Nice one......

  • Good one...+1

  • Hugo Kornelis (6/30/2013)


    ronmoses (6/28/2013)


    It doesn't strike me as a particularly useful function, especially given these examples:

    SOUNDEX and DIFFERENCE can be useful, but you have to know their specifications. Here is a link to the SOUNDEX algorithm:

    http://en.wikipedia.org/wiki/Soundex

    Armed with this knowledge, I just updated a script I'd been using for several months and improved its effectiveness immeasurably. So I'm very glad I ignorantly threw this question out to the world. (Especially to Hugo.) 😀

    thanks!

    ron

    -----
    a haiku...

    NULL is not zero
    NULL is not an empty string
    NULL is the unknown

Viewing 15 posts - 16 through 30 (of 43 total)

You must be logged in to reply to this topic. Login to reply