smart first name matching in TSQL

  • Hello experts,

    I looking for script (actually for data) for smart first name conversion.

    I.e. if user enter William or Billy then script should return Bill, if user enter Alexander, Aleks, Sasha then script should return Alex, etc.

    Actually I can write SQL to do it, but I can't find list of all possible first names.

    Is anybody have it?

    Much thanks, Alex.

  • Google "NickNames Database" and you will find a number of sources, for common English names. But there is no universal list of ALL first names. In my lifetime, I've seen such unique names as Moon Unit and Will.I.Am.

    By the way, William is the proper first name. Bill, Billy, Will, and Willy are nicknames for William. This isn't to be confused with the female name "Billie"... or with people who were named simply Bill by their parents. 🙂

    Having looked at this problem, it seems that we can't just arbitrarily settle on a single name. We have to generate all combinations to search on.

    Good luck.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Here you go...

    p.s.

    I got this off the internet ages ago but I can't remember where! Anyway props to the people who compiled it 🙂

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

  • Here are an additional 2600 nicknames in my attachment to add to the attachment in the post above.

    My recommendation is to separate male from female names. Also use a flattened structure; nicknames are not hierarchical. Legal name to nickname is a many-to-many relationship.

    I store a soundex code with each name for indexed use.

  • May I humbly ask why you want to do this in the first place? My first name is Jan, which is both Dutch, Afrikaans and Polish (and probably exists in a few more languages), but it is an abbreviation of Johan/Johannes/Johannis/Ioannis, which could be Jean in French, Juan in Spanish, Ian in Irish/Scottish, and whatever else. The possible derivations will go down endlessly the further you go down the tree.

    Edit: Forgot Afrikaans, sorry Host Country!

    --------------------------------------------------------------------------
    A little knowledge is a dangerous thing (Alexander Pope)
    In order for us to help you as efficiently as possible, please read this before posting (courtesy of Jeff Moden)[/url]

  • And your own name, Alex, lends itself to the same exercise 🙂 (BTW, my oldest son's name is also Alex).

    --------------------------------------------------------------------------
    A little knowledge is a dangerous thing (Alexander Pope)
    In order for us to help you as efficiently as possible, please read this before posting (courtesy of Jeff Moden)[/url]

  • Jan Van der Eecken (6/18/2013)


    May I humbly ask why you want to do this in the first place? My first name is Jan, which is both Dutch, Afrikaans and Polish (and probably exists in a few more languages), but it is an abbreviation of Johan/Johannes/Johannis/Ioannis, which could be Jean in French, Juan in Spanish, Ian in Irish/Scottish, and whatever else. The possible derivations will go down endlessly the further you go down the tree.

    We do not need actually all countries / all languages abbreviation, Usa / English will be enough.

    We have clients database with first name / last name, so when user try to add new client we need to check exists clients and offer to choose already created clients by search them by First / Last name. We accept that user can do mistakes and just enter i.e. Alezander instead Alexander, but if user enter Alexander and there already user with name Alex (with same Last Name) we should offer user to choose correct client from list.

  • Jan Van der Eecken (6/18/2013)


    May I humbly ask why you want to do this in the first place?

    This can be useful when performing record linkage and importing data into a Single Customer View application

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

  • Jan Van der Eecken (6/18/2013)


    May I humbly ask why you want to do this in the first place? My first name is Jan, which is both Dutch, Afrikaans and Polish (and probably exists in a few more languages), but it is an abbreviation of Johan/Johannes/Johannis/Ioannis, which could be Jean in French, Juan in Spanish, Ian in Irish/Scottish, and whatever else. The possible derivations will go down endlessly the further you go down the tree.

    Edit: Forgot Afrikaans, sorry Host Country!

    I'd echo the sentiment of "why" however Ian is the English spelling - the Scottish variant is Iain - both are legal names.

    The other problem the op will run up against is that one name may be the diminutive form of multiple full names. e.g. take a man commonly known as Al - is this a short form of Alan, Alain, Allan, Allen, Alun, Alfred, Alfredo, Albert, Alphonse, Alphonso, Alexander - or maybe his legal name actually is just Al.

    If he is planning on using this to identify when a person may already be held in the database then fine but don't autocorrect names - it will mess up and will upset users when they are called by the wrong name.

  • I am going to add one more wrinkle... initials. Bill Smith may actually be J. W. Smith, but the logic you are talking about would eliminate that from consideration. IMHO, if you present a pick list to choose from, you need to be careful about eliminating any possible candidates, because the end user will assume you are showing them everything. It may be safer to just select based on last name only and order by first name.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Yes, I think it will be 2 options: use "smart" first name matching or by last name only

  • crmitchell (6/19/2013)

    ...the Scottish variant is Iain - both are legal names.

    My apologies 😉

    --------------------------------------------------------------------------
    A little knowledge is a dangerous thing (Alexander Pope)
    In order for us to help you as efficiently as possible, please read this before posting (courtesy of Jeff Moden)[/url]

  • onixsoft (6/19/2013)


    Yes, I think it will be 2 options: use "smart" first name matching or by last name only

    Even last name won't be accurate. My surname is sometimes spelled in three words, sometimes two, or even in one word. And then there may be a "C" in it or not.

    --------------------------------------------------------------------------
    A little knowledge is a dangerous thing (Alexander Pope)
    In order for us to help you as efficiently as possible, please read this before posting (courtesy of Jeff Moden)[/url]

  • Jan Van der Eecken (6/19/2013)


    Even last name won't be accurate. My surname is sometimes spelled in three words, sometimes two, or even in one word. And then there may be a "C" in it or not.

    to be honest I always thought the v was lowercase 🙂 but I try to always assume the person knows how their own name should be spelt better than me.

    The same issue would also arise with the German von and the French Le or De and in those cases they may be shortened to l' or d'

    Then these's Mac which may also be Mc - with the c either normal size or presented as a superscript character.

    And good luck dealing with the various accented characters. :w00t:

  • Hehe,

    I'm born Flemish, in other words Belgian. There the lower case 'v' in 'Van' means you've got some royal blood somewhere, maybe generations ago. So I'm obviously not in that league, I just come from the oak tree. In the Netherlands they don't make that distinction. So they are all kings 🙂

    --------------------------------------------------------------------------
    A little knowledge is a dangerous thing (Alexander Pope)
    In order for us to help you as efficiently as possible, please read this before posting (courtesy of Jeff Moden)[/url]

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply