Proper Storage and Handling of Personal Names

  • szeitzp (8/4/2016)


    1. The full_name column may not a practical one. It is brings more problem than profit, if you think about the possible sequences of the first_name and last_name it is obvious (and not just from that aspect).

    2. I'm Hungarian and I have a very interesting problem. If a women is married on Hungary she can chose the 'married name'. This may her original name without any change (a) or her husband surname and her given name (b) or her original surname + her husband surname + her original given name (c) or her husband surname + her original surname + her original given name (c2) or (the most crazy) if her husband has a title (dr. etc) that too (C3) and if she has one it is too (C4) and from other variants and if these are not enough the order of the names is reverse (like the Korean or Japanese) i.e. surname given name the right order. (Example if these was too difficult: An ordinary Hungarian woman name is Nagy Katalin (the Nagy is the surname!), if she has a title (dr. what mean he is a physician) her name (in Hungarian order) dr. Nagy Katalin, if her husband name is dr. Kiss Endre she can chose the next alternatives: dr. Nagy Katalin, dr. Kiss Katalin, dr. Kissné Nagy Katalin, dr. Kissné dr. Nagy Katalin (!), dr. Kiss Endréné dr. Nagy Katalin (!). These are not the all variant but the main versions. :alien:) The problem (if it not obvious for now...): which is which? Which is the surname which is the given name, which is the middle name(s)? This is not only a problem when you in an English database but in a Hungarian too. (And I didn't mention the need of the Unicode...). If you want to ordering these kind of names it is not an easy task but even the storage is a big problem. (And I can write from the old Hungarian names, which is can more complex...:angry: ). 😉

    Why must we split names into parts at all? The way I see it, it may be better to simply request a full name, and a short name that we would use to address the person.

  • robert.sterbal 56890 (8/4/2016)


    I would not want to type the column prefix column names you suggested.

    For an article thorough about a lot of things, the assumption that you would name columns that way takes away from your argument.

    I often, but not always, use the prefix to indicate the table of origin, when the column is referenced out of context. It's a matter of taste, and taste is important to me. 😉

    Robert, what naming convention do you use?

  • There are a lot of arguments about how to name columns. Most people prefer column names without prefixes, but there isn't a standard.

    If you are promoting standards, you should acknowledge the issue inline in your article.

    412-977-3526 call/text

  • Mononymics are fairly straightforward as far as I'm concerned.

    The mononymic would go in the given/first name and "N/A" would go in the family name.

    Also, nulls are abominations, since you need to know if a name was not entered yet (TBD), not applicable (N/A) or verified as unknown/unknowable (UNK). The last case, for instance, in "John Doe" type situations, where the person can't be asked and no one else knows--or can ever find out (a deceased and unidentified person, for example). Missing middle names can use "N/A" or "UNK" if not applicable/unknown/none of the database's business. 😛

    As for the poor Hungarian poster, my condolences! In that case you'd almost certainly need a "mode" field of some kind, to indicate what kind of name you're dealing with.

    Another case I didn't see mentioned was "called-by". People named Robert, for instance might be called "Bob", "Rob", "Robert", etc. In my company that's actually a problem so I have to make allowances for it. Our company president is pretty insistant that called-by names be used for email, for instance, and on internal reports.

    Now I'm thinking about how you could build a Name table to handle all cases. (laughing).

  • Ernie Schlangen (8/4/2016)


    Allow me to propose an alternative. One of the primary reasons Western databases store names in multiple fields (given_name, surname, title, etc.) is to allow for the generation of formal and informal ways to address the person. Perhaps that's what we need: a simple "name" field, plus a formal_name and informal_name fields of sufficient length. This imposes additional burden at entry but should accommodate the different cultural aspects discussed in the article with the exception of the native language issue which would require duplicating all three fields to allow for native language versions. The only thing that is still missing (for Western cultures with multiple names) is a way to sort by surname, which is the norm most of the time. Thoughts?

    Edit 1: added colon to make sentence more readable

    Edit 2: added sort by surname sentence

    I think this is probably the best way to deal with names these days. Now, as long as the name does not include titles, I think we should rethink the idea of sorting by surname, especially seeing that there are significant numbers of folks who don't have a surname (like Icelanders).

    Please see:

    https://en.wikipedia.org/wiki/Icelandic_name

    Tung

  • Tom Gillies (8/4/2016)


    That's definitely an interesting and useful suggestion. Tung has pointed out a real problem. The difficulty with dealing with it in the abstract is that different solutions work in different situations. The "formal" and "informal" name approach would work very well in many situations and across cultures. I'm not sure it would necessarily give you the "grouping" that you get with Western family names but that may not matter.

    I've been thinking about this problem, and I would start with a number of questions: 1) What is the geographic/cultural scope of the proposed system/database? - If we are confining ourselves to one culture, then our job is simpler. 2) How different are the various cultures concerned? 3) What are the numbers/proportions involved? 4) How much does it matter to the people/business involved? - Might it cause grave offence or do they not care. 5) Is there a political aspect to this? - "Colonial legacy" or minority ethnic groups 6) Are there features of names that we want to use that we haven't been thinking about - Like grouping with family names. It goes on and on...

    ...the thing is, this can either be important, or it can be a complete waste of time. We need to decide if it matters, how much it matters and how much effort we want to put into it. We can only really do that for a particular case. I'm not sure if it is possible to come up with a implementable "general" solution.

    If the names you are dealing with are not important, as in a survey in which individuals don't really need to be identified personally, that's one thing; if each name represents a customer or program member, then it becomes a high priority.

  • You certainly don't want to just store the entire name in one long, indistinguishable string. Trying to accurately parse that out later is virtually impossible.

    As to the different rules/formats, just add a "name_style_code" or "name_format_code" -- a smallint should do nicely -- that has the name formation rules: first name first, last name first, etc..

    [Btw, I too found the use of the "cst_" prefix to be so obsolete and poor a practice as to take away from the force of the original article.]

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • You raise many valid points, but you miss a few key ones related to capitalization:

    1. Lower case letters beginning a name, such as "d'Angelio"

    2. Split surnames, such a "von Doenitz" (This is Latin-ized. The correct spelling is "D o(with umlaut) n i t z")

    3. Unusual punctuation characters, such a "Zu!tic" - when pronounced, the "!" is a verbal click

    While some of these are not storage issues, per se, the way they are stored affects later processing using the names. Comparing names from disparate systems often raises various issues, such as:

    1. Is "d'Angelio" the same as "DANGELIO"?

    2. Is "de la Fayette" the same as "LAFAYETTE"?

    3. Is "von Helsing" the same as "VONHELSING" or just "HELSING"? In the latter case, the originating system sliced off the "von" as a middle name!

    The answer to all of these examples is "Probably."

    Middle names can often be an important discriminator. Consider VICTOR IGNATIUS SMITH and his son VICTOR IGOR SMITH. Their legal names are distinct; thus, there is no suffix. But abbreviating the middle name to a single letter yields VICTOR I SMITH and, if they reside at the same address, may cause infinite mischief during later processing.

    With these thoughts in mind, it is important to know how these names will be used later, before defining the structure that will hold them.

  • roger.plowman (8/4/2016)


    As for the poor Hungarian poster, my condolences! In that case you'd almost certainly need a "mode" field of some kind, to indicate what kind of name you're dealing with.

    Another case I didn't see mentioned was "called-by". People named Robert, for instance might be called "Bob", "Rob", "Robert", etc. In my company that's actually a problem so I have to make allowances for it. Our company president is pretty insistant that called-by names be used for email, for instance, and on internal reports.

    Now I'm thinking about how you could build a Name table to handle all cases. (laughing).

    Thanks! :crying: 😀

  • Robert Domitz (8/4/2016)

    3. Is "von Helsing" the same as "VONHELSING" or just "HELSING"? In the latter case, the originating system sliced off the "von" as a middle name!

    The answer to all of these examples is "Probably."

    For that specific case, almost certainly not. In German, in particular, the "von" prefix was a status marking, a "nobility" prefix, of a sort. Iow, the dude's name was just "Helsing", but "von" was added before it as a sign of nobility/influence/prestige. Thus, "Helsing" and "Vonhelsing" are absolutely not similar names.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • I deal with the variable field values by A) making each name component field a varchar(50), B) setting the field to NOT NULL, and C) providing a default value of "" (empty string). This specifically assumes that at the database level, all name components are optional, and I leave it to the application to enforce required vs. non-required fields. My main objective is to prevent NULLs in the data that have to be handled separately (which, interestingly, is the topic of the SSC email that linked to this post).

    J Pratt

  • ScottPletcher (8/4/2016)


    You certainly don't want to just store the entire name in one long, indistinguishable string. Trying to accurately parse that out later is virtually impossible.

    If there's a need to parse the name, it would be rather silly to only store it in one long string. But I would really consider, depending upon your specific needs, if having "name parts" is really necessary. Sometimes it is; sometimes not.

    As to the different rules/formats, just add a "name_style_code" or "name_format_code" -- a smallint should do nicely -- that has the name formation rules: first name first, last name first, etc..

    [Btw, I too found the use of the "cst_" prefix to be so obsolete and poor a practice as to take away from the force of the original article.]

    I would like to hear your input on why you don't like the prefix, and what you prefer. I think this also depends on the situation; in a smaller database, the prefix is not as useful, but if you have a couple hundred tables, that's a different situation.

  • Yet Another DBA (8/4/2016)


    Rick-153145 (8/4/2016)


    Just an aside to this, under data protection laws (at least here in the UK), personal data should be stored encrypted, so if that is the case, it really doesn't matter at the DB level as it would ideally be salt and hashed.

    PII data needs to be protected inappropriate use & dissemination which doesn't always mean that the data needs to be encrypted.

    A business customer doesn't need their name encrypted, as it is not personal.

    The name and delivery address for a customer doesn't need to be encrypted as how will the postman every deliver the item. What the customer buys may be deemed to be confidential and may warrant a plain packaging and the postman will not need to know the contents other than handling instructions.

    That depends, the rule for the clients we have in my current company ask for names and phone numbers to be encrypted, this we have been told is due to the changes to data protection after all the recent hacking that has taken place, I will admit that its a ball ache, but protects us as much as our clients and no, its nothing dodgy, our biggest client is a very well known name.

  • I would settle for having my last name properly spelled. There's some low-hanging fruit.

  • Stephanie Giovannini (8/4/2016)


    I would settle for having my last name properly spelled. There's some low-hanging fruit.

    :-D:-D Oh how true! No matter what way you structure it.

    Tom Gillies LinkedIn Profilewww.DuhallowGreyGeek.com[/url]

Viewing 15 posts - 16 through 30 (of 38 total)

You must be logged in to reply to this topic. Login to reply