A simple example which, after 15 years of relational design work, still breaks my brain - phone numbers

  • Every so often I find myself needing to build schema for something which has very little complexity in the "real world", but where the model of that part of the world becomes more and more difficult the more I think about it. Maybe this is a symptom of overthinking, but I like thinking.

    To wit, phone numbers. I will provide two obvious possible schemas. Then I will discuss why I find it almost impossible to logically choose between them.

    Schema 1: Each type of phone number gets its own column. I will refer to this as the "multi column" solution:


    create table contacts (
    -- some kind of contact key, 
     /* ... */,
     home_phone varchar(20) null,
     work_phone varchar(20) null,
     mobile_phone varchar(20) null,
     /* ... */


    Schema 2: Phone numbers are all kept in a single column, with a second column indicating type. I will refer to this as the "single column" solution:

    create table contact_phone_numbers (
     -- some kind of contact key
     phone_type char not null check (phone_type in ('H', 'W', 'M', /* .... */ ),
     phone_number varchar(20) not null
    )

    Practical considerations, which I will ignore:

    Clearly there are practical consequences of the choice. If, for example, I ask "Which contacts use this particular phone number?", the second schema is much easier to query.
    On the other hand, I want a report that shows contact name, home phone and mobile phone, the first schema is much easier to query.

    Based on years of experience, I do not believe it is possible to reliably predict the future requirements of the system. Maybe today most users are asking questions like the first one, but next week a report like the second example will be top priority.

    Theoretical considerations:

    Based on the premise that practical considerations cannot be reliably predicted, I generally try to base my designs on more theoretical considerations. Ie, which schema is more "relationally correct"? I think that this simple and common example demonstrates how hard it can be to answer that question.

    One might look at the multi column solution and suggest that it seems, prima facie, to violate the "spirit" of first normal form. We have a data element of a single domain which is repeated in the tuple. Commonly, the "repeating groups" antipattern.

    On the other hand, one might look at the single column solution and respond that it seems to look a lot like the dreaded EAV pattern.

    Which point of view is correct? To me, that depends on whether we say that different kinds of phone numbers are actually members of the same domain. But this is a very difficult question to answer. Let's presume that they are the same domain, because they are all phone numbers. Then we should go with the single column solution. Easy, problem solved, right?

    But wait! I can then apply similar reasoning to other parts of the model. My invoice lines, for example, might have both an "ex GST amount" and a "GST amount" column. But wait... those are both dollar amounts! So, per the reasoning of the previous paragraph, we should have a "dollar amounts" table, with each amount value accompanied by an additional value from another column indicating the type of dollar amount. Now the reasoning in the previous paragraph doesn't seem so convincing anymore, does it?

    Besides this, different phone numbers can have different validation rules. A valid pattern for a mobile phone number is different from that for a landline number. So at the very least, it seems like landline numbers and mobile numbers are different domains. But work landline numbers and home landline numbers are both landline numbers, so those are the same domain. Now we have an even more unfortunate schema, where we need *two* phone number tables... landline phone numbers and mobile phone numbers. And we are stll left with the "different types of dollar amount" objection.

    Can anyone provide a sound logical argument as to which of these schemas is relationally correct?

  • So what are the "ex GST amount" and a "GST amount" columns that you speak of?  In particular, what is the difference due to the "ex" in the first column name?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Tuesday, December 5, 2017 8:05 PM

    So what are the "ex GST amount" and a "GST amount" columns that you speak of?  In particular, what is the difference due to the "ex" in the first column name?

    It's just one possible example, but for context, in Australia many things have goods and services tax applied. More complete names might be "invoice_GST_exclusive_amount" and "invoice_GST_amount". Adding the two together would give the "invoice_GST_inclusive_amount" for the line. They're generally kept separate because the amount excluding GST is company revenue, the GST amount is paid to the government. And not everything is charged GST, so if you just kept a single "amount" value on the line, you would not be able to tell whether or not GST was in fact charged. But of course both must be applied, since the customer has to pay the sum of the two.

    I could use any number of similar examples. If I wanted to model a leg of a journey, for example, I need a start location and an end location. Does my "journey_legs" table therefore have two columns, one for start location and one for end location, or do I have a separate "journey_leg_locations" table, which is in a 2 to 1 cardinality relationship with my legs table, and have an additional type column constrained to "start" and "end"? In this case, the choice *not* to use a "single column" solution seems obvious.

  • Don Halloran - Tuesday, December 5, 2017 8:15 PM

    Jeff Moden - Tuesday, December 5, 2017 8:05 PM

    So what are the "ex GST amount" and a "GST amount" columns that you speak of?  In particular, what is the difference due to the "ex" in the first column name?

    It's just one possible example, but for context, in Australia many things have goods and services tax applied. More complete names might be "invoice_GST_exclusive_amount" and "invoice_GST_amount". Adding the two together would give the "invoice_GST_inclusive_amount" for the line. They're generally kept separate because the amount excluding GST is company revenue, the GST amount is paid to the government. And not everything is charged GST, so if you just kept a single "amount" value on the line, you would not be able to tell whether or not GST was in fact charged. But of course both must be applied, since the customer has to pay the sum of the two.

    I could use any number of similar examples. If I wanted to model a leg of a journey, for example, I need a start location and an end location. Does my "journey_legs" table therefore have two columns, one for start location and one for end location, or do I have a separate "journey_leg_locations" table, which is in a 2 to 1 cardinality relationship with my legs table, and have an additional type column constrained to "start" and "end"? In this case, the choice *not* to use a "single column" solution seems obvious.

    Thank you for the explanation.  I appreciate it.

    I don't know much about taxes for "Down Under" but in the U.S.A., taxes and the like are usually applied to the total value of the invoice rather than for each item on the invoice.  In such a case, I'd store the taxes on separate lines.  In other words, I build the invoice table in an EAV fashion where each amount listed on the invoice lives in a separate row in the table.  Any taxes or fees also live on their own rows.  So, for me, your example doesn't provide justification for listing multiple phone numbers on the same row of data. 

    That shouldn't be taken as a point of contention between you and I.  I'm just explaining the thought process that I have that justifies the use of a separate phone number table (one of the people that I work for has ~20 phone numbers).  Think of the row that identifies the individual or company as the "invoice header" table and the phone number table as the "invoice detail" table.

    I also do the same with addresses because even individuals can have more than one physical address and more than one shipping address (for example) and they can be temporally sensitive.  The same also holds true for email addresses and many other attributes of an individual, even including names.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Thursday, December 7, 2017 7:25 PM

    Thank you for the explanation.  I appreciate it.

    I don't know much about taxes for "Down Under" but in the U.S.A., taxes and the like are usually applied to the total value of the invoice rather than for each item on the invoice.  In such a case, I'd store the taxes on separate lines.  In other words, I build the invoice table in an EAV fashion where each amount listed on the invoice lives in a separate row in the table.  Any taxes or fees also live on their own rows.  So, for me, your example doesn't provide justification for listing multiple phone numbers on the same row of data. 

    That shouldn't be taken as a point of contention between you and I.  I'm just explaining the thought process that I have that justifies the use of a separate phone number table (one of the people that I work for has ~20 phone numbers).  Think of the row that identifies the individual or company as the "invoice header" table and the phone number table as the "invoice detail" table.

    I also do the same with addresses because even individuals can have more than one physical address and more than one shipping address (for example) and they can be temporally sensitive.  The same also holds true for email addresses and many other attributes of an individual, even including names.

    Yep, I tend to agree with your reasoning.

    What I usually end up with in this domain is a set of tables representing different contact protocols (phone numbers, email addresses and url's) which are subtypes of a more general contact point, and I associate contacts with sets of contact points. But that's normally a result of cardinality concerns: If a person can have more than one home phone number (for example), it becomes quite clear that at least distinct home phone numbers must be distinct rows, not distinct columns. This then renders the domain ambiguities somewhat moot. That's why I wanted to restrict the hypothetical example to one in which there is just the one home phone number, mobile, etc. It's certainly an artificial restriction in most cases, but it does sharpen the point. 

    A lot of systems I've seen do in fact model different phone numbers as different columns. Right now I'm thinking of Dynamics CRM, where a contact is composed of columns like phone1, phone2, address1_line1, address1_line2, address2_line1, etc. This, to me, is just obviously wrong. But there are certainly cases where things aren't nearly so clear!

  • I certainly agree that, if you can, store things such as phone numbers in a separate table if you're going to be storing multiple of them. You might well have an application that has options to store a client's Home, Mobile, Work, Fax, and Alternate contact number. It's unlikely you're going to be needing all of those for every client to start with, so some of those columns will be "wasted" space. Also, using a different table preserves Normal Form,

    On the subject of taxes, however, at least in the UK, I would rather store the price next to item, not at the transaction total. Not all items are taxable in the UK, so storing the value (before tax) and then the tax value makes more sense. Say, for example, you purchased a Book for £10.00 and a lamp at £20.00, the total tax would be £5.00. Without it being applied to the items you wouldn't know which item(s) it applied to. However, if I said that the Book was £10.00 with £0.00 Tax (as books are tax exempt), and the Lamp was £20.00 with £5.00 tax, this makes things easier.

    This is actually, really important in my line of work in insurance. Insurance can quite often be sold as "Co-insured". This means that several Insurers are insuring the same risk, however, each will be insuring a different percentage of it. So, for example, someone might have a High Value Household that they want insured, with a (rough) Sum Insured value of £25M. A single insurer may not be happy to insure the full risk, as a result Insurer A insures 60% of the risk (£15M), while Insurer B insures the remaining 40% (£10). The tax paid applicable needs to go to the applicable insurers, so that they can pay it. Doing to value at the full transaction wouldn't help, so the premiums would be split, with a separate value for the tax in the same line. 🙂 This becomes even more important if you start bringing in brokers and sub brokers; as you introduce Commission, which may be different for different insurers (depending on the agreement).

    I'm not saying that having Tax on a separate line is wrong, I just think that depends more on type of business and where you are in the world and how tax works there.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Thom A - Friday, December 8, 2017 3:34 AM

    On the subject of taxes, however, at least in the UK, I would rather store the price next to item, not at the transaction total. Not all items are taxable in the UK, so storing the value (before tax) and then the tax value makes more sense. Say, for example, you purchased a Book for £10.00 and a lamp at £20.00, the total tax would be £5.00. Without it being applied to the items you wouldn't know which item(s) it applied to. However, if I said that the Book was £10.00 with £0.00 Tax (as books are tax exempt), and the Lamp was £20.00 with £5.00 tax, this makes things easier.

    And, for extra design fun, there are multiple tax rates, depending on what's being sold...  So we can't just store "yes, this line has VAT", but we need to know which level / tier of tax is applicable...

    Thomas Rushton
    blog: https://thelonedba.wordpress.com

  • ThomasRushton - Friday, December 8, 2017 3:38 AM

    Thom A - Friday, December 8, 2017 3:34 AM

    On the subject of taxes, however, at least in the UK, I would rather store the price next to item, not at the transaction total. Not all items are taxable in the UK, so storing the value (before tax) and then the tax value makes more sense. Say, for example, you purchased a Book for £10.00 and a lamp at £20.00, the total tax would be £5.00. Without it being applied to the items you wouldn't know which item(s) it applied to. However, if I said that the Book was £10.00 with £0.00 Tax (as books are tax exempt), and the Lamp was £20.00 with £5.00 tax, this makes things easier.

    And, for extra design fun, there are multiple tax rates, depending on what's being sold...  So we can't just store "yes, this line has VAT", but we need to know which level / tier of tax is applicable...

    And then, just to add further matters, some areas of the UK are tax exempt! For example, if the risk is based in the Channel Islands, or the Isle of Man, then you don't pay IPT (notice I said the risk; so if you live in the Isle of Man but you're insuring a house in Suffolk, you have to pay IPT!).

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Thom A - Friday, December 8, 2017 3:34 AM

    I certainly agree that, if you can, store things such as phone numbers in a separate table if you're going to be storing multiple of them. You might well have an application that has options to store a client's Home, Mobile, Work, Fax, and Alternate contact number. It's unlikely you're going to be needing all of those for every client to start with, so some of those columns will be "wasted" space. Also, using a different table preserves Normal Form,

    On the subject of taxes, however, at least in the UK, I would rather store the price next to item, not at the transaction total. Not all items are taxable in the UK, so storing the value (before tax) and then the tax value makes more sense. Say, for example, you purchased a Book for £10.00 and a lamp at £20.00, the total tax would be £5.00. Without it being applied to the items you wouldn't know which item(s) it applied to. However, if I said that the Book was £10.00 with £0.00 Tax (as books are tax exempt), and the Lamp was £20.00 with £5.00 tax, this makes things easier.

    This is actually, really important in my line of work in insurance. Insurance can quite often be sold as "Co-insured". This means that several Insurers are insuring the same risk, however, each will be insuring a different percentage of it. So, for example, someone might have a High Value Household that they want insured, with a (rough) Sum Insured value of £25M. A single insurer may not be happy to insure the full risk, as a result Insurer A insures 60% of the risk (£15M), while Insurer B insures the remaining 40% (£10). The tax paid applicable needs to go to the applicable insurers, so that they can pay it. Doing to value at the full transaction wouldn't help, so the premiums would be split, with a separate value for the tax in the same line. 🙂 This becomes even more important if you start bringing in brokers and sub brokers; as you introduce Commission, which may be different for different insurers (depending on the agreement).

    I'm not saying that having Tax on a separate line is wrong, I just think that depends more on type of business and where you are in the world and how tax works there.

    Understood but if you have a more conventional invoice (or receipt), I would think that it would be better to aggregate the groups of taxable items to avoid the "penny rounding" and calculate the appropriate taxes against the total invoice rather than individual line items.  Of course, business requirements may vary and either case could be true.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I would go for the normalized form, whenever possible (it depends applies but it's for exceptions).
    Integrity is my main concern plus it's easily extendable without much maintenance overhead, take less storage space, devs don't have to deal with null values (add overhead in the application layer), don't have to deal with empty string - default empty phone number (000-0000 for instance) and null are equaling the absence of, easier to maintain, and more than often a database is a "shared unit" queried from multiple sources. So modifying schema afterward is often very risky. More gain than loss for the small additional time to write at first.

    Of course the solutions with "contacts" only is easier to query for devs. In those situations (having devs difficulty writing t-sql), an "external interface" view (or sp) will do the required joins for easy querying while preserving the underlying schema safe.

    Also another point I feel important is what's the smallest indivisible unit a data can go before being unneeded by the business. What the phone numbers represent for the business. Is there any business value to know it's a landline over a cell phone or an IP phone? Does it bring more values distinguishing what type of phone it is? A phone number for a carrier is far more different from a business perspective than for another company that need to collect a phone number whatever it is because it needs to and won't do anything with it later.

    my 2 cents

  • Having a separate table with ContactID, PhoneNumber and PhoneType will lend itself to easier maintenance as different types of communication are invented.  And will eliminate all that wasted space when no one has a home land line* any more.

    But you need to carefully define the PhoneType domain.  "Cell" may not be specific enough - is it a personal cell, a business-supplied phone, something else?  (Of course this applies to the multiple column in one table approach as well.)

    You also have to consider what the "front end" will be showing.  Someone could potentially have 10 or more phone numbers; how much real estate are you going to take up to display that?

    Someone asked if the business needed to know whether it's a land line or cell phone (or something else).  Many times that is essential to know.  Various laws and regulations apply to who can call different types of phones, at what time of day, how many times, etc.

    * - I hate online registrations that require a home phone.  I don't have one.  So I put my cell number in that field.  Now, what should I put in the cell field?

  • The contact_phone_numbers table looks to me like the best of the two options proposed but I would say the relationship between between numbers and types should be slightly different. Aren't these really "tags" and a given phone number could be tagged with multiple of them? E.g. phone number that is "work" and "mobile" or "work" and "home".

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply