SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


A simple example which, after 15 years of relational design work, still breaks my brain - phone...


A simple example which, after 15 years of relational design work, still breaks my brain - phone numbers

Author
Message
Don Halloran
Don Halloran
SSCarpal Tunnel
SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)

Group: General Forum Members
Points: 4205 Visits: 1885
Every so often I find myself needing to build schema for something which has very little complexity in the "real world", but where the model of that part of the world becomes more and more difficult the more I think about it. Maybe this is a symptom of overthinking, but I like thinking.

To wit, phone numbers. I will provide two obvious possible schemas. Then I will discuss why I find it almost impossible to logically choose between them.

Schema 1: Each type of phone number gets its own column. I will refer to this as the "multi column" solution:


create table contacts (
-- some kind of contact key,
/* ... */,
home_phone varchar(20) null,
work_phone varchar(20) null,
mobile_phone varchar(20) null,
/* ... */


Schema 2: Phone numbers are all kept in a single column, with a second column indicating type. I will refer to this as the "single column" solution:

create table contact_phone_numbers (
-- some kind of contact key
phone_type char not null check (phone_type in ('H', 'W', 'M', /* .... */ ),
phone_number varchar(20) not null
)


Practical considerations, which I will ignore:

Clearly there are practical consequences of the choice. If, for example, I ask "Which contacts use this particular phone number?", the second schema is much easier to query.
On the other hand, I want a report that shows contact name, home phone and mobile phone, the first schema is much easier to query.

Based on years of experience, I do not believe it is possible to reliably predict the future requirements of the system. Maybe today most users are asking questions like the first one, but next week a report like the second example will be top priority.

Theoretical considerations:

Based on the premise that practical considerations cannot be reliably predicted, I generally try to base my designs on more theoretical considerations. Ie, which schema is more "relationally correct"? I think that this simple and common example demonstrates how hard it can be to answer that question.

One might look at the multi column solution and suggest that it seems, prima facie, to violate the "spirit" of first normal form. We have a data element of a single domain which is repeated in the tuple. Commonly, the "repeating groups" antipattern.

On the other hand, one might look at the single column solution and respond that it seems to look a lot like the dreaded EAV pattern.

Which point of view is correct? To me, that depends on whether we say that different kinds of phone numbers are actually members of the same domain. But this is a very difficult question to answer. Let's presume that they are the same domain, because they are all phone numbers. Then we should go with the single column solution. Easy, problem solved, right?

But wait! I can then apply similar reasoning to other parts of the model. My invoice lines, for example, might have both an "ex GST amount" and a "GST amount" column. But wait... those are both dollar amounts! So, per the reasoning of the previous paragraph, we should have a "dollar amounts" table, with each amount value accompanied by an additional value from another column indicating the type of dollar amount. Now the reasoning in the previous paragraph doesn't seem so convincing anymore, does it?

Besides this, different phone numbers can have different validation rules. A valid pattern for a mobile phone number is different from that for a landline number. So at the very least, it seems like landline numbers and mobile numbers are different domains. But work landline numbers and home landline numbers are both landline numbers, so those are the same domain. Now we have an even more unfortunate schema, where we need *two* phone number tables... landline phone numbers and mobile phone numbers. And we are stll left with the "different types of dollar amount" objection.

Can anyone provide a sound logical argument as to which of these schemas is relationally correct?

Blog on sqlservercentral
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (681K reputation)SSC Guru (681K reputation)SSC Guru (681K reputation)SSC Guru (681K reputation)SSC Guru (681K reputation)SSC Guru (681K reputation)SSC Guru (681K reputation)SSC Guru (681K reputation)

Group: General Forum Members
Points: 681225 Visits: 45603
So what are the "ex GST amount" and a "GST amount" columns that you speak of? In particular, what is the difference due to the "ex" in the first column name?

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Don Halloran
Don Halloran
SSCarpal Tunnel
SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)

Group: General Forum Members
Points: 4205 Visits: 1885
Jeff Moden - Tuesday, December 5, 2017 8:05 PM
So what are the "ex GST amount" and a "GST amount" columns that you speak of? In particular, what is the difference due to the "ex" in the first column name?

It's just one possible example, but for context, in Australia many things have goods and services tax applied. More complete names might be "invoice_GST_exclusive_amount" and "invoice_GST_amount". Adding the two together would give the "invoice_GST_inclusive_amount" for the line. They're generally kept separate because the amount excluding GST is company revenue, the GST amount is paid to the government. And not everything is charged GST, so if you just kept a single "amount" value on the line, you would not be able to tell whether or not GST was in fact charged. But of course both must be applied, since the customer has to pay the sum of the two.

I could use any number of similar examples. If I wanted to model a leg of a journey, for example, I need a start location and an end location. Does my "journey_legs" table therefore have two columns, one for start location and one for end location, or do I have a separate "journey_leg_locations" table, which is in a 2 to 1 cardinality relationship with my legs table, and have an additional type column constrained to "start" and "end"? In this case, the choice *not* to use a "single column" solution seems obvious.


Blog on sqlservercentral
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (681K reputation)SSC Guru (681K reputation)SSC Guru (681K reputation)SSC Guru (681K reputation)SSC Guru (681K reputation)SSC Guru (681K reputation)SSC Guru (681K reputation)SSC Guru (681K reputation)

Group: General Forum Members
Points: 681225 Visits: 45603
Don Halloran - Tuesday, December 5, 2017 8:15 PM
Jeff Moden - Tuesday, December 5, 2017 8:05 PM
So what are the "ex GST amount" and a "GST amount" columns that you speak of? In particular, what is the difference due to the "ex" in the first column name?

It's just one possible example, but for context, in Australia many things have goods and services tax applied. More complete names might be "invoice_GST_exclusive_amount" and "invoice_GST_amount". Adding the two together would give the "invoice_GST_inclusive_amount" for the line. They're generally kept separate because the amount excluding GST is company revenue, the GST amount is paid to the government. And not everything is charged GST, so if you just kept a single "amount" value on the line, you would not be able to tell whether or not GST was in fact charged. But of course both must be applied, since the customer has to pay the sum of the two.

I could use any number of similar examples. If I wanted to model a leg of a journey, for example, I need a start location and an end location. Does my "journey_legs" table therefore have two columns, one for start location and one for end location, or do I have a separate "journey_leg_locations" table, which is in a 2 to 1 cardinality relationship with my legs table, and have an additional type column constrained to "start" and "end"? In this case, the choice *not* to use a "single column" solution seems obvious.


Thank you for the explanation. I appreciate it.

I don't know much about taxes for "Down Under" but in the U.S.A., taxes and the like are usually applied to the total value of the invoice rather than for each item on the invoice. In such a case, I'd store the taxes on separate lines. In other words, I build the invoice table in an EAV fashion where each amount listed on the invoice lives in a separate row in the table. Any taxes or fees also live on their own rows. So, for me, your example doesn't provide justification for listing multiple phone numbers on the same row of data.

That shouldn't be taken as a point of contention between you and I. I'm just explaining the thought process that I have that justifies the use of a separate phone number table (one of the people that I work for has ~20 phone numbers). Think of the row that identifies the individual or company as the "invoice header" table and the phone number table as the "invoice detail" table.

I also do the same with addresses because even individuals can have more than one physical address and more than one shipping address (for example) and they can be temporally sensitive. The same also holds true for email addresses and many other attributes of an individual, even including names.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Don Halloran
Don Halloran
SSCarpal Tunnel
SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)SSCarpal Tunnel (4.2K reputation)

Group: General Forum Members
Points: 4205 Visits: 1885
Jeff Moden - Thursday, December 7, 2017 7:25 PM
Thank you for the explanation. I appreciate it.

I don't know much about taxes for "Down Under" but in the U.S.A., taxes and the like are usually applied to the total value of the invoice rather than for each item on the invoice. In such a case, I'd store the taxes on separate lines. In other words, I build the invoice table in an EAV fashion where each amount listed on the invoice lives in a separate row in the table. Any taxes or fees also live on their own rows. So, for me, your example doesn't provide justification for listing multiple phone numbers on the same row of data.

That shouldn't be taken as a point of contention between you and I. I'm just explaining the thought process that I have that justifies the use of a separate phone number table (one of the people that I work for has ~20 phone numbers). Think of the row that identifies the individual or company as the "invoice header" table and the phone number table as the "invoice detail" table.

I also do the same with addresses because even individuals can have more than one physical address and more than one shipping address (for example) and they can be temporally sensitive. The same also holds true for email addresses and many other attributes of an individual, even including names.

Yep, I tend to agree with your reasoning.

What I usually end up with in this domain is a set of tables representing different contact protocols (phone numbers, email addresses and url's) which are subtypes of a more general contact point, and I associate contacts with sets of contact points. But that's normally a result of cardinality concerns: If a person can have more than one home phone number (for example), it becomes quite clear that at least distinct home phone numbers must be distinct rows, not distinct columns. This then renders the domain ambiguities somewhat moot. That's why I wanted to restrict the hypothetical example to one in which there is just the one home phone number, mobile, etc. It's certainly an artificial restriction in most cases, but it does sharpen the point.

A lot of systems I've seen do in fact model different phone numbers as different columns. Right now I'm thinking of Dynamics CRM, where a contact is composed of columns like phone1, phone2, address1_line1, address1_line2, address2_line1, etc. This, to me, is just obviously wrong. But there are certainly cases where things aren't nearly so clear!


Blog on sqlservercentral
Thom A
Thom A
SSC Guru
SSC Guru (59K reputation)SSC Guru (59K reputation)SSC Guru (59K reputation)SSC Guru (59K reputation)SSC Guru (59K reputation)SSC Guru (59K reputation)SSC Guru (59K reputation)SSC Guru (59K reputation)

Group: General Forum Members
Points: 59706 Visits: 17726
I certainly agree that, if you can, store things such as phone numbers in a separate table if you're going to be storing multiple of them. You might well have an application that has options to store a client's Home, Mobile, Work, Fax, and Alternate contact number. It's unlikely you're going to be needing all of those for every client to start with, so some of those columns will be "wasted" space. Also, using a different table preserves Normal Form,

On the subject of taxes, however, at least in the UK, I would rather store the price next to item, not at the transaction total. Not all items are taxable in the UK, so storing the value (before tax) and then the tax value makes more sense. Say, for example, you purchased a Book for £10.00 and a lamp at £20.00, the total tax would be £5.00. Without it being applied to the items you wouldn't know which item(s) it applied to. However, if I said that the Book was £10.00 with £0.00 Tax (as books are tax exempt), and the Lamp was £20.00 with £5.00 tax, this makes things easier.

This is actually, really important in my line of work in insurance. Insurance can quite often be sold as "Co-insured". This means that several Insurers are insuring the same risk, however, each will be insuring a different percentage of it. So, for example, someone might have a High Value Household that they want insured, with a (rough) Sum Insured value of £25M. A single insurer may not be happy to insure the full risk, as a result Insurer A insures 60% of the risk (£15M), while Insurer B insures the remaining 40% (£10). The tax paid applicable needs to go to the applicable insurers, so that they can pay it. Doing to value at the full transaction wouldn't help, so the premiums would be split, with a separate value for the tax in the same line. Smile This becomes even more important if you start bringing in brokers and sub brokers; as you introduce Commission, which may be different for different insurers (depending on the agreement).

I'm not saying that having Tax on a separate line is wrong, I just think that depends more on type of business and where you are in the world and how tax works there.


Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does :-P

Please always remember to encapsulate your code in IFCode Markup. For example [code=sql] [/code].
Click here to read Jeffs Guide on how to post SQL questions, and get swift and helpful answers from the community
Thomas Rushton
Thomas Rushton
SSCoach
SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)SSCoach (15K reputation)

Group: Moderators
Points: 15007 Visits: 7736
Thom A - Friday, December 8, 2017 3:34 AM
On the subject of taxes, however, at least in the UK, I would rather store the price next to item, not at the transaction total. Not all items are taxable in the UK, so storing the value (before tax) and then the tax value makes more sense. Say, for example, you purchased a Book for £10.00 and a lamp at £20.00, the total tax would be £5.00. Without it being applied to the items you wouldn't know which item(s) it applied to. However, if I said that the Book was £10.00 with £0.00 Tax (as books are tax exempt), and the Lamp was £20.00 with £5.00 tax, this makes things easier.

And, for extra design fun, there are multiple tax rates, depending on what's being sold... So we can't just store "yes, this line has VAT", but we need to know which level / tier of tax is applicable...

Thom A
Thom A
SSC Guru
SSC Guru (59K reputation)SSC Guru (59K reputation)SSC Guru (59K reputation)SSC Guru (59K reputation)SSC Guru (59K reputation)SSC Guru (59K reputation)SSC Guru (59K reputation)SSC Guru (59K reputation)

Group: General Forum Members
Points: 59706 Visits: 17726
ThomasRushton - Friday, December 8, 2017 3:38 AM
Thom A - Friday, December 8, 2017 3:34 AM
On the subject of taxes, however, at least in the UK, I would rather store the price next to item, not at the transaction total. Not all items are taxable in the UK, so storing the value (before tax) and then the tax value makes more sense. Say, for example, you purchased a Book for £10.00 and a lamp at £20.00, the total tax would be £5.00. Without it being applied to the items you wouldn't know which item(s) it applied to. However, if I said that the Book was £10.00 with £0.00 Tax (as books are tax exempt), and the Lamp was £20.00 with £5.00 tax, this makes things easier.

And, for extra design fun, there are multiple tax rates, depending on what's being sold... So we can't just store "yes, this line has VAT", but we need to know which level / tier of tax is applicable...

And then, just to add further matters, some areas of the UK are tax exempt! For example, if the risk is based in the Channel Islands, or the Isle of Man, then you don't pay IPT (notice I said the risk; so if you live in the Isle of Man but you're insuring a house in Suffolk, you have to pay IPT!).



Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does :-P

Please always remember to encapsulate your code in IFCode Markup. For example [code=sql] [/code].
Click here to read Jeffs Guide on how to post SQL questions, and get swift and helpful answers from the community
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (681K reputation)SSC Guru (681K reputation)SSC Guru (681K reputation)SSC Guru (681K reputation)SSC Guru (681K reputation)SSC Guru (681K reputation)SSC Guru (681K reputation)SSC Guru (681K reputation)

Group: General Forum Members
Points: 681225 Visits: 45603
Thom A - Friday, December 8, 2017 3:34 AM
I certainly agree that, if you can, store things such as phone numbers in a separate table if you're going to be storing multiple of them. You might well have an application that has options to store a client's Home, Mobile, Work, Fax, and Alternate contact number. It's unlikely you're going to be needing all of those for every client to start with, so some of those columns will be "wasted" space. Also, using a different table preserves Normal Form,

On the subject of taxes, however, at least in the UK, I would rather store the price next to item, not at the transaction total. Not all items are taxable in the UK, so storing the value (before tax) and then the tax value makes more sense. Say, for example, you purchased a Book for £10.00 and a lamp at £20.00, the total tax would be £5.00. Without it being applied to the items you wouldn't know which item(s) it applied to. However, if I said that the Book was £10.00 with £0.00 Tax (as books are tax exempt), and the Lamp was £20.00 with £5.00 tax, this makes things easier.

This is actually, really important in my line of work in insurance. Insurance can quite often be sold as "Co-insured". This means that several Insurers are insuring the same risk, however, each will be insuring a different percentage of it. So, for example, someone might have a High Value Household that they want insured, with a (rough) Sum Insured value of £25M. A single insurer may not be happy to insure the full risk, as a result Insurer A insures 60% of the risk (£15M), while Insurer B insures the remaining 40% (£10). The tax paid applicable needs to go to the applicable insurers, so that they can pay it. Doing to value at the full transaction wouldn't help, so the premiums would be split, with a separate value for the tax in the same line. Smile This becomes even more important if you start bringing in brokers and sub brokers; as you introduce Commission, which may be different for different insurers (depending on the agreement).

I'm not saying that having Tax on a separate line is wrong, I just think that depends more on type of business and where you are in the world and how tax works there.


Understood but if you have a more conventional invoice (or receipt), I would think that it would be better to aggregate the groups of taxable items to avoid the "penny rounding" and calculate the appropriate taxes against the total invoice rather than individual line items. Of course, business requirements may vary and either case could be true.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Megistal
Megistal
SSCertifiable
SSCertifiable (6.5K reputation)SSCertifiable (6.5K reputation)SSCertifiable (6.5K reputation)SSCertifiable (6.5K reputation)SSCertifiable (6.5K reputation)SSCertifiable (6.5K reputation)SSCertifiable (6.5K reputation)SSCertifiable (6.5K reputation)

Group: General Forum Members
Points: 6511 Visits: 2616
I would go for the normalized form, whenever possible (it depends applies but it's for exceptions).
Integrity is my main concern plus it's easily extendable without much maintenance overhead, take less storage space, devs don't have to deal with null values (add overhead in the application layer), don't have to deal with empty string - default empty phone number (000-0000 for instance) and null are equaling the absence of, easier to maintain, and more than often a database is a "shared unit" queried from multiple sources. So modifying schema afterward is often very risky. More gain than loss for the small additional time to write at first.

Of course the solutions with "contacts" only is easier to query for devs. In those situations (having devs difficulty writing t-sql), an "external interface" view (or sp) will do the required joins for easy querying while preserving the underlying schema safe.

Also another point I feel important is what's the smallest indivisible unit a data can go before being unneeded by the business. What the phone numbers represent for the business. Is there any business value to know it's a landline over a cell phone or an IP phone? Does it bring more values distinguishing what type of phone it is? A phone number for a carrier is far more different from a business perspective than for another company that need to collect a phone number whatever it is because it needs to and won't do anything with it later.

my 2 cents
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum








































































































































































SQLServerCentral


Search