Many Intersection Tables vs. Many Entity Tables

  • All,

    I am interested in feedback on which of the following is the more efficient and better performing design.  Please disregard syntax and datatype issues.  This is only an example.

    Also, I am very much leaning toward Design 1 due to the maintenance simplicity, but if Design 2 is much more efficient I am willing to adopt it.

    Thanks in advance for any feedback.

    Keith

     

    Design 1:  Single Address Table with Entity Intersection Table

    Create Table Address

    (AddressID  int identity(1,1) Primary Key

    AddressTypeID int  (ForeignKey to AddressType)

    Address1 varchar(100)

    ...

    PostalCode varchar(10))

     

     

    Create Table Customer

    (CustomerID int identity(1,1) PrimaryKey

    LastName varchar(50),

    FirstName varchar(50))

     

     

    Create Table CustomerAddress

    (CustomerID int PrimaryKey

    AddressID int PrimaryKey)

     

    Example: Select * from Customer c join CustomerAddress ca on c.CustomerID = ca.CustomerID

    join Address a on ca.AddressID = a.AddressID

     

    Design 2:  Would Require multiple Entity/Address tables.

    Create Table Customer

    (CustomerID int identity(1,1) PrimaryKey

    LastName varchar(50),

    FirstName varchar(50))

     

    Create Table CustomerAddress

    (CustomerAddressID int identity (1,1) (PrimaryKey)

    CustomerID  int

    Address1  varchar(100)

    ...

    PostalCode varchar(10))

     

    Example:  Select * from Customer c join CustomerAddress ca on c.CustomerID = ca.CustomerID

     

  • I'd check the business rules/requirements to find out if Address is an entity by itself. For example, if you want to do mass mail to your customers and you have several people living at the same address would you send them only one letter to all of them (in which case it's Design 1) or a separate letter to each of them (Design 2).

  • I concur with the (possibly most important) comment in the above posting - check business rules.

    Once you have clarified the rules you should know whether there is an issue with or reason to input the address each time.  For example if in your business model you expect to enter an address and then change the customer at the address if necessary but never re-key the address the first model will likely work better.  If you intend to re-key the address each time the second will be easier.

    I personally tend toward the idea that a new address should only be added when a new house is built.  Or in other words once an address is captured it can be reused.  (First model).

    Karl lives at 1 Main St.

    Mary moves out of 2 Other St, break the link between Mary and 2 Other St

    Karl moves out of 1 Main st, break the link between karl and 1 Main st

    Karl moves to 2 Other st, where previously Mary lived.  Link Karl to 2 Other St

    Sally moves in to 1 main St, link Sally to the existing 1 Main St address.

    Regards

    Karl

  • Aside from the benefit of address reuse, are there any performance issues regarding the first model.  It would definitely be easier to maintain having all addresses in one table.  Also, I have this situation throughout my model.  It is not only an address issue.  The business rules do not dictate that addresses will be shared or reused.  I have some pressure to use design 2 but it is definitely not as clean as design 1.  Design 2 would require an address table for each entity defined in the db of which there are at least 10, so instead of 1 table, I would need at least 10 tables.

    Thanks for any input.

    Keith

  • For performance, it again depends on business requirements. For example, if there is a request to find all of your customers who live at the same household then design 1 will be much faster, of course.

    If you're concerned about that extra join, don't worry. Just index your tables properly and it'll be ok.

    However, design 1 might be an issue from application perspective. Since addresses are typically entered/modified in a poorly structured way you'll have to find a way to find out if an address that a user has entered already exists in your Addresses table or it's a new one. Business requirement one more time

  • Having a relationship table makes it easy to maintain a history as you can give each record a start and end date and perhaps a status if relevant.  This can be useful in tracing gone aways and in detecting some types of fraud (eg if address changed just before an insurance claim, throw up a warning to check that it was a valid change and not someone trying to intercept the cheque).

    In the case of customer details, deduplication always is a problem and sharing address records between customers could be dangerous.

    I would tend towards Design 1 in this case because of the audit trail.

    Similar considerations apply in other scenarios - how safe is it to share data between entities and do you want an audit trail?

     

  • Forget performance.  What is the requirement? 

    Design 1 is a many to many tie between customers and addresses. 

    design 2 is 1 to many

    You many even need to get more complicated than that. 

    Perhaps a contact table also and that could be a many to many to the Company table, or even a table join.

    Again, what is the requirement.

    The database design should fulful the requirements.  Don't start off changing the requirements because you MIGHT have performance concerns. 

  • Also, do you need a historical reference of address, then altering the middle table with a date field to serve as your end of use date may push you to Design 1. Or you could have a trigger with audit tables for tracking changes instead in design 2.

    It all boils down to what you are expected with regards to business operations.

    1) Can a customer have more than one active address?

    2) Can multiple customers be at the same address and you need to segregate for any business purpose?

    3) Do you have to retain historical data about changes? If so what is the long term goal you must be able to meet?

    You could even have a third design of

    Addresses (addressid, address, city, state, zip,...)

    Customers (CustomerID ,name ,... ,CurrentAddressID [,PreviousAddressID>])

    This might be for a mailing system for orders but also be used for market mailings where you use Current Resident when addressing the address.

    If all boils down to what you must accomplish both short and long term and how much historical reference must you keep.

  • At least for US addresses, you could probably whip something out that normalizes the address entered by the data entry people. USPS.com does provide an API to do this.

    If it was me, I wouldn't worry too much about the dogmatic approach. Customers, CustomerAddresses (with an AsOf date field to indicate which is the current address), or maybe only store the current address in CustomerAddresses, with a vertically-partitioned table to store the historical CustomerAddresses data, triggers to detect when the address is changed and do the manipulations, etc.).

    I say this as occasionally receiving multiple pieces of junk mail addressed to me, but with variations of the same address, with abbreviations, no abbreviations, flip-flops of address lines, typos/misspellings, etc.

    If you have an existing body of data reflective of what you will be receiving for your app, then check on it first.

    If you have data from multiple countries, then...it's probably not going to be worth the effort to try and programmatically enforce or tease out along the way. At least that's the conclusion I've come to.

    Let's not get started about corporate names...

  • It all comes down to your requirements.

    If Customer and Address are in a many-to-many relationship then use the first design else if

    Customer and Address are in one-to-many relationship use the second.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply