Large Table Design Best Practices

  • Let me preface this by saying that "Large Table" means the size of the record (number of columns), not that there will be a large quantity of records in the table.

    I’m starting design of a rather complex Loan Management application which requires data to be present at different times in the workflow.

    The attachment is a sample of what would be required as Loan Master data. Not all of the data is known when the record is first created and it becomes available at different stages of the workflow. This applies to a number of other tables in the application.

    The question becomes. As a best (or at least better) design practice, should this data be combined into a single master table, or should the different components (Loan Terms, Insurance Data, Planning Data) be put into “child” tables with only the Basic Loan Information being the “master” record. There would only be a 1 to 1 relationship in these cases. This way I can have the database help enforce mandatory fields at different stages of the workflow.

    Let me know if additional explanation is required.

    Thanks in advance for your comments.

  • This can't be answered with certainty in the absence of data. However, there is nothing offhand wrong with this. I'm assuming that the IDs point to other tables. I also work with information where not all the information is known at first. They either get filled with NULLS (usually in the case of dates) or blank strings.

  • tnpich (3/3/2016)


    Let me preface this by saying that "Large Table" means the size of the record (number of columns), not that there will be a large quantity of records in the table.

    I’m starting design of a rather complex Loan Management application which requires data to be present at different times in the workflow.

    The attachment is a sample of what would be required as Loan Master data. Not all of the data is known when the record is first created and it becomes available at different stages of the workflow. This applies to a number of other tables in the application.

    The question becomes. As a best (or at least better) design practice, should this data be combined into a single master table, or should the different components (Loan Terms, Insurance Data, Planning Data) be put into “child” tables with only the Basic Loan Information being the “master” record. There would only be a 1 to 1 relationship in these cases. This way I can have the database help enforce mandatory fields at different stages of the workflow.

    Let me know if additional explanation is required.

    Thanks in advance for your comments.

    I think you will be much happier in the long run if you break this into properly normalized tables. Consider something like a certain individual has 3 loans and they change their insurance. In the scenario of a single table you have to update all the rows for that person. If you have normalized data, you update the relevant insurance information and everything is up to date. You already have this broken out into logical segments, squishing it back together seems like a bad idea to me.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • think you will be much happier in the long run if you break this into properly normalized tables

    In the absence of data, this cannot be determined with certainty. The understanding I have of the process is that these items relate to a single load. A 1-1 relationship is indicated. If this is correct, than normally 1:1 relationships are in the same table. The exceptions are for rare situations (a person may only have one company car, but most employees do not get a car) or security considerations (each employee has only one SSN, but it's separated into another table for security reasons).

  • RonKyle (3/3/2016)


    think you will be much happier in the long run if you break this into properly normalized tables

    In the absence of data, this cannot be determined with certainty. The understanding I have of the process is that these items relate to a single load. A 1-1 relationship is indicated. If this is correct, than normally 1:1 relationships are in the same table. The exceptions are for rare situations (a person may only have one company car, but most employees do not get a car) or security considerations (each employee has only one SSN, but it's separated into another table for security reasons).

    True that we can't know for sure. But if you keep this in one table you lose a lot of data integrity control from the database side. You now have to allow null and move the RI conditions to the application. This is a double edged sword for sure.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • But if you keep this in one table you lose a lot of data integrity control from the database side. You now have to allow null and move the RI conditions to the application.

    I don't see how this can be so. Please provide a concrete example. In my experience, if the data should all be in one table and it's not, it creates other problems. I've worked with a system where the 1 to 1 information is divided unnecessarily among multiple tables. This lead to the loss of the item tracked because occasionally information would not make it into one of the tables. Therefore the information couldn't be retrieved properly.

    Bottom line: if it's a 1:1 relation, as a general rule it goes into the same table. There are exceptions, but understand why you are making the exception.

  • RonKyle (3/3/2016)


    But if you keep this in one table you lose a lot of data integrity control from the database side. You now have to allow null and move the RI conditions to the application.

    I don't see how this can be so. Please provide a concrete example. In my experience, if the data should all be in one table and it's not, it creates other problems. I've worked with a system where the 1 to 1 information is divided unnecessarily among multiple tables. This lead to the loss of the item tracked because occasionally information would not make it into one of the tables. Therefore the information couldn't be retrieved properly.

    Bottom line: if it's a 1:1 relation, as a general rule it goes into the same table. There are exceptions, but understand why you are making the exception.

    As the OP stated this data is created by multiple steps in the process. Let's say for example that InsuranceCarrierID would be required for this to be valid but that information happens in step 3. This means you have to allow NULL in the database because when the row is created it can't be known. However, logically it can't be NULL. This means you have to include business rules in the application to enforce that value be supplied in that step. As I said, this is a double edged sword with neither side being 100% correct.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean is grasping the problem. I'm concerned about data integrity and having to enforce more rules in the business logic (which will be complex enough as it is)

    If I put it all into one table, I have to allow NULLs for certain values. But a some point, NULLs are not acceptable to proceed with the workflow.

    In the sample I provided, the data was normalized.

    As to Sean's comment about changing insurance, it wouldn't apply here since the insurance relates to this specific loan. A borrower could have another loan and it could be covered by a different insurer. In this app, loans are to builders for specific initiatives and the terms are relatively short.

    I think I'm leaning more to having the tables contain the data that support a business function. The client wants some checklists implemented that would support the business functions and data required in them.

    Thanks for your input. I'll keep monitoring since I haven't come to a final decision yet.

  • If you are certain there is a 1-1 relationship between these columns, meaning they are all essentially attributes of one entity and not stand alone entities, then I'd lean in favor of containing them within a single table too. Also, having all columns within the same row will allow for more robust enforcing business rules in the form of table level check constraints and unique indexes. Another consideration for having the columns within same table is that it allows for creation of covering indexes for optimal query performance.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • I'm with Sean. I would absolutely normalize this data. There are simply tons and tons of reasons why. Let's start with the simplest, ensuring that every single spelling of the State in the address is the same. 'OK', 'OKLA', 'OKLAHOMA' just to start, means multiple queries or multiple OR clauses (assuming not a single typo, one 'OKLHOMA' and all bets are off). Then lets talk about the fact that data normalization is actually a mechanism for increasing query performance. Modern database systems (and I mean anything after SQL Server 2000) are tuned to support JOIN operations. It's not the 1990s any more when more than two or three joins caused performance problems. Now, storing less data on fewer pages with good indexes you get stellar performance out of normalized structures (when the data warrants it, as this stuff clearly does). Extremely wide tables like this inevitably lead to data management and performance issues.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • RonKyle (3/3/2016)


    But if you keep this in one table you lose a lot of data integrity control from the database side. You now have to allow null and move the RI conditions to the application.

    I don't see how this can be so. Please provide a concrete example. In my experience, if the data should all be in one table and it's not, it creates other problems. I've worked with a system where the 1 to 1 information is divided unnecessarily among multiple tables. This lead to the loss of the item tracked because occasionally information would not make it into one of the tables. Therefore the information couldn't be retrieved properly.

    Bottom line: if it's a 1:1 relation, as a general rule it goes into the same table. There are exceptions, but understand why you are making the exception.

    That shouldn't happen if the referential integrity is in place and enforced. I've worked with normalized data structures for more than 25 years. It's absolutely a good practice. Normalization only becomes problematic when it's done incorrectly or it's not enforced.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I'm with Sean. I would absolutely normalize this data.

    This is a false choice. The question is not to put it in a single table in an unnormalized format vs multiple tables in a normalized format. The question is can it make sense to put it in a single table. And the answer is yes, it can, if the data is truly 1:1, and no, it doesn't, if some of the data can have a 1:many relationship with other parts of the data.

    If the data is 1:1 and the data is unnecessarily split out among different tables, it will make many things, such as the need to create joins that wouldn't have been necessary, more difficult. Maybe not difficult, but certainly not as easy. It will also be easier to query for orders in different stages.

    Anyone who knows me knows I'm death on ensuring the data is normalized. But without data to analyze, this can't be determined simply by looking at a design.

  • RonKyle (3/3/2016)


    I'm with Sean. I would absolutely normalize this data.

    This is a false choice. The question is not to put it in a single table in an unnormalized format vs multiple tables in a normalized format. The question is can it make sense to put it in a single table. And the answer is yes, it can, if the data is truly 1:1, and no, it doesn't, if some of the data can have a 1:many relationship with other parts of the data.

    If the data is 1:1 and the data is unnecessarily split out among different tables, it will make many things, such as the need to create joins that wouldn't have been necessary, more difficult. Maybe not difficult, but certainly not as easy. It will also be easier to query for orders in different stages.

    Anyone who knows me knows I'm death on ensuring the data is normalized. But without data to analyze, this can't be determined simply by looking at a design.

    Actually that isn't the question as posted the OP.

    The question becomes. As a best (or at least better) design practice, should this data be combined into a single master table, or should the different components (Loan Terms, Insurance Data, Planning Data) be put into “child” tables with only the Basic Loan Information being the “master” record. There would only be a 1 to 1 relationship in these cases. This way I can have the database help enforce mandatory fields at different stages of the workflow.

    Just for a second let us pretend this is a brand new system. There is no existing data so you can't analyze any. You have to be able to visualize what the data will look like when the system comes online. Would your answer remain the same if there simply isn't any data?

    The biggest challenge of using a single table for this is that the integrity checks would have to be moved to the application. This means that certain required columns must allow in the data model. This violates the data model because it does not support the business rules.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • The biggest challenge of using a single table for this is that the integrity checks would have to be moved to the application.

    I've used these kinds of accumulated snapshot tables most of my DBA career. I have never found this to be an issue and in the absence of a concrete example that shows how multiple tables would solve an issue that having the data in a single table would create I remain skeptical that this is an inherent issue. I have seen parts of entities disappear when unnecessarily divided because a single commit may now involve multiple tables. In my view these losses, although only a fraction of the overall total, represent an unacceptable risk as no data should ever be lost due to a fault in the database design.

  • RonKyle (3/3/2016)


    The biggest challenge of using a single table for this is that the integrity checks would have to be moved to the application.

    I've used these kinds of accumulated snapshot tables most of my DBA career. I have never found this to be an issue and in the absence of a concrete example that shows how multiple tables would solve an issue that having the data in a single table would create I remain skeptical that this is an inherent issue. I have seen parts of entities disappear when unnecessarily divided because a single commit may now involve multiple tables. In my view these losses, although only a fraction of the overall total, represent an unacceptable risk as no data should ever be lost due to a fault in the database design.

    Shall I repeat the concrete example yet again? You have a column in step 3 that is required. That column cannot be made NOT NULL in the design because you can't enter that data when the first part is created. How do you enforce the NOT NULL requirement? Remember that the OP stated that the data is entered in stages, not all at once. If it were a single point of entry that captured every piece of data then absolutely I would not suggest splitting this would be a good idea. But given the multi-step process involved here that isn't the case. This means you have to allow NULL in all the columns and then check for their existence later from the application side. That sounds like a nightmare to me.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Viewing 15 posts - 1 through 15 (of 46 total)

You must be logged in to reply to this topic. Login to reply