many blank fields vs many linked tables

  • The database I am designing will be accessed by multiple users via a dot net application written in C#. I first created many related tables such that almost every field would contain data. After experiencing performance problems in Visual Studio Express, I redesigned the database with fewer tables/relationships but many empty fields. I populated the empty fields with zero-length strings or zeroes. This seemed to improve performance in Visual Studio somewhat, but I do not know how sql server performance has been affected.

    How does Sql Server handle many empty fields vis-a-vis many relationships?

  • It sounds like you're debating normalized vs. denormalized data.

    For very simple select statements that don't involve much in the way of filtering, but simply pull data based on a key value, assuming the key value is also the clustered index, then a denormalized table will usually perform faster than a denormalized table. Now, start to add other filters to the WHERE clause and you quickly see the performance shift over to working better with normalized data. This is because the normalized data can eliminate the NULL values (when done correctly). NULL values don't index well, at all.

    You're not bringing into the discussion though, some of the other advantages to normalized data such as data integrity and the reduction of space used. Let's take one simple example, a state name. I've seen New York, NY, N.Y., and then a hundred different variations, all in one database that didn't have a normalized design that enforced the integrity of the data. Not only is this data nearly impossible to work with, but you will run into serious performance problems as you put in a hundred different OR clauses to try to get the data out. Also, you're being forced to hold a string up to 50 characters or more on every row of the table where you could have a smallint field linked to a lookup table.

    If you're hitting performance issues with the normalized data, post the execution plans to the forum here and someone will help out.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant, thanks for your response.

    I gather from your reply that I haven't posed my question clearly enough. I think I am not suggesting denormalizing any tables. Nor am I intending to allow any null values in any fields. Rather, I am wondering about the comparative efficiency of SQL Server 2005 in two distinct scenarios. Perhaps, as you mentioned, space is the issue.

    For example, the information system I am designing includes Customers who are treated as businesses. Each Customer has a name and, among many other attributes, zero or one Departments. There are currently a little over 5000 Customers, only 5 of which have Departments. So which is more efficient, Plan A or Plan B?

    Plan A

    this results in thousands of fields populated with zero-length strings

    CREATE TABLE CustMaster

    (

    CustID nchar(7) NOT NULL,

    CustName nvarchar(96) NOT NULL,

    Department nvarchar(96) NOT NULL DEFAULT '',

    CONSTRAINT PK_CustMaster PRIMARY KEY (CustID)

    )

    Plan B

    this results in an additional table and relationship but no fields containing zero-length strings

    CREATE TABLE CustMaster

    (

    CustID nchar(7) NOT NULL,

    CustName nvarchar(96) NOT NULL,

    CONSTRAINT PK_CustMaster PRIMARY KEY (CustID)

    )

    CREATE TABLE CustDepartments

    (

    CustID nchar(7) NOT NULL,

    Department nvarchar(96) NOT NULL,

    CONSTRAINT PK_CustDepartments

    PRIMARY KEY (CustID),

    CONSTRAINT FK_CustDepartments_CustMaster

    FOREIGN KEY (CustID) REFERENCES CustMaster(CustID)

    )

    Additionally, would the comparative efficiency change if the ratio of Customers to Departments changed from 5/5000 to 4000/5000?

  • OK. That was more helpful.

    Well, for a select from the key and the key only, A would probably be faster than B. However, as soon as you decide you want to filter by the department, B becomes MUCH faster than A. B is also more effecient on space. Assuming you want to occasionally filter by department, I would definitely go with B.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • This sounds like a scenario where the new "sparse columns" feature of SQL 2008 might be of use, although I realize you probably have to use SQL 2005.

  • I think you may have a conceptual design flaw.

    How is your customer/business represented in the application connecting to the data?

    Is it a Customer object which has a departments collection attribute which contains 0+ department objects? Should it be 1 or more department objects?

    How do your use cases/user interface/application deal with this?

    These questions should help you decide on the model for the database.

    Its all about the application object model design. 🙂

    Hiding under a desk from SSIS Implemenation Work :crazy:

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply