Data Sizes

  • Hi,

    I am curious about the best practices for adding fields to tables.

    Is there a tipping point where you should stop adding fields to a table, for example if you have 50 fields in one table would it make sense to split it up?

    For example: I'll use Students in a school setting.

    Originally the Students table started out as basically Name, Surname, Date of Birth, Identification Number. Then over time added Contact Details, Addresses, Class group, Major Subject, Minor Subject...

    then in the next upgrade Gender, Race, split of Contact details to be Parent 1 details and Parent 2 details, Medical information, Residential Accommodation, Meal requirements etc.

    Now they would like to add a Current Student Photo and fingerprint details (Which will massively increase the size of the data)

    Is there a point where you should consider (does it make logical sense) in splitting the original table up so you have a one-to-one join to for example a Photo Table, Finger Print Table, Medical records table, or a Parent Details table...Joining on a Identity field from Student.

    Or will the performance be the same as long as you always reference the precise fields you need?

    Then if you keep all >50 fields in the same table and you generate a one-to-many query between that table and (for example) Grades. Would the performance be negatively affected?

  • christy-720576 (11/13/2015)


    Hi,

    I am curious about the best practices for adding fields to tables.

    Is there a tipping point where you should stop adding fields to a table, for example if you have 50 fields in one table would it make sense to split it up?

    For example: I'll use Students in a school setting.

    Originally the Students table started out as basically Name, Surname, Date of Birth, Identification Number. Then over time added Contact Details, Addresses, Class group, Major Subject, Minor Subject...

    then in the next upgrade Gender, Race, split of Contact details to be Parent 1 details and Parent 2 details, Medical information, Residential Accommodation, Meal requirements etc.

    Now they would like to add a Current Student Photo and fingerprint details (Which will massively increase the size of the data)

    Is there a point where you should consider (does it make logical sense) in splitting the original table up so you have a one-to-one join to for example a Photo Table, Finger Print Table, Medical records table, or a Parent Details table...Joining on a Identity field from Student.

    Or will the performance be the same as long as you always reference the precise fields you need?

    Then if you keep all >50 fields in the same table and you generate a one-to-many query between that table and (for example) Grades. Would the performance be negatively affected?

    Your question hits on at least two things 1) Normalization 2) Record size

    I have to work with a table that has over 200 columns and my record size is over 8000.

    As to your question I would look into splitting the table so that for example address information is in a separate table. If there can be more than one answer (two parents) or a separate group of information (medical records) you should consider a separate table. Read about normalization. You do not need to have everything in its own table but you do not want everything in one.

    I hope this helps as I know the issues but am not good at presenting them

  • A good way to start thinking about normalization is to think about the nouns in your requirements. You have students, contacts and parents. Those are all nouns and each should all have their own table. A student might have 0, 1 or more parents. That's a one-to-many relationship from Students to parents. The Parents table would have a foreign key to Students. The same could be said about how many contact methods a single student could have.

    Things like gender and race are characteristics (or adjectives) of a student, so they belong in the Students table. I would define them as a one-to-many relationship. The Students table would have a foreign key to to Genders table and another one to the Races table. The genders and races would be stored in their own tables and the foreign keys in the Students table would point to the appropriate row.

    Did I do even a remotely good job of explaining this?

  • I liked it.:-)

  • christy-720576 (11/13/2015)


    Hi,

    Now they would like to add a Current Student Photo and fingerprint details (Which will massively increase the size of the data)

    Is there a point where you should consider (does it make logical sense) in splitting the original table up so you have a one-to-one join to for example a Photo Table, Finger Print Table, Medical records table, or a Parent Details table...Joining on a Identity field from Student.

    Or will the performance be the same as long as you always reference the precise fields you need?

    Then if you keep all >50 fields in the same table and you generate a one-to-many query between that table and (for example) Grades. Would the performance be negatively affected?

    Assuming the application is wise enough to not SELECT *, performance is likely to be dictated more by SQL Server's ability to use indexes, than the width of the table. Maintenance of the larger database size will become a concern. The system's restore strategy will need to revisited and adjusted to accommodate slower backups and restores, slower consistency checks, depending upon what SLA than must be met.

    The minimum IO SQL Server performs is 8KB (a page). It does not matter what size a row is upon a page - 8 KB will always be read. When the size of a row exceeds 8KB (i.e. data types with max precision, with one example being a data type being used to persist photos), pointers are kept in the page's row. Those pointers point to other pages that would need to be fetched, if the client requests.

    Indexes use a B-Tree structure. When SQL Server performs an index seek, it starts at the root of the index (a page), seeking down the levels of an index (one page from each level), until the leaf level (the final page) of the indexes B-Tree is reached. The number of levels of an index (the depth) dictates how many pages are read when SQL Server performs an index seek. The depth of most indexes is less than 10 pages, but this also depends upon the width of the index. The maximum index width is 900 bytes.

    When a table is split into a one-one relationship, a query is likely to require each table to experience at least one index seek. If the depth of those indexes were each 2 pages, at least 32 KB of additional disk IO would be needed, before the two rows (one page in each 1-1 table) can be joined. Whether such amounts of additional IO becomes a notable performance concern needs to be tested, but in general a 1-1 design is slower, due to the need to seek and join the 1-1 rows. If the table's columns are sparsely populated, you may also want to consider using sparse columns.

    Database normalization is important, a mnemonic for the first three forms of normalization is "The columns in a table should relate to the primary key, the whole primary key, and nothing but the primary key, so help me Codd" (I couldn't resist quoting that :). It is usually best to start with a normalized design and denormalize it, if you discover a compelling performance reason to do so. It would be best for you to functionally test each design scenario, using realistic data, appropriate indexes, and realistic queries.

    One popular optimization is to avoid storing large objects in the database. Instead, pointers (such as well-formed paths to files in a file system, share, or URL) to large objects are kept in the database. Doing so can decouple transactional consistency between the files and the database, which many find to be an acceptable compromise.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply