RE: Data Sizes – SQLServerCentral

SSCertifiable

Points: 5871

November 13, 2015 at 7:58 pm

christy-720576 (11/13/2015)
Hi,
Now they would like to add a Current Student Photo and fingerprint details (Which will massively increase the size of the data)
Is there a point where you should consider (does it make logical sense) in splitting the original table up so you have a one-to-one join to for example a Photo Table, Finger Print Table, Medical records table, or a Parent Details table...Joining on a Identity field from Student.
Or will the performance be the same as long as you always reference the precise fields you need?
Then if you keep all >50 fields in the same table and you generate a one-to-many query between that table and (for example) Grades. Would the performance be negatively affected?

Assuming the application is wise enough to not SELECT *, performance is likely to be dictated more by SQL Server's ability to use indexes, than the width of the table. Maintenance of the larger database size will become a concern. The system's restore strategy will need to revisited and adjusted to accommodate slower backups and restores, slower consistency checks, depending upon what SLA than must be met.

The minimum IO SQL Server performs is 8KB (a page). It does not matter what size a row is upon a page - 8 KB will always be read. When the size of a row exceeds 8KB (i.e. data types with max precision, with one example being a data type being used to persist photos), pointers are kept in the page's row. Those pointers point to other pages that would need to be fetched, if the client requests.

Indexes use a B-Tree structure. When SQL Server performs an index seek, it starts at the root of the index (a page), seeking down the levels of an index (one page from each level), until the leaf level (the final page) of the indexes B-Tree is reached. The number of levels of an index (the depth) dictates how many pages are read when SQL Server performs an index seek. The depth of most indexes is less than 10 pages, but this also depends upon the width of the index. The maximum index width is 900 bytes.

When a table is split into a one-one relationship, a query is likely to require each table to experience at least one index seek. If the depth of those indexes were each 2 pages, at least 32 KB of additional disk IO would be needed, before the two rows (one page in each 1-1 table) can be joined. Whether such amounts of additional IO becomes a notable performance concern needs to be tested, but in general a 1-1 design is slower, due to the need to seek and join the 1-1 rows. If the table's columns are sparsely populated, you may also want to consider using sparse columns.

Database normalization is important, a mnemonic for the first three forms of normalization is "The columns in a table should relate to the primary key, the whole primary key, and nothing but the primary key, so help me Codd" (I couldn't resist quoting that :). It is usually best to start with a normalized design and denormalize it, if you discover a compelling performance reason to do so. It would be best for you to functionally test each design scenario, using realistic data, appropriate indexes, and realistic queries.

One popular optimization is to avoid storing large objects in the database. Instead, pointers (such as well-formed paths to files in a file system, share, or URL) to large objects are kept in the database. Doing so can decouple transactional consistency between the files and the database, which many find to be an acceptable compromise.