• Hugo Kornelis (11/16/2011)


    Quick message to those who arrive here: I am aware that this question involves a lot of calculations - much more hard work than normal questions. So my compliments on all your hard work, and I hope it paid off by earning you a point.

    I do seriously like the spirit of the question. Just suprised at the lack of detail in certain aspects. Considering how much attention to detail Hugo usually pays to others QOD's I was suprised. Others have pointed out that some of your statement also contradict ones the articles they reference as do the numbers you have in your math. I can see around that though becuase a SQL data page has been the same size for a long time and your table rows fit.

    The primary thing this question is missing from a real world size calculations perspective is if the DB was created with FULL or BULK as the option for logging. Was the log file set to auto-grow? How often is the tran log is backed up?

    The calculations on size requirements the DB recovery option, Tran log backup policy, and the size of the logged transactions are even more important that the ones Hugo brings up when determining the amount of space needed to support an OLTP database with 2 million rows in a single table, a char row of random values for a primary key (instead of a finding a candidate key in the data), and other items that would cuase this DB to be costly.

    Number one disk space abuser in my world is Databases created with Full recovery model and no Tran log backup or DB backup jobs. It is amazing to me that I have yet to find an environment that does not have at least one database like this.

    Also one last note: Not certain if Auto create Statistics was left on or off so I will believe it was the default or on. The tables Primary Key Clustered Index is not at all related to the data stored in the table or to the 2 Million new rows being inserted into it. The stats that would exist before and after the Index rebuild would require more disc space than a good Non-Clustered index. A proper Table Primary Key created from two or more candidate key columns would also create a Clustered Index that could reduce the true space required by this table and all the data structures that it has. The sys.dm_db_missing_index_details management view can be used after the initial data insert is completed to find out exactly what a good non-clustered index would need to be. If you want to find what the table candidate keys should be so you can have the best of both worlds use the sp_special_columns to find information about the column that uniquely identifies rows in the table. This would only work if you had a version of the table with data in it that did not have the current random char(10) value as a Primary Key.