How do you estimate what a database size will be?
What do you consider and how close do you have to be to
I had to respond to that question many times. I had to come up with estimates many times also, and people were always surprised to get pages of explanation to accompany the magic number… It made the understanding more obvious, if you are not just estimating space but are estimating a storage strategy as well.
I mean let’s face it. An estimate is an estimate. The difficulty comes when you do not put enough in perspective the number you are giving. If you are too low, you might be blamed because it will represent a non-budgeted cost; hopefully you caught it before the database ran out of space. Storage solutions cost can be expensive and you have got to come with solid justification so it can be properly budgeted.
You need room to store the data to.
You might have redundancy needs.
You might have fault tolerance needs.
You might have special backup needs.
You might be looking at an expensive SAN solution to help cover all of the above.
Everyone in the community knows that data and log files, tempdb and system databases files as well as the OS and program files do not belong well together. And what about the page files? And what happens when you try to put many different databases files on the same disk system. Two OLAP or OLTP databases side by side will certainly affect each other …
You must make sure I/O performance will be constant and predictable. You must make sure you will have enough disk space to reach the goal you were given so you have to estimate the space the database will occupy in X months, in X years or whatever. You might want to have free space so you can use (almost) non-disruptive online re-indexing options.
Now, with all this in mind, let’s take a look at what is needed to purely store the data to disk.
You can use the data dictionary of an existing database to get the information you need. Then you can put the information into an Excel spreadsheet and fine tune your numbers. Following you will find an example of the command you can run to extract the basic information you need about tables and indexes structures. You will see that two views are created prior to the query. This is to hide a bit the complexity the query would have shown without them.
The query (code in the Resources Section) is a complex derived table type join based on the new views as well as the system tables they are built on. Before each step you will find within the query an explanation of what it is getting. Numbers and formulas used represent pretty much what SQL Server data structure is using right now. Some data types and details might be missing but it is a fairly good start.
You can put the result of this query in an Excel spread sheet and start to play with the numbers and have Excel calculate the new results for you.
You will also find a link at the bottom (Resources Section) to an example of an Excel worksheet you can use to manipulate the numbers and then use to document and justify your needs. I suggest you past data in the Excel spreadsheet carefully so you do not loose the formulas. You can also build an Excel form from scratch and ameliorate the prediction mechanism and put in better formulas of your own. This spreadsheet calculates space based on an existing database that will see its row increase normally
distributed. It might not be exactly the real behavior a database would have in real life.
The important is that you have something to start with and an understandable way of explaining your numbers into a nice Excel format everyone will easily be able to read.