The Basics of Sizing a SQL Server Database

,

Sizing a database can be one of the most arduous tasks a

DBA, analyst or developer must attend to. It’s time consuming and more hours

go into analyzing the database than actually sizing it. This article focuses on

how to monitor the database’s growth after its deployed and contains some tips

on how to size it before deployment. We will also dive a little into how to

benchmark your database against a robust data load.

So how much equipment do you need to buy for your database?

What you answer can mean the difference between a SAN deployment and a few RAID

1 drives. Luckily for you, there are some tools out there to efficiently size

the database before deployment. For example, you can use ERwin (a product of CA)

to determine how large your database will be in x number of months. You can also

load a database with random data using products like Datatect 1.6. Those who are

feeling brave can try to do the math yourself, but beware of the dreaded

indexes. When most people size a database without the aid of a 3rd

party product, they miss the space needed by indexes, foreign keys and

constraints.

The easiest way of sizing databases is through the data

model program itself, like ERwin. ERwin (as most data modeling programs) has a

volumetric tab that allows you to specify a number of volume items when you’re

creating the table. In ERwin, you can specify:

  • The

    number of initial records

  • The

    number of new records monthly

  • The

    change a column will be NULL

The combination of these three factors can give the product

the ability to generate accurate volumetric reports any number of months out.

These reports can be outputted to HTML and can even be viewed by object. Most

modeling tools account for indexes, foreign keys and constraints. The only thing

ERwin doesn’t account for is indexed views, which can take up quite a bit of

space if they’re against a large table.

Once you have a general ideal of how large your database is

going to be, you can prove your theory with a product like Datatect

1.6. Datatect allows you to create meaningful random data to load into your

database. It also takes into account foreign keys as it’s loading data. After

trying all the other data generators, Datatect 1.6 was the easiest way to load

valuable sample data into my database I found. I found the other data loading

applications to be unstable and would easily peg my CPU. I am finishing my

review of Datatect 1.6 next week and will have my full lab results complete

then.

So why would you want to load a sample database? A few

months ago I used ERwin to project a new product’s database size would grow to

about 50GB after 6 months of operation. The application developers were used to

developing against test databases with only a few hundred sample records in them

and couldn’t fathom how to develop against a database this size. After loading

the database using Datatect, they were able to find deficiencies in the

application that would’ve crashed the server within six months once the

application went to production. 

With Datatect, I was able to load a 10 GB table in about 12

hours with meaningful test data. The same task in a similar product by a larger

vendor took over 8 days. Make sure that whenever you load the data, that you

remove all clustered indexes. Otherwise you can increase your load time

significantly. Once you have a sample database, you can use programs like

WinRunner, which can benchmark an applications performance, to see how it will

perform with so many records. 

The final approach is to watch and learn. You can use this

when you want to size a server aggressively and then deploy it with your best

guess on the size. Once I deploy a server, I have a table on each server where I

log statistics about my databases. Each week, I capture a snapshot of how large

each database is and then I can pull it into a spreadsheet for later use. Once

in the spreadsheet, you can draw trend charts to project where your database

will be physically in one year.

A sample stored procedure to get you started is available in the Resources section below. In a later article, we will dive into the more advanced topic

of how to size your database yourself without the assistance of a third-party

tool

Resources

Rate

3.86 (21)

Share

Share

Rate

3.86 (21)