Planning for multi-terabyte databases

  • Hello,

    Does anyone know where I can learn more about how to plan multi-terabyte databases? For example:

    1. Regarding backups and restores, whether there is some way (I've heard of an agent for Exchange that apparently allows this, and is there one for SharePoint Services/Portal?) to restore subsets of data without having to restore a whole multi-terabyte database, or whether the .BAK file for, say, a 2 TB or 3 TB database is about 2 TB or 3 TB or some other size.

    2. I also wonder what the transaction log and log backup sizes would be for a database of that size. I don't even know if it is advisable to maintain several full backups or some kind of system of differential backups. Just one week of full backups, for example, would be 14 TB for a 2 TB database along with all of the extra space needed for hourly transaction log backups.

    Pointers to sites or books that discuss large database planning and maintenance in detail would be great too.

    Thanks for any help,

    webrunner

    -------------------
    A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
    Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html

  • Sql 2005's integrated backup functionality is not going to be very fast - though your storage infrastructure is going to be the biggest factor impacting the speed of backups & restores.

    Are you storing the 2-3TB in ONE Database... How does your schema/table structure look?

    How many tables? Is there any kind of data partitioning possible? (depending on the nature of the data and the backup objectives, this could help a lot)

    If you have large indexes that are frequently used, you will need to consider the impact a backup has on users/apps. I've seen big indexes slow backups considerably; they could make a big difference when considering overall IO and throughput - I'm using "simple" logging which allowed me to run full backups a few times a day with less impact than using "full" logging and incremental backups throughout the day. It was a system level IO bottleneck/contention issue - by reducing logging overhead, I take full advantage of my high throughput rates (250-750MB/s = 2-6Gb/s ).

    What sort of read/write ratio are you dealing with? (and how much of it is Insert, Update, Delete)

    You may want to consider database mirroring (synchronously). The overhead *may* be lower than running frequent incremental backups.

    Anyway, the best thing about mirroring is you can eliminate or mitigate the downtime of a restore operation - obviously, you have a convenient standby already up-to-date.

    - Dan

  • 1.  If your data is stored on a SAN, you may want to talk with your SAN admin as it may have some block copy capabilities that you can take advantage of and then have that make a copy and run your backups from there.  Much safer and the SAN is optimized for this kind of volume and won't cause much of an impact on your DB. 

    2.  MS has a few links on their site about VLDB (Very Large Databases) 

    http://www.microsoft.com/sql/prodinfo/previousversions/scalability.mspx gets you to most of them. 

    Also, I am not at the office and don't have access to the links but I know that TechRepublic has some good links to white papers from folks like HP, IBM, and Dell about SQL Server and VLDB. 

    Good Luck!

    SJ

  • Actually we are running a quite large database (over 1,4 TB) and still growing. We are expecting 2 TB by the end of this year. This is an SAP system with IS-U extention.

    As of today we have a quite nice response time, but having some issues with the maintenance, as we are not allowed to bring the DB down. So reindex is tricky (BTW SAP is coming with around 28000 tables) as during the reindex the table is not available (one of our biggest table has over 115 M rows, 110 GB) we have to deal with the transaction log, backup time and other administrative task.

    I think we can start calling it a VLDB



    Bye
    Gabor

  • Thanks to everyone for their replies. I'm responding to Dan in particular since he asked a couple of questions. Regarding those questions, I don't have answers yet because the database has not even been planned yet, and we are just working off of something the vendor said, that their product would indeed be ONE database about 2 TB in size.

    We are now trying to anticipate what the impact would be on restore time if someone asked for something from a backup, as well as how to plan the necessary storage to allow us to keep at least one day of backups on the server (I don't know enough about this yet, but I assume a backup on the server will allow faster restores than one kept on tape). We do have SAN storage, so I will see what we can do with the SAN to help with performance. Even so, we may end up using simple recovery if using full recovery with transaction log backups turns out to be infeasible. I'll see what I find out as I read up, and maybe the vendor will give us more detailed advice since (I assume so far since I am not closely involved yet) they must have done other installations of this large database setup.

    I'll use the information you all have given so far to see if I can answer the above questions when the time comes and perhaps even come up with more refined questions that may enable more specific answers.

    Thanks again!

    webrunner

    -------------------
    A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
    Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html

  • Can you split the database into multiple filegroups? If so, you can use a filegroup backup and restore stategy, instead of a full database strategy. I haven't used it personally, maybe someone else can fill in the blanks ...

  • If you have enough space you can consider the following:

    Separate the biggest (or most accessed) tables into separate filegroups. Best is if thoses filegroupes are located on differents physical disks. The advantage is, that you can restore a single file whithout having to restore the whole DB.

    You can use a third party backup SW (from Quest, Red-gate, Idera...) which are making a pretty good job in term of compression and backup time. Still some concerns AFAIK with the restore times

    You can use a separate disk subsystem (a SATA based on because of the price) for your backups. If you have a backup software able to make a nice compression, you can keep several backups (most probably the log backups) on line



    Bye
    Gabor

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply