TeraByte DB (15+TB): Partitioning / DataWarehouse

  • Hi

    I am primarily a Oracle DBA asked to support SQL Server; I have had success with maintaining one of our other Production db (not a big db) with Log Shipping / Standby; Now i am told that i will be on a project where the db size will be > 15TB (Yes Tera Byte); One thing that comes to my mind is partitioning and file groups. What other settings would need to be turned on for a TB database in SQL Server 2005? I am reading on partitioning; But if there is someone who has worked on TB databases and can shed somelight it would be useful.

    The data load will take place once fully followed by 3-4 loads during the day and will be on Windows box.

    Any useful suggestions will be appreciated. I donot want to re-invent the wheel.

    TIA

  • Very interesting question!

    Here are my 5 cents:

    - Use a 64bit Enterprise environment

    - Use a SAN with a lot of disks

    - Which backup solution are you gonna use? (backuptime, duration, restore strategy)

    - Do you know how many users a connected during working time? This will help you in deciding your hardware (CPU and memory), it will probably a "big" one.

    - Partition your data, after you know how this database is used

    - Make sure TempDB is properly aligned (nr of datafiles = nr of CPU)

    - seperate static and live data in different filegroups

    - Think about actions like how to reorganize/rebuild indexes, DBCC actions

    Wilfred
    The best things in life are the simple things

  • You have left some information out, so here are a few questions.

    What does the database design look like? 15tb is a lot different if there is one very large table vs. 150,000 small tables. What takes up all the space, it is full of image data? What operations will be done on the database?

    You mention a data load of some sort - are you suggesting that your involvement will be to create some kind of failover solution or reporting database that will be updated regularly?

    You will obviously need to be running SQL Enterprise and using a 64 bit environment. Depending on the design, you may want to consider splitting the database into pieces and creating a federated server solution to spread processing load across multiple servers - scale out rather than just scaling up. What kind of budget are you going to be working with?

    If you are creating a reporting solution, I would recommend you invest some time in reading the Microsoft white papers on project REAL. This was an MS team assigned to create a real world data warehouse solution for a very large database. The documentation they published was not the usual marketing garbage. They even suggested not using some of the MS best practices and have indicated product weaknesses and ways to avoid them.

  • Replies embedded

    >- Use a 64bit Enterprise environment

    >- Use a SAN with a lot of disks

    >- Which backup solution are you gonna use? (backuptime, duration, restore strategy)

    As of now , my plan is to have weekly once Full backup and sufficient disk space for Trn logs;

    In case of any db failure, restore from the backup and roll forward

    >- Do you know how many users a connected during working time?

    Not really at this time.

    >This will help you in deciding your hardware (CPU and memory), it will probably a "big" one.

    >- Partition your data, after you know how this database is used

    >- Make sure TempDB is properly aligned (nr of datafiles = nr of CPU)

    >- seperate static and live data in different filegroups

    >- Think about actions like how to reorganize/rebuild indexes, DBCC actions

    Currently i am reading about file groups and partitioning. Can you suggest me more on DBCC actions

    for a TB database?

  • >You have left some information out, so here are a few questions.

    >What does the database design look like? 15tb is a lot different if there is one very large table vs. >150,000 small tables. What takes up all the space, it is full of image data? What operations will >be done on the database?

    The db design from what i have obtained is only about 30 tables; it is mostly call related data and not image related data; There would be daily load 3-4 times and mostly query only.

    >You mention a data load of some sort - are you suggesting that your involvement will be to create >some kind of failover solution or reporting database that will be updated regularly?

    Reporting database that will be updated regularly. For daily query, weekly/quarterly query

    >You will obviously need to be running SQL Enterprise and using a 64 bit environment.

    We are in 64 bit environment

    >Depending on the design, you may want to consider splitting the database into pieces and

    >creating a federated server solution to spread processing load across multiple servers - scale out

    >rather than just scaling up. What kind of budget are you going to be working with?

    When you say splitting db into pieces, are you indicating putting data across multiple servers and then fetching data from server.db.ado.table?

    >If you are creating a reporting solution, I would recommend you invest some time in reading the >Microsoft white papers on project REAL. This was an MS team assigned to create a real world

    >data warehouse solution for a very large database. The documentation they published was not

    > the usual marketing garbage. They even suggested not using some of the MS best practices and

    > have indicated product weaknesses and ways to avoid them.

    Will look into this.

    Thanks for the suggestions till now.

  • One correction to my earlier posting. This 15+ TeraByte db will have a Stanbdy with log shipping taking place (with 1 min lag);

    Is there a way to reduce this interval?

    Thanks

  • Hi,

    - I recommend that you run this in a cluster to help protect against hardware failures.

    - Why do you need logshipping with such a small delay? If you are after such a short delay, use database mirroring. Or you could use clustering, mirroring, t-log shipping and SAN based replication all together! (we use 3/4).

    - Get used to the world of 'SAN' especially RAID levels and replication (remote and local).

    - Use mount points instead of drive letters.

    - Do you have SLAs for recovery of this database? If so, look up 'Timefinder' or 'Snapshot backups' (basically, they are very fast backups, but require a lot of additional storage).

    - When performing backups, split the backup into multiple files.

    - Use Windows Server Datacenter edition.

    I support a multi terabyte database with 7000 users. It runs on a pretty beefy server 40 CPUS and 160GB of memory. Uses all of the memory but only about 50% CPU.

  • Ben H (9/25/2008)


    Hi,

    - I recommend that you run this in a cluster to help protect against hardware failures.

    - Why do you need logshipping with such a small delay? If you are after such a short delay, use database mirroring. Or you could use clustering, mirroring, t-log shipping and SAN based replication all together! (we use 3/4).

    - Get used to the world of 'SAN' especially RAID levels and replication (remote and local).

    - Use mount points instead of drive letters.

    - Do you have SLAs for recovery of this database? If so, look up 'Timefinder' or 'Snapshot backups' (basically, they are very fast backups, but require a lot of additional storage).

    - When performing backups, split the backup into multiple files.

    - Use Windows Server Datacenter edition.

    I support a multi terabyte database with 7000 users. It runs on a pretty beefy server 40 CPUS and 160GB of memory. Uses all of the memory but only about 50% CPU.

    Thanks for the reply. I will do fact finding. The plan is to use Direct Attached Storage (no SAN) with raid. Can you shed more light on your statement (we use 3/4)? I am not sure what is trying to be conveyed.

    Thanks

  • Hey,

    By 3/4 i meant we use 3 of the 4 availablity / redundancy options I listed.

    We use:

    - SAN based replication (which you will not be able to use)

    - DB mirroring, and

    - clustering

    Also, all SQL backups are backed up to tape

    I recommend you use

    - Clustering (protects against server hardware failures)

    - Tape backups, (protects against accidental data loss and DAS failure)

    - And if you want a warm standby or a reporting solution, logshipping. (I much prefer logshipping then mirroring).

    Will your tape backup unit be able to plug directly into your DAS?

    Don't be surprised if you can't perform a daily 'full' due to DAS, network and tape backup speed. Look up 'SQL differential backups'.

    If you are using DAS, you will probably limited to the number of servers which can connect to it so you may have a maximum of a 2 node cluster.

  • If this 15TB of data is important to your business, it is important to get the right advise. I recommend you get a consultant with experience of this size of SQL Server database to advise your organisation on the best way to set it up.

    There are very few people with this type of experience in the SQL Server world, but Microsoft can supply the right people if you cannot find them elsewhere.

    People in this forum can provide some good advise, but it is in their own time and without full details of your requirements. If I was in your position, I would want something more than free advise before commiting the $$$ needed to support 15TB on a platform I was not familiar with. But I would use the comments from this forum to help decide if a potential consultant really had the skills needed for the job.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

  • I couldn't agree more. Best to get it right the first time.

    It may cost $1000-$2000 a day for a top consultant, but that is nothing compared to the cost of the hardware and ongoing maintenance / support.

  • Let me add my recommendation to the other two about getting professional help. You have absolutely no chance of success in this endeavour. There are only a handful of people in the world with experience with SQL Server at that scale (15TB), and most of them work with Microsoft's CAT or Consulting Services. Solid Quality and HP probably have a few. (disclaimer: I am a Mentor with Solid Quality)

    Best of luck with the project! Sounds like a lot of fun. Would be amazing to get to play with that kind of hardware and data!! 😎

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • I am hesitating to ask...why not using Oracle?

  • Vivien Xing (9/30/2008)


    I am hesitating to ask...why not using Oracle?

    1) cost

    2) complexity

    3) may be sql server only app (although doubtful)

    4) ease of development

    5) did I mention cost?? 😀

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • I can tell you this from my own experience. DAS is NOT a practical solution at all for a database that size.

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply