SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


scalable hardware solution for 10TB now to 100TB in 3 years


scalable hardware solution for 10TB now to 100TB in 3 years

Author
Message
mlbauer
mlbauer
Grasshopper
Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)

Group: General Forum Members
Points: 22 Visits: 132
Hello experts,

we are looking for a server solution for the next three years - at least. We collected 6TB in the last 3 years. We are using two direct attached disk boxes with 25 disks each connected to a dual socket quad core server. Currently RAID 5 due to space requirements. Storage and performance needs are growing with an estimate target capacity of 50TB ... 100 TB in the next 3 years. Any query should take same or less time than now, using the complete history of collected data.
We are skeptic about a SAN solution using a NetApp filer since we dont believe that a SAN solution is able to deliver the same performance as direct attached storage. We also fear that a netapp SAN system could be 5 to 10 times more expensive. Whats yor opinion and experience with SAN performance? How good do SANs scale wrt. performance and capacity? SAN backup is extremely expensive due to strange political internal contracts (imho). Expert opinions wanted!
alen teplitsky
alen teplitsky
SSCertifiable
SSCertifiable (7K reputation)SSCertifiable (7K reputation)SSCertifiable (7K reputation)SSCertifiable (7K reputation)SSCertifiable (7K reputation)SSCertifiable (7K reputation)SSCertifiable (7K reputation)SSCertifiable (7K reputation)

Group: General Forum Members
Points: 6958 Visits: 4674
where i work we're on HP and you should be able to do this with Proliant servers. Get a DL 380 with 2-3 P812 RAID controllers. for the storage get a MSA 60 with the 2TB SATA drives. it holds 12 drives so it's 24TB raw storage, 22TB after RAID5 and i'm not sure how much after formatting. this is based on one RAID5 per MSA.

Each P812 has 2 external connectors and you can cascade something like 8 MSA's per connector

your big headache is going to be backup. for something like this you will need LTO-5 tapes or a D2D solution with a lot of storage. I would make sure that your DB server and backup servers are 10 gigabit and 6G SATA/SAS. A Proliant DL 380 G5 will be too slow due to it's older I/O channel

Check out the Proliant DL 380 G7. G8 will be out April to June of 2011.
mlbauer
mlbauer
Grasshopper
Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)

Group: General Forum Members
Points: 22 Visits: 132
Hello,

thank you for your ideas.

A few additional questions: For better IO/s performance we think about using more drives with 1GB (or less). So we get 11TB in a RAID5. This gives us 8x11=88TB for a single server machine which is near our capacity targets -- great. Backup will be done with a second identical server+storage system.

Now the next question: Given this machine with 88TB, how can we further scale capacity and performance when this system is not enough. This will be 3 years from now.
alen teplitsky
alen teplitsky
SSCertifiable
SSCertifiable (7K reputation)SSCertifiable (7K reputation)SSCertifiable (7K reputation)SSCertifiable (7K reputation)SSCertifiable (7K reputation)SSCertifiable (7K reputation)SSCertifiable (7K reputation)SSCertifiable (7K reputation)

Group: General Forum Members
Points: 6958 Visits: 4674
they will probably have higher capacity hard drives by then that you can replace into your existing RAID5

just make sure you're running Windows 2008 R2
mlbauer
mlbauer
Grasshopper
Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)

Group: General Forum Members
Points: 22 Visits: 132
ok, the disks will have more capacity, but speed probably wont be enough. All queries will take twice the time for twice the data. Everybody says the number of drives or spindles limits the IO/s. We need a solution that can grow wrt. capacity AND PERFORMANCE. We also would need twice the CPU peformance, so a new server will be necessary. It may be necessary to store 500TB in less than 3 years. How could we do that? One server wont have enough performance.
Toby Harman
Toby Harman
Ten Centuries
Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)

Group: General Forum Members
Points: 1181 Visits: 670
1. If you want manageable, extensible storage then this is what SANs are for. The one you must avoid like the plague is NAS as this will force all your I/O though the network cards.
2. If you want the data to be accessible and writable at the same speed then you need to start considering different architecture as RAID 5 slows down for writes when you add more spindles.

Suggestion:
Since you are posting on an SQL based forum I'm going to assume that there is an SQL database over the top of this.

Once you have separated log files onto different physical spindles, you may want to consider using a mix of RAID 5 and RAID 1+0. If you can define a Primary Key which can separate the data based on its age then you can put older data onto a RAID 5 array and use the Sliding Window Partition (see the SQL Server Central article here for explanations, and place historical data onto Filegroups that are on RAID 5, and current data onto a filegroup on the RAID 1+0 array. When the time comes to move a filegroup from Current to Historical you can shut down the SQL and relocate the file in a scheduled maintenance window.

HP / Compaq also make quite a nice solution called the StorageWorks 4000 which is highly extensible. I'm sure that IBM have something and EMC/Clariion will have a solution too.
Steve Jones
Steve Jones
SSC Guru
SSC Guru (143K reputation)SSC Guru (143K reputation)SSC Guru (143K reputation)SSC Guru (143K reputation)SSC Guru (143K reputation)SSC Guru (143K reputation)SSC Guru (143K reputation)SSC Guru (143K reputation)

Group: Administrators
Points: 143732 Visits: 19424
I tend to agree with Toby. R5 has a write penalty, and to get large storage, you need to move to a SAN, both for IOPS performance and capacity.

Follow me on Twitter: @way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com
mlbauer
mlbauer
Grasshopper
Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)

Group: General Forum Members
Points: 22 Visits: 132
Thank you for your helpful suggestions. I understand that a san like storageworks could be a solution. I'm not a SAN expert, but I doubt that such a system would deliver the performance we are looking for. The FC network between SAN storage and servers would be a performance bottleneck. We want to scan the *complete data in less than 1 hour* - which is possible with our direct attached storage now. We have approximately 1Gbytes per second now with a 2 Quadcore CPU server . In the future we will have 10 t
o 100 times more data, so we would need a network for 10 GBytes per second to 100GBytes per second and we would need 20 to 200 quadcore CPUs for that. Could you give me an example how this can be done? Even if the SAN can do this, we would need a cluster of 10 to 100 servers for the needed number of CPUs.
Toby Harman
Toby Harman
Ten Centuries
Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)

Group: General Forum Members
Points: 1181 Visits: 670
We want to scan the *complete data in less than 1 hour* - which is possible with our direct attached storage now


I question this, but will bow to your experience. Ultimately *everything* which can be achieved using local attached storage can be achieved using a SAN, and more. SAN is more expensive that Direct Attached, but is significantly more flexible and capable.

In the future we will have 10 to 100 times more data, so we would need a network for 10 GBytes per second to 100GBytes per second and we would need 20 to 200 quadcore CPUs for that.


At this point my recommendation becomes really simple. If you are going to be getting that much hardware then get the pre-sales storage architects from HP / IBM / EMC into your office and explain that you are about to spend a LOT of money with them. When they have finished drooling over you, sit them down and explain the problem. Get them to provide a solution with a written performance guarantee.

Could you give me an example how this can be done? Even if the SAN can do this, we would need a cluster of 10 to 100 servers for the needed number of CPUs.


Not without spending a lot more time on this, and I would still recommend getting one of the SAN vendors in.
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (210K reputation)SSC Guru (210K reputation)SSC Guru (210K reputation)SSC Guru (210K reputation)SSC Guru (210K reputation)SSC Guru (210K reputation)SSC Guru (210K reputation)SSC Guru (210K reputation)

Group: General Forum Members
Points: 210727 Visits: 41973
mlbauer (12/23/2010)
All queries will take twice the time for twice the data.


Not really true and "It Depends" pevails a whole lot in that area. Partitioned tables can really help in that area as can properly written code and proper indexing as well as a nice solid database design and effective/regular maintenance of the database.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search