scalable hardware solution for 10TB now to 100TB in 3 years

Question

scalable hardware solution for 10TB now to 100TB in 3 years

mlbauer

Old Hand

Points: 348
More actions
December 20, 2010 at 2:03 pm

#228986

Hello experts,
we are looking for a server solution for the next three years - at least. We collected 6TB in the last 3 years. We are using two direct attached disk boxes with 25 disks each connected to a dual socket quad core server. Currently RAID 5 due to space requirements. Storage and performance needs are growing with an estimate target capacity of 50TB ... 100 TB in the next 3 years. Any query should take same or less time than now, using the complete history of collected data.
We are skeptic about a SAN solution using a NetApp filer since we dont believe that a SAN solution is able to deliver the same performance as direct attached storage. We also fear that a netapp SAN system could be 5 to 10 times more expensive. Whats yor opinion and experience with SAN performance? How good do SANs scale wrt. performance and capacity? SAN backup is extremely expensive due to strange political internal contracts (imho). Expert opinions wanted!

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply

alen teplitsky SSC-Dedicated Points: 30011 More actions · Answer 1

where i work we're on HP and you should be able to do this with Proliant servers. Get a DL 380 with 2-3 P812 RAID controllers. for the storage get a MSA 60 with the 2TB SATA drives. it holds 12 drives so it's 24TB raw storage, 22TB after RAID5 and i'm not sure how much after formatting. this is based on one RAID5 per MSA.

Each P812 has 2 external connectors and you can cascade something like 8 MSA's per connector

your big headache is going to be backup. for something like this you will need LTO-5 tapes or a D2D solution with a lot of storage. I would make sure that your DB server and backup servers are 10 gigabit and 6G SATA/SAS. A Proliant DL 380 G5 will be too slow due to it's older I/O channel

Check out the Proliant DL 380 G7. G8 will be out April to June of 2011.

mlbauer Old Hand Points: 348 More actions · Answer 2

Hello,

thank you for your ideas.

A few additional questions: For better IO/s performance we think about using more drives with 1GB (or less). So we get 11TB in a RAID5. This gives us 8x11=88TB for a single server machine which is near our capacity targets -- great. Backup will be done with a second identical server+storage system.

Now the next question: Given this machine with 88TB, how can we further scale capacity and performance when this system is not enough. This will be 3 years from now.

alen teplitsky SSC-Dedicated Points: 30011 More actions · Answer 3

they will probably have higher capacity hard drives by then that you can replace into your existing RAID5

just make sure you're running Windows 2008 R2

mlbauer Old Hand Points: 348 More actions · Answer 4

ok, the disks will have more capacity, but speed probably wont be enough. All queries will take twice the time for twice the data. Everybody says the number of drives or spindles limits the IO/s. We need a solution that can grow wrt. capacity AND PERFORMANCE. We also would need twice the CPU peformance, so a new server will be necessary. It may be necessary to store 500TB in less than 3 years. How could we do that? One server wont have enough performance.

Toby Harman SSCarpal Tunnel Points: 4177 More actions · Answer 5

1. If you want manageable, extensible storage then this is what SANs are for. The one you must avoid like the plague is NAS as this will force all your I/O though the network cards.

2. If you want the data to be accessible and writable at the same speed then you need to start considering different architecture as RAID 5 slows down for writes when you add more spindles.

Suggestion:

Since you are posting on an SQL based forum I'm going to assume that there is an SQL database over the top of this.

Once you have separated log files onto different physical spindles, you may want to consider using a mix of RAID 5 and RAID 1+0. If you can define a Primary Key which can separate the data based on its age then you can put older data onto a RAID 5 array and use the Sliding Window Partition (see the SQL Server Central article here[/url] for explanations, and place historical data onto Filegroups that are on RAID 5, and current data onto a filegroup on the RAID 1+0 array. When the time comes to move a filegroup from Current to Historical you can shut down the SQL and relocate the file in a scheduled maintenance window.

HP / Compaq also make quite a nice solution called the StorageWorks 4000 which is highly extensible. I'm sure that IBM have something and EMC/Clariion will have a solution too.

Steve Jones - SSC Editor SSC Guru Points: 742528 More actions · Answer 6

I tend to agree with Toby. R5 has a write penalty, and to get large storage, you need to move to a SAN, both for IOPS performance and capacity.

mlbauer Old Hand Points: 348 More actions · Answer 7

Thank you for your helpful suggestions. I understand that a san like storageworks could be a solution. I'm not a SAN expert, but I doubt that such a system would deliver the performance we are looking for. The FC network between SAN storage and servers would be a performance bottleneck. We want to scan the *complete data in less than 1 hour* - which is possible with our direct attached storage now. We have approximately 1Gbytes per second now with a 2 Quadcore CPU server . In the future we will have 10 t

o 100 times more data, so we would need a network for 10 GBytes per second to 100GBytes per second and we would need 20 to 200 quadcore CPUs for that. Could you give me an example how this can be done? Even if the SAN can do this, we would need a cluster of 10 to 100 servers for the needed number of CPUs.

Toby Harman SSCarpal Tunnel Points: 4177 More actions · Answer 8

We want to scan the *complete data in less than 1 hour* - which is possible with our direct attached storage now

I question this, but will bow to your experience. Ultimately *everything* which can be achieved using local attached storage can be achieved using a SAN, and more. SAN is more expensive that Direct Attached, but is significantly more flexible and capable.

In the future we will have 10 to 100 times more data, so we would need a network for 10 GBytes per second to 100GBytes per second and we would need 20 to 200 quadcore CPUs for that.

At this point my recommendation becomes really simple. If you are going to be getting that much hardware then get the pre-sales storage architects from HP / IBM / EMC into your office and explain that you are about to spend a LOT of money with them. When they have finished drooling over you, sit them down and explain the problem. Get them to provide a solution with a written performance guarantee.

Could you give me an example how this can be done? Even if the SAN can do this, we would need a cluster of 10 to 100 servers for the needed number of CPUs.

Not without spending a lot more time on this, and I would still recommend getting one of the SAN vendors in.

Jeff Moden SSC Guru Points: 1004752 More actions · Answer 9

mlbauer (12/23/2010)
All queries will take twice the time for twice the data.

Not really true and "It Depends" pevails a whole lot in that area. Partitioned tables can really help in that area as can properly written code and proper indexing as well as a nice solid database design and effective/regular maintenance of the database.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004752 More actions · Answer 10

mlbauer (12/31/2010)
We want to scan the *complete data in less than 1 hour*

I have to ask... to what end? Why is this necessary and why will it be necessary when you have 50TB of data?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

mlbauer Old Hand Points: 348 More actions · Answer 11

Hi,

good question. The short answer is: We know our data will grow. We want to do our current tasks in the future, so we want 10x performance and capacity. It is not clear how fast our data will grow, so we want to be sure to have some additional capacity available.

P.S.

I have openend a small suvey about SAN systems here:

http://www.sqlservercentral.com/Forums/Topic1052953-377-1.aspx

Scott Murray-240410 SSCarpal Tunnel Points: 4649 More actions · Answer 12

Also.... will you continue to add data without an archive strategy? Some years of data are likely to no longer be used at some point.

mlbauer Old Hand Points: 348 More actions · Answer 13

hi,

you are right. Old data will be less important at some point of time. An archive will be necessary for historic data. But i would be nice to have it available - maybe with less speed.

We are doing data mining, so a large collection of historic data will help us. We are currently developing and testing different ways of analyzing our data so it will help us to have lots of data available with a performance high enough to do many experiments without having to wait for weeks in every step of development.

At the moment, our hardware guys have a clear tendency towards netapp hardware, with an estimated cost of about 1 million euros for a 200 TB solution. This is a huge leap of costs compared to our current hardware and it would be nice to have at least one or two alternative suggestions for the discussion about the best hardware for our purposes. Does any of the SAN manufacturers provide any technical feature that others do not?

P.S. Our options seem to be only HP or NetApp.

Michael Valentine Jones SSC Guru Points: 64815 More actions · Answer 14

You may want to look into SSD storage. For example, this product promises 6 GB/Sec of bandwidth on a 5 TB device.

http://www.fusionio.com/products/iodriveoctal

With 10 units, you would have 60 GB/Sec of IO bandwidth with 50 TB of storage. That would be enough bandwidth to let you read 50 TB in about 14 minutes.

Of course you may want to verify that the vendors product can actually do what they claim. 🙂

There are plenty of other potential bottlenecks when you get into this area: PCI bus speed, memory speed, front-side bus speed, processor speed, etc. I think you will find this a difficult challenge with current hardware.

I would recommend waiting as long as possible to buy the hardware, instead of trying to buy something now that will be good for three years. Performance of hardware per dollar will be much better later, especially for emerging technology like SSD storage.

I would also look into database compression if you are not already using it. If you can get 70% compression that will save a lot of space and IO bandwidth. Use it with partitioned tables to tailor compression for best performance, like compressing anything older than 90 days. Even though it uses more CPU, you save a lot on IO and memory footprint.

Also, I would seriously explore the importance of this to the business. It is easy to demand fantastic performance when you don't understand the cost, but when you start talking millions of dollars people will take a harder look at the value they are getting for that money. Perhaps a solution where they could see all the recent data quickly would be enough. Or you might be able to break the most important data out to a smaller dataset that doesn't require as much time to query.