Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

SQL Man of Mystery

Wes Brown is a PASS chapter leader and SQL Server MVP. He writes for SQL Server Central and maintains his blog at http://www.sqlserverio.com. Wes is Currently serving as a Senior Lead Consultant at Catapult Systems. Previous experiences include Product Manager for SQL Litespeed by Quest software and consultant to fortune 500 companies. He specializes in high availability, disaster recovery and very large database performance tuning. He is a frequent speaker at local user groups and SQLSaturdays.

SSD, The Game Changer

I’ve often described SQL Server to people new to databases as a data pump.

Just like a water pump, you have limited capacity to move water in or out of a system usually measured in gallons per hour.
If you want to upgrade your pumping systems it can be a two fold process, the physical pump and the size of the pipes.

Our database servers also have several pumps and pipes, and in general you are only as fast as your slowest or narrowest pipe, hard drives.

To feed other parts of the system we have resorted to adding lots and lots of hard drives to get the desired IO read/writes and MB/sec throughput that a single server can consume.
Everyone is familiar with Moore’s law (Often quoted, rarely understood) loosely applied says CPU transistor counts double roughly every 24 months. Hard disks haven’t come close to keeping up with that pace, performance wise.

Up until recently, hard drive capacity has been growing almost at the same rate doubling in size around every 18 months (Kryder’s Law). The problem isn’t size is speed.


Lets compare the technology from what may have been some folks first computer to the cutting edge of today.

Time Circa 1981 Today improvement
Capacity 10MB 1470MB 147x
HDD Seeks 85ms/seek 3.3ms/seek 20x
IO/Sec 11.4 IO/Sec 303 IO/Sec 26x
HDD Throughput 5mbit/sec 1000mbit/sec 200x
CPU Speed 8088 4.77Mhz (.33 MIPS) Core i7 965(18322 MIPS) 5521x

*These are theoretical maximums in the real world you mileage may vary.

 

I think you can see where this is going. I won’t go any further down memory lane lets just say that some things haven’t advanced as fast as others. As capacity has increased the speed has been constrained by the fact hard disks are just that, spinning disks.

So, what does this little chart have anything to do with SSD? I wanted you to get a feel of where the real problem lies. It isn’t capacity of hard drives it’s the ability to get to the data quickly. Seeks are the key. SSD’s have finally crossed a boundary where they are cheap enough and fast enough to make it into the enterprise space at all levels.

SSD compared to today’s best 15k.2 HDD from above.

  HDD SSD improvement
seek times 3.3ms/seek 85µs/seek 388x
IO/Sec 303 IO/Sec 35000 IO/Sec 115x
Throughput 1000mbit/sec 25000mbit/sec 2.5x

 

So, in the last few years SSD has caught up and passed HDD on the performance front by a large margin. This is comparing a 2.5” HDD to a 2.5” SSD. This gap is even wider if you look at the new generation of SSD’s that plug directly into the PCIe bus and bypass the drive cage and RAID controller all together. HOT DOG! Now we are on track. SSD has allowed us to scale much closer to the CPU than anything storage wise we have seen in a very long time.

Since this is a fairly new emerging technology I often see allot of confused faces when talking about SSD. What is in the technology and why it has now become cost effective to deploy it instead of large raid arrays?

Once you take out the spinning disks, the memory and IO controller march much more to the tune of Moore’s law than Kryder’s meaning cost goes down, capacity AND speed go up. Eventually there will be an intersection where some kind of solid state memory, maybe NAND maybe not, will reach parity with spinning hard drives.

But, like hard drives not all SSD’s are on the same playing field, just because it has SSD printed on it doesn’t make it a slam dunk to buy.

Lets take a look at two implementations of SSD based on MLC NAND. I know some of you will be saying why not SLC? I’m doing this to get a better apples to apples comparison and to put this budget wise squarely in the realm of possibility.

 

Intel x25-M priced at 750.00 for 160GB in a 2.5” SATA 3.0 form factor and the Fusion-io IoDrive Duo 640GB model priced at 9849.99 in a PCIe 8x single card.

Drive Capacity in GB Write Bandwidth Read Bandwidth Reads/sec Writes/Sec Access Latency (seek time) Wear Leveling
(writes-erase/day)
Cost per Unit Cost per GB Cost per IO Reads Cost Per IO Writes
IoDrive Duo 640 1000MB 1400MB 126601 180530 80µs 5TB $9849.99 $15.39 $0.08 $0.06
X25-M 160 70MB 250MB 35000 3300 85µs 100GB * $750.00 $4.60 $0.02 $0.22
Improvement 4x 14x 5x 4x 55x ~ 10x 13x 4x 4x -4x

* This is an estimate based on this article http://techreport.com/articles.x/15433. Intel has stated the drive should be good for at least 1 petabyte in write operations or 10,000 cycles.

 

Both of these drives use similar approaches to achieve the speed an IO numbers.They break up the NAND into multiple channels like a very small RAID array. This is an over simplification but gives you an idea of how things are changing. It is almost like having a bunch of small drives crammed into a single physical drive shell with it’s own controller a mini-array if you will.

So, not all drives are created equal. In Intel’s defense they don’t plan the X25-M to be an enterprise drive, they would push you to their X25-E which is an SLC based NAND device which is more robust in every way. But keeping things equal is what I am after today.

To get the X25-M to the same performance levels it could take as few as 4 drives and as many as 55 depending on the IO numbers you are trying to match on the IoDrive Duo.

Wear leveling is my biggest concern on NAND based SSD’s. We are charting new water and really won’t know what the reliability numbers are until the market is aged another 24 to 36 months. You can measure your current system to see how much writing you actually do to disk and get a rough estimate on the longevity of the SSD. Almost all of them are geared for 3 to 5 years of usability until the croak.

At a minimum it would take 10 X25-M drives to equal the stated longevity of a single IoDrive Duo.

Things also start to level out once you factor in RAID controllers and external enclosures if you are going to overflow the internal bays on the server. That can easily add another $3000.00 to $5000.00 dollars to the price. All the sudden the IoDrive Duo really starts looking more appealing by the minute.

 

What does all this mean?

Not all SSD’s are created equal. Being constrained to SATA/SAS bus and drive form factors can also be a real limiting factor. If you break that mold the benefits are dramatic.

Even with Fusion-io’s cost per unit it, is still pretty cost effective in some situations like write heavy OLTP systems, over other solutions out there.

I didn’t even bother to touch on something like Texas Memory System’s RamSan devices at $275000.00 for 512GB of usable space in a 4U rack mount device the cost per GB or IO is just through the roof and hard to justify for 99% of most SQL Server users.

You need to look closely at the numbers, do in house testing and make sure you understand your current IO needs before you jump off and buy something like this. It may be good to also look at leveraging SSD in conjunction with your current storage by only moving data that requires this level of performance to keep cost down.

If this article has shown you anything it’s technology marches on. In the next 6 to 12 months there will be a few more choices on the market for large SSD’s in the 512GB to 2TB range by different manufacturers at ranging prices making the choice to move to SSD even easier.

Recently, Microsoft Research in early April published a paper where they examined SSD and enterprise workloads. They don’t cover SQL Server explicitly but they do talk about Exchange. The conclusion is pretty much SSD is too expensive to bother with right now. To agree and disagree with them it was true several months ago, today not so much.

The fact that the landscape has change significantly since this was published and will continue to do so, I think we are on the verge of why not use SSD instead of do we really need it.

With that said please, do your homework before settling on a vendor or SSD solution, it will pay dividends in not having to explain to your boss that the money invested was wasted dollars.

A little light reading for you:

SSD Primer
http://en.wikipedia.org/wiki/Solid-state_drive


James Hamilton’s Blog
http://perspectives.mvdirona.com/2009/04/12/WhereSSDsDontMakeSenseInServerApplications.aspx

 

-Wes

Comments

Posted by Andy Warren on 14 April 2009

Wes, I'm using an SSD in my laptop and it's definitely faster and smoother in some cases, but it definitely came with a premium price tag - about $500 worth in Sep 2008. Given how much we already do caching to avoid disk access, how often do you think this matters on a SQL box once you get reasonable cache fill?

Posted by Wesley Brown on 14 April 2009

Well, that is the real kicker isn't it?

You are absolutely right, the bulk of SQL Server memory is simply allocated as cache to begin with.

With smaller databases where you can keep a solid amount of data in memory more than just the 300ms page life expectancy that is the minimum, it may not matter as much.

But for large data warehouses where you can blow right by 32GB of ram or in busy OLTP systems where you are write heavy I think it could be significant, most of the time latency of writes is a big killer on busy systems. Even if you can keep disk access latency under 10ms it may be to slow, SSD changes that in a huge way.

Plus, it can "fix" performance issues where bad code forces scans to begin with. I hate that it can mitigate some of those issues since I've seen to many problems that were code based get hardware thrown at it as a fix instead.

I'm working on benchmarking some SSD options now so I'll be able to follow this up with something closer to real world experience in a few weeks.

Posted by Steve Jones on 15 April 2009

This might be very interesting for splitting up systems, moving high access/high need data (in filegroups) to SSDs, moving tempdb, staging DBs for ETL, etc. and manage the cost with lower cost, traditional storage.

Kind of an HSM scenario.

Thanks, Wes!

Posted by ramesh on 19 April 2009

I checked this with Microsoft for one of our clients.

MS said SSD not supported on Windows 2003 and SQL Server won't benefit from it.  currently supported for Windows 7 OS only.  In future they may release a patch which may work with SSD.

Cheers

Ramesh

Posted by Bruce W Cassidy on 19 April 2009

I'll wait and see on this one.  I think the issue with limited writes is concerning.  But I'm expecting great things from SSDs.

There was an article several years ago now about "bubble memory" and how it would be printable on plastic and non volatile.  So I got all excited then.  Now, I'm still waiting.

But soon, I think.

Posted by Wesley Brown on 20 April 2009

Ramesh,

If you got that kind of blanket statement from someone at Microsoft I would ask to talk to someone higher on the food chain.

Bruce,

I agree, the write issue is the one thing that is really got me wondering on what the true available life span is on SSD's.

Posted by Chris on 21 April 2009

I think perhaps you meant 1470GB (not MB) for todays capacity, which is actually a 147,000x improvement over the 10MB of 1981

Posted by David.Poole on 21 April 2009

With SDD you need to check the write speed carefully.  Read speed will be blindingly fast but the big discriminator is how fast the disks can process writes.

For laptops I have anechdotal evidence that SDDs benefit battery life!

I've read conflicting things about the life of SDD drives.  The general quote is 1 million writes per block.  In some cases authors have said this means a very short lifespan for a drive but others say that even with a continuous logger this means a double-digit year life.

Posted by Andrew Peterson on 21 April 2009

It is something to watch.  From an economic standpoint, in the short term, it might be better to just find ways to add more RAM, rather than use a SSD.  Long term, who knows. As the price points drop, SAN vendors might use them as large cache buffers, similar to what EMC does (EMC uses volatile memory, so keep the batteries charged).

Posted by Wesley Brown on 21 April 2009

Chris,

I was comparing a top of the line 15k drive to the top of the line MFM drive. You are right though space capacity at the lower end of the spectrum is huge since the 2TB drives are now coming to market!

David,

Yeah, if you note write speed even on the fusion-io card is 2/3 of read and on the x-25M is is 1/10th! write life is going to be an ongoing issue and until we have solid field results it is going to be difficult to predict with 100% accuracy what the life will be.

Andrew, I agree to a point, SQL Server uses RAM as a huge buffer to begin with but with large databases you can blow right past that. I have worked with databases with hundreds of millions of records in a single table that wouldn't fit in any amount of RAM. At some point you always end up back at the disk. If you look at the TPC-C benchmarks it isn't unusual to see the very top end basically caching the databases in memory but we are talking HP Superdomes and the like were you can put 512GB of ram or more.

Posted by Russell T. Gould, Ph.D. on 21 April 2009

This facinating topic does beg the question - What sort of performance change would we see if we only moved paging onto one of these devices?  Seems like a judicious mix could do some impressive things for Pac Man-like memory-chomping tasks: ETL anyone?

Posted by Steve Jones on 21 April 2009

I've been looking for some links with real world tests, so far not a lot. EVE ONLINE uses some.

Also, a research paper from MS: research.microsoft.com/.../ssd.pdf

MS does support these, and if you heard this about SQL Server, you talked to the wrong person. Maybe not a thumb drive, but they are supported. There are customers running Fusion and Texas Memory disks with SQL Server.

Posted by randall.martin on 21 April 2009

My 2 cents. We have some terabyte size databases where I'm at now, and even when we throw 32 GB of RAM at those servers and their running 95% cache hit rates, you still see the RAID drives humming almost constantly. So I could definitely see a place for this technology, if nothing else for tempdb.

Posted by Tony Garcia on 21 April 2009

Great article, much enjoyed.  I am an SSD fan and have been following the various technologies now for about a year.  I have a test server where I implemented an Areca 1231-ML (Intel IOP 341 @ ~800mhz, 2GB cache) and 8x Intel X25-E 32GB SSDs in RAID5.  This server was meant for R&D for future implementation of SQL Server databases on this platform of drives and controllers (although we may use the 1680 series because they are clocked at ~1200mhz and can get a bit more throughput).

Some commentary: the use of X25-M drives here is not really a good comparison to FusionIO.  FusionIO is clearly enterprise-grade equipment.  The M in X25-M stands for mainstream.  I have a 160GB X25-M in my laptop, and this is the environment those drives are made for.

In any case, the other Intel SSD drive family (X25-E) is capable of 200+MB/sec for both read and write.  On the above mentioned configuration with the X25-E x8 (32gb) and Areca 1231ML, I get ~800MB/sec sustained read and write.  HDTune shows ~10K IOPS, but I really need to vet this more thoroughly with IOMeter and/or XBench.

Bottom line, the X25-E drives (also now available in 64GB versions) plus an appropriate controller end up with nearly the same throughput as a FusionIO for a similar price, but most likely due to controller shortfalls, will not see the same IOPS as a FusionIO.

Your database load will dictate whether or not you need the extra IOPS.  A word of caution, as far as I know (and I really like FusionIO, I'd like to get one in for further testing) FusionIO is not internally redundant.  This may lean some applications towards an appropriately fast controller with X25-E drives in RAID 5 or RAID 6.

IMHO, to get ~800MB/sec sustained read and write for about $4K and get a usable 224GB of space with the Intel X25-E/Areca solution is not a bad buy at all.  Another word of caution though, I did have one X25-E go bad after 3 weeks.  So just remember, solid state != 100% reliability, even before the cells begin to wear.  A database is a really, really bad place to not get that warm and fuzzy feeling of disk redundancy.

(One side note, it was quite amazing/fun to watch the SSD array initialize RAID 5 and do a complete format/write of the entire drive array adding parity in just about 2-3 minutes.  I was used to Velociraptors and spindle disks taking 6+ hours, and then some!)

Posted by tony.garcia on 21 April 2009

Sorry, one other note.  The 1231ML is capped at about 800MB/sec.  The 1680 series seems to be capped around 1.2GB/sec.  8x X25-Es even in RAID5 should be capable of at least 1.2GB/sec, as I've found they scale quite linearly with Areca hardware.

The big issue is that SSD speeds are such a quantum leap forward that most controller manufacturers were not prepared for it, so you end up buying 12 port controllers (1231ML) that can be maxed out by 4 SSD drives in RAID0 (800MB/sec).

The future for SSD and databases looks bright!  Spindles are not long from being almost entirely gone from the world of random database IO, but they certainly have a long term place in the world of mass storage.

Posted by Wesley Brown on 21 April 2009

Hey Tony,

I used the X-25 for a reason it is an MLC based card and not an SLC card. The Fusion-IO I reference is also and MLC card. I also understand they are targeted at different segments as well. I will say the X-25M is still best of breed for MLC based 2.5" SSD's. OCZ is coming on strong in that space and they have a 1TB PCIe card in the works as well. This market space is opening up quite nicely.

We are gearing up to implement the Fusion-io cards and the redundancy issue is well taken. The Duo's look like two drives to the host and we are allocating two for each system and doing RAID 10 across both cards. At that point it is as fault tolerant as any other HDD solution and you have cut out the chassis and RAID controllers which can both be bottlenecks in the IO chain.

I think removing the controller and going directly on bus is the best way to really open up the performance on the SSD front.

Awesome comments!

Posted by tony.garcia on 21 April 2009

Wesley,

Great feedback, thank you.  I see your point now regarding MLC.

It's worth noting that I consulted with a colleague who is currently using several FusionIO products and they are not RAID redundant (other than using two separate cards to mirror each other, or using the new Duo series and setting up mirroring), but they are still redundant.  Flash inherently has the capability to easily write data, then check (read) the data to make sure it was written properly, and if not, relocate the data.  FusionIO does this, as well as maintains several extra flash chips in the event an entire chip goes bad.  You can read more about their redundancy strategy on their site, but I was mistaken to say they are not redundant.  This is all part of the spindle-to-SSD revolution.  I should also now note that redundancy != RAID :)

In any case, I totally agree -- keeping the storage near the bus is by far the best way for apps and databases that can fit the storage they need within their PCIe slot constraints.

One final note (not to make this purely an SSD article!) but Samsung's SSDNow! drives are also MLC and promise to hit 200+MB/sec for both read and writes!  They are only out in channel to (mostly) laptop mfgrs at this point, but will be available via retail soon.  The SSD revolution is undoubtedly happening in real time!

Thanks again for the article.  Much enjoyed.

Posted by tony.garcia on 22 April 2009

Minor correction, Kingston's relabled Intel drives are SSDNow.  Samsung's drives are FlashSSD.

Posted by MudLuck on 22 April 2009

Great Article author.  I have seen this topic surfacing on boards for several years now.  I guess all of us are to poor to make it a reality but it is nice dreaming for the days when budgets are fat again!  Any way I like how you broke out the stats by comparing two SSD's side by side.  

I would be interested to see the energy costs say on a daily/monthly basis when comparing a SAN versus a comparable size of SSD's.  

I'm also wondering if the caching and controllers (fiber channel) found in SAN's that have been invented to increase performance might also be leveraged to get past the writing to SSD issues others have mentioned.

Also when I hear people talking about SSD I never hear that they are using them in  RAID configurations and what the implication of doing so are.  I'm certain that all of us don't want disaster recovery clean up every three years as was alluded to in the article.  I mean HDD's as I have know them are hot swappable, mirrored and up all the time.  You don't think of a "classic" HDD as wearing out.  Most HDD's I have seen in production have outlived there usefulness as we upgraded to the larger better SAN.

Finally I leave you with a link to a good article on the implications of SSD to DB performance.

"The Magic Triangle of Performance"  

To me this idea is paramount to the performance gains you speak of that SSD offers and points to the fact that

"These three things (CPU, I/O, and memory) must be balanced in order to tune for optimal performance on the database"

As I/O has classically been the bottleneck a huge jump in performance gains imply that we will also need to beef the other sides of the triangle to keep up with I/O

dsnowondb2.blogspot.com/.../intro-to-db2-performance-tuning.html

Any way my rants again good article Poster.

Peace Out...

Posted by ramesh on 22 April 2009

Wesley thanks for the update.  

I did check with MS and it took me 4 days to get to a real technical team. ( High in the food chain).

My client wanted to buy and move to SQL 2008 with HDD.

But with out MS confirmation we did not move forward.

If you need the email id of MS consultant please email me at rameshm@sqlamerica.com and I will be more then happy to send you the email from MS. Due to client agreement I can not give the email details or contants in the forum.

Also, I did not find a single article on HDD with SQL Server till  now ( so did MS when I asked them).

Cheers

Ramesh

Posted by ramesh on 22 April 2009

sorry for the typo its SSD what I meant.

My take on new technology is they are good and worth using them as far as they are supported by Microsoft in Production environment.

I am sure SSD on a laptop will work with SQL Server and no one is going to stop you. The question is will MS support SSD with SQL server in production. My client wanted all the files on SSD not just tempdb or read only data files.

Also if any SSD issues will MS resolve it.

But I am for SSD if MS gives an offical statement that SSD is supported for SQL Server 2005/2008 on Windows 2003.

Cheers

Ramesh

Posted by tony.garcia on 24 April 2009

@MudLuck:

You are right, most people don't talk about SSD in RAID configurations.  This is (I think) usually due to cost.

It is critical that any production work on SSD is done in RAID, IMHO.  As I mentioned in one of my above posts, I had an Intel X25-E fail in 3 weeks.  It's *supposed* to be redundant and have ~40GB of total storage, with ~8GB left over to replace bad cells.  Presumably there's a utility to let you expand the usable space by eating into the spare storage, but I haven't seen it.

In any case, after having what I consider to be the current king of SSDs die in just three weeks, I made the decision that you absolutely have to run solid state in RAID.  Regular HDDs also have extra space to account for bad sectors too.  So one could make the argument that an HDD is redundant -- but we all know how that plays out in the real world.

For what it's worth, there are some great enclosure options for SSDs out there.  I use a lot of the Supermicro CSE-M14TB (black version).  They house 4 SAS/SATA 2.5" drives in a single 5.25" drive bay.  So 8 2.5" SSDs will fit in 2x 5.25 bays, and the enclosures (w/ hot swap backplane) only cost $100 ea.  I pulled the fan off the back of the backplanes as soon as I swapped from Velociraptors to SSDs.

And, in a more SQL-related note, I tested a 20GB database restore today with both the actual backup, MDF, and LDF files all being on the same RAID5 array.  The restore only took a minute or two and total disk throughput ranged from 600-1500 MB/sec during the process.

Did I mention I really love SSD and databases? :)

Posted by Wesley Brown on 26 April 2009

Wow, thanks everyone for the comments!

tony,

Right Fusion-io doesn't have "raid" over multiple cards. You can do raid 1 with a single card or across two cards since a single card actually exposes 2 drives, at the OS level. Also, you could do raid 5.

I've done a ton of research into hardware failure, and disk failures in particular. When you look at failure almost everyone stops at MTBF but that is one small part of the equation. We have to look at the failure rate of the system as a whole Mean Time to Total Data Loss MTTDL is the final output of crunching all the failure numbers of each component. SSD's failure rates are better, but not vastly so, than regular HDD's but they plug right into the equation just like any other piece of hardware and have the same kind of curve if you exclude wear leveling.

I guess the short answer is use RAID if you need data protection no matter the technology. Even if Fusion-io did Raid 1 on a single card I'd still buy two and mirror/stripe them at the OS level.

MudLuck, You are right on the money. The problems will shift around and you may find yourself CPU bound very quickly. Though, in my personal opinion I would much rather have that problem than disk bound.

Posted by Anonymous on 27 April 2009

Pingback from  The Deep Puddle » Quick scan of the net - pipe changer

Leave a Comment

Please register or log in to leave a comment.