﻿<?xml version='1.0' encoding='UTF-8'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>SQLServerCentral / Discuss Content Posted by David Poole / Article Discussions / Article Discussions by Author  / Storage - A meeting of minds / Latest Posts</title><generator>InstantForum.NET v2.9.0</generator><description>SQLServerCentral</description><link>http://www.sqlservercentral.com/Forums/</link><webMaster>notifications@sqlservercentral.com</webMaster><lastBuildDate>Tue, 21 May 2013 12:56:02 GMT</lastBuildDate><ttl>20</ttl><item><title>RE: Storage - A meeting of minds</title><link>http://www.sqlservercentral.com/Forums/Topic1329902-60-1.aspx</link><description>[quote][b]Jo Pattyn (7/16/2012)[/b][hr]Great article, will reread it twice[/quote]I've read this article more than once.My experience is also of a lack of dialogue between SAN, DB and application guys.</description><pubDate>Tue, 18 Dec 2012 02:16:02 GMT</pubDate><dc:creator>paul s-306273</dc:creator></item><item><title>RE: Storage - A meeting of minds</title><link>http://www.sqlservercentral.com/Forums/Topic1329902-60-1.aspx</link><description>[quote]BTW, did you mean that solutions like FastTrack work in "symphony" with hardware, not "sympathy" ??Br, Mark Kromer[/quote]Sorry, a Freudian slip.  Though having seen FastTrack perform its worth a fanfare.</description><pubDate>Tue, 24 Jul 2012 10:00:33 GMT</pubDate><dc:creator>David.Poole</dc:creator></item><item><title>RE: Storage - A meeting of minds</title><link>http://www.sqlservercentral.com/Forums/Topic1329902-60-1.aspx</link><description>Wonderful article, David! I think of AlwaysOn with read-only secondaries as an implementation on CQRS and was happy to see that pattern mentioned in your story.BTW, did you mean that solutions like FastTrack work in "symphony" with hardware, not "sympathy" ??Br, Mark Kromer</description><pubDate>Sun, 22 Jul 2012 18:54:15 GMT</pubDate><dc:creator>mkromer</dc:creator></item><item><title>RE: Storage - A meeting of minds</title><link>http://www.sqlservercentral.com/Forums/Topic1329902-60-1.aspx</link><description>The article is good.  The discussion has been even better.</description><pubDate>Tue, 17 Jul 2012 11:44:28 GMT</pubDate><dc:creator>Bill Kline-270970</dc:creator></item><item><title>RE: Storage - A meeting of minds</title><link>http://www.sqlservercentral.com/Forums/Topic1329902-60-1.aspx</link><description>[quote][b]ryan.offord (7/17/2012)[/b][hr]Having the right design makes the world of difference but no matter what we do if there is an underlying latency you will not remove it by schema design.[/quote]If an organic NF schema reduces data footprint by an order of magnitude, not unlikely, then not only are there fewer bytes on the wire, there's less complication in the client code.  Client code could be reduced to simple display.  Doing what your grandfather did, using a better hardware to just do what the old software wanted, is the suboptimal choice this time.</description><pubDate>Tue, 17 Jul 2012 06:21:39 GMT</pubDate><dc:creator>RobertYoung</dc:creator></item><item><title>RE: Storage - A meeting of minds</title><link>http://www.sqlservercentral.com/Forums/Topic1329902-60-1.aspx</link><description>I was working on some calculations yesterday around latency and different types of storage. Using some 'typical' figures you would see SATA being around 40 milliseconds of latency, SAS would be in the region of 20 milliseconds and PCIe around 50 [b]microseconds[/b]. You can tweak the speed that data comes back but you still have that basic latency. This is something you have to add to every single request to get data back from the disks. Every touch point in the setup has the potential for latency and also the potential for failure. Network card, cable, switch and so on until you finally get to a disk that has to revolve (and you wait your turn with other requests). The beauty of systems like FusionIO is that it removes these failure / latency points, gives you a massive performance lift and allows you to have a less complex system overall while reducing costs.If you had the same schema design, data and queries and did a direct comparison between the three you will see a clear performance winner. The complexity gets reduced significantly as does the failure points and latency.Having the right design makes the world of difference but no matter what we do if there is an underlying latency you will not remove it by schema design.</description><pubDate>Tue, 17 Jul 2012 01:31:24 GMT</pubDate><dc:creator>ryan.offord</dc:creator></item><item><title>RE: Storage - A meeting of minds</title><link>http://www.sqlservercentral.com/Forums/Topic1329902-60-1.aspx</link><description>[quote][b]RobertYoung (7/16/2012)[/b][hr]Missing, not for the first time in such essays, is discussion of normal forms,[/quote]Would I be correct in thinking this translates as "design your OLTP database properly and you will get better performance from your hardware"?I think it would be beneficial to the community as a whole if the more storage savvy amongst you wrote some articles going beyond the very basics.  I've only just touched a snowflake on the tip of a very large iceberg.</description><pubDate>Mon, 16 Jul 2012 16:30:28 GMT</pubDate><dc:creator>David.Poole</dc:creator></item><item><title>RE: Storage - A meeting of minds</title><link>http://www.sqlservercentral.com/Forums/Topic1329902-60-1.aspx</link><description>Excellent article, and I cannot overemphasize along with you the importance of the DBA working as part of a team both with the application developers and the administrators that provide the lower level infrastructure.</description><pubDate>Mon, 16 Jul 2012 14:27:42 GMT</pubDate><dc:creator>timothyawiseman</dc:creator></item><item><title>RE: Storage - A meeting of minds</title><link>http://www.sqlservercentral.com/Forums/Topic1329902-60-1.aspx</link><description>A good introduction to some aspects of storage, marred by some Fusion-IO/PCIe SSD specific perspective, generalizations without supporting evidence, and the very serious flaw of lacking a discussion of RAID levels in modern systems, and the equally serious flaw of failing to discuss OS presented mount points vs. logical drive vs. LUN vs. raidset/virtual drive vs. spindle, or even the very critical dedicated vs. shared spindle approach, and also ignored hot spares.  Additionally, shared SAN backbone limitations didn't appear to make an appearance.Note that on the storage front, there are modern 2U, 4U, and tower servers that support in excess of 20 to 30 local spindles each (2.5", of course), with a mix of 15k RPM, 10k RPM, 7.2k RPM, and SSD disks.  These provide us with new options for high IOPS/throughput capable SQL Servers, in addition to the PCIe SSD front.Note that with SAS and SATA SSD's, either local or on the SAN, you have the option of all the normal RAID levels - 1, 10, 5, 50, 6, 60, etc.  With PCIe SSD's, the last I heard for both OCZ and Fusion-IO SSD's was that you were limited to software RAID at this time.  It's generally held that software RAID is inferior to hardware RAID; that may or may not be true with the most modern server operating systems.  I haven't bothered to try software RAID; I stick with hardware RAID on caching controller cards, as do the storage professionals I work with.Unsupported generalization: "... not a commodity piece of hardware... However 128 GB RAM for the SAN would cost a £six figure sum!"Reference for EMC Clariion systems: [url=http://www.pinncomp.com/pdf/technical/compellent/emc_product_analysis_cx4.pdf]http://www.pinncomp.com/pdf/technical/compellent/emc_product_analysis_cx4.pdf[/url], which lists DDR2 DIMMs as RAM, which is commodity hardware, even in ECC variants (it's what we use in servers as well), and I've bought hundreds of gigabytes at a time for far, far less than six figures USD (and used it in SQL servers).  Unless references for a third party replacement for SAN memory (i.e. without as much price gouging as the vendors may put in their replacement part MSRP) are provided, I don't believe this is true in 2012.RAID levels: Conventional wisdom is that RAID 1 and 10 is better for writes (i.e. one log file per RAID set), and RAID 5 is good for reads (less wasted storage).  On modern caching controller and/or SAN hardware from the last couple years, my benchmarking has shown this to no longer quite be the case; see my results in my post at [url=http://www.sqlservercentral.com/Forums/FindPost1293225.aspx]http://www.sqlservercentral.com/Forums/FindPost1293225.aspx[/url].  On my particular setup, RAID10 appears to have an advantage over RAID5 and RAID50 only on 8KB and 64KB random (not sequential) writes, and was equivalent or worse on other operations.  Test your own setup carefully, whether SAN or local - many setups have quirks with one or another specific aspect that you should take into consideration when planning what goes where and how it's configured (for instance, a sequential write throughput cap, or severe performance problem with, say, 64KB random reads).  Note that on some modern SAN's, RAID 50 is extremely performant.Perhaps the most critical oversight in the article or my reading of it was to not discuss the path from SQL Server data files down to storage spindles or parts thereof, and the dedicated vs. shared argument.I.e. (I'm going to skip subdirectory level mount points, but be aware they exist), on your SQL server you see:Production O:\userdb.mdfProduction V:\userdb.ldfDevelopment E:\tempdb.mdf and E:\tempdb.ldfThe SAN admin tells you:O: maps to LUN 5V: maps to LUN 71E: maps to LUN 6Unless you ask further, you may not hear that:LUN 5 maps to RAIDset 12LUN 71 maps to RAIDset 13LUN 6 maps to RAIDset 13Corporate file share \\server\MainShare maps to RAIDset 12Then, you may still have to ask to find out:RAIDset 12 is a 14 disk RAID5 (1x13+1)RAIDset 13 is an 8 disk RAID50 (2x3+1)Oddly enough, when they're backing up the corporate file share, all sequential activity slows down, and reads on the production userdb are particularly slow.  Why?  Because A) the SAN has limited total fiber channel bandwidth, and the backups are using up a lot of the total throughput available, and B) because the corporate file share is hitting the same spindles that userdb.mdf is on with a mix of random and sequential access.Even worse, when the development machine is doing a lot of hard tempdb activity, writes on the production userdb are slow.  Why?  Because the development tempdb LUN is on the same spindles as production userdb.ldf.There's also a large difference between dedicated spindles (i.e. 8 disk RAID5 for userdb.mdf, 2 disk RAID1 for userdb.ldf, 2 disk RAID1 for tempdb.mdf, 2 disk RAID1 for tempdb.ldf, 2 disk RAID1 for the OS and programs, 3 disk RAID5 for the file share, 2 disk RAID1 for system DB's, 2 disk RAID1 for system log files, and 1 global hot spare) vs. shared spindles (i.e. 23 disk RAID5 for everything, and 1 global hot spare).  With dedicated spindles, you can have high tempdb, OS, file share, and user log file activity all at once, and each will proceed almost as quickly (random or sequential) as it would if it was the only activity.  With shared spindles, the maximum speed for any one activity will be much higher, and the "average" will seem much better on paper... but on the first day of the new fiscal year, when all kinds of activity happens at once, don't be surprised if everything slows down quite a lot.Shared spindles are basically large sets of spindles set up in a "storage pool", and everyone shares it.  It's very simple, allows you to use less overall spindles (you're not really counting IOPS anymore), is easy to manage, and when only one or two things happen at a time, it performs very well indeed.  When many, many things happen at once, it thrashes itself to death (moreso if too many spindles were traded away in the search for cost savings) trying to deliver too many random IOPS.  Some SAN admins really, really push it, because it does most efficiently utilitize the storage.  However, it means Johnny playing with his MP3 library on the file share (Bad Johnny!) can causes the production SQL Server to slow down.  Shared spindles are all about averages, and not about concurrent peaks (contrary to storage admin whitepapers, peak usage is not random, nor is it based on a normal curve; it's based on business requirements, like reporting and commission periods).Dedicated spindles are about being able to predict performance and guaranteeing minimum performance levels (call them... SLA's).Here's a Brent Ozar article on dedicated vs. shared: [url=http://www.brentozar.com/archive/2008/08/sql-server-on-a-san-dedicated-or-shared-drives/]http://www.brentozar.com/archive/2008/08/sql-server-on-a-san-dedicated-or-shared-drives/[/url].Shared SAN backbone limitations are also important.  If you have, say, an 8Gbps Active/Passive FC setup to your SAN, you aren't going to get more than 8Gbps of throughput.  This may sound great - it's higher than 6Gbps for modern SAS and SATA drives, so it must be better, right?  Well, remember, if the SAN itself is also 8Gbps Active/Passive, then _it_ can only provide 8Gbps total... to your production box, plus your development box, plus the data warehouse, plus the tape backup, plus the corporate file share, plus... and so on.  If you have several 6Gbps drives locally, _each_ gets 6Gbps; I've seen a local 6 disk SATA SSD setup in RAID5 deliver 1.4GB/s (i.e. ~14Gbps, or an 8Gbps Active/Active bandwidth aggregating FC's maximum)... on 64KB random reads, and 64KB and larger sequential reads (apparently that was a bandwidth limitation on the controller).  Further, each box is using its own throughpout, not sharing it.Note that SAN's can be very effectively supplemented by putting, say, tempdb data and log files on local SSD's, either SATA/SAS or PCIe; this not only allows tempdb to respond faster than the SAN could at peak, but it keeps tempdb transfers off the SAN, allowing everything else on the SAN to use the throughput and IOPS that are now going to local storage... and since you don't back up tempdb, there's no need to change backup strategies.  Most warm and cold DR capabilities are also unaffected by this.</description><pubDate>Mon, 16 Jul 2012 10:15:01 GMT</pubDate><dc:creator>Nadrek</dc:creator></item><item><title>RE: Storage - A meeting of minds</title><link>http://www.sqlservercentral.com/Forums/Topic1329902-60-1.aspx</link><description>Thanks for this useful contribution.</description><pubDate>Mon, 16 Jul 2012 10:14:13 GMT</pubDate><dc:creator>Basit Farooq</dc:creator></item><item><title>RE: Storage - A meeting of minds</title><link>http://www.sqlservercentral.com/Forums/Topic1329902-60-1.aspx</link><description>[quote][b]RobertYoung (7/16/2012)[/b][hr][quote][b]rmechaber (7/16/2012)[/b][hr]You state:[quote]In addition there are more sectors in the outside tracks than there are in the innter [i][sic][/i] tracks.  [/quote]My understanding of disk sectors has always been that the number of sectors per track is [b]constant[/b] for a given disk, and that each sector stores the same amount of data as any other sector.  Anyone confirm this?Rich[/quote]That was true about a decade ago, or perhaps longer.  For very many years, HDD have had variable geometry, with more sectors on outer tracks, since there's more there, there.  http://en.wikipedia.org/wiki/Zone_bit_recording[/quote]Ah, thank you -- it's been that long or more since I've looked into disk storage geometry.  The 'net has a memory: I found several authoritative-"looking" pages via Google supporting my (older) knowledge in a way that sounded current.  Hence my request for some confirmation/elaboration.Without add'l sectors on outer tracks, the concept of short-stroking makes no sense, so I knew something was off.Thanks again,Rich</description><pubDate>Mon, 16 Jul 2012 07:59:13 GMT</pubDate><dc:creator>rmechaber</dc:creator></item><item><title>RE: Storage - A meeting of minds</title><link>http://www.sqlservercentral.com/Forums/Topic1329902-60-1.aspx</link><description>[quote][b]rmechaber (7/16/2012)[/b][hr]You state:[quote]In addition there are more sectors in the outside tracks than there are in the innter [i][sic][/i] tracks.  [/quote]My understanding of disk sectors has always been that the number of sectors per track is [b]constant[/b] for a given disk, and that each sector stores the same amount of data as any other sector.  Anyone confirm this?Rich[/quote]That was true about a decade ago, or perhaps longer.  For very many years, HDD have had variable geometry, with more sectors on outer tracks, since there's more there, there.  http://en.wikipedia.org/wiki/Zone_bit_recording</description><pubDate>Mon, 16 Jul 2012 07:52:52 GMT</pubDate><dc:creator>RobertYoung</dc:creator></item><item><title>RE: Storage - A meeting of minds</title><link>http://www.sqlservercentral.com/Forums/Topic1329902-60-1.aspx</link><description>Correct me if I'm wrong, but wouldn't 40 disks in a RAID 1 give you 1/2 the capacity you stated.  It's a mirror, so your array size is still only 6 TB, not 12.  This works out to about 5.5 TB of useable space.  More to the point, aren't we really talking about RAID 0+1 here?</description><pubDate>Mon, 16 Jul 2012 07:34:40 GMT</pubDate><dc:creator>Scott D. Jacobson</dc:creator></item><item><title>RE: Storage - A meeting of minds</title><link>http://www.sqlservercentral.com/Forums/Topic1329902-60-1.aspx</link><description>You state:[quote]In addition there are more sectors in the outside tracks than there are in the innter [i][sic][/i] tracks.  [/quote]My understanding of disk sectors has always been that the number of sectors per track is [b]constant[/b] for a given disk, and that each sector stores the same amount of data as any other sector.  Because the sectors on the outer tracks cover a bigger surface area on the physical platter, the storage density for those outer tracks is correspondingly lower.  The included angle subtended by any sector is the same, which allows the head to read the same amount of data per partial rotation, no matter where on the disk its reading from.Anyone confirm this?Rich</description><pubDate>Mon, 16 Jul 2012 07:29:35 GMT</pubDate><dc:creator>rmechaber</dc:creator></item><item><title>RE: Storage - A meeting of minds</title><link>http://www.sqlservercentral.com/Forums/Topic1329902-60-1.aspx</link><description>Missing, not for the first time in such essays, is discussion of normal forms, particularly for the operational data.  If one moves to SSD, response time factor changes significantly, even compared to short-stroking.  But doing so with the typical flat-file datastore is cost prohibitive.  In order to get maximum user data back and forth with available IOPS, one needs a high NF datastore, which also happens to have the minimum footprint on storage.Coders just love to refactor code, but they (all too often in control of database schemas) refuse to refactor data.  Since their schemas start life as byte dumps manipulated by their wonderous code (just like their granddaddies' COBOL/VSAM apps), refactoring data means re-writing code; well, mostly discarding lots of code.  The lifetime employment assurance disappears.IOW, the problem isn't technical, but spiritual.  Much the same thing happened when the 360 appeared with DASD.  Rather than code to Direct Access, coders continued to do what was comfortable, code to Sequential Batch.  Who said there's something new under the sun?</description><pubDate>Mon, 16 Jul 2012 05:58:30 GMT</pubDate><dc:creator>RobertYoung</dc:creator></item><item><title>RE: Storage - A meeting of minds</title><link>http://www.sqlservercentral.com/Forums/Topic1329902-60-1.aspx</link><description>Really good article Dave and something we've spoken about in depth a lot over the last few years.Something worthwhile pointing out on the cost front, PCIe storage is the cheaper option compared to a SAN if doing a new build (where you have to factor in the SAN cost). I've just recently looked at the costs for a new SAN setup with 1.2TB as the basic requirement of storage capacity. On the SAN side I've gone for 4x600GB disks (RAID 10) and a dozen servers. I've looked at dedicated spindles per DB server as always (I would have preferred 300GB disks but it's the better balance for the example I was working on). The SAN is a Clarion VNX5.For PCIe I went with the FusionIO ioDrive2 Mono MLC card as an OEM bit of kit shipped from Dell. The rest of the server spec is identical to the above. It's more than fast enough but there are much faster.The cost of PCIe is well under [b]half[/b] the SAN based costs and will deliver a lot more performance (I upped the RAM spec on both sides as I found the requirements as you mentioned from FusionIO and it's not that more expensive to allow at this stage).On the lifespan front you can use the program / erase cycles to calculate the theoretical lifespan.The lowest number you normally see quoted is 10,000 p/e cycles. Using that value we can calculate (simplified theoretical version) :A 1.2TB drive =  1,318,554,959,872 bytes1,318,554,959,872 bytes * 10,000 p/e cycles =  13,185,549,598,720,000 bytes that can be written 500GB written per day =  536,870,912,000 bytes (for me this is pretty close as TempDB takes a hammering in our estate)1,318,554,959,872 bytes / 536,870,912,000 bytes = 24,560 days of writing at 500GB per day24,560 = 67 years or 589,440 hours (admittedly lower than half of SATA or SAS, but when you up the capacity to 2.4TB with the same write rate it almost matches the usual MTBF rates on mechanical storage [my preferred way of describing SAN storage without being offensive]). It's a bit of an unfair calculation if I'm honest as we are comparing the amount of times we can theoretically write to something vs a potential hardware failure rate with mechanical parts. However, since the end result is something being kaput it's probably not too wide of the mark. Adding more component parts increases the probability of failure so that is something else to consider with mechanical storage. If we add spindles for speed we increase the likelihood of something breaking. Oh and once you see a 1TB database restore go from 4 hours to 5 minutes simply with none-mechanical storage it's very hard to get it out of your head.</description><pubDate>Mon, 16 Jul 2012 03:54:22 GMT</pubDate><dc:creator>ryan.offord</dc:creator></item><item><title>RE: Storage - A meeting of minds</title><link>http://www.sqlservercentral.com/Forums/Topic1329902-60-1.aspx</link><description>Hi David,Wonderful article.Thanks</description><pubDate>Mon, 16 Jul 2012 03:47:16 GMT</pubDate><dc:creator>Eoin_BI</dc:creator></item><item><title>RE: Storage - A meeting of minds</title><link>http://www.sqlservercentral.com/Forums/Topic1329902-60-1.aspx</link><description>Great article. Thanks for sharing your research. :-)</description><pubDate>Mon, 16 Jul 2012 03:07:01 GMT</pubDate><dc:creator>Jeff Stratford</dc:creator></item><item><title>RE: Storage - A meeting of minds</title><link>http://www.sqlservercentral.com/Forums/Topic1329902-60-1.aspx</link><description>Great article, will reread it twice</description><pubDate>Mon, 16 Jul 2012 03:03:21 GMT</pubDate><dc:creator>Jo Pattyn</dc:creator></item><item><title>RE: Storage - A meeting of minds</title><link>http://www.sqlservercentral.com/Forums/Topic1329902-60-1.aspx</link><description>Great article, David.</description><pubDate>Mon, 16 Jul 2012 01:00:09 GMT</pubDate><dc:creator>ALZDBA</dc:creator></item><item><title>RE: Storage - A meeting of minds</title><link>http://www.sqlservercentral.com/Forums/Topic1329902-60-1.aspx</link><description>What an excellent article - thanks very much for sharing your thoughts.  Some really well-researched material here!</description><pubDate>Mon, 16 Jul 2012 00:56:53 GMT</pubDate><dc:creator>derek.colley</dc:creator></item><item><title>Storage - A meeting of minds</title><link>http://www.sqlservercentral.com/Forums/Topic1329902-60-1.aspx</link><description>Comments posted to this topic are about the item [B]&lt;A HREF="/articles/Storage/90974/"&gt;Storage - A meeting of minds&lt;/A&gt;[/B]</description><pubDate>Sun, 15 Jul 2012 15:24:30 GMT</pubDate><dc:creator>David.Poole</dc:creator></item></channel></rss>