Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««123»»

Storage - A meeting of minds Expand / Collapse
Author
Message
Posted Monday, July 16, 2012 7:52 AM


SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Thursday, September 11, 2014 12:01 PM
Points: 76, Visits: 232
rmechaber (7/16/2012)
You state:
In addition there are more sectors in the outside tracks than there are in the innter [sic] tracks.


My understanding of disk sectors has always been that the number of sectors per track is constant for a given disk, and that each sector stores the same amount of data as any other sector.

Anyone confirm this?

Rich


That was true about a decade ago, or perhaps longer. For very many years, HDD have had variable geometry, with more sectors on outer tracks, since there's more there, there. http://en.wikipedia.org/wiki/Zone_bit_recording
Post #1330118
Posted Monday, July 16, 2012 7:59 AM


Right there with Babe

Right there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with Babe

Group: General Forum Members
Last Login: Wednesday, November 26, 2014 9:31 AM
Points: 717, Visits: 3,037
RobertYoung (7/16/2012)
rmechaber (7/16/2012)
You state:
In addition there are more sectors in the outside tracks than there are in the innter [sic] tracks.


My understanding of disk sectors has always been that the number of sectors per track is constant for a given disk, and that each sector stores the same amount of data as any other sector.

Anyone confirm this?

Rich


That was true about a decade ago, or perhaps longer. For very many years, HDD have had variable geometry, with more sectors on outer tracks, since there's more there, there. http://en.wikipedia.org/wiki/Zone_bit_recording


Ah, thank you -- it's been that long or more since I've looked into disk storage geometry. The 'net has a memory: I found several authoritative-"looking" pages via Google supporting my (older) knowledge in a way that sounded current. Hence my request for some confirmation/elaboration.

Without add'l sectors on outer tracks, the concept of short-stroking makes no sense, so I knew something was off.

Thanks again,
Rich
Post #1330124
Posted Monday, July 16, 2012 10:14 AM


SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Saturday, December 13, 2014 4:37 AM
Points: 107, Visits: 728
Thanks for this useful contribution.

Regards,

Basit A. Farooq (MSC Computing, MCITP SQL Server 2005 & 2008, MCDBA SQL Server 2000)

http://basitaalishan.com
Post #1330250
Posted Monday, July 16, 2012 10:15 AM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Today @ 8:00 AM
Points: 892, Visits: 2,473
A good introduction to some aspects of storage, marred by some Fusion-IO/PCIe SSD specific perspective, generalizations without supporting evidence, and the very serious flaw of lacking a discussion of RAID levels in modern systems, and the equally serious flaw of failing to discuss OS presented mount points vs. logical drive vs. LUN vs. raidset/virtual drive vs. spindle, or even the very critical dedicated vs. shared spindle approach, and also ignored hot spares. Additionally, shared SAN backbone limitations didn't appear to make an appearance.

Note that on the storage front, there are modern 2U, 4U, and tower servers that support in excess of 20 to 30 local spindles each (2.5", of course), with a mix of 15k RPM, 10k RPM, 7.2k RPM, and SSD disks. These provide us with new options for high IOPS/throughput capable SQL Servers, in addition to the PCIe SSD front.

Note that with SAS and SATA SSD's, either local or on the SAN, you have the option of all the normal RAID levels - 1, 10, 5, 50, 6, 60, etc. With PCIe SSD's, the last I heard for both OCZ and Fusion-IO SSD's was that you were limited to software RAID at this time. It's generally held that software RAID is inferior to hardware RAID; that may or may not be true with the most modern server operating systems. I haven't bothered to try software RAID; I stick with hardware RAID on caching controller cards, as do the storage professionals I work with.

Unsupported generalization: "... not a commodity piece of hardware... However 128 GB RAM for the SAN would cost a £six figure sum!"

Reference for EMC Clariion systems: http://www.pinncomp.com/pdf/technical/compellent/emc_product_analysis_cx4.pdf, which lists DDR2 DIMMs as RAM, which is commodity hardware, even in ECC variants (it's what we use in servers as well), and I've bought hundreds of gigabytes at a time for far, far less than six figures USD (and used it in SQL servers). Unless references for a third party replacement for SAN memory (i.e. without as much price gouging as the vendors may put in their replacement part MSRP) are provided, I don't believe this is true in 2012.

RAID levels: Conventional wisdom is that RAID 1 and 10 is better for writes (i.e. one log file per RAID set), and RAID 5 is good for reads (less wasted storage). On modern caching controller and/or SAN hardware from the last couple years, my benchmarking has shown this to no longer quite be the case; see my results in my post at http://www.sqlservercentral.com/Forums/FindPost1293225.aspx. On my particular setup, RAID10 appears to have an advantage over RAID5 and RAID50 only on 8KB and 64KB random (not sequential) writes, and was equivalent or worse on other operations. Test your own setup carefully, whether SAN or local - many setups have quirks with one or another specific aspect that you should take into consideration when planning what goes where and how it's configured (for instance, a sequential write throughput cap, or severe performance problem with, say, 64KB random reads). Note that on some modern SAN's, RAID 50 is extremely performant.

Perhaps the most critical oversight in the article or my reading of it was to not discuss the path from SQL Server data files down to storage spindles or parts thereof, and the dedicated vs. shared argument.

I.e. (I'm going to skip subdirectory level mount points, but be aware they exist), on your SQL server you see:
Production O:\userdb.mdf
Production V:\userdb.ldf
Development E:\tempdb.mdf and E:\tempdb.ldf

The SAN admin tells you:
O: maps to LUN 5
V: maps to LUN 71
E: maps to LUN 6

Unless you ask further, you may not hear that:
LUN 5 maps to RAIDset 12
LUN 71 maps to RAIDset 13
LUN 6 maps to RAIDset 13
Corporate file share \\server\MainShare maps to RAIDset 12

Then, you may still have to ask to find out:
RAIDset 12 is a 14 disk RAID5 (1x13+1)
RAIDset 13 is an 8 disk RAID50 (2x3+1)

Oddly enough, when they're backing up the corporate file share, all sequential activity slows down, and reads on the production userdb are particularly slow. Why? Because A) the SAN has limited total fiber channel bandwidth, and the backups are using up a lot of the total throughput available, and B) because the corporate file share is hitting the same spindles that userdb.mdf is on with a mix of random and sequential access.

Even worse, when the development machine is doing a lot of hard tempdb activity, writes on the production userdb are slow. Why? Because the development tempdb LUN is on the same spindles as production userdb.ldf.


There's also a large difference between dedicated spindles (i.e. 8 disk RAID5 for userdb.mdf, 2 disk RAID1 for userdb.ldf, 2 disk RAID1 for tempdb.mdf, 2 disk RAID1 for tempdb.ldf, 2 disk RAID1 for the OS and programs, 3 disk RAID5 for the file share, 2 disk RAID1 for system DB's, 2 disk RAID1 for system log files, and 1 global hot spare) vs. shared spindles (i.e. 23 disk RAID5 for everything, and 1 global hot spare). With dedicated spindles, you can have high tempdb, OS, file share, and user log file activity all at once, and each will proceed almost as quickly (random or sequential) as it would if it was the only activity. With shared spindles, the maximum speed for any one activity will be much higher, and the "average" will seem much better on paper... but on the first day of the new fiscal year, when all kinds of activity happens at once, don't be surprised if everything slows down quite a lot.

Shared spindles are basically large sets of spindles set up in a "storage pool", and everyone shares it. It's very simple, allows you to use less overall spindles (you're not really counting IOPS anymore), is easy to manage, and when only one or two things happen at a time, it performs very well indeed. When many, many things happen at once, it thrashes itself to death (moreso if too many spindles were traded away in the search for cost savings) trying to deliver too many random IOPS. Some SAN admins really, really push it, because it does most efficiently utilitize the storage. However, it means Johnny playing with his MP3 library on the file share (Bad Johnny!) can causes the production SQL Server to slow down. Shared spindles are all about averages, and not about concurrent peaks (contrary to storage admin whitepapers, peak usage is not random, nor is it based on a normal curve; it's based on business requirements, like reporting and commission periods).

Dedicated spindles are about being able to predict performance and guaranteeing minimum performance levels (call them... SLA's).

Here's a Brent Ozar article on dedicated vs. shared: http://www.brentozar.com/archive/2008/08/sql-server-on-a-san-dedicated-or-shared-drives/.

Shared SAN backbone limitations are also important. If you have, say, an 8Gbps Active/Passive FC setup to your SAN, you aren't going to get more than 8Gbps of throughput. This may sound great - it's higher than 6Gbps for modern SAS and SATA drives, so it must be better, right? Well, remember, if the SAN itself is also 8Gbps Active/Passive, then _it_ can only provide 8Gbps total... to your production box, plus your development box, plus the data warehouse, plus the tape backup, plus the corporate file share, plus... and so on. If you have several 6Gbps drives locally, _each_ gets 6Gbps; I've seen a local 6 disk SATA SSD setup in RAID5 deliver 1.4GB/s (i.e. ~14Gbps, or an 8Gbps Active/Active bandwidth aggregating FC's maximum)... on 64KB random reads, and 64KB and larger sequential reads (apparently that was a bandwidth limitation on the controller). Further, each box is using its own throughpout, not sharing it.

Note that SAN's can be very effectively supplemented by putting, say, tempdb data and log files on local SSD's, either SATA/SAS or PCIe; this not only allows tempdb to respond faster than the SAN could at peak, but it keeps tempdb transfers off the SAN, allowing everything else on the SAN to use the throughput and IOPS that are now going to local storage... and since you don't back up tempdb, there's no need to change backup strategies. Most warm and cold DR capabilities are also unaffected by this.
Post #1330252
Posted Monday, July 16, 2012 2:27 PM


Right there with Babe

Right there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with Babe

Group: General Forum Members
Last Login: Monday, November 24, 2014 4:29 PM
Points: 752, Visits: 920
Excellent article, and I cannot overemphasize along with you the importance of the DBA working as part of a team both with the application developers and the administrators that provide the lower level infrastructure.

---
Timothy A Wiseman
SQL Blog: http://timothyawiseman.wordpress.com/
Post #1330401
Posted Monday, July 16, 2012 4:30 PM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 10:08 AM
Points: 2,921, Visits: 1,870
RobertYoung (7/16/2012)
Missing, not for the first time in such essays, is discussion of normal forms,


Would I be correct in thinking this translates as "design your OLTP database properly and you will get better performance from your hardware"?

I think it would be beneficial to the community as a whole if the more storage savvy amongst you wrote some articles going beyond the very basics. I've only just touched a snowflake on the tip of a very large iceberg.


LinkedIn Profile
Newbie on www.simple-talk.com
Post #1330448
Posted Tuesday, July 17, 2012 1:31 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Tuesday, October 28, 2014 6:57 AM
Points: 27, Visits: 377
I was working on some calculations yesterday around latency and different types of storage. Using some 'typical' figures you would see SATA being around 40 milliseconds of latency, SAS would be in the region of 20 milliseconds and PCIe around 50 microseconds. You can tweak the speed that data comes back but you still have that basic latency. This is something you have to add to every single request to get data back from the disks.

Every touch point in the setup has the potential for latency and also the potential for failure. Network card, cable, switch and so on until you finally get to a disk that has to revolve (and you wait your turn with other requests). The beauty of systems like FusionIO is that it removes these failure / latency points, gives you a massive performance lift and allows you to have a less complex system overall while reducing costs.

If you had the same schema design, data and queries and did a direct comparison between the three you will see a clear performance winner. The complexity gets reduced significantly as does the failure points and latency.

Having the right design makes the world of difference but no matter what we do if there is an underlying latency you will not remove it by schema design.
Post #1330555
Posted Tuesday, July 17, 2012 6:21 AM


SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Thursday, September 11, 2014 12:01 PM
Points: 76, Visits: 232
ryan.offord (7/17/2012)

Having the right design makes the world of difference but no matter what we do if there is an underlying latency you will not remove it by schema design.


If an organic NF schema reduces data footprint by an order of magnitude, not unlikely, then not only are there fewer bytes on the wire, there's less complication in the client code. Client code could be reduced to simple display. Doing what your grandfather did, using a better hardware to just do what the old software wanted, is the suboptimal choice this time.
Post #1330689
Posted Tuesday, July 17, 2012 11:44 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Wednesday, November 5, 2014 1:36 PM
Points: 235, Visits: 730
The article is good. The discussion has been even better.
Post #1330935
Posted Sunday, July 22, 2012 6:54 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, February 5, 2013 7:28 AM
Points: 3, Visits: 36
Wonderful article, David! I think of AlwaysOn with read-only secondaries as an implementation on CQRS and was happy to see that pattern mentioned in your story.

BTW, did you mean that solutions like FastTrack work in "symphony" with hardware, not "sympathy" ??


Br, Mark Kromer
Post #1333561
« Prev Topic | Next Topic »

Add to briefcase ««123»»

Permissions Expand / Collapse