SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Storage - A meeting of minds


Storage - A meeting of minds

Author
Message
RobertYoung
RobertYoung
SSC Veteran
SSC Veteran (242 reputation)SSC Veteran (242 reputation)SSC Veteran (242 reputation)SSC Veteran (242 reputation)SSC Veteran (242 reputation)SSC Veteran (242 reputation)SSC Veteran (242 reputation)SSC Veteran (242 reputation)

Group: General Forum Members
Points: 242 Visits: 232
rmechaber (7/16/2012)
You state:
In addition there are more sectors in the outside tracks than there are in the innter [sic] tracks.


My understanding of disk sectors has always been that the number of sectors per track is constant for a given disk, and that each sector stores the same amount of data as any other sector.

Anyone confirm this?

Rich


That was true about a decade ago, or perhaps longer. For very many years, HDD have had variable geometry, with more sectors on outer tracks, since there's more there, there. http://en.wikipedia.org/wiki/Zone_bit_recording
Rich Mechaber
Rich Mechaber
SSCrazy
SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)

Group: General Forum Members
Points: 2727 Visits: 3671
RobertYoung (7/16/2012)
rmechaber (7/16/2012)
You state:
In addition there are more sectors in the outside tracks than there are in the innter [sic] tracks.


My understanding of disk sectors has always been that the number of sectors per track is constant for a given disk, and that each sector stores the same amount of data as any other sector.

Anyone confirm this?

Rich


That was true about a decade ago, or perhaps longer. For very many years, HDD have had variable geometry, with more sectors on outer tracks, since there's more there, there. http://en.wikipedia.org/wiki/Zone_bit_recording


Ah, thank you -- it's been that long or more since I've looked into disk storage geometry. The 'net has a memory: I found several authoritative-"looking" pages via Google supporting my (older) knowledge in a way that sounded current. Hence my request for some confirmation/elaboration.

Without add'l sectors on outer tracks, the concept of short-stroking makes no sense, so I knew something was off.

Thanks again,
Rich
Basit Farooq
Basit Farooq
SSC Veteran
SSC Veteran (287 reputation)SSC Veteran (287 reputation)SSC Veteran (287 reputation)SSC Veteran (287 reputation)SSC Veteran (287 reputation)SSC Veteran (287 reputation)SSC Veteran (287 reputation)SSC Veteran (287 reputation)

Group: General Forum Members
Points: 287 Visits: 818
Thanks for this useful contribution.

Regards,

Basit A. Farooq (MSC Computing, MCITP SQL Server 2005 & 2008, MCDBA SQL Server 2000)

http://basitaalishan.com
Nadrek
Nadrek
SSCarpal Tunnel
SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)

Group: General Forum Members
Points: 4504 Visits: 2741
A good introduction to some aspects of storage, marred by some Fusion-IO/PCIe SSD specific perspective, generalizations without supporting evidence, and the very serious flaw of lacking a discussion of RAID levels in modern systems, and the equally serious flaw of failing to discuss OS presented mount points vs. logical drive vs. LUN vs. raidset/virtual drive vs. spindle, or even the very critical dedicated vs. shared spindle approach, and also ignored hot spares. Additionally, shared SAN backbone limitations didn't appear to make an appearance.

Note that on the storage front, there are modern 2U, 4U, and tower servers that support in excess of 20 to 30 local spindles each (2.5", of course), with a mix of 15k RPM, 10k RPM, 7.2k RPM, and SSD disks. These provide us with new options for high IOPS/throughput capable SQL Servers, in addition to the PCIe SSD front.

Note that with SAS and SATA SSD's, either local or on the SAN, you have the option of all the normal RAID levels - 1, 10, 5, 50, 6, 60, etc. With PCIe SSD's, the last I heard for both OCZ and Fusion-IO SSD's was that you were limited to software RAID at this time. It's generally held that software RAID is inferior to hardware RAID; that may or may not be true with the most modern server operating systems. I haven't bothered to try software RAID; I stick with hardware RAID on caching controller cards, as do the storage professionals I work with.

Unsupported generalization: "... not a commodity piece of hardware... However 128 GB RAM for the SAN would cost a £six figure sum!"

Reference for EMC Clariion systems: http://www.pinncomp.com/pdf/technical/compellent/emc_product_analysis_cx4.pdf, which lists DDR2 DIMMs as RAM, which is commodity hardware, even in ECC variants (it's what we use in servers as well), and I've bought hundreds of gigabytes at a time for far, far less than six figures USD (and used it in SQL servers). Unless references for a third party replacement for SAN memory (i.e. without as much price gouging as the vendors may put in their replacement part MSRP) are provided, I don't believe this is true in 2012.

RAID levels: Conventional wisdom is that RAID 1 and 10 is better for writes (i.e. one log file per RAID set), and RAID 5 is good for reads (less wasted storage). On modern caching controller and/or SAN hardware from the last couple years, my benchmarking has shown this to no longer quite be the case; see my results in my post at http://www.sqlservercentral.com/Forums/FindPost1293225.aspx. On my particular setup, RAID10 appears to have an advantage over RAID5 and RAID50 only on 8KB and 64KB random (not sequential) writes, and was equivalent or worse on other operations. Test your own setup carefully, whether SAN or local - many setups have quirks with one or another specific aspect that you should take into consideration when planning what goes where and how it's configured (for instance, a sequential write throughput cap, or severe performance problem with, say, 64KB random reads). Note that on some modern SAN's, RAID 50 is extremely performant.

Perhaps the most critical oversight in the article or my reading of it was to not discuss the path from SQL Server data files down to storage spindles or parts thereof, and the dedicated vs. shared argument.

I.e. (I'm going to skip subdirectory level mount points, but be aware they exist), on your SQL server you see:
Production O:\userdb.mdf
Production V:\userdb.ldf
Development E:\tempdb.mdf and E:\tempdb.ldf

The SAN admin tells you:
O: maps to LUN 5
V: maps to LUN 71
E: maps to LUN 6

Unless you ask further, you may not hear that:
LUN 5 maps to RAIDset 12
LUN 71 maps to RAIDset 13
LUN 6 maps to RAIDset 13
Corporate file share \\server\MainShare maps to RAIDset 12

Then, you may still have to ask to find out:
RAIDset 12 is a 14 disk RAID5 (1x13+1)
RAIDset 13 is an 8 disk RAID50 (2x3+1)

Oddly enough, when they're backing up the corporate file share, all sequential activity slows down, and reads on the production userdb are particularly slow. Why? Because A) the SAN has limited total fiber channel bandwidth, and the backups are using up a lot of the total throughput available, and B) because the corporate file share is hitting the same spindles that userdb.mdf is on with a mix of random and sequential access.

Even worse, when the development machine is doing a lot of hard tempdb activity, writes on the production userdb are slow. Why? Because the development tempdb LUN is on the same spindles as production userdb.ldf.


There's also a large difference between dedicated spindles (i.e. 8 disk RAID5 for userdb.mdf, 2 disk RAID1 for userdb.ldf, 2 disk RAID1 for tempdb.mdf, 2 disk RAID1 for tempdb.ldf, 2 disk RAID1 for the OS and programs, 3 disk RAID5 for the file share, 2 disk RAID1 for system DB's, 2 disk RAID1 for system log files, and 1 global hot spare) vs. shared spindles (i.e. 23 disk RAID5 for everything, and 1 global hot spare). With dedicated spindles, you can have high tempdb, OS, file share, and user log file activity all at once, and each will proceed almost as quickly (random or sequential) as it would if it was the only activity. With shared spindles, the maximum speed for any one activity will be much higher, and the "average" will seem much better on paper... but on the first day of the new fiscal year, when all kinds of activity happens at once, don't be surprised if everything slows down quite a lot.

Shared spindles are basically large sets of spindles set up in a "storage pool", and everyone shares it. It's very simple, allows you to use less overall spindles (you're not really counting IOPS anymore), is easy to manage, and when only one or two things happen at a time, it performs very well indeed. When many, many things happen at once, it thrashes itself to death (moreso if too many spindles were traded away in the search for cost savings) trying to deliver too many random IOPS. Some SAN admins really, really push it, because it does most efficiently utilitize the storage. However, it means Johnny playing with his MP3 library on the file share (Bad Johnny!) can causes the production SQL Server to slow down. Shared spindles are all about averages, and not about concurrent peaks (contrary to storage admin whitepapers, peak usage is not random, nor is it based on a normal curve; it's based on business requirements, like reporting and commission periods).

Dedicated spindles are about being able to predict performance and guaranteeing minimum performance levels (call them... SLA's).

Here's a Brent Ozar article on dedicated vs. shared: http://www.brentozar.com/archive/2008/08/sql-server-on-a-san-dedicated-or-shared-drives/.

Shared SAN backbone limitations are also important. If you have, say, an 8Gbps Active/Passive FC setup to your SAN, you aren't going to get more than 8Gbps of throughput. This may sound great - it's higher than 6Gbps for modern SAS and SATA drives, so it must be better, right? Well, remember, if the SAN itself is also 8Gbps Active/Passive, then _it_ can only provide 8Gbps total... to your production box, plus your development box, plus the data warehouse, plus the tape backup, plus the corporate file share, plus... and so on. If you have several 6Gbps drives locally, _each_ gets 6Gbps; I've seen a local 6 disk SATA SSD setup in RAID5 deliver 1.4GB/s (i.e. ~14Gbps, or an 8Gbps Active/Active bandwidth aggregating FC's maximum)... on 64KB random reads, and 64KB and larger sequential reads (apparently that was a bandwidth limitation on the controller). Further, each box is using its own throughpout, not sharing it.

Note that SAN's can be very effectively supplemented by putting, say, tempdb data and log files on local SSD's, either SATA/SAS or PCIe; this not only allows tempdb to respond faster than the SAN could at peak, but it keeps tempdb transfers off the SAN, allowing everything else on the SAN to use the throughput and IOPS that are now going to local storage... and since you don't back up tempdb, there's no need to change backup strategies. Most warm and cold DR capabilities are also unaffected by this.
timothyawiseman
timothyawiseman
SSCrazy
SSCrazy (2.4K reputation)SSCrazy (2.4K reputation)SSCrazy (2.4K reputation)SSCrazy (2.4K reputation)SSCrazy (2.4K reputation)SSCrazy (2.4K reputation)SSCrazy (2.4K reputation)SSCrazy (2.4K reputation)

Group: General Forum Members
Points: 2362 Visits: 920
Excellent article, and I cannot overemphasize along with you the importance of the DBA working as part of a team both with the application developers and the administrators that provide the lower level infrastructure.

---
Timothy A Wiseman
SQL Blog: http://timothyawiseman.wordpress.com/
Dave Poole
Dave Poole
SSCoach
SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)

Group: General Forum Members
Points: 16638 Visits: 3403
RobertYoung (7/16/2012)
Missing, not for the first time in such essays, is discussion of normal forms,


Would I be correct in thinking this translates as "design your OLTP database properly and you will get better performance from your hardware"?

I think it would be beneficial to the community as a whole if the more storage savvy amongst you wrote some articles going beyond the very basics. I've only just touched a snowflake on the tip of a very large iceberg.

LinkedIn Profile
www.simple-talk.com
ryan.offord
ryan.offord
SSC-Enthusiastic
SSC-Enthusiastic (136 reputation)SSC-Enthusiastic (136 reputation)SSC-Enthusiastic (136 reputation)SSC-Enthusiastic (136 reputation)SSC-Enthusiastic (136 reputation)SSC-Enthusiastic (136 reputation)SSC-Enthusiastic (136 reputation)SSC-Enthusiastic (136 reputation)

Group: General Forum Members
Points: 136 Visits: 412
I was working on some calculations yesterday around latency and different types of storage. Using some 'typical' figures you would see SATA being around 40 milliseconds of latency, SAS would be in the region of 20 milliseconds and PCIe around 50 microseconds. You can tweak the speed that data comes back but you still have that basic latency. This is something you have to add to every single request to get data back from the disks.

Every touch point in the setup has the potential for latency and also the potential for failure. Network card, cable, switch and so on until you finally get to a disk that has to revolve (and you wait your turn with other requests). The beauty of systems like FusionIO is that it removes these failure / latency points, gives you a massive performance lift and allows you to have a less complex system overall while reducing costs.

If you had the same schema design, data and queries and did a direct comparison between the three you will see a clear performance winner. The complexity gets reduced significantly as does the failure points and latency.

Having the right design makes the world of difference but no matter what we do if there is an underlying latency you will not remove it by schema design.
RobertYoung
RobertYoung
SSC Veteran
SSC Veteran (242 reputation)SSC Veteran (242 reputation)SSC Veteran (242 reputation)SSC Veteran (242 reputation)SSC Veteran (242 reputation)SSC Veteran (242 reputation)SSC Veteran (242 reputation)SSC Veteran (242 reputation)

Group: General Forum Members
Points: 242 Visits: 232
ryan.offord (7/17/2012)

Having the right design makes the world of difference but no matter what we do if there is an underlying latency you will not remove it by schema design.


If an organic NF schema reduces data footprint by an order of magnitude, not unlikely, then not only are there fewer bytes on the wire, there's less complication in the client code. Client code could be reduced to simple display. Doing what your grandfather did, using a better hardware to just do what the old software wanted, is the suboptimal choice this time.
Bill Kline-270970
Bill Kline-270970
Old Hand
Old Hand (373 reputation)Old Hand (373 reputation)Old Hand (373 reputation)Old Hand (373 reputation)Old Hand (373 reputation)Old Hand (373 reputation)Old Hand (373 reputation)Old Hand (373 reputation)

Group: General Forum Members
Points: 373 Visits: 730
The article is good. The discussion has been even better.
mkromer
mkromer
SSC Rookie
SSC Rookie (27 reputation)SSC Rookie (27 reputation)SSC Rookie (27 reputation)SSC Rookie (27 reputation)SSC Rookie (27 reputation)SSC Rookie (27 reputation)SSC Rookie (27 reputation)SSC Rookie (27 reputation)

Group: General Forum Members
Points: 27 Visits: 36
Wonderful article, David! I think of AlwaysOn with read-only secondaries as an implementation on CQRS and was happy to see that pattern mentioned in your story.

BTW, did you mean that solutions like FastTrack work in "symphony" with hardware, not "sympathy" ??


Br, Mark Kromer
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search