Should DBA's need to know the physical details of a SAN??

  • John Rowan - you use SQLIO, IOMeter or another I/O testing app to prove that the drives are not serving up sufficient IOPs or MB per sec.

    Let me remind everyone that there is MUCH more to I/O performance than just the SAN configuration and number of spindles of what RAID type. See my post from back on 12/6. It isn't necessarily the SAN or SAN administrator's fault if your partition isn't serving your data needs.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Thanks SQLGuru. Proving it was not really the problem. I had no problem comming up with enough data to point my finger at the SAN. I was curious as to how Colin goes about presenting those findings to the powers that be in a way that they will accept your 'SAN bashing'.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Hire me to present the findings for you!! 😀

    Seriously, as a consultant I have been the "bad guy" on a number of occassions and it doesn't bother me. It has been beneficial for the worker-bees in a number of scenarios. And some management seem to 'trust' what a consultant tells them over what their subordinates tell them, rightly or wrongly. I have seen cases where it was valid and the employee didn't have sufficient knowledge or training to provide the correct guidance - but I have seen probably more where the deck-plate guy/girl was saying the same thing I did.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • The issue with any storage and especially SAN storage is having non dedicated spindles for the sql server. It's a tricky situation and the Jim Gray mentioned once that when standard fit disks are 1TB and he asks for 10 spindles for his 500gb database how will he justify his request?

    And this is this real issue, when disks were small dedicated arrays were fine, in fact with DAS this is how it works - it's when the same spindles are shared between several servers, I once had one instance where a prod oltp server shared it's spindles with exchange, two BI / DW servers and a couple of others. Although we had raid 1 for t logs these were also shared between several servers. Now the vendors said this was fine, not unsurprisingly they said the cache negated any problems and they were on fibre blah blah blah. In the meantime management blamed sql server - what a surprise! - It took well over a year and I used disk i/o completion time to prove my point - in this case the sql server had a B2B front end which would time out after 30 seconds so if transactions took too long the connection dropped provoking much complaints. io completion time could hit well over 1 second ( really! ) with almost zero load on the prod server and the timeouts were not dependant upon transactional throughput. It was raid 5 as well of course, isn't it always! The raid sets were quite large, maybe 13 or 15 spindles, but of course for write ops you have to divide by 4 for raid 5. We actually had the san vendors engineers on site monitoring when we had these problems and they presented lots of graphs and charts which "proved" it was all ok. The client now has dedicated raid 10 spindle sets and surprise surpise it flies!!

    Get a map of the physical disk mappings if you can, not always an easy task. How do I present my findings these days? I play the snia card as I rarely find those looking after a SAN at a client site have any certifications or training, having been through the snia training I have questions to ask which usually no-one can answer, or want to answer.

    I don't want to paint anyone into a corner here because there isn't a magic bullit to resolve this - managers often believe things are better than they are because either no-one knows how to monitor or measure performance ( or they don't want to ask for another £25k for the tools which don't come as standard with the SAN ) or everyone is too afraid to tell them their costly investment is rubbish. I often find that raid basics seem to have been forgotten - training is often not provided for those in charge of the SAN storage, so this puts those on the back foot.

    I could actually write a book on the subject - problem for the DBA is that he/she is on the sharp end and usually sql server gets the blame rather than the infrastucture - personally I think the ease of use of microsft products - the belief that anyone can ( and usually does ) install and setup products is part to blame - I've upset people before with this view but I bet half these problems don't exist in Unix/Oracle world.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • Wow. I am so learning a lot from this thread. I thought I knew details about our SAN. Until I looked at the chart and realized the RAID configuration they gave us was only for the local server drives (OS files) and that I had no idea if we shared our SAN with anyone else.

    Now I'm going back and asking questions because you guys are correct. It is absolutely essential to understand how the SAN works. Especially considering that I'm looking into performance enhancing techniques like partitioning, separating tables into different files and putting non-clustered indexes on a different file. Non of these work very well if you can't put them on different physical disks.

    Thanks, All, for the informative discussion. I greatly appreciate it.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • ...but of course for write ops you have to divide by 4 for raid 5. ...

    personally I think the ease of use of microsft products - the belief that anyone can ( and usually does ) install and setup products is part to blame - I've upset people before with this view but I bet half these problems don't exist in Unix/Oracle world.

    1000 ms for tlog writes will indeed bring a system to it's knees right quick like!

    For the above comments: if you have a 14 drive RAID5 spindle, isn't write capacity 13/14 (i.e. N-1/N)? you mention dividing by 4.

    I could not agree more with your statement about ease of MS product use being to blame for people (everyone from the deckplate worker bee to the corner office) believing that it has to be MS's fault that their stuff isn't performant. Just had yet another talk on that issue with a corporate CIO yesterday. I see the same bad practices and poor design/development/implementation/maintenance at every single client I go to. Two weeks after I am onboard the client is invariably wowed with what I have done with their systems and the issues I have pointed out.

    On the other side of this coin however is the fact that for MOST companies/products/apps the ease of development/implementation that MS products provide is an INCREDIBLE boon to everyone.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • I'm pretty sure that the DBA doesn't need to know the physical architecture behind a SAN. I do, because I'm both the DBA and the storage engineer, but that's a different story.

    The DBA needs disk space from the SAN at a defined performance level. End of story. If the performace that is required to meet the design is delivered, it doesn't really matter how.

    It would be nice if everybody got along in the server room, but it doesn't always end up that way.

  • TheSQLGuru (12/12/2007)


    ...but of course for write ops you have to divide by 4 for raid 5. ...

    personally I think the ease of use of microsft products - the belief that anyone can ( and usually does ) install and setup products is part to blame - I've upset people before with this view but I bet half these problems don't exist in Unix/Oracle world.

    1000 ms for tlog writes will indeed bring a system to it's knees right quick like!

    For the above comments: if you have a 14 drive RAID5 spindle, isn't write capacity 13/14 (i.e. N-1/N)? you mention dividing by 4.

    Correct - I'm thinking the size of the drive is N-1 (a.k.a 13) for RAID-5. Of course your write performance is substantially less, and technically gets LOWER the more spindles you throw in (assuming a fair amount of random writes occur throughout the DB to account for updates to the data), since you need all parts of the stripe to recalculate the parity each time.

    This page has a fairly decent (and from what I can tell, accurate) description of the operations and how the various levels react to them.

    http://www.storagereview.com/guide2000/ref/hdd/perf/raid/concepts/perfReadWrite.html

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • I've actually used a SAN for SQL Server, and at first the admin (outsourced) didn't want me to have any say in how it was set up. He set the whole SAN up as a single RAID 5 array, and I had to have all my OLTP data files on the same array as my log files and my OLAP files, etc.

    After several months of arguing about it, management finally told him to work with me to configure it as needed in order to get best performance.

    We ended up splitting it into six sets of disks. One RAID 5 for a file server and a few other things that had nothing to do with me, one small RAID 1 array of fast disks for the most critical OLTP database's data files, another for it's index files, and a third for it's log files, and 2 larger RAID 10 arrays for all the other databases (separating data files and log files).

    All these disks were already in the SAN, it was just a matter of allocating them differently.

    It took a weekend to get it set up and running the way I wanted, and the main OLTP database sped up by over 200% as a result, with the other databases getting between 50% and 100% increases in speed.

    Management was very, very happy with that. They had, at the recommendation of the admin (all hardware administration and network administration was done by his company), been considering a $100k+ investment in server hardware, and now they didn't have to do that. (For a 25-person company, that's a lot of server.)

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I've not read all of the replies, so this may have already been said.

    To create an effective solution the DBA and the SAN managers should be working side by side to ensure that the optimal solution is presented.

    Specifically, at a previous company our EVA was configured incorrectly. All data/logs/temp etc was on one drive and there were multiple database servers all connected to the same EVA. So when reporting ran, although it was a different LUN, it could affect production performance. Working with the Infrastructure team, we worked out what servers were using what disks and worked out a new strategy. Needless to say, we made huge strides in performance in a lot of areas.

    It can be a thorny subject because people do like to have control over everything. The key point, however, is that a SAN engineer won't know how SQL Server works as well as you and vice versa so you need to work together to share knowledge.

  • I work for a large organisation which uses SAN storage extensively. While the DBAs here also get a 'one size fits all' response from storage, we find that they are willing to discuss if we ask how the SAN is set up (rather than telling them what we want in terms of RAID), and such a discussion may give you the opportunity to open your storage group's minds to the idea that databases deserve more special treatment than ordinary files 🙂

    One thing to watch out for; if you do get a 'one size fits all' response, make sure they track-align any LUNs which they allocate for you to put databases on. It makes a real difference to performance if any of your SQL blocks (64K) end up spanning blocks returned from the SAN.

  • It's a good interesting discussion and I'm pleased with the points raised, also shows me I'm not alone in sufferring san problems. Did I ever mention sata drives wait til you get them in your san and someone claims they work faster than scsi disks! Honest.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • Just for my education - what does "track-aligning" a LUN entail (meaning - what is it that's being done?)

    As for SATA - I've only dealth with EMC (which only allows 7500RPM SATA I with no NCQ and the crippled seek caches), so yes - SATA is pretty nasty in that scenario. It's actually an unfair comparison (about the same as comparing IDE to SCSI), and would be curious how well it stacks up when you actually throw non-"crippled" drives into the mix... The newer ones with the right bells and whistles have better stats than even SCSI drives on paper so if they don't perform somewhat close, then there's some other flaw they're not letting in on (like - the controllers/negotiation for example)..

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Yeah - I have heard the "a drive is a drive is a drive" mantra before too. 15K RPM SCSI (or perhaps SAS) drives are the only way to go if optimal performance is required. SATA, even SATA II, should be relegated for archival, backups, and low-grade data throughput need situations, IMHO. You can get some big honkin' volumes with SATA though. Man those drives just keep getting bigger and bigger!

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Oh - I have no doubt that SAS and SCSI is better. I just wish someone would actually come out with the "why SATA doesn't stack up against SCSI", because once NCQ started p0pping up, all of the arguments I knew of vanished... You don't have much to point to when the beancounter shows up and says - so WHY do you want these drives that are 5 times as much for 1/3 the space and same speed as these other drives?

    That being said - we did leverage their huge storage for some near-line archiving. NOT the backup drive, but where the backup gets copied to once the backup operation is completed.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Viewing 15 posts - 31 through 45 (of 55 total)

You must be logged in to reply to this topic. Login to reply