SAN best practices

  • Thanks for that.

    I think I have enough ammo now. Going to see if I can get a meeting on monday. The san 'experts' won't be there, since they're in London, but hopefully I can get the PM, architect, local storage people and other 'important' people to understand SQL's requirements for a SAN.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I wanted to thank everyone for their input and the links. I have been in 2 companies now that have a SAN. In both companies the storage management team has acted in the same manner as described in this discussion. I am starting to believe that the SAN administration training that the storage team goes to includes a "don't let anyone use this device" chapter.

    Q

    Please take a number. Now serving emergency 1,203,894

  • Marios Philippopoulos (5/8/2008)


    I'm also experiencing difficulties convincing our SAN people that RAID 5 is unacceptable, especially for tempdb.

    Also tempdb data files need to be on their own dedicated physical device.

    I guess in some ways the onus is on the DBAs to prove - even after the fact - that performance bottlenecks are caused by the infrastructure and not the database server, although I am not sure how that could be done.

    Anyone know what metrics could point to SAN-related bottlenecks as opposed to SQL-related ones?

    I think that would be very helpful.

    Perf Mon counters: Avg Disk Sec/read and Avg Disk Sec/write. Also take a look at sys.dm_io_virtual_file_stats.

    Oh, and the SAN vendor should provide a tool that can examine various IO path performance metrics.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (5/9/2008)


    Marios Philippopoulos (5/8/2008)


    I'm also experiencing difficulties convincing our SAN people that RAID 5 is unacceptable, especially for tempdb.

    Also tempdb data files need to be on their own dedicated physical device.

    I guess in some ways the onus is on the DBAs to prove - even after the fact - that performance bottlenecks are caused by the infrastructure and not the database server, although I am not sure how that could be done.

    Anyone know what metrics could point to SAN-related bottlenecks as opposed to SQL-related ones?

    I think that would be very helpful.

    Perf Mon counters: Avg Disk Sec/read and Avg Disk Sec/write. Also take a look at sys.dm_io_virtual_file_stats.

    Oh, and the SAN vendor should provide a tool that can examine various IO path performance metrics.

    Any ballpark values signifying trouble?

    My problem would be not knowing how to interpret the results.

    __________________________________________________________________________________
    SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
    Persisting SQL Server Index-Usage Statistics with MERGE[/url]
    Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

  • GilaMonster (5/8/2008)


    Raid type is not dictated by anyone other than the storage team. In [head office] I would be suprised if the DBAs could tell you which system was on what raid type .... The application team shouldn't be requesting dedicated drives. This defeats the entire purpose of a SAN. The performance will be horrible.

    Wow, I really feel your pain - sounds like bloat on the "storage team" (however many that is) and a non-customer service attitude. The last place I worked where dealt with SAN directly had one guy who managed the several SANs because the things have such easy to use GUIs that all it takes is 1 knowledgable person to take care of multiple ones. He had a customer service oriented attitude towards providing the kind of storage different servers needed. Sounds like you got the exact opposite.

    All I can suggest is that you write up an SLA that pretty much demands dedicated LUN performance without specifying it exactly and then filling their trouble ticket queue every 30 minutes when their general pool approach doesn't cut it.

    Remember, they can carve out dedicated space and move your data around inside the SAN in seconds with the click of a mouse so switching over to dedicated space is just a matter of convincing them to do it and being offline for a minute. Don't let them con you into thinking it's a huge long deal to change. Step 1 is to get on the SAN in the first place, step 2 is to get it the way you like it.

    I wouldn't waste any effort complaining about RAID level on any decent SAN - your own LUN is by far the important thing. We had a data warehouse on its own LUN at RAID 5 and the performance monitor showed the HBA almost always maxed out its bandwidth anyway - a different RAID level wouldn't have helped. Oh, and how many storage processors on the SAN versus how many clients is another important one to be aware of.

    I thought the entire purpose of a SAN was to *consolidate* storage not *pool* storage. There's a big difference. If that comment about performance being horrible with your own LUN is from someone on the "storage team" then you're in trouble - get attached storage instead.

  • 1-2ms for log writes. Anything > 5ms is probably trouble.

    8-10ms for various and sundry writes/reads. >20ms for extended periods is probably trouble

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • The main problem with SAN performance is that the monitoring tools are very expensive, most clients don't buy them so you are on a hiding to nothing to start with! Most perfmon counters don't work on a SAN, or give very silly results. The ones I use are disk idle time ( subtract from 100 for disk utilisation ) and io completion av disk sec/write sec/read . You'd expect to see under 6ms for a read and under 10 ms for a write. ( typically dedicated luns would achieve better than this ).

    The io completion time will show if you have shared lun contention as your disk performance will not be related to server activity - you'll need to plot other counters to show how busy your server is.

    It's unusual for a HBA to be saturated especially with multi path and redundant networks - most HBA's are usually 4Gb today and in multipath that's give you a 8Gb min path which is a hell of a lot of sql data !! However as in all things there well configured sans and there are others!

    In the main I've found that SAN Vendor and manufacturer consultants are "unhelpful" towards the DBA - I have to say I've had great dialogs with Hitachi but sadly never worked with their storage; and I don't want to name names! Disk read and disk writes per sec work on a san too.

    Disk alignment is important as is hba queue/buffer depth and you'd want to check your switches too.

    Clients I've worked for always started with shared LUNs as the vendor told them that was the best way, sigh, since then they have moved to dedicated raid 10 / 1 LUNs only using raid 5 for the backup drive. Surprise surprise - so many performance issues went away.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • magarity kerns (5/9/2008)


    All I can suggest is that you write up an SLA that pretty much demands dedicated LUN performance without specifying it exactly and then filling their trouble ticket queue every 30 minutes when their general pool approach doesn't cut it.

    That's what my boss suggested too. I'd rather get it done right first time, especially since I'm the one the users complain to when the system's slow.

    If we can't persuade tham, then that is the route we'll go.

    Colin: What kind of SAN-level monitoring tols would you recommend (EMC SAN) and what kind of results from them would denote problems?

    If that comment about performance being horrible with your own LUN is from someone on the "storage team" then you're in trouble - get attached storage instead.

    It was. :ermm:

    Direct attached storage isn't an option, as it's a clustered server. Also corporate policy states all servers will use SAN storage.

    Another comment from storage team:

    [Large SAN Vendor]'s best practice is to spread everything out wide across the array so everyone's sharing with everyone.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Gail, DO NOT GIVE IN.

    We had a 6 month argument with our hosting company about the performance of their shared SAN environment. We were trying to implement 3 new clustered database servers. While performing io tests using SQLIO and sql Sim we were able to affect the performance on our current Production database server. We would see actually see the queue length increase. This is despite the fact that we were told repeatedly that our new LUN's were totally separate from our existing ones. Supposedly the drives were on different shelves. I spent many nights getting up at 2 in the morning to demonstrate this for their "experts". The hosting company escalated this all the way to the SAN provider as there was evidence of issues with their processor.

    I provided evidence in the form of iops/s and mb/s comparing the LUN's provided to an internal 4 disc RAID 10 Array.

    IOPS/sec Internal SAN

    WritesRandom 82716916

    Random 641395398

    Random 128739223

    Random 25612289

    Sequential 8147447408

    Sequential 641283869

    Sequential 128721441

    Sequential 256462234

    ReadsRandom 818541048

    Random 641377720

    Random 1281051385

    Random 256569294

    Sequential 8139528238

    Sequential 6437631642

    Sequential 1281894795

    Sequential 256986485

    MBs/Sec Internal SAN

    WritesRandom 821.227.15

    Random 6487.2424.9

    Random 12892.4327.99

    Random 25630.6322.49

    Sequential 8115.1857.87

    Sequential 6480.2354.31

    Sequential 12890.1555.23

    Sequential 256115.6858.55

    ReadsRandom 814.498.19

    Random 6486.0745.04

    Random 128131.448.14

    Random 256142.3973.74

    Sequential 8235.2464.36

    Sequential 64172.9102.68

    Sequential 128236.8299.49

    Sequential 256246.51121.25

    All of this effort was pointless as they came back and told us it was our database design. Which, while it can use some improvement, runs perfectly fine with 8 internal drives (split into 2 raid 1's and one raid 10)

    We did not implement the SAN solution with the company and started immediately looking at other SAN providers. Every provider wants to say the same thing, that they do not need to provide the separated structure we request as far as drives and raid level configurations.

    I found that using these tools to benchmark the io proved to be invaluable.

  • >>Direct attached storage isn't an option, as it's a clustered server. Also corporate policy states all servers will use SAN storage. <<

    The first statement is a wive's tale. You can indeed cluster on non-SAN storage. The second statement definitely has some merit, although the reality is that companies regularly shoot themselves in the *ss with it. 🙂

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • I have a document called EMC Storage Solutions: SQL 2000 and SQL 2005 Best Practices Planning. While it is slanted toward EMC storage solutions, I think the overall jist of the document with its capacity planning, disk guidelines, etc may be of use.

    ftp://ftp.emc.com/outgoing/astyers/H601_2CLARiiONStor_Arr_MS_SQLSvr_ldv.pdf

    - Amy

  • Here is a paper from dell and microsoft.

    http://www.dell.com/downloads/global/power/ps4q07-20070555-Chen.pdf

    The appendix on this one is good:

    http://download.microsoft.com/download/4/7/a/47a548b9-249e-484c-abd7-29f31282b04d/PhysDBStor.doc

    Hope it helps.

    Mary


    Mary Myers

  • Because I wish to be diplomatic and to be honest slagging off a company is really bad practice and doesn't really add anything constructive - I don't want to mention names - but my experiece is that one company in particular, in my personal experience, is the world's worst at both configuring their hardware, telling the truth about raid and caching and selling impossible dreams to management - but I do believe they provide excellent jollys for those with the cheque book but try to avoid any techies

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • Hello everyone,

    we have one heavy busy database. I was able to prove that we could do 7 times the current load. But then the SAN is very busy at checkpoints. Is there any way other than using a manual checkpointing to lessen the SAN stress?

  • TheSQLGuru (5/12/2008)


    >>Direct attached storage isn't an option, as it's a clustered server. Also corporate policy states all servers will use SAN storage. <<

    The first statement is a wive's tale. You can indeed cluster on non-SAN storage. The second statement definitely has some merit, although the reality is that companies regularly shoot themselves in the *ss with it. 🙂

    Yes, you can, and yes, we are, and yes, its a PITA! Our server team complains about the amount of white space on the sql servers, but refuses to consider a SAN to help fix that problem. Balancing a multi-instance 2-server cluster with attached storage is doable but difficult. If you guess wrong on the storage requirements for the instance, re-balancing the storage is a major undertaking and is painful.:crazy:

Viewing 15 posts - 16 through 30 (of 57 total)

You must be logged in to reply to this topic. Login to reply