Are these SQLIO results ok? Need to ensure that new SAN IO latency is topnotch

  • Hi,

    Just run SQLIO on my new Cluster and got very interesting results that I would like to share and/or get some feedback.

    1st, I am not so sure about the SAN specific’s model and cache, but I’ve been told it’s a DELL Compellent SC8000.

    I did two rounds of tests, each node, after failing over (Cluster is not in production yet). I tested the SQL mount point s(data and logs) using a 30GB file. On tempdb mount point I used a 5GB file due space restrictions.

    I requested tempdb to be tier1; we use tempdb a lot, it’s a bottleneck on our old systems. It is my understanding that the tier1 put disks on a RAID10 layout. The databases are DW type, mostly reads, but nightly imports on few critical tables.

    Having said that, these some of the results ....

    Doing Writes:

    sqlio v1.5.SG

    using system counter for latency timings, 2474111 counts per second

    parameter file used: data_param.txt

    file E:\SQL_Data\MSSQL\testfile.dat with 2 threads (0-1) using mask 0x0 (0)

    2 threads writing for 120 secs to file E:\SQL_Data\MSSQL\testfile.dat

    using 64KB random IOs

    enabling multiple I/Os per thread with 8 outstanding

    buffering set to use hardware disk cache (but not file cache)

    using specified size: 30480 MB for file: E:\SQL_Data\MSSQL\testfile.dat

    initialization done

    CUMULATIVE DATA:

    throughput metrics:

    IOs/sec: 824.99

    MBs/sec: 51.56

    latency metrics:

    Min_Latency(ms): 1

    Avg_Latency(ms): 18

    Max_Latency(ms): 627

    histogram:

    ms: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+

    %: 0 1 1 2 4 7 10 12 14 12 10 7 5 3 2 3 1 1 0 0 0 0 0 0 4

    sqlio v1.5.SG

    using system counter for latency timings, 2474111 counts per second

    parameter file used: data_param.txt

    file E:\SQL_Data\MSSQL\testfile.dat with 2 threads (0-1) using mask 0x0 (0)

    2 threads writing for 120 secs to file E:\SQL_Data\MSSQL\testfile.dat

    using 8KB random IOs

    enabling multiple I/Os per thread with 8 outstanding

    buffering set to use hardware disk cache (but not file cache)

    using specified size: 30480 MB for file: E:\SQL_Data\MSSQL\testfile.dat

    initialization done

    CUMULATIVE DATA:

    throughput metrics:

    IOs/sec: 9406.30

    MBs/sec: 73.48

    latency metrics:

    Min_Latency(ms): 0

    Avg_Latency(ms): 1

    Max_Latency(ms): 33

    histogram:

    ms: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+

    %: 1 92 2 1 1 1 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0

    Doing Reads:

    sqlio v1.5.SG

    using system counter for latency timings, 2474111 counts per second

    parameter file used: data_param.txt

    file E:\SQL_Data\MSSQL\testfile.dat with 2 threads (0-1) using mask 0x0 (0)

    2 threads reading for 120 secs from file E:\SQL_Data\MSSQL\testfile.dat

    using 64KB random IOs

    enabling multiple I/Os per thread with 8 outstanding

    buffering set to use hardware disk cache (but not file cache)

    using specified size: 30480 MB for file: E:\SQL_Data\MSSQL\testfile.dat

    initialization done

    CUMULATIVE DATA:

    throughput metrics:

    IOs/sec: 1779.39

    MBs/sec: 111.21

    latency metrics:

    Min_Latency(ms): 0

    Avg_Latency(ms): 8

    Max_Latency(ms): 39

    histogram:

    ms: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+

    %: 0 0 0 0 1 1 1 1 91 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0

    sqlio v1.5.SG

    using system counter for latency timings, 2474111 counts per second

    parameter file used: data_param.txt

    file E:\SQL_Data\MSSQL\testfile.dat with 2 threads (0-1) using mask 0x0 (0)

    2 threads reading for 120 secs from file E:\SQL_Data\MSSQL\testfile.dat

    using 8KB random IOs

    enabling multiple I/Os per thread with 8 outstanding

    buffering set to use hardware disk cache (but not file cache)

    using specified size: 30480 MB for file: E:\SQL_Data\MSSQL\testfile.dat

    initialization done

    CUMULATIVE DATA:

    throughput metrics:

    IOs/sec: 13686.86

    MBs/sec: 106.92

    latency metrics:

    Min_Latency(ms): 0

    Avg_Latency(ms): 0

    Max_Latency(ms): 32

    histogram:

    ms: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+

    %: 16 83 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

    Any feedback and opinion is highly appreciated ... my main interest is discovered any potential problem, if one, prior going live and before installing the SQL 2012 fail-over instance.

  • The 64K random read and random write tests may be OK, but I would not call them spectacular, especially the 64K random write.

    You should probably test again, adding more threads, until you reach an average Avg_Latency in the range of 25 to 50 ms, to find the maximum throughput it can sustain.

    If possible, increase the size of the test file to make sure you are measuring the disk speed and not the SAN cache speed and size.

  • Michael Valentine Jones (12/18/2012)


    The 64K random read and random write tests may be OK, but I would not call them spectacular, especially the 64K random write.

    You should probably test again, adding more threads, until you reach an average Avg_Latency in the range of 25 to 50 ms, to find the maximum throughput it can sustain.

    If possible, increase the size of the test file to make sure you are measuring the disk speed and not the SAN cache speed and size.

    Fantastic post, thanks!

    Yes, I was thinking on increasing the test file size to 50GB, maybe bigger. I do not know the SAN's cache specs yet. I need to ask our SAN admin.

    Now, in case the numbers are slow (I'm not a SAN expert) what kind of tweaks or changes can be done at SAN level? MPIO is installed plus I patched Windows on both nodes.

  • Looks to me like the disks aren't formatted at 64Kb. I'd definitly check this as there are some gains to be made here.

    Leo

    Leo
    Nothing in life is ever so complicated that with a little work it can't be made more complicated.

  • Leo.Miller (12/18/2012)


    Looks to me like the disks aren't formatted at 64Kb. I'd definitly check this as there are some gains to be made here.

    Leo

    I am not following you. The 64 KB was specified by me. The test uses 64 and 8KB page sizes.

  • Increased the test file size to 150GB and performed another round of tests ...

    64KB random IOs

    sqlio v1.5.SG

    using system counter for latency timings, 2474091 counts per second

    parameter file used: data_param.txt

    file E:\SQL_Data\MSSQL\testfile.dat with 2 threads (0-1) using mask 0x0 (0)

    2 threads writing for 120 secs to file E:\SQL_Data\MSSQL\testfile.dat

    using 64KB random IOs

    enabling multiple I/Os per thread with 8 outstanding

    buffering set to use hardware disk cache (but not file cache)

    using specified size: 150480 MB for file: E:\SQL_Data\MSSQL\testfile.dat

    initialization done

    CUMULATIVE DATA:

    throughput metrics:

    IOs/sec: 677.04

    MBs/sec: 42.31

    latency metrics:

    Min_Latency(ms): 1

    Avg_Latency(ms): 23

    Max_Latency(ms): 660

    histogram:

    ms: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+

    %: 0 1 2 4 8 9 9 11 12 11 8 7 4 3 1 1 1 1 0 0 0 0 0 0 7

    64KB sequential IOs

    sqlio v1.5.SG

    using system counter for latency timings, 2474091 counts per second

    parameter file used: data_param.txt

    file E:\SQL_Data\MSSQL\testfile.dat with 2 threads (0-1) using mask 0x0 (0)

    2 threads writing for 120 secs to file E:\SQL_Data\MSSQL\testfile.dat

    using 64KB sequential IOs

    enabling multiple I/Os per thread with 8 outstanding

    buffering set to use hardware disk cache (but not file cache)

    using specified size: 150480 MB for file: E:\SQL_Data\MSSQL\testfile.dat

    initialization done

    CUMULATIVE DATA:

    throughput metrics:

    IOs/sec: 700.92

    MBs/sec: 43.80

    latency metrics:

    Min_Latency(ms): 1

    Avg_Latency(ms): 22

    Max_Latency(ms): 665

    histogram:

    ms: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+

    %: 0 1 2 4 7 8 10 11 12 11 9 7 4 2 1 1 1 1 0 0 0 0 0 0 6

    64KB random IOs (reads)

    sqlio v1.5.SG

    using system counter for latency timings, 2474091 counts per second

    parameter file used: data_param.txt

    file E:\SQL_Data\MSSQL\testfile.dat with 2 threads (0-1) using mask 0x0 (0)

    2 threads reading for 120 secs from file E:\SQL_Data\MSSQL\testfile.dat

    using 64KB random IOs

    enabling multiple I/Os per thread with 8 outstanding

    buffering set to use hardware disk cache (but not file cache)

    using specified size: 150480 MB for file: E:\SQL_Data\MSSQL\testfile.dat

    initialization done

    CUMULATIVE DATA:

    throughput metrics:

    IOs/sec: 1794.08

    MBs/sec: 112.13

    latency metrics:

    Min_Latency(ms): 0

    Avg_Latency(ms): 8

    Max_Latency(ms): 80

    histogram:

    ms: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+

    %: 0 0 0 0 7 1 1 1 86 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1

    Wonder why I can't get anything lower or equal tha 20ms on random writes... or, what should I tweak to get better latency values.

    I do like the read and random IO results though, 8ms average. Giving the case of the file's size, 150GB, I do not believe is being cached by the SAN disk controller ...

    Any opinions?

  • Hi sql lover,

    I'm doing a similar exercise with a clients SAN here in NZ.

    Before I even did SQLIO, I actually spent a few 30 minute sessions with our SAN tech (and VMWare tech) going over both configurations to confirm what is setup and how. You need to have some basic understanding of SANs before doing this.

    So go hit up BrentOzar.com and check out his amazing blog entries covering SAN performance:

    http://www.brentozar.com/archive/2011/09/automated-tiered-storage-for-databases/

    http://sqlserverpedia.com/blog/sql-server-performance-tuning/sqlio-tutorial/

    The other thing to bear-in-mind is if your servers are virtualized as the multipathing configuration will affect performance (in my case VMWare multipathing was set up incorrectly for the SAN here.....):

    http://www.brentozar.com/archive/2009/05/san-multipathing-part-2-what-multipathing-does/

    Good luck!

    Cheers

    Michael

  • Hi Michael,

    Thanks for reply.

    While SAN and storage is not my main area of expertise, I do have a basic understanding of them and more important yet, how SQL interacts or should be configured on a SAN. My question about what to tweak is more towards the SAN, not MS-SQL.

    The environment is a real Cluster, not VMware.

    And those two are fantastic link / resources. I actually had the chance to look at them before. 😉

  • Ah ok 🙂 hehe suck eggs! Sorry!

    Hmmm well I thought the performance you posted is good. I'm assuming the sets you've posted are the fastest results out of the cmd file Brent Ozar posted?

    In terms of tweaking the SAN, you could look at disk sets in the RAID configurations? Maybe make it wider? One problem I have with mine is the old RAID 6 vendor answer to redundancy has been applied, thus making my writes absolutely awful...

    What type of disk is in there? SAS 10k? SAS 15k?

    Do you know what kind of throughput and response time you need?

    The only other thing I can think of is ensuring your read heavy elements and your write heavy elements sit on different enclosures within the SAN? This'll prevent thrashing on one particular set of disks if the IOP traffic is switching all over the show (between random and sequential). I think Brent did a post on this somewhere......I'll try and dig it out.

    The other question is what will sit with the Clusters allocated disk? What other services on the network will use it? Or is it all dedicated (I'm so jealous if you've managed to persuade the SAN tech to do that btw!!!).

    Cheers

    Michael

  • @michael-2,

    Yes, after last round of test and using 150GB file, I think that the 64KB test (writes and reads) looks good, specially the sequential.

    Our systems are DW type of databases, so I am not too concern about the writes. But I was expecting 20ms or lower on the writing results. I think I will ask our SAN admin to create another LUN, so I can map it, assign to the cluster, and do another round of tests. I may ask for a different SAN configuration on that LAN and maybe do a customize Windows formatting too.

    I think, not sure, our SAN does automatic tiered storage, but I can "force" or dictate which LUN goes on which tier. I'll validate that with our SAN expert.

    Thanks again for reply! 🙂

  • sql-lover (12/18/2012)


    Leo.Miller (12/18/2012)


    Looks to me like the disks aren't formatted at 64Kb. I'd definitly check this as there are some gains to be made here.

    Leo

    I am not following you. The 64 KB was specified by me. The test uses 64 and 8KB page sizes.

    I believe he meant the cluster size that the disk is formatted at, not the size of the IO for your test.

    The usual recommendation is that disks that hold SQL Server data files be formatted at 64K.

  • On our IBM SAN (V7000) we have three physical enclosures which have two RAID 6 sets in them..... so you *should* be able to get split out....

    Hope you get the traction you need 🙂

    BTW, check this for a quick hit too:

    http://i.brentozar.com/presentations/Ozar_SQLConnections_SQL315_BuildingFasterSQLServers.pdf

    Cheers

    Michael

  • sql-lover (12/19/2012)


    Increased the test file size to 150GB and performed another round of tests ...

    64KB random IOs

    sqlio v1.5.SG

    using system counter for latency timings, 2474091 counts per second

    parameter file used: data_param.txt

    file E:\SQL_Data\MSSQL\testfile.dat with 2 threads (0-1) using mask 0x0 (0)

    2 threads writing for 120 secs to file E:\SQL_Data\MSSQL\testfile.dat

    using 64KB random IOs

    enabling multiple I/Os per thread with 8 outstanding

    buffering set to use hardware disk cache (but not file cache)

    using specified size: 150480 MB for file: E:\SQL_Data\MSSQL\testfile.dat

    initialization done

    CUMULATIVE DATA:

    throughput metrics:

    IOs/sec: 677.04

    MBs/sec: 42.31

    latency metrics:

    Min_Latency(ms): 1

    Avg_Latency(ms): 23

    Max_Latency(ms): 660

    histogram:

    ms: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+

    %: 0 1 2 4 8 9 9 11 12 11 8 7 4 3 1 1 1 1 0 0 0 0 0 0 7

    64KB sequential IOs

    sqlio v1.5.SG

    using system counter for latency timings, 2474091 counts per second

    parameter file used: data_param.txt

    file E:\SQL_Data\MSSQL\testfile.dat with 2 threads (0-1) using mask 0x0 (0)

    2 threads writing for 120 secs to file E:\SQL_Data\MSSQL\testfile.dat

    using 64KB sequential IOs

    enabling multiple I/Os per thread with 8 outstanding

    buffering set to use hardware disk cache (but not file cache)

    using specified size: 150480 MB for file: E:\SQL_Data\MSSQL\testfile.dat

    initialization done

    CUMULATIVE DATA:

    throughput metrics:

    IOs/sec: 700.92

    MBs/sec: 43.80

    latency metrics:

    Min_Latency(ms): 1

    Avg_Latency(ms): 22

    Max_Latency(ms): 665

    histogram:

    ms: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+

    %: 0 1 2 4 7 8 10 11 12 11 9 7 4 2 1 1 1 1 0 0 0 0 0 0 6

    64KB random IOs (reads)

    sqlio v1.5.SG

    using system counter for latency timings, 2474091 counts per second

    parameter file used: data_param.txt

    file E:\SQL_Data\MSSQL\testfile.dat with 2 threads (0-1) using mask 0x0 (0)

    2 threads reading for 120 secs from file E:\SQL_Data\MSSQL\testfile.dat

    using 64KB random IOs

    enabling multiple I/Os per thread with 8 outstanding

    buffering set to use hardware disk cache (but not file cache)

    using specified size: 150480 MB for file: E:\SQL_Data\MSSQL\testfile.dat

    initialization done

    CUMULATIVE DATA:

    throughput metrics:

    IOs/sec: 1794.08

    MBs/sec: 112.13

    latency metrics:

    Min_Latency(ms): 0

    Avg_Latency(ms): 8

    Max_Latency(ms): 80

    histogram:

    ms: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+

    %: 0 0 0 0 7 1 1 1 86 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1

    Wonder why I can't get anything lower or equal tha 20ms on random writes... or, what should I tweak to get better latency values.

    I do like the read and random IO results though, 8ms average. Giving the case of the file's size, 150GB, I do not believe is being cached by the SAN disk controller ...

    Any opinions?

    I would call the random IO results borderline, depending on the application.

    The sequential write IO is far below what I would expect on a new system. You may be bottlenecked by the speed of the underlying disks. I would be very uncomfortable releasing that to production.

  • sql-lover (12/18/2012)


    Leo.Miller (12/18/2012)


    Looks to me like the disks aren't formatted at 64Kb. I'd definitly check this as there are some gains to be made here.

    Leo

    I am not following you. The 64 KB was specified by me. The test uses 64 and 8KB page sizes.

    I'm talking about the actuall formatting of the drives, By default most sysadmins who build serves use the default drive format allocation unit which is 4Kb. If you run >fsutil fsinfo ntfsinfo X: from a command line (whereX is a drive letter) you will get the drive information. You are interested in the line: Bytes Per Cluster : If it says 4096 it is formated with 4Kb allocation units. You want this to say 65536 i.e. 64Kb. This can make a noticable change in disk performance.

    You may need to reformat your data dives, in which case you may need to backup your data. If you've been carefull and kept the system DBs apart from the user DBs you can backup and then restore the databases after reformatting.

    Cheers

    Leo

    Leo
    Nothing in life is ever so complicated that with a little work it can't be made more complicated.

  • Leo.Miller (12/19/2012)If it says 4096 it is formatted with 4Kb allocation units. You want this to say 65536 i.e. 64Kb. This can make a noticeable change in disk performance.

    Leo

    Can you expand a bit an explain why? ...

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply