Multiple DataFiles on SAN disks?

  • In the past, an i/o performance enhancing option was/is to split up your database into multiple datafiles and put each datafile on it's own dedicated disk array. The idea being that the more heads reading/writing data the better. But now we have the modern SAN systems, which effectively spread out this i/o already (if the SAN admin does his job correctly!). Is there any advantage to continuing this practice of multiple datafiles on a SAN? It seems the striping the data across the datafiles is completely unnecessary in this case, and perhaps even adds overhead to the i/o process, thus making it slower.

    thoughts?

  • JarJar (3/16/2016)


    In the past, an i/o performance enhancing option was/is to split up your database into multiple datafiles and put each datafile on it's own dedicated disk array. The idea being that the more heads reading/writing data the better. But now we have the modern SAN systems, which effectively spread out this i/o already (if the SAN admin does his job correctly!). Is there any advantage to continuing this practice of multiple datafiles on a SAN? It seems the striping the data across the datafiles is completely unnecessary in this case, and perhaps even adds overhead to the i/o process, thus making it slower.

    thoughts?

    As always it REALLY depends on what is on the disks your SQL Server files reside on. If there is a lot of other stuff on them, they are thin provisioned, etc., then you won't be getting ANY sequential IO no matter what and what you do with your SQL Server files is of only minor consequence because of the huge perf drop-off between sequential and random IO on rotating media.

    There are multi-pathing and other things (cache being one of the most important ones) that come into play here.

    I will note that of the MANY of clients I have done performance reviews on NONE had a SAN configuration that was anywhere NEAR optimal for SQL Server IO performance - and most of them were blaming SQL Server for not being a good product. Of course their hardware config, virtualization layer config, windows config, SQL Server config, network config, application config, etc, etc SUCKED too!! :w00t:

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • thanks for the info, kevin.

    i *think" we are in good shape on the san side. disk has not been any issue so far (knock on wood). we benchmarked our disk with sqlio and everything checked out. we're about to migrate from SQL 2012 to 2014 and i have been wondering about this 2TB database being split up into 8 datafiles, if it's even necessary. it's been configured like that from the initial setup by externals on SQL 2008 and since then a lot has evolved.

    do you have a link that you can recommend for sql san configuration optimization? i am not a san admin so it would be great to have a reference point to discuss with my admins.

    thanks again.

  • JarJar (3/16/2016)


    thanks for the info, kevin.

    do you have a link that you can recommend for sql san configuration optimization? i am not a san admin so it would be great to have a reference point to discuss with my admins.

    I don't, but I have stuff in my head. 😀

    I would search for stuff from Denny Cherry, David Klee and Brent Ozar to see if you can find useful information/guidelines.

    Or you can hire someone to give your system a performance review. I know a good consultant for that. 😎

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • My knowledge of SAN technology is limited, but I do know that when a SAN is setup, the disk array can be optimized to support different application case usages. For example, what's good for MS Exchange isn't necessarily good for SQL Server, and the same goes for OLTP databases versus OLAP.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Do you know if you're using MetaLUNs on the SAN?

  • if all LUNs are from the same SAN storage pool they'll share the same cache too, with multiple LUNs and multiple I\O requests to the same storage pool you may impact the cache performance, all depends on how this is all setup really

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • jon - not sure about MetaLUNs. my SAN admin is out today. is that specific to EMC? we're running hitachi SANs.

    perry- the 2TB database is split into 8 datafiles but all the datafiles are on the same LUN. we have a dedicated SAN setup for our SQL storage, so there is no cross-application sharing of these disks. it's all dedicated for SQL. we have groups set up for production data, production logs, and qa/dev data, qa/dev logs, all tiered in levels of performance.

  • For performance in the good-old-days you would use a separate spindle or set of spindles for each allocation. Of course you would group some smaller files but that de-tunes the performance in the name of simplicity.

    Today our SAN and VMware admins want to use large, like 4TB, SAN LUNs and carve out multiple VM's storage requirements.

    However, for performance you will still want a separate SAN LUN (VMware Datastore) for one filesystem where you will locate one important database file. I have one application where there are 30 important database files hence I am asking for 30 SAN LUNs and 30 separate mounted filesystems.

  • JarJar (3/16/2016)


    jon - not sure about MetaLUNs. my SAN admin is out today. is that specific to EMC? we're running hitachi SANs.

    perry- the 2TB database is split into 8 datafiles but all the datafiles are on the same LUN. we have a dedicated SAN setup for our SQL storage, so there is no cross-application sharing of these disks. it's all dedicated for SQL. we have groups set up for production data, production logs, and qa/dev data, qa/dev logs, all tiered in levels of performance.

    Meta LUNs are used to expand an existing LUN, each time space is required a new LUN is created and added to the meta LUN config

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Meta LUNs are used to expand an existing LUN, each time space is required a new LUN is created and added to the meta LUN config

    That, AND it's the striping across LUNS, such that, "any data that gets written to a metaLUN component is striped across all the LUNs in the component."

    ref: https://community.emc.com/docs/DOC-22633

    At this point, SAN admins will start arguing that it doesn't matter if you setup separate mount point for sequential vs. random. I'm no storage admin, so I'm not going to wade into the deep waters. There is more to it like caching, connections, and queue depth configs.

    This is an EMC feature, but I think Hitachi has something similar.

  • Pretty much every SAN now in the mid-high end range use disk pooling technology and fancy virtual raids to stripe data as wide as possible. Most will want to stripe the data across all like disks (size and speed). Some will even allow track priority, meaning certain luns you can prioritise to always be on the outer tracks of the disks.

    Brent Ozar does some really good blogs about all this and how SQL admins don't need to be as scared as they once were. Really no matter what you do storage will always be the bottleneck so just size the server as needed and generally it wont be an issue (more ram = less reads going to disk means the storage has more IOPS to focus on writes). I guess there are other reason to break up DB into multiple files making restores easier might be one? But as far as performance probably not so much. Unless we are talking tempDB.

    From my testing you defiantly don't want thin provisioned disk for SQL DB that have a lot of writes.

  • Using multiple files for a database can be useful regardless of the number of physical volumes (spinning rust or SSD) that a single logical volume gets mapped to.

    If your IO rates are very high, especially write IO, then you may get locking and latching on space map pages. There has been a lot written about this regarding tempdb as this is normally where this issue is seen first, but the same problem can affect any database. Splitting your filegroup into multiple files is not something most people need to do, because most people do not have high enough IO rates. However, if your monitoring does show a problem then multiple data files (even hosted on the same LUN) can mitigate this issue.

    I have also seen a post from the Windows CAT team concerning a bottleneck at the NTFS level with very very high IO rates, due to NTFS lock management. There are very few sites worldwide that might see this problem, but if you think you are producing significantly more IO than anybody you know then this bottleneck is something to be aware of. The mitigation is to use more Windows volumes. I would recommend contacting Microsoft before taking action on this, because if you have not diagnosed the correct cause you will do a lot of work that will not improve your situation.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply