Filegroup and number of files per FG questions - Redesign of 2 TB DB with 2,000+ batches per second

  • Hi all,

    We're currently under going an exercise to re-design the storage structure of our production DB. This is a system that as been in place for 5+ years without any adjustments and i'm looking for some feedback on what i've come up with so far.

    Thanks in-advance for taking the time to read and/or respond!

    Summary

    We are wondering 2 things:

    • Recommended number of files per filegroup

    • Does our design make sense or is it completely crazy and will it cause untold amounts of headache

    Design calls for a total of 16 files across 4 filegroups (spread over 2 pools of disks – SSD/SAS) as outlined below.

    With that said, let the giant walls of text begin!

    A quick overview

    Windows Server 2008 Enterprise

    SQL Server 2008 SP2 CU5 Enterprise

    Dual socket, 8 core (X7560) (32 cores total w/ hyperthreading)

    512 GB RAM

    2 Node Active/Passive Cluster

    IBM v7000 8 x 300 GB SSD in Raid 5

    SAS disks with tempDB

    Read/Write ratio roughly 50/50

    On average 2,000 batches per second

    DB size is between 1.6-1.8 TB (depending on retention rates for log tables)

    DB layout: 4-5 Filegroups with 1-2 files in each, all contained on a single Volume (taking up the full raid5 SSD array mentioned above)

    Our current issues

    •Filegroups are a mess

    •We are constantly battling with maintaining the size of the DB on just SSD Disks.

    •We cannot leverage SAN flashcopy due to restriction of 64MB max copy speed per LUN/volume (we only have 1!)

    We’re also looking to move upto half or more of the DB ‘down’ to SAS as its log tables that do not have the need of insanely fast READ IO.

    Ok, so now to the Visio picture below... it shows Pool, Volume, LUN, Filegroup Layout and with any luck gives a good sense of what we’re curious about.

    So basically, we know we have 2 tiers/levels of storage… SSD and SAS. We also know that we want to have the base tables/clustered indexes in one file group, while we have the non-clustered indexes in a second group.

    This gives us 2 x storage levels and 2 x filegroups. For a total of 4 Groups (excluding the Primary).

    Primary

    Filegroup_Tier1_Default

    Filegroup_Tier1_Index

    Filegroup_Tier2

    Filegroup_Tier2_Index

    We set the SSD tier as default, and name the filegroup as such.

    ** One note here. Even though both the Tier1_Default and Tier1_Index filegroups are on the same set of physical disks, we feel there is benefit in having the indexes separate. If we wanted to move them to a different set of disks later, etc… We are at least in that position to do so.

    So now we have our 4 user-data File Groups, we need to decide on how many Files per FG.

    Are there a recommended number of files per Filegroup –some mathematical formula to # of cores/etc…?

    We understand that typically multiple files are used to spread out the load across different disks but in our case all these disks are the same ‘set’.

    However, we have an alternative reason here as well, as the 3rd point above. We are using IBM v7000/Storwise SAN units and are looking to take advantage of FlashCopy technology. The problem is, it replicates on a per LUN/Volume basis and the max throughput is a mere 64 MB/second.

    So to combat this, we are looking to spread our data across a ‘high’ number of LUN’s in order to take full advantage of our storage infrastructure.

  • Jeff Kelly-310227 (12/5/2012)


    Are there a recommended number of files per Filegroup –some mathematical formula to # of cores/etc…?

    Nope.

    There's two main reasons for going multiple files/multiple filegroups

    1) IO throughput. This requires that different files/filegroups are on independent drives. You've confirmed that in your situation this is not the case

    2) Backup/restore. You can take file or filegroup backups hence spreading out the impact of backups. You can set some filegroups read only and avoid having to back that data up. Taking full database backups can be faster if there are multiple filegroups on independent drives as that allows parallel reads and more backup buffers (again full advantage requires different IO channels)

    There's also the option of tiering your storage so important tables go on fast drives and less important on slow drives

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hi Gail,

    thanks for the quick reply, i'll try to talk to each of your points.

    1) IO throughput

    We're more interested in IOPS, so large pools of disks offer us better performance, but also i'd group this with your last point about important tables on fast drives. We have 2 tiers of storage for just that reason, high access/important tables live on SSD disks, while others will be moved down to SAS.

    2) Backup/restore

    With the way our application is designed partial restores are not possible, but we should look into filegroup backups and see if we gain anything. Currently our backup consistences of 4 additional volumes and we write to 8 devices (4 volumes x 2 files per). Full backup of 1.7 TB takes about 35 minutes (Backup disks are on seperate IO Group/controller/etc...)

    Based on your review/understanding, do you foresee any issues/reasons to NOT go with our planned 16 volumes and corresponding FG layout across those volumes?

    thx again

    Jeff

  • Jeff Kelly-310227 (12/5/2012)


    Are there a recommended number of files per Filegroup –some mathematical formula to # of cores/etc…?

    We understand that typically multiple files are used to spread out the load across different disks but in our case all these disks are the same ‘set’.

    This is a Tempdb optimisation technique and not really relevant here. As Gail has explained for better IO throughput you could use multiple files per filegroup, but you have to remember that SQL server will attempt to stripe the files and has to switch between these files and maintain fill stats to support the proportional fill algorithm. As you stated, you don't have the disk setup to support this. In my opinion it's totally unnecessary.

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Looks pretty good.

    Why multiple files in a filegroup? Just to allow for the SAN's snapshot? (check that the snapshot software supports SQL Server databases before you use it). There's no gain on the SQL side and watch the overhead. More files is not necessarily better.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Perry Whittle (12/5/2012)


    Jeff Kelly-310227 (12/5/2012)


    Are there a recommended number of files per Filegroup –some mathematical formula to # of cores/etc…?

    We understand that typically multiple files are used to spread out the load across different disks but in our case all these disks are the same ‘set’.

    This is a Tempdb optimisation technique and not really relevant here. As Gail has explained for better IO throughput you could use multiple files per filegroup, but you have to remember that SQL server will attempt to stripe the files and has to switch between these files and maintain fill stats to support the proportional fill algorithm. As you stated, you don't have the disk setup to support this. In my opinion it's totally unnecessary.

    Thx for the time Perry.

    You bring up a good point, we should investigate the overhead involved with having 12 files per filegroup (but i suspect it's nothing to be concerned about).

    Also you mention totally unnecessary, and if we were not trying to overcome the limitation of 64 MB/sec copy speeds per volume, we would not be doing this. So in our case, it is an unfortunate necessity.

    So based on your input, i take away that the only downside you see to this solution is the possible overhead caused by stripping across multiple files.

    good feedback, thx again!

  • GilaMonster (12/5/2012)


    Looks pretty good.

    Why multiple files in a filegroup? Just to allow for the SAN's snapshot? (check that the snapshot software supports SQL Server databases before you use it)

    Correct Gail,

    IBM's FlashCopy technology operates on a volume/lun basis. so it basically copies the lun @ the block level. It does fully support SQL Server and properly integrates into SQL via VSS and properly quiescence's the DB/etc...

    The issue is 64 MB/sec is the max throughput the SAN will support per volume. so with a 1.6+ TB DB, we need many volumes to make this work for us in any reasonable amount of time.

    One note, we're looking to do copies, not just snapshots as we want full read/write ability plus we don't want the additional overhead of secondary reads to be directed to the master/etc...

    cheers

    jeff

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply