Learn to Use Filegroups

  • It seems I.T. likes to set up a single channel RAID 5 which in my opinion is the worst possible setup for sql server (50 MB/sec limit) with a severe 4x write penalty. My preference would be three i/o channels to RAID 1 (50 MB/sec x 3) and use filegroups to assign clustered indexes and non-clustered into two channels and one channel for the log.

    Alternatively I just push for purchasing more RAM so the only disk channel is mostly used for log writes while the database stays in data cache.

  • GilaMonster (10/13/2015)


    don.schaeffer (5/12/2011)


    I'm wondering just what the benefit is to filegroups when your data is stored on a SAN or NetApp appliance? I believe I read something about a benefit when restoring a corrupt database, but otherwise are there benefits?

    I keep meaning to write a nice complex article on splitting across filegroups, looking at splitting for performance and splitting for recoverability. I keep not getting around to it. Maybe someday.

    Short answer to your question is 'maybe' 🙂

    If all the filegroups are on LUNs from the same disk pool, using the same storage path, then probably you won't see a performance benefit. Recoverability-wise, it's about whether you can restore sub-sets of the database in case of either corruption or a complete DB loss. That kind of filegroup split tends to need to be carefully designed to get the benefit.

    Wow, it's like SQL Server Central is reading my mind today. I had the exact same question because we just migrated to NetApp storage and one of the first things I learned about SQL Server is that I should ALWAYS split my primary data, staging data, index data, and log data into multiple filegroups across multiple disks. With specific explicitly on keeping my logs on fast drives.

    So, the question on whether or not I receive benefit from doing so with NetApp greatly depends on the disk pool. Therefore, I've kind of narrowed my request to the system admins to a simple, "Can you give me dedicate I/O that is not shared per filegroup?"

    I'm wondering if that would be the right question to really simplify down the ask for those not as familiar with filegroups with SQL Server and what's trying to be achieved with I/O optimization?

  • RonKyle (10/13/2015)


    A downside I've seen is that when I adopted this method I had to stop putting my indexes in a different file group because of the restore issues.

    I would love to hear more about this. It sounds like we are pretty similar with massive OLAP databases. I'm still keeping my indexes in their own filegroup, but have not ran into any restore issues.

  • We put everything in default filegroup. Our clients' databases don't exceed 50GB, mostly they are 5-10GB. At least 50% of that is our logging data rather than their real-world data 😉

    But I'm interested to know whether we should split filegroups, and if so in a "generic sense" how? For now they'd all be on the same drive (well ... MDF on one, LDF on another), but my thinking is that it would mean that I could split them if we needed to - rather than having to do a whole lot of client-specific scripting WHEN the need arises.

    Or would it be more hassle than it is worth supporting UNTIL we have a need?

    So perhaps the question I should ask is: can I just split Clustered / Non-clustered or something like that for now such that I could put them on different spindles in future. Or is there no simple generic that is likely to work and the only way is to carefully performance-test (i.e. using different spindles)?

    Actually, rereading that, maybe the answer would be to shift our Logging data to a different filegroup?

  • RonKyle (10/13/2015)

    A downside I've seen is that when I adopted this method I had to stop putting my indexes in a different file group because of the restore issues.

    I would love to hear more about this. It sounds like we are pretty similar with massive OLAP databases. I'm still keeping my indexes in their own filegroup, but have not ran into any restore issues.

    This was something I read when I changed to multiple filegroups, not something I experienced. Based on what you're saying, I will have to experiment with this as I would still prefer to have my non-clustered indexes in a different place. Will try to find the source and update this post as well.

  • We put everything in default filegroup. Our clients' databases don't exceed 50GB, mostly they are 5-10GB. At least 50% of that is our logging data rather than their real-world data

    While some disagree with this, including some of the most frequent posters, I routinely have a data filegroup for all the tables and an index filegroup for all the non-clustered indexes for any database regardless of size. This leaves only system databases in the primary filegroup. The data filegroup is set to be the default.

  • xsevensinzx (10/13/2015)


    RonKyle (10/13/2015)


    A downside I've seen is that when I adopted this method I had to stop putting my indexes in a different file group because of the restore issues.

    I would love to hear more about this. It sounds like we are pretty similar with massive OLAP databases. I'm still keeping my indexes in their own filegroup, but have not ran into any restore issues.

    In short, if you put clustered indexes in one filegroup and nonclustered in a second filegroup, you can't use partial database restore to just restore the tables (the clustered indexes). Well, you can, but the index metadata still exists (it's in the system tables), so the queries try to use the nonclustered indexes and fail because the filegroup they're in is offline.

    You can't drop the nonclustered indexes, because the filegroups' offline. You can't rebuild them somewhere else, because the filegroup's offline.

    Splitting the indexes into a separate filegroup might help performance (and it's a big 'might' with lots of caveats), but it will not help you to restore faster in the case of a disaster, as you will have to restore the filegroup with the nonclustered indexes in it before the DB can be considered 'operational'

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (10/14/2015)


    xsevensinzx (10/13/2015)


    RonKyle (10/13/2015)


    A downside I've seen is that when I adopted this method I had to stop putting my indexes in a different file group because of the restore issues.

    I would love to hear more about this. It sounds like we are pretty similar with massive OLAP databases. I'm still keeping my indexes in their own filegroup, but have not ran into any restore issues.

    In short, if you put clustered indexes in one filegroup and nonclustered in a second filegroup, you can't use partial database restore to just restore the tables (the clustered indexes). Well, you can, but the index metadata still exists (it's in the system tables), so the queries try to use the nonclustered indexes and fail because the filegroup they're in is offline.

    You can't drop the nonclustered indexes, because the filegroups' offline. You can't rebuild them somewhere else, because the filegroup's offline.

    Splitting the indexes into a separate filegroup might help performance (and it's a big 'might' with lots of caveats), but it will not help you to restore faster in the case of a disaster, as you will have to restore the filegroup with the nonclustered indexes in it before the DB can be considered 'operational'

    Ahh, gotcha.

    I think the reason why I went that option is because it's my understanding that with large volumes, it pays out in the end to split the indexes. I have a fact table that is pushing a billion plus records.

  • xsevensinzx (10/14/2015)


    I think the reason why I went that option is because it's my understanding that with large volumes, it pays out in the end to split the indexes.

    Maybe, maybe not. There's no automatic improvement from nonclustered indexes in a second filegroup. It depends on a lot of factors.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • cdesmarais 49673 (10/13/2015)


    Just wanted to point out that you can still use partitioned views across multiple filegroups without enterprise edition.

    Didn't realize that, thanks.

  • So there is far more to File Groups than I had imagined. If anyone is considering writing an article (I recall GilaMonster and RonKyle saying that they might if they get a chance to get round to it) I just thought I would highlight that I would read it (no one likes putting in the effort for something that is never used, in this case read).

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

  • BJ Hermsen (5/12/2011)


    One of the databases I support uses a large number of file groups to seperate different types of data. Base tables are seperated among many of them and several are reserved for the indexes of large tables. What this allows is for us to seperate the data files out onto several different sets of spindles or even completely seperate storage hardware. This allows for much higher performance and throughput. I have seen this return much better results than having a single raid set with 12 or more spindles. It also does allow for putting a Raid 1+0, Raid 1 or a Raid 5 behind differing performance need file groups.

    Yeah, I realize this post is almost a half decade old but it's still appropriate. I wish the admins of our SANs understood that. I've been trying to get them to do such things for years but they keep saying that I don't understand and that it's not possible. Of course I know better but that's their reasoning and they're sticking to it.

    The biggest issue I have noticed with storage is the fact that disks are to large to get adequate spindles for performance needs of many systems. a 500GB database that needs two raid sets will either be using 4 300GB SAS drives in a mirror or more as a R5 or 1+0. The wasted space often is hard to validate for a penny pinching customer so finding the best way to deploy can be a challenge.

    +1000 to that. Although it took more rack space, I wish there were a choice of using drives in the 80-100GB range for the smaller stuff (and most of it is smaller stuff) that also needs to be as fast as possible. The low cost of adding spindles to a problem with the smaller disks was very effective in multiple areas.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Steve Jones - SSC Editor (5/12/2011)


    Alan Vogan (5/12/2011)


    In SQL Server, we can't split a table across filegroups. Or can we?

    I was just asked that question! The idea that was being tossed around was to split a transactional table over 2 file groups, 1 of them being a read-only file group, and having the transactional data that was 'completed' being moved to the read-only filegroup while the transactional data that was still being processed be available on the other filegroup.

    I said I didn't know but could look into the possibility. I also suggested that there are other ways in which to protect data. Can you point me to some articles on splitting tables across file groups?

    [font="Arial Black"]You can't split a table per se across filegroups. [/font]If you have partitioning (EE), you can partition in different filegroups.

    What I might suggest is that you consider partitioned views, where you have two tables joined with a view and one is a read-only filegroup. That can reduce your backup time/windows as well if the "old data" doesn't change.

    Same here. Old post but still helpful...

    While it's true that you can't split a monolithic table over multiple file groups, you CAN split a table over multiple files in a file group. And, as Steve pointed out, you can split a table over multiple file groups if it's either a Partitioned Table (Enterprise Edition only) or Partitioned Views (any edition except CE).

    Also, splitting a table to 1 supposedly "read only" file group and 1 active file group is a complete waste of time because it will still require you to do DELETEs from the active FG and writes to the "read only" FG. It's far better to take some time to plan out (for example) monthly partitions so that the old months can actually and truly become "read only" so that you never have to back them up after the second time (you should always have to copies somewhere) nor do any maintenance on them ever again because they can truly be "read only".

    My nearly 400GB telephone system only takes a couple of minutes to backup each night. The reason why is because I've done like I said above and I'm only backing up the current month instead of 6 years of history that will never change. Like a friend of mine used to say, "Well, why would you wash the clean clothes?" 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Gary Varga (11/4/2015)


    So there is far more to File Groups than I had imagined. If anyone is considering writing an article (I recall GilaMonster and RonKyle saying that they might if they get a chance to get round to it) I just thought I would highlight that I would read it (no one likes putting in the effort for something that is never used, in this case read).

    I'm actually working on one that will mirror the SQL Saturday session I give on partitioning.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (11/4/2015)


    Gary Varga (11/4/2015)


    So there is far more to File Groups than I had imagined. If anyone is considering writing an article (I recall GilaMonster and RonKyle saying that they might if they get a chance to get round to it) I just thought I would highlight that I would read it (no one likes putting in the effort for something that is never used, in this case read).

    I'm actually working on one that will mirror the SQL Saturday session I give on partitioning.

    I await with baited breath!!! :w00t:

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

Viewing 15 posts - 16 through 29 (of 29 total)

You must be logged in to reply to this topic. Login to reply