Tables on seperate filegroups.

  • Hi All

    We have a large Datawarehouse and the size is 50TB.. The tables are placed in filegroups based on the schema like fact, dimensions, raw data each sit on seperate filegroups. I am thinking will it make sense to seperate the large facts which are having billions of rows so that they reside on filegroups on their own.. I would like to know if this is sensible and the pros and cons..

    Regards

    Senthil

  • It will make sense if the files belonging to these FGs are placed on separate physical drives and you have a lot of queries between fact tables. In this case you will alleviate large I/O readings by employing parallelism.

    However, if most of your queries select data within same fact tables, the better solution would be to partition them, again spreading partition files across different physical drives.

  • I am in favor of separating the large fact tables into their own filegroups - based on better data and analysis.

    I don't necessarily think that the fact tables should each have their own filegroup. But if you can split out the fact tables into some configuration of new filegroups - it would prove beneficial.

    Where is the benefit of doing that?

    With data this size, it provides a more robust opportunity for recovery. Sure you can see some performance gain (whether on same disks or not). But the big benefit is being able to take a filegroup backup and restore a filegroup backup (when/if necessary).

    Maybe somebody wants/needs a 2TB group of tables restored to a dev or qa environment. With FG backups, you can achieve that without having to concoct a process to move the tables via bcp or ssis, or worse via backup/restore of the entire 50TB database.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • SQLRNNR (7/30/2014)


    I am in favor of separating the large fact tables into their own filegroups - based on better data and analysis.

    I don't necessarily think that the fact tables should each have their own filegroup. But if you can split out the fact tables into some configuration of new filegroups - it would prove beneficial.

    Where is the benefit of doing that?

    With data this size, it provides a more robust opportunity for recovery. Sure you can see some performance gain (whether on same disks or not). But the big benefit is being able to take a filegroup backup and restore a filegroup backup (when/if necessary).

    Maybe somebody wants/needs a 2TB group of tables restored to a dev or qa environment. With FG backups, you can achieve that without having to concoct a process to move the tables via bcp or ssis, or worse via backup/restore of the entire 50TB database.

    +1000

    For anyone interested in such a process, it's called a "Piecemeal" restore in Books Online and such restores can frequently be done online with little or no interruption in service.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks Jason.

    Your suggestion is one of the main benefit that we want to achieve. The ability to leave tables that we do not do development in the next few months and that can be excluded in this way on the backups to the development environment. But the downside is that the DBA informed me that we are using a SIMPLE recovery database and it is not possible to have filegroup based backup when this setting is made.

  • senthil kumar d (8/8/2014)


    But the downside is that the DBA informed me that we are using a SIMPLE recovery database and it is not possible to have filegroup based backup when this setting is made.

    He's correct. Well, it's possible to take the backups afaik, but they're useless for restoring the DB.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • senthil kumar d (7/30/2014)


    Hi All

    We have a large Datawarehouse and the size is 50TB.. The tables are placed in filegroups based on the schema like fact, dimensions, raw data each sit on seperate filegroups. I am thinking will it make sense to seperate the large facts which are having billions of rows so that they reside on filegroups on their own.. I would like to know if this is sensible and the pros and cons..

    Regards

    Senthil

    Adding to what has been said and giving the case of how common SANs are now...

    if all your mdf files reside on the same LUN (regardless of how many drive letters you may see in the Os), you won't get any I/O improvement anyway, even if you use FGs. This is because in most cases, the whole LUN that contains all the logical drives is already a single tier with same RAID.

    By the way, why are you using SIMPLE recovery model? Is your company aware of possible data loss in case of a system crash/restore if you use SIMPLE?

  • Sorry for the late response. I had asked for a discussion with my DBA on why they are doing SIMPLE recovery instead of Full recovery..

    But we have split our filegroups so that they are spread across 16 different drives.. So it should help distribute the IOs..

  • senthil kumar d (8/13/2014)


    Sorry for the late response. I had asked for a discussion with my DBA on why they are doing SIMPLE recovery instead of Full recovery..

    But we have split our filegroups so that they are spread across 16 different drives.. So it should help distribute the IOs..

    If it's on a SAN, you have to ask the question... "Are they really 16 separate physical drives"?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff

    Yes. There are 16 seperate physical drives on which the files are distributed. And the reason for simple recovery model is that the transaction logs are really huge and they do not want to have a back up of all these huge logs and hence did not want to have Full recovery model..

  • senthil kumar d (8/18/2014)


    Yes. There are 16 seperate physical drives on which the files are distributed.

    I do hope they're not 16 individual drives each with one filegroup on. If they are, you have no redundancy and probably lower throughput than if they were shared on a SAN due to having only one spindle.

    And the reason for simple recovery model is that the transaction logs are really huge and they do not want to have a back up of all these huge logs and hence did not want to have Full recovery model..

    That's a rather silly reason. Does the DB need point in time recovery? If a drive fails at 15h34, is restoring to the last full backup accessible

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (8/18/2014)


    senthil kumar d (8/18/2014)


    And the reason for simple recovery model is that the transaction logs are really huge and they do not want to have a back up of all these huge logs and hence did not want to have Full recovery model..

    That's a rather silly reason. Does the DB need point in time recovery? If a drive fails at 15h34, is restoring to the last full backup accessible

    "Silly" would be a really polite way of putting it.

    I do agree that it appears to be decision that has been made. But what are the recovery rules from the business? Generally they have no idea about the size of the logs or database. They consume and use the data and have the requirements for the recovery of that data to maintain the continuity.

    That decision with that reason does not seem sane to me.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply