Optimal Database Filesize in Filegroup

  • Hi there,

    I was wondering if there is experience one where about the optimal filesize for databases (2012) lies.

    I am creating a new table that will have quite an amount of rows.

    Approximately about 1,5 billion rows per month.

    The size of a rows will be 20 byte.

    As I have a date-column I will be partitioning by date.

    As this table will be filled for many years I also decided to create a table for each year and UNION ALL them with a view (constrain on date, so the optimizer can eliminate tables within the view)

    I have 24 LUNs in the system, so I will create 24 files for each filegroup to equally arrange the data.

    For each (year-)table I will have about 335 Gbyte so I would create 24*14GB-Files for each year.

    However I could go further on and create 24 files for each month, so that would be about 1GB per file having 12 filegroups for each year (arranging it with partition function/scheme)

    Version a: The partitioning will be over days but all in one filegroup. (Files 24x14GB)

    Version b: The partitioning will be over days, each month in one filegroup. (Files 12x24x1,2GB)

    I am not worried about the "work" to create all these files and groups but I was wondering, what is better for the sql server.

    More smaller files or fewer bigger files.

    Thx

    Mitch

  • I personally think smaller files are better than bigger files. If you have a lot of data that gets queried/modified frequently then it makes sense to spread it over many small files/disk luns to improve IO. Also it could help your SLA with piece meal restores in case of a DR. Large files probably make sense for less frequently queried data that doesnt change and probably belong to a read only filegroup. The data stored in these files also impact the files. If you are going to query the latest data 8 times out of 10 then small or big file size doesnt matter, the key is to store it together since most operations will need to happen within that partition.

    Probably a hybrid approach with more files and filegroups for the initial 6 months could work better , have you explored that option ?

    Jayanth Kurup[/url]

  • mitch.fh (1/2/2013)


    More smaller files or fewer bigger files.

    Unless all those files are on different physical drives, fewer bigger files is better. There's no gain from having lots of little files on the same RAID array, and there is a downside when you have *lots* of files in terms of time to open them, the overhead of calculating proportional fill and such.

    If you have 24 LUNs, then there's no gain from more than 24 files and possibly a loss.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I got two answers with two totally different opinions 😉

    Unless all those files are on different physical drives, fewer bigger files is better. There's no gain from having lots of little files on the same RAID array, and there is a downside when you have *lots* of files in terms of time to open them, the overhead of calculating proportional fill and such.

    The 24 LUNs is all I got 😉 Every filegroup would put it files in the same LUNs

    I think that like the cost of parallelism at one point exceed the benefit there is a "break-even point" for database files.

    But maybe there is the same effect on the file size and it would be contrary (too big file getting slower)

    Probably a hybrid approach with more files and filegroups for the initial 6 months could work better , have you explored that option ?

    Under the assumption that it has a benefit, that could work, but I would have to merge continually older partitions in to a big one.

    Load and query:

    I will write the data date by date in this table. Usually "today" I will not write data that is for older dates then "yesterday".

    The queries will be more frequent for the newer dates - approx. up to 6 month.

    Older dates will be queried in about 10-20% of the queries. However for those 10-20% I need the complete data.

  • mitch.fh (1/2/2013)


    I think that like the cost of parallelism at one point exceed the benefit there is a "break-even point" for database files.

    Don't make the mistake of thinking that multiple files are necessary for multiple threads to access. It's an old myth. Any thread in SQL can issue IOs to any file at any time.

    There is no advantage to having multiple files on the same LUN (well, performance advantage, there are other reasons one might spilt, for backup/restore options). There's also no performance degradation for large files. If you're taking about files on the same LUN, 1 file of 10 TB is the same as 10 files of 1 TB.

    As for multiple contrary opinions, welcome to the internet :-D. Guess you'll have to research and see who's opinion you would trust.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • As for multiple contrary opinions, welcome to the internet :-D. Guess you'll have to research and see who's opinion you would trust.

    over the last few years, you provided many good advises, so I have no reason to doubt you 😉

    there are other reasons one might spilt

    I am aware of the other reasons

    There's also no performance degradation for large files. If you're taking about files on the same LUN, 1 file of 10 TB is the same as 10 files of 1 TB.

    That is the information I was looking for :w00t:

    I will stick to the plan and separate the years in different tables and different filegroups. Within the years I have paritions on the date but they will all go into the same filegroup (24 Files/ 1 per LUN).

    The Paritioning within the table helps on queries and if I want to delete a whole date I can switch the partitions out of the table.

    The different tables leave me the posibility to move older years to other LUNs. On Recovery/Restoring I will have some advantages to as I will not have to restore the whole table with all years if I have a recovery reason.

    Constrains and Partitioning will do the magic on eliminating everything that is not relevant for the queries.

    Thanks

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply