Redistributing Data Across Files

  • Comments posted to this topic are about the item Redistributing Data Across Files

  • It would be nice to mention in such articles that the approach used here only works for tables with clustered indexes, once you're down to heaps with indexes you would have to rebuild both indexes and table something like this:

     

    ALTER INDEX IDX_Contacts on dbo.Contacts REBUILD
    ALTER TABLE dbo.Contacts REBUILD

    you can verify both table and Index Filegroups afterwards for correct movement / redistribution.

  • Nice description of a useful but not as simple as it appears topic. Thanks.

     

    Minor thing, the image with the caption "There has been a little movement, as my new files are 90% full (roughly) and the old one is 43% full. However, that's not great" is the same as the image above and doesn't show 43% full.

  • theoretical you could have added one more file to the filegroup and used

    DBCC SHRINKFILE (N'<original file>' , EMPTYFILE)

    to move all data into the new files (will be evenly distributed in this case.

    Drawback: it can take VERY long when it is a big file / filegroup and even in I read somewhere, that I could cancel / kill the DBCC-session, it seems to rollback the whole move sometimes (at least on SQL 2016).

    Benefit: usually your original file is e.g. 400 GB (current size) but you do not want to have four 400 GB files that are filled only to 25 % (I guess everyone favors four 100 GB files). When you just rebuild Indexes (or even create copies of the tables and move data into them, e.g. because you want to add partitioning), you will end with 400 GB + 100 GB + 100 GB + 100 GB, where about 4/7 of the data remains in the first file and 1/7 in each of the three new files (because it redistributes it depending on the free space in a file).

    If you would add four 400 GB files and rebuild indexes etc., the data would be distributed equally and you can (usually) shrink the three new files to 100 GB using the TRUNCATEONLY option for DBCC SHRINKFILE, but not the original file, since there is a high chance, that even some pages in the last 100 GB of the file are or were in use.

    But when you add four complete new files, you could just delete the clumsy old one after the EMPTYFILE...

     

    God is real, unless declared integer.

  • Re: Adding a new filegroup, that's a fairly big decision for most people. If you have 400GB, that may make more sense, but it really depends on what you are aiming for. I didn't mention that because this is a focused article on rebalancing a filegroup, not managing a large database set.

    Good general recommendations are to have separate filegroups from Primary and be sure you use those to balance out your workload where appropriate. These days with large SANs and LUNs, you might not do as much of that as in the past.

  • Updated the verbiage for the initial rebalance and mentioned that this is for clustered indexes.

  • Hi Steve, appreciate this article.  I realize it's getting close to a year old, so not sure if you will see this, but was curious about something...  It seems to me that this activity (adding data files and spreading the data across the files) is a legit activity needed by DBA's at times in certain circumstances.  For me, I'm currently prepping a 2008 R2 server to migrate to 2017 and I need to take some databases that were single data file databases and split them into multiple files to best align with the new server.  I need to do this quickly and don't really have time to let the natural balance work out as I need to shrink the original file to match the new ones (for example taking one 10GB data file and splitting to four 2.5 GB files... add three 2.5 GB files, balance the data, then shrink the original file to 2.5 GB.)  As someone who's close to the SQLServer product, do you see Microsoft adding some type of procedure to do the balancing rather than DBA's having to create dummy file groups and move data & indexes around just to get it to balance?  Seems like there would be a legit need for something like that.

    Anyway, really appreciate this article and the tips to accomplish this in the current product.

    Thanks!

    John

  • John,

    I don't. MS has been very resistant to syntactic sugar, basically anything that isn't hard to program, they don't want to build and support. They think this is why you pay staff for some things. Even simple items, like a number/tally table, have been items they don't want to invest in because it's easy to build. It is, but it's a PIA. Having a known item here would be good.

    For something like balancing out data automatically, I don't think the would. Ideally, they'd let me do something like:

    RESTORE DB...

    WITH MOVE ('mybigflile' to ('newfile1' as 'c:\sql\newfile1.mdf', 'newfile2' as 'c:\sql\newfile2.ndf, 'newfile2' as 'c:\sql\newfile2.ndf')

    That way I would rebalance on restore, but I don't think the will. The direction seems to be more abstraction of hardware from the SQL OS/Platform.

     

  • Steve Jones - SSC Editor wrote:

    John,

    I don't. MS has been very resistant to syntactic sugar, basically anything that isn't hard to program, they don't want to build and support. They think this is why you pay staff for some things. Even simple items, like a number/tally table, have been items they don't want to invest in because it's easy to build. It is, but it's a PIA. Having a known item here would be good.

    For something like balancing out data automatically, I don't think the would. Ideally, they'd let me do something like:

    RESTORE DB...

    WITH MOVE ('mybigflile' to ('newfile1' as 'c:\sql\newfile1.mdf', 'newfile2' as 'c:\sql\newfile2.ndf, 'newfile2' as 'c:\sql\newfile2.ndf')

    That way I would rebalance on restore, but I don't think the will. The direction seems to be more abstraction of hardware from the SQL OS/Platform.

    The problem with this approach is that would require the restore operation to work at the index level instead of restoring each page.  I am not even sure how that could be accomplished without having to restore the full database in the first place.

    Jeffrey Williams
    Problems are opportunities brilliantly disguised as insurmountable obstacles.

    How to post questions to get better answers faster
    Managing Transaction Logs

  • DHL-John.Parker wrote:

    Hi Steve, appreciate this article.  I realize it's getting close to a year old, so not sure if you will see this, but was curious about something...  It seems to me that this activity (adding data files and spreading the data across the files) is a legit activity needed by DBA's at times in certain circumstances.  For me, I'm currently prepping a 2008 R2 server to migrate to 2017 and I need to take some databases that were single data file databases and split them into multiple files to best align with the new server.  I need to do this quickly and don't really have time to let the natural balance work out as I need to shrink the original file to match the new ones (for example taking one 10GB data file and splitting to four 2.5 GB files... add three 2.5 GB files, balance the data, then shrink the original file to 2.5 GB.)  As someone who's close to the SQLServer product, do you see Microsoft adding some type of procedure to do the balancing rather than DBA's having to create dummy file groups and move data & indexes around just to get it to balance?  Seems like there would be a legit need for something like that.

    Anyway, really appreciate this article and the tips to accomplish this in the current product.

    Thanks!

    John

    This is where adding the expected number of files to the existing filegroup - and then performing an EMPTYFILE comes into play.  The end result will be a database spread across the expected number of files that can then be backed up and restored to the new system.

    Instead of adding 3 2.5GB files - balancing - and then shrinking you add 4 2.5GB files and empty the first one.  Once empty - you remove that file and you are left with a balanced set of files.

    Jeffrey Williams
    Problems are opportunities brilliantly disguised as insurmountable obstacles.

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Steve, yes!  That would be pretty sweet.  Thanks again for the input!  Appreciate this site!

    @Jeffrey, I've tried this approach before, but (in 2008R2 at least) I've not been able to empty file #1 without pulling some other tricks to allow me to empty and remove.  I don't do this often, so forgive me for not remembering the exact sequence I had to do, but I distinctly remember trying this and not being able to remove the original file because it had pages that couldn't be moved to the other files.  This is why I have approached this with the current strategy.

  • Steve Jones - SSC Editor wrote:

    These days with large SANs and LUNs, you might not do as much of that as in the past.

    That's what I was wondering about. When SQL Server is running on a VM in host machine with an optical connection to a solid state SAN, is there any need for multiple files? Isn't that for "classic" computers with multiple spindles of spinning media? Are there cases with modern infrastructure where one needs to actually worry about files and filegroups?

     

    Sincerely,
    Daniel

  • Hello,

    Let me ask the same question than Daniel

    What is the real advantage to dispatch the same object table or index between multiple files because, today, are located on disks array?

    In the past, it could be interesting if you locate your files on distinct local server's disks because you were able to increase performance

    On the other hand, I see some interest of doing this and use different filegroups which allows to dispatch your different objects in these filegroups and be able to manage growing of these objects; it could also be interesting if your dispatch partitioned tables between distinct files

    I'm interested by having your explanation

    Best regards

    Michel

  • In the old days before we were using SANs there were several advantages. Performance was improved by spreading the data over many spindles and even over multiple interfaces. Also if you needed to add space to a database you pretty much had to add an additional physical drive to a file group. That all changed.

    Today's SANs have eliminated both of those advantages. They automatically spread data across multiple spindles to improve performance, improve it even further with large intelligent caches, and if you wan to add space to a volume you just just do it dynamically with no down time.

    The only place I've used this strategy in recent years is with TEMPDB if I observe contention on the PFS, GAM or SGAM pages. The only way I know of to address that apart from application changes is by increasing the number of files in TEMPDB. And as mentioned in the article - always keep all files the same size.

  • This all depends on your storage. You may or may not need this. You likely will in cloud situations.

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply