Rebalancing data from a single file to multi-file setup

  • Hi, I'm inheriting a series of databases that have grown beyond 100GB yet only have a single datafile. I've been tasked with breaking this up into both more manageable chunks as well as more performant access. To add mystery, this is Sharepoint on a server with a couple dozen content DBs.

    Let's start with WSS_Content_A Database. 100GB in size, 1 datafile.

    If I simply add a second datafile and rebuild indexes, almost nothing happens. I believe 8GB moved over to the new datafile.

    If I add two new data files with the correct sizeMB and use DBCC SHRINKFILE('WSS_Content_A',emptyfile), I wind up with two evenly matched data files that use proportional fill very cleanly across them. BUT... fragmentation is nearly 100% afterwards. This process takes about 3 hours, then I spent another amount of time to rebuild indexes. Further, I have that pesky initial datafile that isn't going anywhere. Not ideal.

    If I add 1 new (temporary) datafile the size of the original, use the above command to empty all contents into it, shrink the original datafile to half its original size, then create a new (third) datafile the same size as the first, I can run emptyfile on the temporary/intermediate datafile with proportional fill still doing a bang up job of evenly distributing the writes across the two files while, interestingly, not committing any fragmentation during the job. I simply alter database remove file on the temp file when complete and I have two datafiles that seemingly accomplish all of my goals.

    Veterans, do you have any comments/wisecracks/suggestions? If I use two temporary datafiles instead of one, it improves performance to the point that this exercise takes about 6 hours @ 100GB. This has all been done on a test server with 'old' E5420 CPUs and 16GB RAM - I'm hoping that Nehalems and Westmeres kick this up a notch.

  • 100GB isn't really that large...but that being said, can you let us know what that space comprises of? Are there many non-clustered indexes? If there are, I'd recommend you creating a new file group, add a new file, drop and re-create all the NCI's on the new file - this would at least place the space used by the NCIs on a different data file.

    If you take further steps (as you already attempted) to split the actual data across multiple files, you will get the fragmentation, but some simple index rebuilds should clear that right up for you

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • It's Sharepoint - so all of the schema is as provided by mother Microsoft. I have not tried using another filegroup, perhaps initially out of support concerns, but also due to the process I've identified above working so well. For the sake of it, i'll give it a try. Perhaps I've over complicated this.

  • If there's not many NCI's contributing to the overall size of the datafile, I wouldn't bother with it

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • Care to expand on your suggestion? I totally recognize the need/value in isolating ANY of the data types/models for improving IO performance. However, even if a file were to have nothing but NCI's, I'd still have multiple datafiles, particularly once some arbitrarily high size was achieved.

    This process is a test/proof of concept - we have contentDBs that are close to a terabyte. Given that its sharepoint, the bulk of the data is varbinary stored in the AllDocStreams CI which only has 2 NCI's associated.

  • Sorry but it seems I misunderstood your initial request. I read (mis-read) that you were dealing solely with SharePoint data and thought you just needed advice using multiple data files ... :unsure:

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • I dont necessarily think so - I think you were eluding that NCI's had an opportunity for improvement. In my case, however, NCI's are very little, no more than a few GB. I'm really looking for sage advice on methods to redistribute data. You did produce one i've read - into a separate filegroup then back out.

    Does anyone else have any advice on how methods to rearrange storage? Agree/disagree with methods i've taken?

  • Does anyone else have any advice on how methods to rearrange storage?

    So you only have Primary right?

    Add a new filegroup. Add two or more files to the new filegroup.

    Start moving your data out of Primary and into the new filgroup

    using create clustered index with drop_existing.

    When everything you want has been moved, shrink Primary.

  • The easiest method I have found is this:

    I assume I will be getting rid of the original file (if possible)

    I create the new files (you do not need to have a new file group, although I prefer it, its not necessary) and be sure to size appropriately.

    I then Shrink the existing file, taking the option to move all data to other files in the same group.

    I do not recommend this during busy times obviously. I also remind you that the initial file cannot be removed as some things will remain in there for housekeeping the server uses. But usually that is less than 10meg.

    The process can be safely killed if necessary and restarted. The option in question marks the file so that SQL Server will not put anything else in it. Once the shrink is done, you should end up with balanced files and the remaining file may be removed, if it was not the initial file.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply