Proper way to move data in large filegroup using DBCC SHRINKFILE emptyfile option

  • Hi,

    I have a filegroup with one large datafile that is 300GB and I want to create 10- 30GB files and spread the data across them for better performance. Here are the steps I took in my test environment:

    1) added 10 - 30GB files to the filegroup

    2) used DBCC SHRINKFILE emtyfile to move the data evenly into the 10 new files (took approx 15 hrs)

    3) stop/restart SQL Services so that I was able to shrink down original 300GB datafile (now empty) to 30GB to match the other files and reclaim the space

    4) the process generated 360GB transaction log. since this is my test environment i just shrunk them back down but in production we are using FULL recovery mode with 15 min log backups. So i plan to let a log backup process and then shrink the logs back down to the normal 15GB size.

    5) the process fragmented the indexes pretty badly so I rebuild all my indexes, which took about 8 hours.

    I have never done this before and have been doing lots of reading online but I want to make sure that I'm not missing something obvious in my process. Here is what the DBCC SHOWFILESTATS show:

    Fileid TotalExtents UsedExtents

    5 480000 479167

    20 491520 405854

    21 491520 406778

    22 491520 407820

    23 491520 408629

    24 491520 408280

    25 491520 408915

    26 491520 407728

    27 491520 407528

    28 491520 407909

    29 491520 392160

    This the was FILESTATS before:

    5 4 4705128 4253047

    Any feedback/comments would be greatly appreciated.

    Thanks,

    Isabelle

    Thanks!
    Bea Isabelle

  • Isabelle2378 (4/30/2013)


    Hi,

    I have a filegroup with one large datafile that is 300GB and I want to create 10- 30GB files and spread the data across them for better performance.

    Create a new filegroup with the new files and move the objects to the new filegroup. Just out of interest, why 10 files?

    Unless you have the underlying disk structures to support this it's going to provide no benefit having multiple files. In fact the file switching and management overhead of the round robin striping could even affect performance.

    Once you run dbcc shrinkfile with the emptyfile parameter against a database file sql server no longer writes to the file, it assumes you're going to remove it. You should remove the file and add a new one if you need to, since you have 10 already do you really need to??

    Isabelle2378 (4/30/2013)


    3) stop/restart SQL Services so that I was able to shrink down original 300GB datafile

    Explain please??

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Isabelle2378 (4/30/2013)


    Hi,

    I have a filegroup with one large datafile that is 300GB and I want to create 10- 30GB files and spread the data across them for better performance.

    Since you're apparently splitting for performance, have you checked and confirmed that the database really is bottlenecked on IO? Are all 10 of those file on separate IO paths (separate physical drives at the very least)?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hi,

    Yes, based on a consultant who came in to help us trobuleshoot performance issues, the two filegroups that we have each with one 300GB file was determined to be a bottleneck and recommended that we spread the data across multiple files instead of one.

    When I first tried to shrink the file after it was empty, I was getting an error and after looking through SQL forums, I saw an article that stated the SQL server needed to be restarted before you can shrink the empty file. I did that and it worked.

    I only chose 10 30GB datafiles because I just wanted to spread the data evenly and it seemed like an easy number.

    Thanks,

    Isabelle

    Thanks!
    Bea Isabelle

  • Isabelle2378 (5/1/2013)


    Yes, based on a consultant who came in to help us trobuleshoot performance issues, the two filegroups that we have each with one 300GB file was determined to be a bottleneck and recommended that we spread the data across multiple files instead of one.

    How did said consultant determine that the filegroup was a bottleneck and are those files all on separate IO paths (separate drives at least)?

    When I first tried to shrink the file after it was empty, I was getting an error and after looking through SQL forums, I saw an article that stated the SQL server needed to be restarted before you can shrink the empty file.

    There's no requirement to restart SQL to shrink a file. Could be there was something using data in that file and the shrink was blocked, can't tell without knowing the error.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hi,

    Well this is the reason for my testing this out...to see if it in fact does improve our performance. I will see if I can find the specific error and article that I read.

    I just wanted to make sure the steps I am taking are on point and that I'm not missing something obvious...

    Thanks!
    Bea Isabelle

  • I will ask again...

    Are the files on separate drives?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hi,

    Sorry...forgot to include that answer. Yes, the files will be spread across separate drives and I will be working with our sysadmin to monitor the I/O of the files before and after I split it up.

    Thanks!
    Bea Isabelle

  • Isabelle2378 (5/1/2013)


    Hi,

    Well this is the reason for my testing this out...to see if it in fact does improve our performance. I will see if I can find the specific error and article that I read.

    I just wanted to make sure the steps I am taking are on point and that I'm not missing something obvious...

    You may need this:

    SELECT

    --virtual file latency

    ReadLatency = CASE WHEN num_of_reads = 0

    THEN 0 ELSE (io_stall_read_ms / num_of_reads) END,

    WriteLatency = CASE WHEN num_of_writes = 0

    THEN 0 ELSE (io_stall_write_ms / num_of_writes) END,

    Latency = CASE WHEN (num_of_reads = 0 AND num_of_writes = 0)

    THEN 0 ELSE (io_stall / (num_of_reads + num_of_writes)) END,

    --avg bytes per IOP

    AvgBPerRead = CASE WHEN num_of_reads = 0

    THEN 0 ELSE (num_of_bytes_read / num_of_reads) END,

    AvgBPerWrite = CASE WHEN io_stall_write_ms = 0

    THEN 0 ELSE (num_of_bytes_written / num_of_writes) END,

    AvgBPerTransfer = CASE WHEN (num_of_reads = 0 AND num_of_writes = 0)

    THEN 0 ELSE ((num_of_bytes_read + num_of_bytes_written) /

    (num_of_reads + num_of_writes)) END,

    LEFT (mf.physical_name, 2) AS Drive,

    DB_NAME (vfs.database_id) AS DB,

    --vfs.*,

    mf.physical_name

    FROM sys.dm_io_virtual_file_stats (NULL,NULL) AS vfs

    JOIN sys.master_files AS mf

    ON vfs.database_id = mf.database_id

    AND vfs.file_id = mf.file_id

    --WHERE vfs.file_id = 2 -- log files

    WHERE (io_stall / (num_of_reads + num_of_writes)) >20

    ORDER BY Latency DESC

    GO

    That's a DMV that collects overall latency. It is pretty safe to run on a production environment. Run it before the proposed change. After the change, run it again. Compare. You can also use perfmon to validate if you have an actual IO bottleneck or not.

    Now, on my personal case though, I do prefer move an specific table (one that is being used a lot) and put that particular table on its specific FG, instead of creating bunch of files and move to a FG. That FG will reside on a new or different LUN, RAID10 if possible. This is the only way, as far as I know, that you can actually segregate IO usage per particular table.

    If you create a bunch of files, put on a new FG, and place each file on different LUN or drive, not sure if MS-SQL will actually take advantage of that, as table's information will be mixed anyway.

  • Hi,

    Thank you for that information. I will look into specific tables that have a high I/O hit on the database and consider that option as well.

    Thanks agin. 🙂

    Isabelle

    Thanks!
    Bea Isabelle

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply