Virtual Log Files (VLFs) split a physical database log file into smaller segments, which are required for how log files work in the background. These are created automatically, but automatically doesn’t always mean perfect. Here’s a practical view of what you need to know and how you can set them up properly.
What VLFs Do
My initial view of how a transaction log worked was that each change was written to a log file, and, as the documentation of those changes was no longer required, it was deleted from the log file. I think this is a common way to view it, and it’s close to the truth. However, it’s not close enough for a conversation on VLFs.
What really happens is that changes are written to the first VLF, which is just a segment of the log file. When that VLF fills up, it moves on to the next VLF in a systematic order.
Each change made to the database is assigned a Log Sequence Number (LSN), and SQL Server keeps track of the oldest LSN it still needs for any purpose. This purpose can be many things, but is typically the oldest LSN of the following:
- Oldest LSN on a log backup (full or bulk-logged recovery)
- LSN at the start of the oldest active transaction
- LSN last replicated when using certain types of replication
Here’s my extremely complicated script to tell you what that reason is for your database:
SELECT name , log_reuse_wait_desc FROM sys.databases
If the last LSN in a VLF is older than what you need to keep, that VLF can be reused. This makes is very efficient because this becomes something like a set-based operation for SQL Server, the entire VLF is cleared and marked for reuse at once instead of line-by-line.
To help you picture how the VLFs work, someone at Microsoft drew you a picture on Transaction Log Physical Architecture. Then you can look at the results of DBCC LogInfo, and it will make a lot more sense when you see a VLF on each line there along with its status.
Only One Log File, Dedicated Drive
Adding more than one log file on a single database isn’t going to help your performance. SQL Server is writing to a single VLF at a time, and that single VLF is part of a single log file. If you have multiple log files then you’re writing to one while the other sits idle. Once you fill up the last VLF on the first file it starts writing to the second file while the first sits idle. If these are on separate drives that means each drive has to be able to handle the I/O, but it’s hit or miss if you’re using the I/O which leads to wasted resources and inconsistent performance.
As for how you’re writing to the log, most of the work done is writing to the tail end of it so spinning disks do really well just keeping the head in one place and writing to the file. That is unless you have other types of files on this disk so the head writes a little to the log, jumps over to write a little to a data file, writes a bit more to the log, then jumps over to read a couple indexes. This is a lot of where the recommendation to keep your data and log files on separate disks come from, but there is more to it than I’ll get into here.
However, we’re going to use RAID for redundancy making the drives too large for just a log, then we’ll put it on a SAN with a write cache, and do so many other things to make it more complicated. If a server is extremely stressed and highly critical, look into dedicated spindles, SSDs, or other options. On general shared disk arrays, it’s nice if you can have an array dedicated to just log files, even if it’s log files for multiple servers.
VLF Size Matters
When you’re writing to VLFs there can be two issues. First, the VLFs are very small and SQL Server is jumping all over the place maintaining small files and figuring out where it should write the next entry – this is common and a big performance hit. Second, the VLFs are too large and SQL Server is reserving a ton of space when only a small part of the tail end of the VLF is in use, then trying to clear it all at once – this is less common and less of an issue.
In addition to this, VLFs each need to be processed when restoring or recovering a database. Recovering is part of restarting SQL services, so you should be hitting this issue at least once a month with your Windows Updates. Every VLF adds some overhead to this process, and a huge number adds a lot of overhead to lengthen this process.
Details of this are on the Microsoft Customer Service and Support blog post How a log file structure can affect database recovery time, and it includes this eye-opening quote, “The first phase of recovering a database is called discovery where all the VLFs are scanned (in serial and single threaded fashion) before actual recovery starts.”
Are your log files pre-sized to take up the entire drive even though they only ever use 5% of that space? Is that hurting you on recovery?
How Big Are My VLFs?
I have a script for that. This is set to filter only logs that I want to look at, but you can comment out there WHERE clause on the final statement to see it all. Also, it would be extremely rare for me to look at individual VLFs, so this is only looking at the sums and averages for each file.
I used to manually comment out a line on this depending on the version of SQL Server until I read a post on VLFs by Andy Galbraith (b|t) Counting Your VLFs, or, Temp Tables Inside IF…ELSE Blocks. Thank you, Andy. The timing of your post was perfect to help me make this easier for everyone.
CREATE TABLE #VLF_temp ( RecoveryUnitID int , FileID varchar(3) , FileSize numeric(20,0) , StartOffset bigint , FSeqNo bigint , Status char(1) , Parity varchar(4) , CreateLSN numeric(25,0) ) CREATE TABLE #VLF_db_total_temp ( DatabaseName sysname , LogiFilename sysname , PhysFileName sysname , AVG_VLF_Size_MB DECIMAL(12,2) , vlf_count int , log_size_mb FLOAT , log_growth_mb FLOAT ) IF (SELECT LEFT(cast(serverproperty('ProductVersion') as varchar),2)) in ('8.','9.','10') BEGIN ALTER TABLE #VLF_temp DROP COLUMN RecoveryUnitID END DECLARE db_cursor CURSOR READ_ONLY FOR SELECT name FROM sys.databases WHERE State = 0 DECLARE @name sysname, @stmt varchar(40) OPEN db_cursor FETCH NEXT FROM db_cursor INTO @name WHILE (@@fetch_status <> -1) BEGIN IF (@@fetch_status <> -2) BEGIN INSERT INTO #VLF_temp EXEC ('DBCC LOGINFO ([' + @name + ']) WITH NO_INFOMSGS') INSERT INTO #VLF_db_total_temp (DatabaseName, LogiFilename, PhysFileName, vlf_count, AVG_VLF_Size_MB, log_size_mb, log_growth_mb) SELECT DatabaseName = @name , LogiFilename = mf.name , PhysFileName = mf.physical_name , vlf_count = COUNT(*) , AVG_VLF_Size_MB = (AVG(FileSize)/1024)/1024 , log_size_mb = (mf.size * 8)/1024 , log_growth_mb = CASE mf.is_percent_growth WHEN 1 THEN (mf.size * 8)/1024 * mf.growth/100 WHEN 0 THEN (mf.growth * 8)/1024 END FROM #VLF_temp vt INNER JOIN sys.master_files mf ON mf.database_id = db_id(@name) AND mf.file_id = vt.fileid GROUP BY mf.name, mf.physical_name,mf.size,mf.growth,mf.is_percent_growth; TRUNCATE TABLE #VLF_temp END FETCH NEXT FROM db_cursor INTO @name END CLOSE db_cursor DEALLOCATE db_cursor SELECT * , USEDB = 'USE [' + DatabaseName + ']' , CP = 'CHECKPOINT' , ShrinkZero = 'DBCC SHRINKFILE (N''' + LogiFileName + ''' , 0)' , GrowBack = 'ALTER DATABASE [' + DatabaseName + '] MODIFY FILE ( NAME = N''' + LogiFileName + ''', SIZE = ' + CAST(log_size_mb AS VARCHAR(20)) + 'MB ) --I split this up when over 10,000 MB' , ChangeGrowth = CASE --Note, never gets up to the 8000 MB I recommend for manual growths, but still 250 MB VLFs WHEN DatabaseName = 'tempdb' THEN '' WHEN log_size_mb <= 128 THEN 'ALTER DATABASE [' + DatabaseName + '] MODIFY FILE ( NAME = N''' + LogiFileName + ''', FILEGROWTH = 32MB )' WHEN log_size_mb <= 512 THEN 'ALTER DATABASE [' + DatabaseName + '] MODIFY FILE ( NAME = N''' + LogiFileName + ''', FILEGROWTH = 128MB )' WHEN log_size_mb <= 4000 THEN 'ALTER DATABASE [' + DatabaseName + '] MODIFY FILE ( NAME = N''' + LogiFileName + ''', FILEGROWTH = 1000MB )' WHEN log_size_mb <= 16000 THEN 'ALTER DATABASE [' + DatabaseName + '] MODIFY FILE ( NAME = N''' + LogiFileName + ''', FILEGROWTH = 1000MB )' ELSE 'ALTER DATABASE [' + DatabaseName + '] MODIFY FILE ( NAME = N''' + LogiFileName + ''', FILEGROWTH = 1000MB )' END FROM #VLF_db_total_temp WHERE (vlf_count > 50 AND Avg_VLF_Size_MB < vlf_count / 3) OR (vlf_count > 250 AND AVG_VLF_Size_MB < vlf_count / 2) ORDER BY vlf_count DESC DROP TABLE #VLF_temp DROP TABLE #VLF_db_total_temp
What’s the Right Size?
The most common thing you’ll see on the internet is people talking about having too many with the limit being 1,000 VLFs and ideally at 50 VLFs. I’m a little different and focus more on size than number. The concept I’m going for is that excessive context switching is what causes me grief, and a reasonable size VLF will keep that under control.
Now there’s no magic number that will work noticeably better in all cases, so I won’t give you static numbers and say they’re magic. What I can say is that too small is basically fragmentation with the overhead of context switching, too many increases recovery time, too big is huge chucks, and too few could cause extra growths as unused space is still being reserved.
It’s easy to visualize how smaller and excessive numbers of VLFs can cause issues, but that doesn’t mean huge VLFs are the way to go. Tammy Richter Jones (b) gets into details with her post If > 50 VLFs are bad, then 16 is good, right?. I’d love to see more on the affects of huge VLFs to have a more complete understanding if a 1 TB log file is better off with 50 20GB VLFs or 2,000 500MB VLFs. I’m not convinced that >50 is bad.
Personally, I like formulas a lot more than static numbers, especially since they allow me to avoid extremes without stressing out. The formulas I used broke it down into three categories:
- Below 50 VLFs, this is a reasonable number, they get a free ride no matter how big they are.
- Between 50 and 250 VLFs, the average size in MB has to be at least 1/3 of the count of VLFs.
- For a file with 100 VLFs, the average VLF has to be over 33 MB.
- Over 250 VLFs, the average size in MB has to be at least 1/2 of the count of VLFs.
- For a file with 500 VLFs, the average VLF has to be over 250 MB.
This is good enough for almost any database. I don’t care about there being a couple small VLFs mixed in there because I’ll spend most of my time in the large ones. I don’t care about the numbers getting larger because the average VLF is large enough to avoid too much context switching. You’re not out to micromanage, so take a relaxed approach to this unless you have a reason to not be relaxed.
Remember how I said there were no magic numbers? Well, that doesn’t mean there aren’t wrong numbers.
By default, every database is based off of model, which has a 1 MB log files growing in 10% increments. So 1/10th of a MB for the first growth, which is just a tiny fragment. If you leave it this way on large databases you could end up with a 1 TB log trying to grow 100 GB at a time, which your users will notice.
Be default, you get the worst of both worlds! The goal here is to avoid extremes, so you’ll want to change the growth rates away from the default.
How Do I Change the Size?
There’s only one way to change the size of your VLFs. Delete them and recreate them.
For every database EXCEPT TempDB, this means shrinking the logs (it deletes VLFs to get to the size you want), then growing the logs again. Both manual and automatic growths will split the new physical space into VLFs, but that depends on your version of SQL Server.
Here are the growth rates I pulled from Paul Randal’s post Important change to VLF creation algorithm in SQL Server 2014.
SQL 2012 and earlier uses this formula:
<= 64 MB growth is 4 VLFs
64 MB and <= 1 GB = 8 VLFs
1 GB = 16 VLFs
SQL 2014+ uses this formula:
Is the growth size less than 1/8 the size of the current log size?
Yes: create 1 new VLF equal to the growth size
No: use the formula above
So if you have too many VLFs you can shrink your log file. It will delete VLFs that are not in use to make it smaller. Then grow the file in increments that give you VLFs sized how you want them. If you’re going for 500 MB VLFs then you grow your file 8,000 MB at a time.
The VLFs that weren’t deleted in this process, because they were in use or you didn’t shrink the file as far as you could have, will not be affected. This means you’ll have different sized VLFs throughout your file, but that doesn’t really matter. What does matter is that you don’t have any ridiculously large VLFs and you’re spending most of your time in properly sized VLFs.
Here’s how I do it. Shrink it as much as possible. Shrinking is never as easy as it should be, so verify it shrunk it a significant amount. If it didn’t, take log backups and try again, repeating this process a handful of times if needed. Then, if I want the log to be 32,000 MB, I just grow it by 8,000 MB 4 times.
--Run a log backup before this if database is not in simple and log backups run less frequently than every 5 minutes. USE [DatabaseName] GO CHECKPOINT GO DBCC SHRINKFILE (N'LogicalFileName_Log' , 0) GO USE [master] GO ALTER DATABASE [DatabaseName] MODIFY FILE ( NAME = N'LogicalFileName_Log', SIZE = 8000MB ) ALTER DATABASE [DatabaseName] MODIFY FILE ( NAME = N'LogicalFileName_Log', SIZE = 16000MB ) ALTER DATABASE [DatabaseName] MODIFY FILE ( NAME = N'LogicalFileName_Log', SIZE = 24000MB ) ALTER DATABASE [DatabaseName] MODIFY FILE ( NAME = N'LogicalFileName_Log', SIZE = 32000MB ) GO
Assuming you were able to shrink this down to 10 VLFs remaining and you want it to be 32,000 MB, this would give you 74 VLFs with all new ones being 500 MB. It goes over the 50 VLF limit I’ve seen elsewhere, but I’m very happy with these sizes and numbers.
With 8,000 MB growths the new 2014 calculation won’t kick in until the log was already 64,000 MB, and at that point an 8,000 MB VLF probably isn’t a bad thing.
I recommend manually growing larger log files between 8,000 and 16,000 MB at a time, but not autogrowths. The process that causes a file to grow, and any other process that needs that space, will sit around and wait for an autogrowth to complete. That could take a minute on descent disks, which is longer than many timeouts I’ve worked with.
Manually growing files is best, but almost every database will rely on autogrowth. Know your timeouts, know how long it takes to grow a file, and size it accordingly. I like to do 1,000 MB here, but will do as less if I have to. If you can’t make a descent sized growth based on this, be more paranoid about manually growing files while leaving autogrowth set to smaller sizes.
The reason for 1,000 MB is that this is the largest size that will still give you 8 VLFs instead of being split into 16 VLFs, with the actual limit being at 1,024 MB. If you did 2,000 MB, you’d still end up with 125 MB VLFs. You can get that up to 250 MB each at 4,000 MB, but you’re telling your users to wait while 4,000 MB is zeroed out before they can move on. Although the calculation changes for larger logs on SQL 2014, I’d still stick with 1,000 MB for a good autogrowth size.
Note, there is a resolved bug that would cause issues when log files grew in 4 GB increments, so you’ll see a lot of people using 8,000 MB instead of 8,192 MB. We aren’t using an effected patch level anymore, but DBAs are made to be paranoid. We’re not just lazy at math, although this doesn’t rule out that possibility.
TempDB is different. The database is recreated every time the SQL service restarts, so the log is recreated as well, following the formulas above. If the log is over 1 GB, it starts out with 16 VLFs. Jonathan Kehayias got into the details in his post TSQL Tuesday #11 – Misconceptions – The TempDB Log File and VLF Counts.
So TempDB never has too many on startup, but is this too few? What if TempDB’s log is 40 GB, do you want a 2.5 GB VLF? Is setting up TempDB to have 5 log files that are 8,000 MB each so it starts up with 500 MB VLFs a better idea?
Seriously, someone answer this one, I’m curious myself.
Changes Have Risk
The script above creates scripts to make changes. It’s not perfectly safe to make those changes. Test it, understand it, and be careful running it.
One of the most important pieces is that the scripts it generates will try to shrink the log as much as possible before growing it again. In terms of cleaning up VLFs, this is perfect. In terms of trying to run this during the day, it risks transactions failing because they’re trying to grab the next VLF as you’re running a shrink job to delete all unused VLFs.
You might be able to get away with this during the day most of the time, but it’s best to wait for a maintenance window to be safe.
What Others Did
Linchi Shea showed performance degradation from having too many small VLFs on Performance impact: a large number of virtual log files – Part I. Updates can take almost 10x as long, and I would assume its due to the size and not number of VLFs.
SQLskills.com has blog posts that read like a series on this. Kimberly Tripp wrote 8 Steps to better Transaction Log throughput, then followed that up with Transaction Log VLFs too many or too few?, which was updated to link to Paul Randal’s post Important change to VLF creation algorithm in SQL Server 2014. All are must-reads if you want to really understand this topic.
Tony Rogerson (b|t) at the time of this writing has only put one post on his new blog, Transaction Log Concepts: Part 1. If this is his typical quality of work, we have a lot to look forward to. I’m personally looking forward to the other 3 parts of this 4 part series.
Grahm Kent (b|t) ran tests on the performance differences between two log files in his post Slow recovery times and slow performance due to large numbers of Virtual Log Files. Both were 5 GB, but one was 16 VLFs (320 MB each) while the other was 20,480 VLFs (0.25 MB each). It was good to see transaction performance as well as recovery performance covered.
If your database is absolutely critical to the point that squeezing an extra couple milliseconds of performance throughout the day would be noticed or if a server coming back online 15 seconds faster would save thousands of dollars, test it. See how it performs in load tests and service restarts using your test servers and your database.
If you have a 1 TB log file and verified it needs to be about that big, what happens when you size it so you have 2,000 VLFs that are 500 MB each? What happens when you have 100 VLFs that are 10 GB each? Did it change recovery, any replication you’re doing, backup times, etc.?
Know that this can make a difference and that this is just another configuration change you can test out. Crazy numbers on either side can be bad, and perfect numbers in the middle aren’t going to come from some stranger on the internet that never saw your server.