One of the banes of my life is log files. I build, and maintain, a data warehouse for a hospital in the UK. I am both the Ops person and the developer for this warehouse and have to be careful to remember which hat I'm wearing, when. Being a hospital, it is reasonably hard-pushed for cash, and although we have a decent server farm setup, hard drive space is still at a premium.
Microsoft's recommendation for log files is that you allow them to expand to the maximum size they need to be. In essence, they contain the transactions that haven't yet been committed. However, the free space in them when the transaction is committed isn't freed up.
For the data warehouse here, we have three main processes that happen each night that make considerable use of the transaction file space. The first is that data is obtained from about a dozen source systems that is loaded into the same number of staging databases. Some of this is done using linked servers and calls to OPENQUERY, some with CSV imports, and some with backup-copy-restore processes. Most loads are one of the first two. Obviously, when you're loading in all the records for the last year's worth of appointments, it's quite a large transaction.
The second process is a cleaning process where the rows in the staged data are checked, and rows with errors marked as such. The clean rows are then copied into a 'clean' version of each database.
The last process is integrating all this data into one place, where it is all tied up cohesively and data linked to a specific patient is easily queryable. Hence there are about 25 databases that are in use over time, with large transactions applied to them.
All three of these processes use about 80GB of transaction space, the last one a little less. The whole process takes about 7-10 hours and is controlled by an SSIS package. We have about 100GB of space on the transaction log drive. Incidentally, we use SQL2008R2 still due to licencing costs (yeah, licencing costs, Microsoft!).
From a programmers point of view, I don't care about transaction log space. I just want my processes to run as fast as possible, and without error. Transaction log space just isn't something that comes within my remit. However, we don't have the space for our transaction logs to stay at their maximum size. We need to share that space over time with the other processes that need it. Until recently, my SSIS package was littered with little boxes called something like ShrinkLogFiles (n), the n ensuring a unique task name. I didn't want them there. They were an annoyance. I simply want my log files to magically shrink. Shrinking is a couple of layers below the level I want to think at.
Necessity being the mother of invention, I realised that I could write such a process. The aim would be to have a scheduled task that would run, say, every 15 minutes, that would run through all the databases on the server. If there are no connections to that database, it would then attempt to shrink the log file for that database.
I knew how to get a list of databases. For each database I knew how to get a list of log files. I knew how to find out if there there were any connections (sp_who2), and I knew how to shrink log files (easy). If I added new databases, I didn't want to have to edit this job.
Shrinking the log files for a database
I'm very much a code person, not an SSIS boxes person. For me, SSIS is primarily there to facilitate parallelisation. I like my code to look like code. My first step was to write a stored procedure which, given a database, would check to see if it was in use, and if not, shrink it. I wanted the inuse check to be as close as possible to the shrink to minimize the chances of a connection being made to that database while this was running.
To get this information, I needed to capture the results of sp_who2 (showing which connections were made to which databases) and sys.master_files (showing which log files existed for the database). If there wasn't a connection for the database, the process needed to loop through the log files and shrink them. We would need to use dynamic SQL for some of this, and temporary staging tables for the data.
After a little playing and debugging, I ended up with the following:
USE lib GO IF OBJECT_ID('dbo.ShrinkAvailableLogFilesForDatabase', 'P') IS NOT NULL DROP PROCEDURE dbo.ShrinkAvailableLogFilesForDatabase GO CREATE PROCEDURE dbo.ShrinkAvailableLogFilesForDatabase @DBName VARCHAR(100) /* Given a database, this obtains the log files for that database. It also runs sp_who2 to see if there are connections to it. If not, this then shrinks all log files for the database. If the DB is in use, it performs no shrinkage. */ AS DECLARE @SQL NVARCHAR(MAX) DECLARE @LogFile NVARCHAR(MAX) -- log files for this database CREATE TABLE #logfiles ( LogFileName NVARCHAR(100) PRIMARY KEY, IsProcessed BIT DEFAULT(0) ) -- results of sp_who2 CREATE TABLE #sp_who2 ( spid INT, Status NVARCHAR(100), Login NVARCHAR(100), HostName NVARCHAR(100), BlkBy NVARCHAR(100), DBName NVARCHAR(100), Command NVARCHAR(100), CPUTime INT, DiskIO INT, LastBatch NVARCHAR(100), ProgramName NVARCHAR(100), spid2 INT, RequestID INT ) -- unique DBs in use CREATE TABLE #dbs ( DBName NVARCHAR(100) PRIMARY KEY ) -- get the log files for the database specified SET @SQL = ' INSERT #logfiles (LogFileName) SELECT name FROM sys.master_files WHERE database_id = db_id(''' + @DBName + ''') AND type = 1 ' EXEC sp_executesql @SQL -- run sp_who2 to find out which databases are in use SET @SQL = ' INSERT #sp_who2 EXEC sp_who2 ' EXEC sp_executesql @SQL -- now derive the unique DBs in use INSERT #dbs (DBName) SELECT DISTINCT DBName FROM #sp_who2 WHERE DBName IS NOT NULL -- is this database in use? If not, shrink its files IF NOT EXISTS (SELECT * FROM #dbs WHERE DBName = @DBName) BEGIN WHILE EXISTS (SELECT * FROM #logfiles WHERE IsProcessed = 0) BEGIN SELECT TOP 1 @LogFile = LogFileName FROM #logfiles ORDER BY LogFileName SET @SQL = 'USE ' + @DBName + '; DBCC SHRINKFILE(''' + @LogFile + ''', 1);' SELECT @SQL EXEC sp_executesql @SQL UPDATE #logfiles SET IsProcessed = 1 WHERE LogFileName = @LogFile END END DROP TABLE #logfiles DROP TABLE #sp_who2 DROP TABLE #dbs GO
After a bit of testing, it clearly worked. Now all I needed to do was put it into a loop, thus:
USE lib GO IF OBJECT_ID('dbo.ShrinkAvailableLogFiles', 'P') IS NOT NULL DROP PROCEDURE dbo.ShrinkAvailableLogFiles GO CREATE PROCEDURE dbo.ShrinkAvailableLogFiles /* This runs through all databases, calling ShrinkAvailableLogFilesForDatabase to try and shrink their log files if they aren't in use. */ AS DECLARE @DBName NVARCHAR(100) CREATE TABLE #databases ( DBName NVARCHAR(100) PRIMARY KEY ) INSERT #databases SELECT name FROM sys.databases SET @DBName = NULL SELECT TOP 1 @DBName = DBName FROM #databases ORDER BY DBName WHILE @DBName IS NOT NULL BEGIN EXEC ShrinkAvailableLogFilesForDatabase @DBName --SELECT @DBName DELETE #databases WHERE DBName = @DBName SET @DBName = NULL SELECT TOP 1 @DBName = DBName FROM #databases ORDER BY DBName END DROP TABLE #databases GO
Hence, we now have a stored procedure which we call from a job every 15 minutes. It has changed the overnight pattern of freespace on our log drive from this (on a bad night):
This may not be the solution for everyone, in fact it probably only fits a subset of cases, but it fits our business model. It may well also be of use as a one-off emergency solution where the log drive is out of space and a DBA just needs to reclaim whatever he can to get a server back to operational capability. Hope it’s of use to you.