SQLServerCentral Article

Using SSIS to Maintain the File System

,

We have all run into a need or a desire to clean up old stale files from the file system, whether it be to a) remove old backup files or b) to remove flat files that have been created by one process or another.  Based on this need/desire, we have all come up with some method to help us achieve that goal.

One method might be to include a flag in a maintenance plan.  Another method may be to use a SQL script employing xp_cmdshell and delete statements.  Yet another may utilize the sp_oa stored procedures and DMO.  And still others may require the use of PowerShell to accomplish the same task.  The point is that there are many methods.

I am adding yet another method to the mix.  Why?  I didn't much like the option of using the sp_oa method or the xp_cmdshell route.  I am very much a novice with PowerShell and it would take a lot more tinkering to get the script working properly.  Also, I felt pretty comfortable with SSIS and had approval to try and get this done using that method.  And just because I am a novice with PowerShell, does not mean that I will not someday circle back around to try and accomplish this task via that means.

Requirements

The method employed needs to be able to do the following:

  1. Remove multiple file types
  2. Be configurable
  3. Clean out files from numerous directories
  4. Remove files older than a specified number of days.

Setup

The solution I chose utilizes SSIS.  This solution requires the use of a table to help control some of the options for the package.  The table is not too terribly complex and can be created using the following script.  (Note:  I use AdminDB_Test throughout this process.  You will either need to create a database by that name, or replace AdminDB_Test throughout with a database you will be using.)

USE [AdminDB_Test]
GO
/****** Object:  Table [dbo].[FilePaths]    Script Date: 12/05/2011 23:36:26 ******/IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[FilePaths]') AND type in (N'U'))
 DROP TABLE [dbo].[FilePaths]
GO
USE [AdminDB_Test]
GO
/****** Object:  Table [dbo].[FilePaths]    Script Date: 12/05/2011 23:36:26 ******/SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[FilePaths](
[PathID] [int] IDENTITY(1,1) NOT NULL,
[Process] [varchar](32) NULL,
[FilePath] [varchar](256) NULL,
 CONSTRAINT [PK_FilePaths] PRIMARY KEY CLUSTERED
(
[PathID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO

The FilePath column holds the file-system path for each directory that needs to be cleaned.  Paths that are supported are local (e.g., C:\temp ) and UNC paths (\\machine\c$\temp).  I set this attribute to a length of 256, but if you have a longer path, you will want to adjust the length.

The Process column will hold a value describing what that path relates to, such as MaintainDirectory.  In my example, I am using MaintainDirectory to control which directories hold files that potentially need to be deleted.

Here is an example of the contents of that table I am using currently.

The last piece of the setup before we start working on the SSIS package is the need for a string splitting function.  Pick the string splitter of your liking.  I have one that I like and am sure you have one that you prefer.  The SSIS package relies on the return field from the splitter being named "Item."  If it is named something else, please make the adjustments in the package as necessary.

The Package

I chose to use SSIS 2008 to create this package.  To meet the requirements already set forth, I decided I would need to loop through some directories and then loop through the file list to determine the file properties.  Based on the date returned in the file properties, I determine if the file needs to be deleted or not based on criteria passed through via variables.  To accomplish this, I utilized the following objects: ADO.Net Data Source, two Execute SQL Tasks, two ForEach Loop Containers, a Script Task, and eight variables.  Let's take a look at these starting with the variables.

Variables

  • SQLServerName - The value held here is used in an Expression for the Data Source.  This will overwrite the ServerName value in the Data Source.
  • DatabaseName - Used alongside the SQLServerName variable in an Expression for the Data Source.  This value will overwrite the InitialCatalog value in the Data Source.  This should be the name of the database where the FilePaths table and String Split function exist.
  • DaysToKeep - This value is the cutoff point for which files to keep and which files will be deleted.  This variable is used as a ReadOnly variable in the Script Task.
  • obj_FileExtension - This object variable is used to store the result set from one of the Execute SQL tasks and the results of the string split function from the FileExtensionList variable.
  • FileExtensionList - This is a delimited list of file extensions that need to be evaluated for deletion.  It is important to note that the file extensions that are to be processed are case sensitive.  The extension must appear in this list as it appears in the file system.
  • FileExtension - Used in one of the ForEach loops.  This variable will receive the FileExtension from the obj_FileExtension variable one at a time.
  • obj_ListOfDirectories - This variable will receive the result set of an Execute SQL Task to be later consumed by one of the ForEach loops.
  • DirectoryToMaintain – Receives, one at a time, the Directory to process for file deletion.  The ForEach loop stores a value from obj_ListOfDirectories in this variable for processing.

Execute SQL Tasks

The two Execute SQL Tasks are simple in function:  one retrieves the list of directories to maintain from the FilePaths table, and the other is strictly to split the string from the FileExtensionList variable.

The first is named "Get Directory List" and should receive the Full Result Set from the following query.

SELECT FP.PathID, FP.Process, FP.FilePath 
      FROM dbo.FilePaths FP
      WHERE FP.Process = 'MaintainDirectory'
      ORDER BY FP.PathID

The Result Set tab of this task also needs to be modified.  it should look like this.

From this task, we step into the next Execute SQL Task named "Split FileList."  The setup of this task is very much like the previous task:  the full result is required, a configuration needs to be made on the result set tab, and a parameter needs to be mapped.  Let's take a quick look at those settings.

Parameter Mapping

Result Set

And this is the query that we will be executing.

Select Item
      From stringsplitter(@FileExtensionList,',')

Notice that the Parameter that was named in the Parameter Mapping tab is being used in the function call.  I chose this method because I could better see and understand how it works.

ForEach Loops

The next step in the flow is the ForEach Loop - Directory object.  As the name implies, this ForEach Loop is designed to work with the obj_ListOfDirectories variable/array.

With this first Loop container, we have two tabs that need to be configured in the properties.  Both Loop containers are similar in that they need the same tabs to be configured.  First, let's talk about the Collection tab.

On the Collection tab, we first set the Enumerator option to "ForEach ADO Enumerator."  Then we need to select the obj_ListOfDirectories from the drop down labeled "ADO Source Object Variable."  Your screen should look like the following image:

With this tab configured, we need to focus our attention to the changes that are required on the Variable Mappings tab.  On this tab, we are telling the enumerator how to handle the data from the object variable.  We are mapping columns from the result set to variables for further consumption.  When configured, it should look like the following.

Inside of this ForEach loop container, we have another ForEach loop container.  This second ForEach loop container handles the file extensions that we listed (in comma delimited fashion) in the FileExtensionList variable.  I have called this container "Foreach Loop - FileExtension" (just keeping it simple).

The collection tab follows the same configuration setup.  The difference of course being that this container will use the obj_FileExtension object from the source variable dropdown menu.

The variable mapping tab is also slightly different.  We will be mapping column 0 of the object to the FileExtension variable.  The explanation for the different column number between the two loop container variable mappings is simple.  In obj_ListOfDirectories, we have multiple columns being returned and only wanted the value in Column 2, while in obj_FileExtension, we have but a single column being returned (which corresponds to Column 0).

This inner Loop container will loop through each of the extensions for each of the directories that have been returned to the outer loop container.  The inner loop container has the remainder of the workload in it via the Script Task.

Script Task

It is via the script task that we finally get to start deleting files.  This was the most difficult piece of the entire package - though the script is not very large.

For the script task, I chose to implement it via Visual Basic (instead of C#).  When setting up this task in your package, make sure the Visual Basic option is selected.  I have three ReadOnlyVariables employed by the script.  Those variables are: User::DaysToKeep,User::DirectoryToMaintain, and User::FileExtension.

Once you have set those on the script tab, the next step is to click the “Edit Script...” button where we need to place the following script.

' Microsoft SQL Server Integration Services Script Task
' Write scripts using Microsoft Visual Basic 2008.
' The ScriptMain is the entry point class of the script.
 
Imports System
Imports System.IO
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime
 
<System.AddIn.AddIn("ScriptMain", Version:="1.0", Publisher:="", Description:="")> _
<System.CLSCompliantAttribute(False)> _
Partial Public Class ScriptMain
      Inherits Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
 
      Enum ScriptResults
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
      End Enum
 
    Public Sub Main()
        Dim SourcePath As String
        Dim PurgeDays As Integer
        Dim FileExtension As String
        Dim thisFileInUse As Boolean = False
 
        PurgeDays = CInt(Dts.Variables("User::DaysToKeep").Value)
        SourcePath = CStr(Dts.Variables("User::DirectoryToMaintain").Value)
        FileExtension = CStr(Dts.Variables("User::FileExtension").Value)
 
        For Each file As FileInfo In New DirectoryInfo(SourcePath).GetFiles()
            If ((Now - file.LastWriteTime).Days > PurgeDays) And (file.Extension = FileExtension) Then
                Try
                    file.Delete()
 
                Catch ex As Exception
 
                End Try
            End If
 
        Next
 
        Dts.TaskResult = ScriptResults.Success
    End Sub
 
End Class

An important note of interest is the need for the Try...Catch block.  Without this block as it is, you could run into an issue where the file (such as those pesky temp files) may be in use by some process and cause the package to error.  The Try...catch will move past that nasty error and delete the files that it can.

Inside this script, you will see that I am comparing the LastWriteTime to the PurgeDays and ensuring that the file extension matches one that is in the list.  Then we move into the try...catch and either delete the file that matches those criteria or throw an exception and move on to the next file.

When all is said and done, your package should look something like this:

You should also have a variable list that looks like this:

Each variable that is not an Object has a value assigned to it at this point.  These values will be overwritten where applicable.

Next Steps

Having this package is a good start.  But unless you are prepared to manually run this on a daily basis, it needs to be added to a job and scheduled.  There are two ways to go about scheduling this package.

The first option is to configure the FileExtensionList and DaysToKeep variables and save the package with those values.  Then run this package through SQL Agent with those values every time.  The drawback to this method is that if you need to add or remove a file extension (as an example) then you need to edit the package and re-save it.

The alternative option is pass the values through the job to overwrite those variables as the job runs.  Should you need to remove or add a file extension, it would just be done at the job definition level.

Let's take a look at this second option.  I will skip past how to create the job as an SSIS job in SQL Server and we will look directly how to modify those variables from the job properties.

To configure these variables directly from the SQL Agent job, open the Job properties and click on the Set Values tab (assuming you have defined this job as an SSIS Job Step type).  You should get a screen similar to this (void of the set values shown in the pic).  Just add the parameters (variables) we have discussed to this point with appropriate values to fit your needs/environment.

I have chosen to only include the four variables shown above since those that remain are either objects or get overwritten in the ForEach loops at runtime.  The only thing remaining now is to set the schedule for the job.  Once set, the job (and package) will take care of the rest.

Conclusion

I have now shown you one method on how to maintain some of the directories on your system through the use of SSIS and SQL server.  There are many methods to accomplish this goal, it is up to each of us to choose the best method for our environment and comfort level (by means of supporting the chosen solution).

Rate

4.47 (15)

You rated this post out of 5. Change rating

Share

Share

Rate

4.47 (15)

You rated this post out of 5. Change rating