SQLServerCentral Article

Best Option for Working with Files in SSIS

,

Introduction

When you administer a database, you need to work with files and sometimes upload big files or administer in SQL Server. When the files are big, sometimes it is necessary to improve the performance. In this article, we will evaluate each method and show how to copy files and folders in SSIS with the following methods:

  1. Using the File System task
  2. Using the command line (cmd) with the Execute Process Task
  3. Using Powershell with the Execute Process Task
  4. Using C# with the Script Task

We will also compare the performance of each option and run tasks in parallel to verify the performance results.

Requirements

  1. SSIS installed.
  2. SSDT installed.

Get Started

Let's get started looking at each method in turn.

File System Task

The File System Task is the best option to handle files and folders. It is simple to use and you do not require programming knowledge or command line skills. In SSDT, create an SSIS project and drag and drop the File System Task to the Control Flow Pane:

We will copy all the files and folders from the c:\sql folder to the backupsql folder. To do that, in the operation section, select the Copy directory operation and in the Source Connection create a connection to an existing folder named sql, or any other name of your preference in the c:\ drive. In the Destination Connection, create a connection to an existing folder named backupsql, or any name of your preference.

Using the Command Line with the Execute Process Task

The command line, or cmd, can be executed using the Execute Process Task. This option can be useful when you need to execute some tasks that are not allowed with the File System task, such as changing the file attributes or applying some special filters to the files that you need to copy/delete/move.

To invoke the cmd in SSIS, drag and drop the Execute Process Task to the Control flow pane:

We will run the cmd from the c:\Windows\System32 folder. The arguments are the following:

/C xcopy c:\sql c:\sqlbackup2 /E

  • /C is to specify that they are commands
  • xcopy is the command used to copy all the files and folders from the c:\sql folder to the c: sqlbackup2 folder
  • /E is used to copy directories and subdirectories

Here is the Execute Process Task in the Process page:

PowerShell and the Execute Process Task

PowerShell is an extremely flexible shell to manipulate files and work with administrative task in an easy and flexible way. However, you need programming skills to use it. PowerShell is a great option if you need to add loops, operators and more complex operations in your tasks.

To invoke PowerShell, you can use the Execute Process Task and call the powershell.exe. We will use the cmdlet Copy-Item in PowerShell and copy the folder from c:\sql to c:\mypowershell including files and folders:

The Script Task

The next option is the script task. You can use C# or Visual Basic to administer your tasks and program your solutions. The Script Task allows handling files and folders using code. This option is recommended if you already automated tasks using this method for other tasks and you want to continue using this option for future administrative tasks:

The following code shows how to copy the files from the c:\sql folder to the c:\csharp folder. If the csharp folder does not exist, it is created:

public void Main()
{
            // Define variables
            string file = "";
            string source = @"C:\sql";
            string target = @"C:\csharp";
            // Use Path class to manipulate file and directory paths.
            string sourceFile = System.IO.Path.Combine(source, file);
            string destFile = System.IO.Path.Combine(target, file);
            // Create the folder if it does not exist
            if (!System.IO.Directory.Exists(target))
            {
                System.IO.Directory.CreateDirectory(target);
            }
            // To copy a file to another location and 
            if (System.IO.Directory.Exists(source))
            {
                string[] files = System.IO.Directory.GetFiles(source);
                // Copy the files and overwrite destination files if they already exist.
                foreach (string s in files)
                {
                    // Use static Path methods to extract only the file name from the path.
                    file = System.IO.Path.GetFileName(s);
                    destFile = System.IO.Path.Combine(target, file);
                    System.IO.File.Copy(s, destFile, true);
                }
            }
            else
            {
                Console.WriteLine("Source path does not exist!");
            }
            Dts.TaskResult = (int)ScriptResults.Success;
}

Comparing the Different Methods

To compare which option is faster, we run each option five times. We are comparing the execution time in seconds for each method. There are a number of files in the folders (4GB) to ensure that we have times that can easily be compared. Here you have the results.

Run #File SystemCmdPowerShellScript Task
192124133154
2102112117114
3100110110107
495116122127
596114118125
Average97115.2120125.4

The fastest option is the File System task. It took an average of 97 seconds to complete execution.  In second place is cmd using the Execute Process task. This took an average of 115 seconds to copy the data from one folder to another. Third place belongs to PowerShell using the Execute Process task with an average of 120 seconds. Finally, the slowest option is C# with an average of 125 seconds.

Here is a percentage comparison:

  • File System Task - fastest
  • cmd/Execute Process task - 15% slower
  • PowerShell - 19% slower
  • C#/Script task - 22% slower

The Second Experiment

The second experiment is to run two tasks in parallel. We split the files of the source folder in two folders and use two tasks in parallel to copy the files faster:

In our example, we had 5 GB of data in the sql folder and we divided the files into two folders of approximately 3 and 2 GB. We tested two File System in parallel, two Execute Process tasks with cmd and so on:

After testing all the options in parallel, we got the following results:

Running the tasks in parallel (execution time in seconds)

# RunFile SystemCmdPowerShellScript Task
1166107145139
2116109112108
3109108123128
4131106125126
5129110126124
Average130.2108126.2125

The results are interesting. The File System task is the worst option if we work with parallel tasks 130 seconds. It is worse than when it was not a parallel process. The cmd improved the performance and now it is the best option. PowerShell is slightly worst than before and the Script Task did not change its performance results.

Conclusion

As you can see, the fastest solution is the File System Task when it is not in parallel. The other options can be used if the File System cannot accomplish your needs. If you are handling big files and the performance is important to consider this. If the performance is important, C# with the script task is not a good option.

When we talked about the tasks in parallel, we did not get a better performance except with the cmd. In few words, in this case, having the process in parallel does not guarantee a better performance. I strongly recommend to test process in parallel and compare results.

References

Rate

5 (6)

You rated this post out of 5. Change rating

Share

Share

Rate

5 (6)

You rated this post out of 5. Change rating