Blog Post

Downloading SFTP Files with SSIS

,

Downloading SFTP Files with SSISIf you’ve used SSIS for any significant amount of time, you’ve almost certainly run into this roadblock: You need to download a file stored on an SFTP (secure FTP) server, but SSIS does not natively support this operation. There is a built-in FTP task in SSIS, but it is very limited in its functionality and will not connect to an SFTP server. However, it is possible to build your own solution using free tools and a bit of code.

In this post, I’ll show how to build a reusable package that can be used for downloading SFTP packages with SSIS.

Possible Solutions

Before I dive into my solution, I’ll point out that there are several ways to solve the problem of downloading SFTP files with SSIS:

  • Acquire a third-party SSIS task suited for this purpose (of which there are several free and paid versions)
  • Develop a purely code-based solution using a third-party DLL
  • Develop a code-based solution that calls a command-line SFTP application

I’ve used each of these solutions in the past, and each one has its own merits and risks. For example, using a third-party SSIS task provides a more streamlined approach (often requiring no manual coding at all), but often – especially in larger organizations – installing a third-party component requires so much extra testing and administrative approval that it makes it more trouble than its worth. For coding solutions using external DLLs, often the same issues exist because the external DLLs have to be registered on each server, which is an easy enough process but surprisingly difficult to get approved in many cases. In my work as a consultant, I’ve found that the easiest solution – when considering not just the technical components but the administrative work – is to use a command-line approach leveraging a mature and well-documented SFTP client. Therefore, my go-to solution is the last one on the list above, which is the solution I describe herein.

The Tools

To develop and test this solution, I use just a couple of tools:

WinSCP. This is a mature, well-documented, and free (GPL) SFTP client. It has both a UI and a command-line component, though I use the latter exclusively for this solution. The portable executables (near the top of the downloads page) are the easiest to use, as you simply drop the two executables into a folder and start using them without going through an installation process.  There are, of course, other free command-line SFTP tools available, but I’ve found none better than WinSCP in terms of reliability, ease of use, and documentation. For more information about WinSCP, check out the documentation pages, which are quite comprehensive and can give you a better understanding of how the command-line WinSCP tool works and its myriad configuration options. In particular, this article was very helpful for some of the syntax questions I had when developing this solution.

Core FTP mini-SFTP server. Although this is not a part of the deployed solution itself, this provides a very simple and easy-to-configure SFTP server for testing purposes.

Also, in this example I’m using SSIS 2014 to demonstrate the solution, but this design pattern is portable across all versions of SSIS.

Setting Up the Environment

For the initial setup, download WinSCP and the Core FTP mini-SFTP server executables listed above. To keep things simple, I recommend creating a utility directory to store these executables. As noted, WinSCP comes with two executables – WinSCP.exe and WinSCP.com – but this solution requires only the latter of these for command-line operations.

image

Next, start up the mini-SFTP server (msftpsrvr.exe). The UI is quite simple and self-explanatory, only requiring a user name, password, port number (which defaults to 22), and the root directory.

image

For testing, add a few files to the root directory specified above. Be sure to also include a few files that will not be downloaded as part of the SFTP operation for more comprehensive testing.

Downloading SFTP Files with SSIS

With the testing environment set up, it’s time to build a solution. In a new SSIS project, create a new package (I’ve named mine DownloadSFTP.dtsx). Navigate to the Parameters tab, where we’ll create a handful of runtime values that will make the DownloadSFTP package more reusable.

image

As shown above, all of the values that might change over time – the server name, user ID and password, download directory, etc. – are parameterized. This accomplishes two things. First, when those values do change (and they will), there is a single point of administration rather than a bunch of hard-coded values in the package. Even better, by parameterizing these values, you can build a package that is process-agnostic, and have just one package that can be used by multiple ETL processes.

For those using older versions of SSIS (prior to 2012), you won’t be able to use parameters. However, with a little extra work you can accomplish the same thing I’ve described here by using package configurations.

I’ll call out a couple of these values that may not be obvious:

  • pFilename: This is the file name to download from the server. Note that we can also use wildcards (assuming they are supported by the target server) – in the example above, we’ll be downloading all files ending in “.TXT”.
  • pServerHostKey: This is used to satisfy a security mechanism built into the WinSCP process. By default, WinSCP will prompt the user to verify and add to local cache the host key when connecting to an SFTP server for the first time. Because this will be done in an automated, non-interactive process, getting that prompt would cause an error in our script. To prevent this, the script is built to supply the server host key to avoid the error, and also has the added benefit of ensuring we’re actually connecting to the correct SFTP server. This brief article on the WinSCP documentation site describes how to retrieve the server host key for the target server.
  • pServerUserPassword: This is marked as sensitive to mask the password. As part of the script logic, this password will be decrypted before it is sent to the server.

With the parameters established, we can jump into the code portion of the solution. Create a new script task in the control flow, and add all 7 of the parameters shown above to the list of ReadOnlyVariables.

image

Next, we’ll add the code to set up the Process object that will ultimately trigger the execution of WinSCP.com to perform the download. As shown below, using the Main() function (which is created automatically in a new script task), we’re creating the Process object and configuring a few of the runtime options, including the name of the executable and the download directory.

 public void Main()
        {
            // Create a new Process object to execute WinSCP
            Process winscp = new Process();
            // Set the executable path and download directory
            winscp.StartInfo.FileName = Dts.Variables["$Package::pWinSCPLocation"].Value.ToString();
            winscp.StartInfo.WorkingDirectory = Dts.Variables["$Package::pDownloadDir"].Value.ToString();
            
            // Set static execution options (these should not need to change)
            winscp.StartInfo.UseShellExecute = false;
            winscp.StartInfo.RedirectStandardInput = true;
            winscp.StartInfo.RedirectStandardOutput = true;
            winscp.StartInfo.CreateNoWindow = true;
            // Set session options
            string sessionOptionString = "option batch abort" + System.Environment.NewLine + "option confirm off";

The next step is to create the input strings that will make the connection and download the file. At the bottom of this snippet, there are 3 variables that will capture output messages, error messages, and the return value, all of which will be used to log runtime information.

            // Build the connect string (<user>:<password>@<hostname>)
            string connectString = @"open " + Dts.Variables["$Package::pServerUserName"].Value.ToString()
                                            + ":"
                                            + Dts.Variables["$Package::pServerUserPassword"].GetSensitiveValue().ToString()
                                            + "@"
                                            + Dts.Variables["$Package::pServerName"].Value.ToString();
            // Supplying the host key adds an extra level of security, and avoids getting the prompt to trust the server.
            string hostKeyString = Dts.Variables["$Package::pServerHostKey"].Value.ToString();
            // If hostkey was specified, include it
            if (hostKeyString != null && hostKeyString.Length > 0)
                connectString += " -hostkey=\"" + hostKeyString + "\"";
            // Build the get command string
            string getString = "get " + Dts.Variables["$Package::pFilename"].Value.ToString();
            // Create output variables to capture execution info
            string outStr = "", errStr = "";
            int returnVal = 1;

With all of the options configured, it’s time to invoke WinSCP.com. The try/catch block below will attempt to connect and download the specified file from the server.

            // This try/catch block will capture catastrophic failures (such as specifying the wrong path to winscp).
            try
            {
                winscp.Start();
                winscp.StandardInput.WriteLine(sessionOptionString);
                winscp.StandardInput.WriteLine(connectString);
                winscp.StandardInput.WriteLine(getString);
                winscp.StandardInput.Close();
                winscp.WaitForExit();
// Set the outStr to the output value, obfuscating the password
                outStr = winscp.StandardOutput.ReadToEnd().Replace(":" + Dts.Variables["$Package::pServerUserPassword"].GetSensitiveValue().ToString() + "@", ":*******@");
                returnVal = winscp.ExitCode;
            }
            catch (Exception ex)
            {
                errStr = "An error occurred when attempting to execute winscp.com: " + ex.Message.Replace("'", "\"").Replace("--", " - ");
            }

Technically, that’s all we have to do. However, I always like to build in enough information for logging that I know when and why a problem occurred. The final chunk of code below checks for error conditions – either a nonzero return code or a nonblank errStr value – and will log the message and fail the script task (thus failing the package) on error. If there is no error, we’ll simply use the Dts.Events.FireInformation() method to log the output from the execution of WinSCP.com.

            // Do-nothing variable required for FireInformation below
            bool fireagain = true;
            // Log output to SSIS log
            if (returnVal != 0 || errStr.Length > 0) // Error
            {
                if (errStr.Length > 0)
                    Dts.Events.FireError(0, "WinSCP Exception", errStr.Replace("'", "\"").Replace("--", " - "), "", 0);
                if (outStr.Length > 0)
                    Dts.Events.FireError(0, "WinSCP error", "A WinSCP error has occurred. The full output stack follows: " + outStr.Replace("'", "\"").Replace("--", " - "), "", 0);
            }
            else // No error
                Dts.Events.FireInformation(0, "WinSCP output", outStr, string.Empty, 0, ref fireagain);
            Dts.TaskResult = (returnVal != 0) ? (int)ScriptResults.Failure : (int)ScriptResults.Success;

That’s it! The package is ready to be executed. Assuming everything is configured properly, running the package on my system should download exactly two text files (remember, we used the wildcard “*.txt” to get all text files). And upon review, I find that’s exactly what happens when the package runs.

image

To briefly test out the error logging facilities, I’m going to update the package parameter for User Name to an invalid value. Changing the user name to “userx” results in a package failure, exactly as we’ve designed it. When I review the log information, I can see that the log captures the fact that WinSCP.com failed, which should help determine that it is an authentication-related issue.

image

Usage in the Real World

This pattern takes a little effort to implement. Even if you copy verbatim the code I’ve used in this example, you’ll probably make some modifications, add behavior, etc. That takes time. However, this pattern has a significant advantage in that you could build this one time, and use it for all of your SFTP download operations. It’s built with reusability in mind, and with minor tweaks to suit your environment, you could use this package as a solution to all of your SFTP downloads, deploying it once (or at most, once per SSIS server) and continuing to reuse it.

Conclusion

Although SQL Server Integration Services does not include native functionality to retrieve SFTP files, you can roll your own solution by using free tools and a little bit of script. In a future post, I’ll extend this pattern to demonstrate how to go the other direction and write files back to an SFTP server.

You can download the code from this solution here.

The post Downloading SFTP Files with SSIS appeared first on Tim Mitchell.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating