SQLServerCentral Article

Batch ETL of Multiple Data Files Using an SSIS Foreach Loop Container

,

An SSIS ETL package created with the Import/Export Wizard will process one data file at a time. There may be situations, however, when multiple data files will be arriving continuously for extraction to the same table. A base SSIS package generated using the Import/Export Wizard can be modified to process these files in batch mode.

In this tutorial we will take an SSIS package generated by the Import/Export wizard and modify it to process multiple files and move those files to a backup directory.

1: Download the tutorial files

The zip file attached to this tutorial contains a default SSIS package (BaseBatchETL.dtsx) generated by the Import/Export Wizard, the same SSIS package modified to process multiple files in batch mode (BatchETL.dtsx), a SQL script (CreateCustomersTable.sql) for generating the ETL destination table, and six data files (NewCustomersXX.txt) for extraction into the destination table.

2: Create the destination table with the CreateCustomersTable.sql script

Open the CreateCustomersTable.sql script in SQL Server Management Studio.

USE [TestDB]
GO
/****** Object:  Table [dbo].[customers]    Script Date: 05/10/2013 15:46:30 ******/SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[customers](
    [FirstName] [nvarchar](100) NULL,
    [LastName] [nvarchar](100) NULL,
    [Company] [nvarchar](100) NULL,
    [Address] [nvarchar](100) NULL,
    [City] [nvarchar](100) NULL,
    [County] [nvarchar](100) NULL,
    [State] [nvarchar](100) NULL,
    [ZIP] [nvarchar](100) NULL,
    [Phone] [nvarchar](100) NULL,
    [Fax] [nvarchar](100) NULL,
    [Email] [nvarchar](100) NULL,
    [Web] [nvarchar](100) NULL
) ON [PRIMARY]
GO

Execute the script to create the destination table.

Browse to the new table to confirm that it was created.

3: Create source and backup folders and place the data files in the source folder

Create the folders C:\BatchETL and C:\BatchETL\backup and copy the six data files to the C:\BatchETL folder.

4: Open, configure and execute BaseBatchETL.dtsx in Business Intelligence Development Studio

Open the BaseBatchETL.dtsx SSIS package in BIDS to show that it consists of a Data Flow Task generated by the Import/Export Wizard.

Double-click the DestinationConnectionOLEDB connection manager to bring up the Connection Manager configuration panel, change the server and database names to the server and database where you created the customers table and click the OK button.

Double-click on the SourceConnectionFlatFile connection manager to show that the connection manager is pointing to the NewCustomers01 data file in the C:\BatchETL\ folder.

Execute the BaseBatchETL package.

Confirm that the 5000 records in the NewCustomers01.txt file have been added to the customers table.

Truncate the customers table to prepare for execution of the BatchETL.dtsx package we will create from the BaseBatchETL.dtsx package.

5. Add a Foreach Loop Container to BaseBatchETL.dtsx

Drag-and-drop a Foreach Loop Container from the Toolbox into the Control Flow panel.

Drag the Data Flow Task into the Foreach Loop Container.

6. Configure the Foreach Loop Container

Click on the Variables tab and ad the global variable FilePath.

Double-click on the Foreach Loop Container to bring up the Foreach Loop Editor, then click on the Collection node.

Change the contents of the Folder textbox to C:\BatchETL\ and the Files textbox to NewCustomers*.txt, then click on the Variable Mappings node.

Select the Variable User::FilePath and set the Index to zero, then click the OK button to close the Foreach Loop Editor.

7. Configure the SourceConnectionFlatFile connection manager

Click on the SourceConnectionFlatFile connection manager to bring it up in the properties window. Click on the button next to the Expressions parameter...

...to bring up the property expressions editor. Set the Connection string property to the FilePath global variable we created earlier, then click the OK button to return to the main editing window.

At this point the package will process multiple files. We are now going to add a script task to move each data file to backup as it is extracted.

8. Add a script task for moving the data files to backup

Drag-and-drop a script task from the toolbox into the Foreach Loop Container.

Add a precedence constraint from the data flow task to the script task.

Double-click on the script task to bring up the Script Task Editor. Add the read-only variable FilePath and click the OK button.

Click on the Edit Script button to bring up the script task code editor and enter the following Visual Basic code.

Option Strict On
Imports System
Imports System.IO
Imports System.Data
Imports Microsoft.SqlServer.Dts.Runtime
<System.AddIn.AddIn("ScriptMain", Version:="1.0", Publisher:="", Description:="")> _
<System.CLSCompliantAttribute(False)> _
Partial Public Class ScriptMain
    Inherits Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    Enum ScriptResults
        Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success
        Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
    End Enum
    Public Sub Main()
        Dim DestinationDirectory As String = "C:\BatchETL\backup\"
        Dim file_name As String() = Split(Dts.Variables("FilePath").Value.ToString, "\")
        Dim input_file As New FileInfo(Dts.Variables("FilePath").Value.ToString)
        Try
            input_file.MoveTo(DestinationDirectory & file_name(2))
        Catch ex As Exception
        End Try
    End Sub
End Class

After entering the code click the button to save it, then close the window.

9. Execute the finished package

Click on the button to execute the package to process all the files in the c:\BatchETL\ folder.

Open the customers table to confirm that the 30,000 records in the six data files have been inserted.

Open the C:\BatchETL\ folder to confirm that the data files have been removed from the source folder.

Open the C:\BatchETL\backup\ folder to confirm that the data files were moved to the backup folder.

Summary

To configure a data flow task created by the SSIS Import/Export wizard to process mutliple data files:

  1. Add a Foreach Loop Container to the Control Flow window of the SSIS package
  2. Move the data flow task into the Foreach Loop Container
  3. Add a global string variable named FilePath to the package
  4. Configure the Foreach Loop Container Collection-Folder text box to point to the the folder where the data files will reside
  5. Configure the Foreach Loop Container Collection-Files text box to contain the appropriate root name of the data files
  6. Configure the Foreach Loop Container Variable Mappings-Variable to the FilePath variable and give it an index of zero
  7. Configure the ConnectionString property of the SourceConnectionFlatFile connection manager to the FilePath variable
  8. Add a script task to move the data files to a backup folder so that the same files will not be extracted multiple times

Resources

Rate

4.45 (20)

You rated this post out of 5. Change rating

Share

Share

Rate

4.45 (20)

You rated this post out of 5. Change rating