SQLServerCentral Article

Dynamically create and populate staging tables from CSV files

,

The following SSIS package dynamically creates staging tables from comma-separated value (CSV) files in a subdirectory using the field names contained in the first line of each file, then populates the newly created tables with the data contained in the remaining lines of each file. Each staging table is given the same name as the corresponding CSV file.

Begin by creating an Integration Services project named 'DynamicStagingTables.'

Drag and drop a Script Task component from the Toolbox to the Control Flow panel of the default SSIS package.

Execute the following SQL statements in SQL Server Management Studio to create a database named 'DynamicStagingTables.'

USE master
GO
BEGIN TRY
DROP DATABASE DynamicStagingTables
END TRY
BEGIN CATCH
END CATCH
CREATE DATABASE DynamicStagingTables

'Command(s) completed successfully' will be displayed if the database has been created.

Right-click the 'ConnectionManagers' panel and select 'New ADO.NET Connection...' from the menu list.

Click the 'New' button of the 'Configure ADO.NET Connection Manager' screen.

Configure the appropriate server name and database entries, then click the 'OK' button.

Select the newly created 'DynamicStagingTables' connection manager and click the 'OK' button.

The 'DynamicStagingTables' connection manager is now available in the "Connection Managers' panel.

Double-click on the 'Script Task' component to bring up the 'Script Task Editor.'

Select 'Microsoft Visual Basic 2008' from the 'ScriptLanguage' menu list and click the 'Edit Script' button.

The empty VB.NET script template looks like this.

Add this code to the 'Imports' section of the script.

        Imports System.IO

Replace the commented lines around the text 'Add your code here with the following VB.NET code.

        Dim source_directory As String = "C:\Temp\"
        Dim source_file As String = "*.csv"
        Dim di As New System.IO.DirectoryInfo(source_directory)
        Dim aryFi As IO.FileInfo() = di.GetFiles(source_file)
        Dim fi As IO.FileInfo
        Dim message As String = "The following files matching the pattern '" & source_file & "' were found in '" & source_directory & ".'" & Chr(13) & Chr(10) & Chr(13) & Chr(10)
        For Each fi In aryFi
            message += di.ToString & fi.ToString & Chr(13) & Chr(10)
        Next
        Dim button1 As DialogResult = MessageBox.Show(message, "Matching Files in Designated Subdirectory", MessageBoxButtons.OK)
        For Each fi In aryFi
            Dim sLine As String
            Dim arrText As New ArrayList()
            Dim objReader As New StreamReader(di.ToString & fi.ToString)
            Do
                sLine = objReader.ReadLine()
                If Not sLine Is Nothing Then
                    arrText.Add(sLine)
                End If
            Loop Until sLine Is Nothing
            objReader.Close()
            Dim FirstLine As String = ""
            For Each sLine In arrText
                If FirstLine = "" Then
                    FirstLine = sLine
                End If
            Next
            Dim FieldNames() As String = Split(FirstLine, ",")
            Dim FieldList As String = "These are the parsed fields in the first line of the '" & fi.ToString & "' file." & Chr(13) & Chr(10) & Chr(13) & Chr(10)
            Dim count As Integer = Split(FirstLine, ",").Length
            Dim mySqlStatement As String = "BEGIN TRY DROP TABLE " & fi.ToString.Substring(0, fi.ToString.Length - 4) & " END TRY BEGIN CATCH END CATCH CREATE TABLE " & fi.ToString.Substring(0, fi.ToString.Length - 4) & " ("
            Dim comma As String = ""
            For index As Integer = 0 To count - 1
                mySqlStatement += comma & FieldNames(index) & " VARCHAR(20)"
                comma = ","
                FieldList += FieldNames(index) & Chr(13) & Chr(10)
            Next
            mySqlStatement += ")"
            Dim button2 As DialogResult = MessageBox.Show(FieldList, "Field Names", MessageBoxButtons.OK)
            Dim msg As String = "This SQL statement will be used to create the '" & fi.ToString.Substring(0, fi.ToString.Length - 4) & "' table." & Chr(13) & Chr(10) & Chr(13) & Chr(10)
            Dim button3 As DialogResult = MessageBox.Show(msg & mySqlStatement, "DDL Statement", MessageBoxButtons.OK)
            Dim myADONETConnection As SqlClient.SqlConnection = DirectCast(Dts.Connections("XW4100-9.DynamicStagingTables").AcquireConnection(Dts.Transaction), SqlClient.SqlConnection)
            Dim myCommand As New SqlClient.SqlCommand(mySqlStatement, myADONETConnection)
            myCommand.ExecuteNonQuery()
            myADONETConnection.Close()
            FirstLine = ""
            Dim line_count As Integer = 1
            For Each sLine In arrText
                If FirstLine = "" Then
                    FirstLine = sLine
                Else
                    line_count += 1
                    Dim fields() As String = Split(sLine, ",")
                    Dim ValueList As String = "These are the parsed values in line " & line_count & " of the '" & fi.ToString & "' file." & Chr(13) & Chr(10) & Chr(13) & Chr(10)
                    Dim FieldCount As Integer = Split(sLine, ",").Length
                    mySqlStatement = "INSERT INTO " & fi.ToString.Substring(0, fi.ToString.Length - 4) & " VALUES ("
                    comma = ""
                    For FieldIndex As Integer = 0 To FieldCount - 1
                        mySqlStatement += comma & "'" & fields(FieldIndex) & "'"
                        ValueList += fields(FieldIndex) & Chr(13) & Chr(10)
                        comma = ","
                    Next
                    mySqlStatement += ")"
                    Dim button4 As DialogResult = MessageBox.Show(ValueList, "Field Values", MessageBoxButtons.OK)
                    msg = "This SQL statement will be used to insert record " & line_count - 1 & " into the '" & fi.ToString.Substring(0, fi.ToString.Length - 4) & "' table." & Chr(13) & Chr(10) & Chr(13) & Chr(10)
                    Dim button5 As DialogResult = MessageBox.Show(msg & mySqlStatement, "DML Statement", MessageBoxButtons.OK)
                    myADONETConnection = DirectCast(Dts.Connections("XW4100-9.DynamicStagingTables").AcquireConnection(Dts.Transaction), SqlClient.SqlConnection)
                    myCommand = New SqlClient.SqlCommand(mySqlStatement, myADONETConnection)
                    myCommand.ExecuteNonQuery()
                    myADONETConnection.Close()
                End If
            Next
            Dim button6 As DialogResult = MessageBox.Show("The file '" & di.ToString & fi.ToString & "' will be moved to '" & di.ToString & "Backup\" & fi.ToString & ".'", "'Move Data File' Command", MessageBoxButtons.OK)
            Dim file As New System.IO.FileInfo(di.ToString & fi.ToString)
            Try
                file.MoveTo(di.ToString & "Backup\" & fi.ToString)
            Catch ex As Exception
                Dim temp As New System.IO.FileInfo(di.ToString & "Backup\" & fi.ToString)
                temp.Delete()
                file.MoveTo(di.ToString & "Backup\" & fi.ToString)
            End Try
        Next

Create a subdirectory named 'C:\Temp\' and copy the three text files from the attached 'csv_files.zip' file into it, then create the folder 'Backup' inside 'C:\Temp\.'

Each CSV file contains four comma-separated value lines. The first line contains the field names, the remaining three lines contain the values to be mapped to those fields in each line.

This is the first code executed when you run the SSIS package.

        Dim source_directory As String = "C:\Temp\"
        Dim source_file As String = "*.csv"
        Dim di As New System.IO.DirectoryInfo(source_directory)
        Dim aryFi As IO.FileInfo() = di.GetFiles(source_file)
        Dim fi As IO.FileInfo
        Dim message As String = "The following files matching the pattern '" & source_file & "' were found in '" & source_directory & ".'" & Chr(13) & Chr(10) & Chr(13) & Chr(10)
        For Each fi In aryFi
            message += di.ToString & fi.ToString & Chr(13) & Chr(10)
        Next
        Dim button1 As DialogResult = MessageBox.Show(message, "Matching Files in Designated Subdirectory", MessageBoxButtons.OK)

It reads the names of the all the files with a '.csv' extension in the 'C:\Temp\' subdirectory into an array list and displays them in a popup.

When you click the 'OK' button, this code is executed.

        For Each fi In aryFi
            Dim sLine As String
            Dim arrText As New ArrayList()
            Dim objReader As New StreamReader(di.ToString & fi.ToString)
            Do
                sLine = objReader.ReadLine()
                If Not sLine Is Nothing Then
                    arrText.Add(sLine)
                End If
            Loop Until sLine Is Nothing
            objReader.Close()
            Dim FirstLine As String = ""
            For Each sLine In arrText
                If FirstLine = "" Then
                    FirstLine = sLine
                End If
            Next
            Dim FieldNames() As String = Split(FirstLine, ",")
            Dim FieldList As String = "These are the parsed fields in the first line of the '" & fi.ToString & "' file." & Chr(13) & Chr(10) & Chr(13) & Chr(10)
            Dim count As Integer = Split(FirstLine, ",").Length
            Dim mySqlStatement As String = "BEGIN TRY DROP TABLE " & fi.ToString.Substring(0, fi.ToString.Length - 4) & " END TRY BEGIN CATCH END CATCH CREATE TABLE " & fi.ToString.Substring(0, fi.ToString.Length - 4) & " ("
            Dim comma As String = ""
            For index As Integer = 0 To count - 1
                mySqlStatement += comma & FieldNames(index) & " VARCHAR(20)"
                comma = ","
                FieldList += FieldNames(index) & Chr(13) & Chr(10)
            Next
            mySqlStatement += ")"
            Dim button2 As DialogResult = MessageBox.Show(FieldList, "Field Names", MessageBoxButtons.OK)

It loops through the array list containing the CSV file names and reads all the lines of the first CSV file into another array list. It then loops through the new array list to read the first line. It parses this line into field names and displays them in a popup.

When you click the 'OK' button the following cod is executed...

            Dim msg As String = "This SQL statement will be used to create the '" & fi.ToString.Substring(0, fi.ToString.Length - 4) & "' table." & Chr(13) & Chr(10) & Chr(13) & Chr(10)
            Dim button3 As DialogResult = MessageBox.Show(msg & mySqlStatement, "DDL Statement", MessageBoxButtons.OK)

...which displays the SQL statement that will be used to create the first table from the fields in the first line of the first CSV file.

When you click the 'OK' button of the popup, the DDL statement is executed from the following code, creating the 'people' table.

    Dim myADONETConnection As SqlClient.SqlConnection = DirectCast(Dts.Connections("XW4100-9.DynamicStagingTables").AcquireConnection(Dts.Transaction), SqlClient.SqlConnection)
    Dim myCommand As New SqlClient.SqlCommand(mySqlStatement, myADONETConnection)
    myCommand.ExecuteNonQuery()
    myADONETConnection.Close()

At this point the 'people' table has been created and is ready to accept records.

The following code loops through the csv file array list once more, skipping the first line and parsing the second line for data to map to the table fields.

            FirstLine = ""
            Dim line_count As Integer = 1
            For Each sLine In arrText
                If FirstLine = "" Then
                    FirstLine = sLine
                Else
                    line_count += 1
                    Dim fields() As String = Split(sLine, ",")
                    Dim ValueList As String = "These are the parsed values in line " & line_count & " of the '" & fi.ToString & "' file." & Chr(13) & Chr(10) & Chr(13) & Chr(10)
                    Dim FieldCount As Integer = Split(sLine, ",").Length
                    mySqlStatement = "INSERT INTO " & fi.ToString.Substring(0, fi.ToString.Length - 4) & " VALUES ("
                    comma = ""
                    For FieldIndex As Integer = 0 To FieldCount - 1
                        mySqlStatement += comma & "'" & fields(FieldIndex) & "'"
                        ValueList += fields(FieldIndex) & Chr(13) & Chr(10)
                        comma = ","
                    Next
                    mySqlStatement += ")"
                    Dim button4 As DialogResult = MessageBox.Show(ValueList, "Field Values", MessageBoxButtons.OK)            

The 'button4' popup displays the field values contained by the second line of the first CSV file.

When you click the 'OK' button the following code is executed...

                    msg = "This SQL statement will be used to insert record " & line_count - 1 & " into the '" & fi.ToString.Substring(0, fi.ToString.Length - 4) & "' table." & Chr(13) & Chr(10) & Chr(13) & Chr(10)
                    Dim button5 As DialogResult = MessageBox.Show(msg & mySqlStatement, "DML Statement", MessageBoxButtons.OK)

...which displays the SQL statement that will insert the first record into the new staging table.

When you click the 'OK' button, the following code is executed to insert the record into the table.

                    myADONETConnection = DirectCast(Dts.Connections("XW4100-9.DynamicStagingTables").AcquireConnection(Dts.Transaction), SqlClient.SqlConnection)
                    myCommand = New SqlClient.SqlCommand(mySqlStatement, myADONETConnection)
                    myCommand.ExecuteNonQuery()
                    myADONETConnection.Close()
                End If

After all three records have been inserted into the first staging table, the following code is executed...

            Dim button6 As DialogResult = MessageBox.Show("The file '" & di.ToString & fi.ToString & "' will be moved to '" & di.ToString & "Backup\" & fi.ToString & ".'", "'Move Data File' Command", MessageBoxButtons.OK)
            Dim file As New System.IO.FileInfo(di.ToString & fi.ToString)
            Try
                file.MoveTo(di.ToString & "Backup\" & fi.ToString)
            Catch ex As Exception
                Dim temp As New System.IO.FileInfo(di.ToString & "Backup\" & fi.ToString)
                temp.Delete()
                file.MoveTo(di.ToString & "Backup\" & fi.ToString)
            End Try

...which displays the following popup before moving the CSV file to the 'Backup' subdirectory.

Keep clicking 'OK' buttons until all three tables contain three records, then execute the following query in SQL Server Management Studio.

USE DynamicStagingTables
GO
SELECT * FROM [DynamicStagingTables].[dbo].[people]
SELECT * FROM [DynamicStagingTables].[dbo].[products]
SELECT * FROM [DynamicStagingTables].[dbo].[states]

This confirms that staging tables have been created and populated from the original CSV files.

Conclusion

The purpose of a staging table is to get data from a file into a table with as little fuss as possible, where it can be processed further using SQL. SSIS Data Flow Tasks are orders-of-magnitude faster than any script task written in sequential code. The method described isn't practical for large data files, but could work satisfactorily for smaller (under 10,000 lines) data files.

Resources

Rate

3.45 (31)

You rated this post out of 5. Change rating

Share

Share

Rate

3.45 (31)

You rated this post out of 5. Change rating