SQLServerCentral Article

SSIS Programming

,

Overview

Most of us have been using Data Transformation Services (DTS) in the previous versions of SQL Server in order to manipulate or move data. With the introduction of SQL Server 2005, Microsoft has introduced a completely re-written version of DTS with a lot more new features which is now known as Integration Services. This new ETL platform is nothing but a set of utilities, applications, designers, components, and services all wrapped up into one powerful software application suite.

The most attracting feature of SSIS is that the data movement and transformation is separate from the package control flow and management. There are two different engines that handle these tasks. Integration Services Data Flow engine takes care of the data movement and transformation whereas the Integration Services Run-time engine manages the package control flow.

The motivation behind this article is that the SSIS API is not well documented in MSDN though you get an overall idea of what is going on. This article assumes that the reader is aware of the fundamental components of an SSIS Package. To know more about what SSIS is all about, check the MSDN Books Online http://msdn2.microsoft.com/en-us/library/ms141026.aspx

DTS vs. SSIS Programming

There are two fundamental approaches one can have for SSIS programming. One is to use the SSIS Designer to create and run packages. The other approach is to use the API to create, configure and run packages from own applications.

Those who are familiar with the DTS programming can see that there is very little code from DTS remaining in Integration Services. Some of the similarities are listed here.

DTS

SSIS

Control flow and Data flow on the same design surface

Control flow and Data flow have been separated

Datapump (supported only one source, transform, and

destination)

Data Flow Task or Pipeline (can perform extremely complex

and advanced data processing within one pipeline)

Step

Taskhost

(An expanded version of Step) which schedules the task execution

Precedence Constraints

Still

remains with added features

Connections

Connection

Managers

Getting Started

Let us get started with some actual coding in VB.NET, my favorite!

The following table lists the assemblies that are frequently used when programming Integration Services using the .NET Framework (http://msdn2.microsoft.com/en-us/library/ms403344.aspx).

Assembly

Description

Microsoft.SqlServer.ManagedDTS.dll

Contains the managed run-time engine.

Microsoft.SqlServer.RuntimeWrapper.dll

Contains the primary interop assembly (PIA), or wrapper,

for the native run-time engine.

Microsoft.SqlServer.PipelineHost.dll

Contains the managed data flow engine.

Microsoft.SqlServer.PipelineWrapper.dll

Contains the primary interop assembly (PIA), or wrapper,

for the native data flow engine.

Consider the example of transferring data from a flat file to a database table in SQL Server 2005. Create the destination table in SQL Server prior to running the package. This can be done using a simple T-SQL statement.

The programmatic approach would include the following steps:

  1. Create an SSIS Package

http://msdn2.microsoft.com/en-us/library/ms135946.aspx

  1. Create Control flow, which would be as simple as adding a

    new Data Flow task in our case.

http://msdn2.microsoft.com/en-us/library/ms135997.aspx

  1. Add two connection managers. We need a Flat file connection manager for the source and an OleDb or SQL Server connection manager for the destination.

Creating OleDb connection manager is not hard.

Dim oleDbConn As ConnectionManager = myPackage.Connections.Add("OLEDB")
oleDbConn.Name = "MyOLEDBConnection"
oleDbConn.ConnectionString = "Provider=SQLOLEDB.1; Integrated Security=SSPI;"
& "Initial Catalog=<db>; Data Source=" & SS_INSTANCE & ";"

The following code shows creating a flat file connection manager and setting up the formatting information for the same.

cnFile = myPackage.Connections.Add("FLATFILE")
cnFile.Properties("Name").SetValue(cnFile, "MyFlatFileConnection")
cnFile.Properties("ConnectionString").SetValue(cnFile, <srcFilePath>)
cnFile.Properties("Format").SetValue(cnFile, "Delimited")
cnFile.Properties("Unicode").SetValue(cnFile,False)
cnFile.Properties("ColumnNamesInFirstDataRow").SetValue(cnFile, True)
cnFile.Properties("DataRowsToSkip").SetValue(cnFile,0)
cnFile.Properties("RowDelimiter").SetValue(cnFile,vbCrLf)
cnFile.Properties("TextQualifier").SetValue(cnFile,"""")
cnFile.Properties("HeaderRowsToSkip").SetValue(cnFile, 0)
cnFile.Properties("HeaderRowDelimiter").SetValue(cnFile,vbCrLf)
cnFile.Properties("CodePage").SetValue(cnFile,1252)
Dim MyFlatFilecn As SSISRuntime.IDTSConnectionManagerFlatFile90 = Nothing
MyFlatFilecn = TryCast(cnFile.InnerObject,SSISRuntime.IDTSConnectionManagerFlatFile90)
 DtsConvert.ToConnectionManager90(cnFile)
 Dim name As SSISRuntime.IDTSName90
 Dim col As SSISRuntime.IDTSConnectionManagerFlatFileColumn90
 cnFile.AcquireConnection(Nothing)
 ' Add columns to the FlatFileConnectionManager
 Dim headers As String()
      'Get the header row from flat file. This can be stored in a database table as well.
      'Assuming that "headers" Array contains the headers
 Dim i As Integer = 0
 While i < headers.Length
 col = MyFlatFilecn.Columns.Add()
 If i = headers.Length - 1 Then
 col.ColumnDelimiter = vbCrLf
 Else
 col.ColumnDelimiter = ","
 End If
 col.ColumnType = "Delimited"
 name = TryCast(col, SSISRuntime.IDTSName90)
 name.Name = headers(i)
 'get the converSion information from the destination table
          'this should assign the appropriate datatype, precision, scale and length to col
 getConversionInfo(col, name.Name) 'implement this
 i = i + 1
 End While
 cnFile.ReleaseConnection(Nothing)
  1. Create Data flow which does the actual job of copying the data from flat file to table. This would involve creating a Flat file source and an OleDb destination. This is fairly standard. We give it for the sake of completeness.

Note: One can add data transformation components here if required.

      DFSource = DTP.ComponentMetaDataCollection.New()
 DFSource.ComponentClassID = "DTSAdapter.FlatFileSource"
 DFSource.Name = "FlatFileSource"
 ' Get the design time instance of the component.
 SourceInst = DFSource.Instantiate()
 ' Initialize the component.
 SourceInst.ProvideComponentProperties()
 ' Specify the connection manager.
 If DFSource.RuntimeConnectionCollection.Count > 0 Then
         DFSource.RuntimeConnectionCollection(0).ConnectionManagerID = 
   myPackage.Connections("MyFlatFileConnection").ID
         DFSource.RuntimeConnectionCollection(0).ConnectionManager = 
           DtsConvert.ToConnectionManager90(myPackage.Connections("MyFlatFileConnection"))
 End If
 ' Reinitialize the metadata.
 SourceInst.AcquireConnections(Nothing)
 SourceInst.ReinitializeMetaData()
      Dim exOutColumn As IDTSExternalMetadataColumn90
 For Each outColumn As IDTSOutputColumn90 In DFSource.OutputCollection(0).OutputColumnCollection
 exOutColumn = DFSource.OutputCollection(0).ExternalMetadataColumnCollection(outColumn.Name)
 SourceInst.MapOutputColumn(DFSource.OutputCollection(0).ID, outColumn.ID, exOutColumn.ID, True)
 Next
SourceInst.ReleaseConnections()

Similarly add a destination.

  1. Connect Data flow components

http://msdn2.microsoft.com/en-us/library/ms136086.aspx

 

  1. Map source and destination columns
 Dim input As IDTSInput90 = DFDestination.InputCollection(0)
 Dim destinationInputID As Integer = CInt(input.ID)
 Dim vInput As IDTSVirtualInput90 = input.GetVirtualInput()
 For Each vColumn As IDTSVirtualInputColumn90 In vInput.VirtualInputColumnCollection
 ' This will create an input column on the component.
 DestInst.SetUsageType(destinationInputID, vInput, vColumn.LineageID, DTSUsageType.UT_READONLY)
 ' Get input column.
 Dim inputColumn As IDTSInputColumn90 = input.InputColumnCollection.GetInputColumnByLineageID(vColumn.LineageID)
 ' Getting the corresponding external column.
 ' Ex : We will use the column name as the basis for matching data flow columns to external columns.
 Dim externalColumn As IDTSExternalMetadataColumn90 = input.ExternalMetadataColumnCollection(vColumn.Name)
 ' Tell the component how to map.
 DestInst.MapInputColumn(destinationInputID, inputColumn.ID, externalColumn.ID)
Next
  1. Validate and execute the package
  Dim pkgStatus As DTSExecResult = myPackage.Validate(Nothing, Nothing, Nothing, Nothing)
  If pkgStatus = DTSExecResult.Success Then
     Dim pkgResult As DTSExecResult = myPackage.Execute()
  End If

Notes

1: For debugging purposes it is a good idea to save the SSIS

package created programmatically which can then be exported to the Designer.

This would allow us to check whether the properties of the connection managers

etc are set properly. See the link for different options.

http://msdn2.microsoft.com/en-us/library/ms403347.aspx

2: To get a list of data types that SSIS uses, check

http://msdn2.microsoft.com/en-us/library/ms141036.aspx

3: In case a subset of columns from the flat file needs to be transferred to the destination table, then one can identify such columns in Step c, and then delete them from Output Collection of the source in Step d before creating the external columns

Download the code

Conclusion

SQL Server Integration Services, the new ETL platform in SQL Server 2005, is the successor to DTS, which was there in previous versions of SQL Server. SSIS with all its new features is a huge topic in itself and we tried to look at some of the programming aspects of it. We have looked at how we can programmatically create a package to transfer data from a Flat file source to a database table. Interested readers can look at the MSDN for more information on SSIS. A couple of books that I have come across are also worth reading

  1. Microsoft SQL Server 2005 Integration Services by Kirk Haselden
  2. Professional SQL Server 2005 Integration Services by Brian Knight et al
  3. Microsoft SQL Server 2005: A Developers Guide by Michael Otey et al.

Rate

4.67 (3)

You rated this post out of 5. Change rating

Share

Share

Rate

4.67 (3)

You rated this post out of 5. Change rating