When transferring data from one system to another, there are usually columns with different data types between the source and the destination. In this case of a data warehouse, the source column is an nvarchar and the target is a varchar. The developer of the data warehouse has decided to use a different data type for performance reasons. The problem in SQL Server Integration Services (SSIS) for this case becomes apparent when the mapped source column gets saved to a destination column in the OLE DB Destination component.
In SSIS, once you create an OLE DB Source and add the T-SQL for the data extraction, the data type gets extracted from the source table and saved in the Input/Output column properties. The Data Conversion component of SSIS can be used to translate an nvarchar to a varchar. But let’s look at another option to change the data type. In this process, we get to learn about the input/output properties of source components.
This example will look at the Product tables used in the transaction database AdventureWorks as well as the data warehouse version of the Product dimension table. You can download these databases by going to CodePlex.
The staging package is simple. First, the staging table for the product dimension is truncated. Next, the data is imported into that staging table. Figure 1 shows an Execute SQL Task (that truncates the staging table) and a Data Flow Task (stage the product dimension data).
Figure 1 – Stage Product Dimension SSIS Package
The Data Flow Task in Figure 2 has an OLE DB Source component piping the data to an OLE DB Destination component.
Figure 2 – Inside the Data Flow Task
The T-SQL (Figure 3) is straight forward for the source component. The OLE DB Source component is selecting 7 columns from the Production.Product table in the SQL Statement property rather than connecting directly to the table. Since we only need these 7 columns, packages will perform better if we practice and stick to best practices like using T-SQL. Connecting directly to a table will select all the columns using SELECT *.
Figure 3 – T-SQL to gather Products from Source database.
Once we place the T-SQL into the OLE DB Source component (Figure 4), the data type definitions from the source table are used to define each column in the transformation. In this case, nvarchar from the Production.Product table is used as the data type. Nvarchar data types translate to Non-Unicode type in SSIS for input and output columns.
Figure 4 – OLE DB Source Editor
The staging table in Figure 5 has a varchar data type for ProductName, ProductAlternativeKey, Color and Size. The source is nvarchar, so when we map (Figure 6) these columns in SSIS, we get an error.
CREATE TABLE dbo.stageProducts ( [ProductID] int, [Productname] varchar(50), [ProductAlternateKey] varchar(25), [Color] varchar(15), [ListPrice] money, [Size] varchar(5), [ProductSubcategoryID] int )
Figure 5 – Stage table design
Figure 6 – Mapping columns OLE DB Destination Editor
After clicking OK, the error “Column ProductName cannot convert between Unicode and non-Unicode string data types” appears, as shown in Figure 7.
Figure 7 – Error Message
Now, instead of placing a Data Conversion component between the source and destination components, the T-SQL in the OLEDB Source component will be changed to use a CAST function to convert the nvarchar source columns to varchar. See Figure 8.
SELECT [ProductID], CAST( [Name] AS VARCHAR(50)) AS Productname ,CAST( [ProductNumber] AS VARCHAR(25)) AS ProductAlternateKey ,CAST( [Color] AS VARCHAR(15)) AS Color ,[ListPrice] ,CAST( [Size] AS VARCHAR(5)) AS Size ,[ProductSubcategoryID] FROM [Production].[Product]
Figure 8 – new T-SQL statement with CAST
By placing the CAST function around the nvarchar data type columns with an ‘as varchar (xx)’, the component converts the data in the pipe. Since the data types were extracted from the original T-SQL, the OLE DB Source component’s Advanced Options have to be modified. For some reason, the Input Column property changes for these columns, but the Output property does not. We have to do this manually.
Right-clicking on the Source component (Figure 9) enables a menu to select the Show Advanced Editor…
Figure 9 – Show Advanced Editor…
Once in there, go to the Input and Output Properties and select the Output Columns. The ProductName, ProductAlternateKey, Color and Size have to be changed from Unicode string (DT_WSTR) to string (DT_STR) as seen in Figures 10 and 11.
Figure 10 – Output Columns (ProductName)
Figure 11 – DataType property change to String
After repeating this for all previously nvarchar columns, the error goes away as shown in Figure 12.
Figure 12 – Error has been solved
Even though a Data Conversion component can be used to convert the data types, this method helps keep the logic in the T-SQL. Not only does this method show an alternative to the Data Conversion component, but also gives an opportunity to learn about the Advanced Options of a Source component.