SQLServerCentral Article

Parse Data from a Field Containing Multiple Values using CROSS APPLY

,

There are two kinds of ASCII files commonly used for data transfer:

  1. Flat - a plain text file where each field value is the same width and padded with spaces.
  2. Delimited - a plain text file where each field value is separated by specific delimiter characters.

By definition, a flat file can only have one value per field per record. A delimited file, however, can have one or more fields that are further delimited by other delimiter characters. The most common type of delimited file is the CSV (comma-separated values) file, where the values in each record are separated by commas. A CSV file could also have one or more of its fields delimited by another delimiter, such as the pipe (|) character, as in the following example:

ProductId,ProductName,PartsList
1,Gizmo,463|914|771|281|418
2,Doohickey,422|453
3,Gadget,323|724|449|882|591|715

Each record above has a single ProductId value and ProductName value in those fields, but a variable number of PartNumber values in the PartsList field that are separated by the pipe character. The challenge is to map the multiple PartNumber values from a single record in the file into multiple records related by ProductId in a destination table.

In this article we will demonstrate a SQL Server user-defined function that resolves the values in a multi-value field into their individual values, then use that functon to create an SSIS package that reads a CSV file containing a multi-value field into a staging table and parses the multi-value field into a second, related table using the CROSS APPLY statement.

1. Deploy the SQL Server "ParseIndividualValue" user-defined function

Open Microsoft SQL Server Management Studio and paste the following code...
USE [TestDB]
GO
BEGIN TRY
DROP FUNCTION [dbo].[ParseIndividualValue]
END TRY
BEGIN CATCH
END CATCH
GO
 CREATE FUNCTION [dbo].[ParseIndividualValue](@String varchar(8000), @Delimiter char(1))        
 returns @temptable TABLE (items varchar(8000))        
 as        
 begin        
     declare @idx int        
     declare @slice varchar(8000)        
        
     select @idx = 1        
         if len(@String)<1 or @String is null  return        
        
     while @idx!= 0        
     begin        
         set @idx = charindex(@Delimiter,@String)        
         if @idx!=0        
             set @slice = left(@String,@idx - 1)        
         else        
             set @slice = @String        
            
         if(len(@slice)>0)   
             insert into @temptable(Items) values(@slice)        
   
         set @String = right(@String,len(@String) - @idx)        
         if len(@String) = 0 break        
     end    
 return        
 end
GO
...into a query editor window and execute it to deploy the ParseIndividualValue user-defined function.

You can browse to the Table-valued Functions node of the database to confirm successful deployment of the function.

2. Test the "ParseIndividualValue" function

Paste the following code into a SQL Server Management Studio query editor window...
USE TestDB
GO
BEGIN TRY
DROP TABLE dbo.StagingTable
END TRY
BEGIN CATCH
END CATCH
GO
CREATE TABLE [dbo].[StagingTable](
[ProductId] [varchar](50) NOT NULL,
[ProductName] [varchar](50) NOT NULL,
[PartsList] [varchar](1000) NOT NULL
) 
GO
INSERT INTO dbo.StagingTable
(ProductId,ProductName,PartsList)
VALUES
(1,'Gizmo','463|914|771|281|418'),
(2,'Doohickey','422|453'),
(3,'Gadget','323|724|449|882|591|715')
SELECT * FROM dbo.StagingTable
...and execute it to create a table named StagingTable containing the data from the prevous example file.

Now paste the following code into a query editor window and execute it...
BEGIN TRY
DROP TABLE dbo.Products
END TRY
BEGIN CATCH
END CATCH
GO
CREATE TABLE [dbo].[Products](
ProductId [int] NOT NULL,
[PartNumber] [int] NOT NULL
) 
GO
DECLARE @Table Table (ColumnA VarChar(100), ColumnB VarChar(1000))
INSERT INTO @Table
SELECT ProductId,PartsList FROM dbo.StagingTable
INSERT INTO dbo.Products (ProductId,PartNumber)
SELECT T.ColumnA, a.* FROM  @Table T
CROSS APPLY [dbo].[ParseIndividualValue](t.ColumnB,'|') a
ORDER BY CAST(T.ColumnA AS INT)
GO
SELECT * FROM dbo.Products ORDER BY ProductId,PartNumber
...to create a table named Products and parse the PartNumber values from the StagingTable.PartsList field into it using the ParseIndividualValue function and a CROSS APPLY query.

The screen shot shows that the PartNumer values have been extracted from the multiple-value field into individual records.

3. Create an SSIS package that parses a multi-value field

Paste the following code...

USE TestDB
GO
TRUNCATE TABLE dbo.StagingTable
TRUNCATE TABLE dbo.Products
SELECT * FROM dbo.StagingTable
SELECT * FROM dbo.Products

...into a query panel and execute it to truncate the StagingTable and Products tables.

Paste the following CSV text....

ProductId,ProductName,PartsList
1,Gizmo,463|914|771|261|418
2,Doohickey,422|453
3,Gadget,323|724|449|882|591|715

...into a text editor and save it as input_file.csv...

...to a local folder.

Open a project in Business Intelligence Development Studio and start the SSIS Import and Export Wizard.

Select the Flat File Source data source and browse to input_file.csv, then click the Next button.

Click the Next button again.

Enter the server and and database names, then click the Next button.

Click the Next button again.

Click the Finish button.

Click the Close button.

Run the new SSIS package by selecting Debug-Start Debugging from the main menu.

Wait for the SSIS package to complete execution.

Paste the following code...

USE TestDB
GO
SELECT * FROM dbo.StagingTable

...into a query panel and execute it to confirm that the data was successfully inserted into the staging table.

Run the previous code again to truncate the StagingTable and Product tables.

Add an Execute SQLTask component to the SSIS package.

Add a precedent constraint from the Data Flow Task component to the Execute SQL Task component.

Right-click the Execute SQL Task component and select the Edit menu item.

Copy the following code into the Windows clipboard.

DECLARE @Table Table (ColumnA VarChar(100), ColumnB VarChar(1000))
INSERT INTO @Table
SELECT ProductId,PartsList FROM dbo.StagingTable
INSERT INTO dbo.Products (ProductId,PartNumber)
SELECT T.ColumnA, a.* FROM  @Table T
CROSS APPLY [dbo].[ParseIndividualValue](t.ColumnB,'|') a
ORDER BY CAST(T.ColumnA AS INT)
GO

Click on the "More Options" elipsis menu item at the end of the "SQL Statement" line.

Paste the contents of the clipboard into the "Enter SQL Query" text box.

Click the "OK" button.

Execute the SSIS package.

Wait for the SSIS package to complete execution.

Paste the following code into a SQL Server Managment Studio quey panel...

USE TestDB
GO
SELECT * FROM dbo.StagingTable
SELECT * FROM dbo.Products ORDER BY ProductId,PartNumber

...and execute it demonstrate that the PartsList field ha been correctly parsed and inserted into the Products table.

Rate

3.27 (15)

You rated this post out of 5. Change rating

Share

Share

Rate

3.27 (15)

You rated this post out of 5. Change rating