How to remove quote marks using SSIS and a little C#

,

Introduction

If you import flat files and use double quotes as your Flat File Text Qualifiers, then it’s likely that you’ve encountered problems when importing data where speech marks have been entered manually. This is a common occurrence in free text fields, such as Name.

e.g. Smith, James “Jim”

Attempting to import a file with a record like this usually results in the following error message:

The column delimiter for column "ExampleColumn" was not found.

One option is to amend this field at source. But where this option isn’t possible, adding a simple C# script task allows SSIS to read your input file, scan for any double quotes within double quotes and rewrite to the file before it gets loaded.

Set up your tables and test data

In SQL Server, build yourself a simple table, as shown below:

CREATE TABLE [dbo].[SSIS_Quotes_Test](
[Mnemonic] [varchar](50) NULL,
[Name] [varchar](50) NULL,
[Indicator] [varchar](50) NULL,
[Date] [varchar](50) NULL
) ON [PRIMARY]

Create yourself a text file for a test import, with the below contents (or similar). This will be used for a successful import. We’ll deliberately add speech markers later on to force an error.

"1001","Glory Atkins","Y",""

"1002","Ellyn Tingle","Y",""

"1003","Jenny Anger","Y",""

"1004","Suzann Fullbright","Y","26/01/11"

"1005","Rosamond Puzo","Y","12/11/09"

"1006","Donnell Hokanson","Y",""

"1007","Damon Santoyo","Y","11/09/15"

"1008","Latesha Oshea","Y",""

"1009","Eliz Licata","Y",""

"1010","Tomika Mccaul","Y",""

Call the file "Import.txt".

Set Up Your Import

Open Data Tools/SSIS and create a New Package. In the Control Flow view, drag on an Execute SQL Task and a Data Flow Task and connect them up, as shown below

The Execute SQL Task will truncate your destination table, ready to load the data. Open it up, define a connection to your Server and Database and enter the below SQL logic into the SQLStatement field:

TRUNCATE TABLE [YOURDBNAME].[dbo].[SSIS_Quotes_Test]

Set Up Your Destination

Open your Data Flow Task and drag on a Flat File Source and an OLE DB Destination and connect them up, as shown below:

Open your Flat File Source and create a new Flat File Connection Manager. Using Browse, locate the file we created earlier (Import.txt). Ensure your Text qualifier is set to " – this will remove the quotes around your fields. As we have no headers in the file, untick the “Column names in the first data row” checkbox.

Preview the file to ensure it looks OK. If everything is OK, your screen should resemble this image:

Open your OLE DB Destination and select the table we created earlier ([dbo].[SSIS_Quotes_Test]).

Then choose Mappings and map the fields accordingly.

Run your package (F5), which should hopefully complete successfully. Review the table in SQL Server to ensure the data flowed correctly.

Causing the Package to Fail

Occasionally, quotation marks will be entered into fields – especially in Name entries where people have alias’s or preferred names. To test this, open up the Import.txt file and add in the following two preferred names to Suzann Fullbright and Rosamond Puzo (enclosed in quotes) and save. This is shown below:

Run your package (F5) again. You should encounter a failed execution with an error message similar to:

Error: 0xC0202055 at Data Flow Task, Flat File Source [2]: The column delimiter for column "Column 1" was not found.

The Fix

Drag on a Script Task and connect it between your Execute SQL Task and Data Flow Task as shown below:

Open your Script Task and choose Edit Script, which will open a new window allowing you to enter custom script/code. Pay attention to the areas “Namespaces” (at the top) and “ // TODO: Add your code here” (towards the bottom). Expand your Namespaces. By default you are likely using four Namespaces (System, System.Data, Microsoft.SqlServer.Dts.Runtime and System.Windows.Forms). Add in the following (the order shouldn’t matter much):

  • using System.Linq;
  • using System.IO;
  • using System.Text.RegularExpressions;

You should end up with something like this:

Now you’ve added additional Namespaces, you can add the code which actually checks the file for double quotes within quotes. If your file is stored locally on your machine, enter the below in the //TODO: Add your code here area, paste in the following:

var fileContent = File.ReadAllLines("@C:\Users\John Smith\Desktop\Import.txt");
var fileContentUpdated = fileContent.Select(
               x => new Regex(@"(?<!^)(?<!\,)""(?!\,)(?!$)").Replace(x, "'")).ToArray();
File.WriteAllLines("@C:\Users\John Smith\Import.txt", fileContentUpdated);

It should look like this:

If your file is stored elsewhere on a server for example, you can change the above filepaths to the following:

@"\\YOURSERVERNAME\d$\YOURDIRECTORY\YOURFOLDER\Import.txt"

Run your package (F5), which should hopefully complete successfully.

Finally, review the table in SQL Server to ensure the data flowed correctly. Also take a look and see how your Import.txt was modified. You should notice that the double quotes were converted to single quotes.

Hopefully this will help you ensure that string data can be cleanly loaded when it contains quotes.

Rate

4.83 (6)

Share

Share

Rate

4.83 (6)