Home Forums SQL Server 2005 Business Intelligence Importing Unstructured Flat File Data with Header, Detail and Trailer RE: Importing Unstructured Flat File Data with Header, Detail and Trailer

  • I hope that addresses all the requirements. A package like this takes about 30 minutes to develop for a simple flat file and I only bother the database engine when I'm ready to store data. I can also process rows wider than 8000 bytes with ease and do not have to worry about parallelism messing with the order of the rows being imported into the table as they are ordered in the file. I can also do anything else to this data prior to involving the database engine including running it through a third-party tool to standardize it (think address and phone number) or enrich it per any other external data or service (think xml file, remote DB instance or web service).

    Will the split end up with an extra column at the end because of the trailing delimiter in the data?

    I went to produce the package I outlined using the sample file and realized I missed something very important in the original post...the meaning behind the "unstructured file" comment in the OP. There are missing delimiters and SSIS does not take kindly to flat files that are not actually flat. So, we must standardize the file before the SSIS Flat File Connector will parse it. My pseudocode has changed to the following:

    Control Flow

    -- Data Flow Task 1

    -- -- Flat File Source reads input file putting entire line into a single column

    -- -- Conditional Split Transformation separates the Header, Data and Trailer lines, piping Data lines to a Flat File Destination (intermediate file #1) and capturing the relevant Header and Trailer values in variables (using a Script Component, not sure if this can be done with a native component but its a simple bit of code in SSIS 2005 and is reduced to 1 line in SSIS 2008).

    -- -- Row Count Transformation captures the number of rows sent to the Data Line Output of the Conditional Split Transformation and stores the value in a variable

    -- Data Flow Task 2 (sequential after Data Flow Task 1 succeeds AND the Row Count Variable matches the Trailer Record count-variable validating the number of rows matches what the Trailer said)

    -- -- Flat File Source reads intermediate file #1 putting entire line into a single column

    -- -- Derived Column Transformation appends delimiters as needed to standardize the file

    -- -- Flat File Destination writes intermediate file #2

    -- Execute SQL Task Clears Staging Table (optional depending on how downstream processes function, they may clear the table for us, added for ease of unit testing)

    -- Data Flow Task 3

    -- -- Flat File Source reads intermediate file #2 and parses as pipe-delimited into columns

    -- -- Derived Column Transformation adds column to each row with relevant values from Header and Trailer rows previously stored in variables

    -- -- OLE DB Destination stores parsed data to database table

    Jeff Moden (6/2/2012)


    Wow.... you made it sound so easy with your pseudo code that I finally decided to give the SSIS tutorial a try. I sure hope that SSIS is better than than bloody tutorial because, right now, I'm pretty well put out. I'm doing the 2005 version because that's what I have at work. The first step is to open a file the give you. Now, when I install SQL Server, I tell it to install EVERYTHING including AdventureWorks and all sample files. The sample file they tell me to look for isn't anywhere on my hard disks.

    Sometimes it can seem that way. We may breeze through the concepts familiar to us however when those are new to someone they can take hours to get working, and days or longer to fully internalize. At least that's how it worked for me when I first read your Tally Table article 🙂

    I also see why you only posted pseudo code. There's a whole lot of steps to doing what you did.

    Nah, I didn't post code along with it because I was on my tablet yesterday and wanted to provide something to continue our conversation.

    It is your lucky day rka 😉

    Jeff, I have attached a sample package that does what I outlined in the pseudocode above. It is done in SSIS 2005, pointing to a SQL 2005 database. The test file and a SQL file to create the staging table are included. Set the value of the FlatFileDirectory variable to point to the location on your machine where the Data Files will reside. I tucked them into the TestFiles directory under the project folder for easy bundling.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato