Copy data from flat file with multiple headers

  • John Mitchell-245523 - Wednesday, May 10, 2017 3:59 AM

    It's so simple that you don't even need to use SSIS.  You can use bcp or BULK INSERT (or SSIS if you insist, or if it's part of a larger set of tasks) to copy everything from the file into a table with the same number of columns as the file has.  This assumes that the column headers match the columns themselves one-to-one, right?  Then delete all header rows (something like WHERE Col1 = 'Header1' AND Col2 = 'Header2'...).  Finally, transfer to the destination table, performing any necessary conversions along the way (date formats, decimals etc).

    John

    Hi, Thanks for the info.
    The thing is I need to use the SSIS package as the CSV file is put in a folder everyday to be picked up by the SSIS job. As part of the existing requirement, I already use the transformation component to convert the datatypes.
    My question was which SSIS component I need to use within the Data Flow task and where to get the desired result of removing the additional header rows. Please find the SSIS screenshot attached. Thanks.

  • Change your OLE DB Destination from what is now the final destination table to a staging table.  Remove the Data Conversion transformation, so that you're copying direct from the file to the staging table.  Then add an Execute SQL task immediately after the data flow to remove the headers and push the data (with transformations) into the final destination table.

    John

  • John Mitchell-245523 - Wednesday, May 10, 2017 4:18 AM

    Change your OLE DB Destination from what is now the final destination table to a staging table.  Remove the Data Conversion transformation, so that you're copying direct from the file to the staging table.  Then add an Execute SQL task immediately after the data flow to remove the headers and push the data (with transformations) into the final destination table.

    John

    Thanks again but its easier said than done.

    Its not possible to use Execute SQL Task within a Data Flow task and then again Data Conversion component cannot be used outside the Data Flow Task so don't know how to go about this.

  • pwalter83 - Wednesday, May 10, 2017 5:03 AM

    Its not possible to use Execute SQL Task within a Data Flow task


    Indeed it isn't.  That's why I said immediately after.

    Data Conversion component cannot be used outside the Data Flow Task 


    That's right. Just use good old-fashioned SQL.  Jeff and I have both provided an example of how you might remove the header rows.  If you post a fragment of your file, I'll show you how to do the transformations and the load into the destination table.

    John

  • John Mitchell-245523 - Wednesday, May 10, 2017 5:15 AM

    pwalter83 - Wednesday, May 10, 2017 5:03 AM

    Its not possible to use Execute SQL Task within a Data Flow task


    Indeed it isn't.  That's why I said immediately after.

    Data Conversion component cannot be used outside the Data Flow Task 


    That's right. Just use good old-fashioned SQL.  Jeff and I have both provided an example of how you might remove the header rows.  If you post a fragment of your file, I'll show you how to do the transformations and the load into the destination table.

    John

    Hi thanks.
    Do you know how I can attach a copy of the SSIS (dtsx) package with the message ?
    Thanks.

  • pwalter83 - Wednesday, May 10, 2017 6:13 AM

    John Mitchell-245523 - Wednesday, May 10, 2017 5:15 AM

    pwalter83 - Wednesday, May 10, 2017 5:03 AM

    Its not possible to use Execute SQL Task within a Data Flow task


    Indeed it isn't.  That's why I said immediately after.

    Data Conversion component cannot be used outside the Data Flow Task 


    That's right. Just use good old-fashioned SQL.  Jeff and I have both provided an example of how you might remove the header rows.  If you post a fragment of your file, I'll show you how to do the transformations and the load into the destination table.

    John

    Hi thanks.
    Do you know how I can attach a copy of the SSIS (dtsx) package with the message ?
    Thanks.

    You'd have to zip it up first, or rename it with a permitted file extension.  But I'm not sure many people would open a dtsx file.  I don't think we need it.  It's the text file that's important.

    John

  • John Mitchell-245523 - Wednesday, May 10, 2017 7:07 AM

    pwalter83 - Wednesday, May 10, 2017 6:13 AM

    John Mitchell-245523 - Wednesday, May 10, 2017 5:15 AM

    pwalter83 - Wednesday, May 10, 2017 5:03 AM

    Its not possible to use Execute SQL Task within a Data Flow task


    Indeed it isn't.  That's why I said immediately after.

    Data Conversion component cannot be used outside the Data Flow Task 


    That's right. Just use good old-fashioned SQL.  Jeff and I have both provided an example of how you might remove the header rows.  If you post a fragment of your file, I'll show you how to do the transformations and the load into the destination table.

    John

    Hi thanks.
    Do you know how I can attach a copy of the SSIS (dtsx) package with the message ?
    Thanks.

    You'd have to zip it up first, or rename it with a permitted file extension.  But I'm not sure many people would open a dtsx file.  I don't think we need it.  It's the text file that's important.

    John

    Thanks but could you please advice which text file I need to attach ?

  • The one that you mentioned in your very first post:

    I am trying to copy data from a flat file (CSV) which has multiple headers between the row values

    John

  • John Mitchell-245523 - Wednesday, May 10, 2017 7:36 AM

    The one that you mentioned in your very first post:

    I am trying to copy data from a flat file (CSV) which has multiple headers between the row values

    John

    I had attached the files before, please find them attached again. The first file File1 is the raw version and File2 is the refined version (of how it should be before copying over to the SQL table). Thanks.

  • Ooops!  No wonder you were so confused.  My apologies.

    Right then.  The first thing you need to do is create a staging table.  You only need to do this once.
    CREATE TABLE Staging (
       MODEL_DSC varchar(100)
    ,   MODEL_CD varchar(100)
    ,   DERIVATIVE_CD varchar(100)
    ,   VINNUMBER varchar(100)
    ,   OUT_GATE varchar(100)
    ,   THEDATE varchar(100)
    ,   DISTRIBUTION_CD varchar(100)
    ,   POLCD varchar(100)
    ,   PODCD varchar(100)
       )

    Now use your data flow to import from the file into the staging table, with no transformations, then delete the header rows:
    DELETE FROM Staging
    WHERE MODEL_DSC = 'MODEL_DSC'
    AND MODEL_CD = 'MODEL_CD'
    AND DERIVATIVE_CD = 'DERIVATIVE_CD'
    AND VINNUMBER = 'VIN NUMBER'
    AND OUT_GATE = 'OUT_GATE'
    AND THEDATE = 'DATE'
    AND DISTRIBUTION_CD = 'DISTRIBUTION_CD'
    AND POLCDPOLCD = 'POL CD'
    AND PODCDPODCD = 'POL CD'

    Finally, transform and copy.  Since I don't know the data types of the columns in your destination table, you'll have to fill those in yourself.
    INSERT INTO DestinationTable (
       <Column List>
        )
    SELECT
        CONVERT(???,MODEL_DSC)
    ,    CONVERT(???,MODEL_CD)
    ,    CONVERT(???,DERIVATIVE_CD)
    ,    CONVERT(???,VINNUMBER)
    ,    CONVERT(???,OUT_GAT)
    ,    CONVERT(date,THEDATE,103)
    ,    CONVERT(???,DISTRIBUTION_CD )
    ,    CONVERT(???,POLCDPOLCD)
    ,    CONVERT(???,PODCDPODCD)
    FROM Staging

    That's the basics.  You'll probably want to make it a bit more sophisticated than that - handling any primary or foreign key violations, for example.  You may also wish to wrap it all in a stored procedure so it's easier to call from your package and easier to modify if your database changes.

    John

  • John Mitchell-245523 - Wednesday, May 10, 2017 8:18 AM

    Ooops!  No wonder you were so confused.  My apologies.

    Right then.  The first thing you need to do is create a staging table.  You only need to do this once.
    CREATE TABLE Staging (
       MODEL_DSC varchar(100)
    ,   MODEL_CD varchar(100)
    ,   DERIVATIVE_CD varchar(100)
    ,   VINNUMBER varchar(100)
    ,   OUT_GATE varchar(100)
    ,   THEDATE varchar(100)
    ,   DISTRIBUTION_CD varchar(100)
    ,   POLCD varchar(100)
    ,   PODCD varchar(100)
       )

    Now use your data flow to import from the file into the staging table, with no transformations, then delete the header rows:
    DELETE FROM Staging
    WHERE MODEL_DSC = 'MODEL_DSC'
    AND MODEL_CD = 'MODEL_CD'
    AND DERIVATIVE_CD = 'DERIVATIVE_CD'
    AND VINNUMBER = 'VIN NUMBER'
    AND OUT_GATE = 'OUT_GATE'
    AND THEDATE = 'DATE'
    AND DISTRIBUTION_CD = 'DISTRIBUTION_CD'
    AND POLCDPOLCD = 'POL CD'
    AND PODCDPODCD = 'POL CD'

    Finally, transform and copy.  Since I don't know the data types of the columns in your destination table, you'll have to fill those in yourself.
    INSERT INTO DestinationTable (
       <Column List>
        )
    SELECT
        CONVERT(???,MODEL_DSC)
    ,    CONVERT(???,MODEL_CD)
    ,    CONVERT(???,DERIVATIVE_CD)
    ,    CONVERT(???,VINNUMBER)
    ,    CONVERT(???,OUT_GAT)
    ,    CONVERT(date,THEDATE,103)
    ,    CONVERT(???,DISTRIBUTION_CD )
    ,    CONVERT(???,POLCDPOLCD)
    ,    CONVERT(???,PODCDPODCD)
    FROM Staging

    That's the basics.  You'll probably want to make it a bit more sophisticated than that - handling any primary or foreign key violations, for example.  You may also wish to wrap it all in a stored procedure so it's easier to call from your package and easier to modify if your database changes.

    John

    Thanks again.

    The problem is there was an existing data conversion component (within the data flow task) in the original SSIS package which cannot be now added in the new logic. Please find the screenshot for it attached.

  • Yes, that shows me that there's a Data Conversion transformation, but it doesn't tell me what it actually does.  What does it do that can't be added in the new logic?  As I suggested before, remove that transformation, and instead do everything that the transformation did in that final INSERT statement above.

  • John Mitchell-245523 - Thursday, May 11, 2017 4:30 AM

    Yes, that shows me that there's a Data Conversion transformation, but it doesn't tell me what it actually does.  What does it do that can't be added in the new logic?  As I suggested before, remove that transformation, and instead do everything that the transformation did in that final INSERT statement above.

    Have given up on it now after so many trials. Thanks anyways for your help.

  • If the additional header rows cause errors, you could change the error handling on the flat file source to "Redirect Row" for all columns and have them filtered out of the data flow.  I don't know that you even have to add a second path for the error output, you're not interested in these rows.

    Another method would be to use a script source component to read the file.  Inside that, you could read a row and test whether it is a header line before adding it to the data flow buffer.

    If it's going through a data flow task, the file is being read and processed one line at a time already.  I don't see why you'd dump the whole thing into a staging table and then transform it in T-SQL when you can simply throw the header lines away in the data flow while reading the file.

  • if you are only removing repeating header, you might consider using a  conditional split that tests the string;
    then the conditional split can redirect the headers, or maybe do nothing with headers, and the other side of the conditional split, which has good row data, directs records on to normal processing

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

Viewing 15 posts - 16 through 29 (of 29 total)

You must be logged in to reply to this topic. Login to reply