Copy data from flat file with multiple headers

  • Hi,

    I am trying to copy data from a flat file (CSV) which has multiple headers between the row values to a SQL table. The SSIS package was initially built to only copy data across from flat file with single header but now the functionality has changed.

    The data needs to be written to a table in SQL and it also includes transformation component. Could someone please give any suggestions in this regard. Please find the SSIS and other files attached.

    Thanks.

  • pwalter83 - Friday, May 5, 2017 5:04 AM

    Hi,

    I am trying to copy data from a flat file (CSV) which has multiple headers between the row values to a SQL table. The SSIS package was initially built to only copy data across from flat file with single header but now the functionality has changed.

    The data needs to be written to a table in SQL and it also includes transformation component. Could someone please give any suggestions in this regard. Please find the SSIS and other files attached.

    Thanks.

    My normal approach is to use ELT rather than ETL, load everything into a staging table as it appears in the file and then do grouping and transformations.
    😎
    BTW, cannot see any attachments in your post.

  • pwalter83 - Friday, May 5, 2017 5:04 AM

    Hi,

    I am trying to copy data from a flat file (CSV) which has multiple headers between the row values to a SQL table. The SSIS package was initially built to only copy data across from flat file with single header but now the functionality has changed.

    The data needs to be written to a table in SQL and it also includes transformation component. Could someone please give any suggestions in this regard. Please find the SSIS and other files attached.

    Thanks.

    Could just be me but I see no attachments.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Eirikur Eiriksson - Friday, May 5, 2017 5:15 AM

    pwalter83 - Friday, May 5, 2017 5:04 AM

    Hi,

    I am trying to copy data from a flat file (CSV) which has multiple headers between the row values to a SQL table. The SSIS package was initially built to only copy data across from flat file with single header but now the functionality has changed.

    The data needs to be written to a table in SQL and it also includes transformation component. Could someone please give any suggestions in this regard. Please find the SSIS and other files attached.

    Thanks.

    My normal approach is to use ELT rather than ETL, load everything into a staging table as it appears in the file and then do grouping and transformations.
    😎
    BTW, cannot see any attachments in your post.

    Heh... I prefer "LET". 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Friday, May 5, 2017 6:54 AM

    Eirikur Eiriksson - Friday, May 5, 2017 5:15 AM

    pwalter83 - Friday, May 5, 2017 5:04 AM

    Hi,

    I am trying to copy data from a flat file (CSV) which has multiple headers between the row values to a SQL table. The SSIS package was initially built to only copy data across from flat file with single header but now the functionality has changed.

    The data needs to be written to a table in SQL and it also includes transformation component. Could someone please give any suggestions in this regard. Please find the SSIS and other files attached.

    Thanks.

    My normal approach is to use ELT rather than ETL, load everything into a staging table as it appears in the file and then do grouping and transformations.
    😎
    BTW, cannot see any attachments in your post.

    Heh... I prefer "LET". 😉

    Kind of "LET IT BI"
    😎

  • If the header and files relate to each other, then it may be difficult attaching the second and more headers to their respective headers. I've used Python in the past to split the flat file, header and the header records into separate files. As the header rows become row delimiters, it's easy to read the file in chunks from header-to-header, where you can partition the records with their respective headers into separate flat files. That way they are easily reusable for reimportation later without having to stage more bytes than what you need versus just loading it into a staging table leaving the flat file intact in the same multi-header file.

    Something as simple as (psudo code here):


    import pandas as df

    # Load your CSV file
    file = df.read_csv('location\yourfile.csv')

    # Make a new list to hold your rows that you want to separate
    newRecords= [] 

    # Define your header rows for exporting to csv.
    label1 = ['Date', 'Blah', 'Blah2']

    # Iterate over each row in the data frame.
    # Once you get to the second header in the file, break, else append record to a list
    for line in file:
        if line['FieldName'] == "Date":
            break
        else:
            newRecords.append([line])

    # Now that you have a list/array of records, dump them into a dataframe 
    newFile = df.DataFrame(newRecords, header = label1)

    # Then export the data frame as a csv file to import with SSIS
    df.to_csv(YourPartitonedFile.csv)

    That being said, recommended Python 2.7 (google Windows Anaconda Python) with Pandas. Both to_csv and read_csv have options to ignore headers, set headers, custom delimiters, skip rows, read in chunks and all that good stuff. Pandas does not work that well on larger files with straight reads, but it's really easy to read, transform and export flat files with Python + Pandas. I take this approach to distribute the workload as opposed to SSIS doing all the work along with best practices on doing as much on disk before landing data into your database (i.e.: helping offload the work from SQL Server).

    This is just one option and not your only option! :hehe:

  • Hi,
    I think I did not explain my problem clearly. I need to remove the repeating header rows from the flat file (as attached File1) and then copy the records from the amended file, which only has a single header at the top (File2) to a SQL table. Please also find the SSIS package screenshot attached.
    Thanks in advance for your help.

  • pwalter83 - Monday, May 8, 2017 3:37 AM

    Hi,
    I think I did not explain my problem clearly. I need to remove the repeating header rows from the flat file (as attached File1) and then copy the records from the amended file, which only has a single header at the top (File2) to a SQL table. Please also find the SSIS package screenshot attached.
    Thanks in advance for your help.

    This is easy to handle by loading the full file content to a table and then filter out the repeated header rows.
    😎

  • Eirikur Eiriksson - Monday, May 8, 2017 4:14 AM

    pwalter83 - Monday, May 8, 2017 3:37 AM

    Hi,
    I think I did not explain my problem clearly. I need to remove the repeating header rows from the flat file (as attached File1) and then copy the records from the amended file, which only has a single header at the top (File2) to a SQL table. Please also find the SSIS package screenshot attached.
    Thanks in advance for your help.

    This is easy to handle by loading the full file content to a table and then filter out the repeated header rows.
    😎

    I have tried different ways to achieve it but get errors every time.
    Could you please suggest how to go about this ? Thanks.

  • pwalter83 - Monday, May 8, 2017 4:51 AM

    Eirikur Eiriksson - Monday, May 8, 2017 4:14 AM

    pwalter83 - Monday, May 8, 2017 3:37 AM

    Hi,
    I think I did not explain my problem clearly. I need to remove the repeating header rows from the flat file (as attached File1) and then copy the records from the amended file, which only has a single header at the top (File2) to a SQL table. Please also find the SSIS package screenshot attached.
    Thanks in advance for your help.

    This is easy to handle by loading the full file content to a table and then filter out the repeated header rows.
    😎

    I have tried different ways to achieve it but get errors every time.
    Could you please suggest how to go about this ? Thanks.

    What kind of errors are you getting?
    😎
    Import everything as (n)varchar, use a query for the type casting needed and add a row_number for the header rows filtering on the first instance.

  • This is a good example of why you should never load a file directly into it's final resting place.

    The file is correctly TAB delimited in all of the right places.  As Eirikur implies, load the file into a staging table where all of the columns are character-based data-types.  Then, simply insert from that staging table into the final table using a WHERE clause of WHERE MODEL_DSC <> 'MODEL_DSC' and call it an easy day.

    Then, hunt down the idiots that are sending you this rubbish and explain the facts of life to them.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Eirikur Eiriksson - Monday, May 8, 2017 5:13 AM

    pwalter83 - Monday, May 8, 2017 4:51 AM

    Eirikur Eiriksson - Monday, May 8, 2017 4:14 AM

    pwalter83 - Monday, May 8, 2017 3:37 AM

    Hi,
    I think I did not explain my problem clearly. I need to remove the repeating header rows from the flat file (as attached File1) and then copy the records from the amended file, which only has a single header at the top (File2) to a SQL table. Please also find the SSIS package screenshot attached.
    Thanks in advance for your help.

    This is easy to handle by loading the full file content to a table and then filter out the repeated header rows.
    😎

    I have tried different ways to achieve it but get errors every time.
    Could you please suggest how to go about this ? Thanks.

    What kind of errors are you getting?
    😎
    Import everything as (n)varchar, use a query for the type casting needed and add a row_number for the header rows filtering on the first instance.

    Hi,

    Thanks for your suggestion. Could you please explain your solution in a bit more detail ? Thanks.

  • Jeff Moden - Monday, May 8, 2017 9:21 AM

    This is a good example of why you should never load a file directly into it's final resting place.

    The file is correctly TAB delimited in all of the right places.  As Eirikur implies, load the file into a staging table where all of the columns are character-based data-types.  Then, simply insert from that staging table into the final table using a WHERE clause of WHERE MODEL_DSC <> 'MODEL_DSC' and call it an easy day.

    Then, hunt down the idiots that are sending you this rubbish and explain the facts of life to them.

    Hi,
    Thanks for the solution but I am still confused as to what component to use to insert from the staging table. In the SSIS package I have used a For each loop container and within that is the Data Flow Task and within the Data Flow task, the process flow starts from Flat File source and then there is a data conversion which finally copies the data to the destination SQL table (explain with attached screenshots).

    Do you know which component I need to use and where does it need to be inserted within the Data Flow task ? Many thanks again.

  • pwalter83 - Tuesday, May 9, 2017 5:47 AM

    Jeff Moden - Monday, May 8, 2017 9:21 AM

    This is a good example of why you should never load a file directly into it's final resting place.

    The file is correctly TAB delimited in all of the right places.  As Eirikur implies, load the file into a staging table where all of the columns are character-based data-types.  Then, simply insert from that staging table into the final table using a WHERE clause of WHERE MODEL_DSC <> 'MODEL_DSC' and call it an easy day.

    Then, hunt down the idiots that are sending you this rubbish and explain the facts of life to them.

    Hi,
    Thanks for the solution but I am still confused as to what component to use to insert from the staging table. In the SSIS package I have used a For each loop container and within that is the Data Flow Task and within the Data Flow task, the process flow starts from Flat File source and then there is a data conversion which finally copies the data to the destination SQL table (explain with attached screenshots).

    Do you know which component I need to use and where does it need to be inserted within the Data Flow task ? Many thanks again.

    Hi,
    Can anyone please help on this this, its very urgent. Thanks.

  • It's so simple that you don't even need to use SSIS.  You can use bcp or BULK INSERT (or SSIS if you insist, or if it's part of a larger set of tasks) to copy everything from the file into a table with the same number of columns as the file has.  This assumes that the column headers match the columns themselves one-to-one, right?  Then delete all header rows (something like WHERE Col1 = 'Header1' AND Col2 = 'Header2'...).  Finally, transfer to the destination table, performing any necessary conversions along the way (date formats, decimals etc).

    John

Viewing 15 posts - 1 through 15 (of 29 total)

You must be logged in to reply to this topic. Login to reply