Importing a Flat File Containing Header and Detail Rows

  • Hello All, 
    I'm not quite sure how to explain my situation without confusing the living hell out of everyone but i'll do my best here.  
    I have a flat file which contains one single header row and multiple detail rows.  The detail rows are for customer data, the header row contains the contract number and the start year and end year which covers the data.

    FILE:
    Here is what the file looks like when I open it in Notepad++:

    1H582620172018 -- this is the header which always starts with "1H".  It contains the company contract number (5826)
    2H1234 JONES MARY --this is the customer contract number and customer last/first name for Mary Jones.  Always starts with "2H"
    3H19501115 --this is the customer birthdate for Mary Jones.  Always starts with "3H"
    2H12346 SMITH BILL --new record for with contract number, last/first name for Bill Smith
    3H19600901 --Bill Smith's birthdate

    After I import the file into a landing table, each part of the customer data is in it's own record just like in the file

    RESULT I NEED:
    I need to create one record for each customer in the file:

    COMPANY_CONTRACT_ID      CUSTOMER_CONTRACT_ID        FIRST_NAME       LAST_NAME     BIRTHDATE
        5826                                           1234                                                  Mary                       Jones             1950-11-15
        5826                                           1236                                                  Bill                          Smith              1960-09-01              

    I'm having a hell of a time trying to get the data into one single record.  Any ideas as to how I might do this?

    Hopefully I didn't confuse anyone.  I can honestly say i've never worked with a file which had header/details rows like this before.

  • The following is one way of doing it.

    Requires that you load the files onto a staging table with a identity column and 2 other columns, reckey and data as per below example - load must not be done in parallel.

    Note that the output below does not match your desired output as I do believe you added a extra number on the second customer id.
    This is just one way - not necessarily the best one.
    If your fields are fixed size then the charindex below isn't required as you can just split directly based on the position of each field.
    also assumes that there will always be a record type of 2H to be able to identify which Customer the date of birth relates to. 
    similar for the 1H record.

    And obviously that the input file will always be on the correct order - although this is pretty standard on these type of files.


    if object_id('tempdb..#input') is not null
      drop table #input
    create table #input
      ( inputrecord varchar(2000)
      )
    insert into #input
      select '1H582620172018'
    insert into #input
      select '2H1234 JONES MARY'
    insert into #input
      select '3H19501115'
    insert into #input
      select '2H12346 SMITH BILL'
    insert into #input
      select '3H19600901'

    if object_id('tempdb..#staging') is not null
      drop table #staging

    create table #staging
      ( rowid int identity (1, 1)
      , reckey char(2)
      , data varchar(2000)
      )

    insert into #staging
      select substring(i.inputrecord, 1, 2) as reckey
       , substring(i.inputrecord, 3, 2000) as data
      from #input i

    create clustered index #staging_ix1 on #staging
    (rowid
    )

    create nonclustered index #staging_ix2 on #staging
    (reckey
    , rowid
    )
    include (data)

    select max(t.company_contract_id) as company_contract_id
      , t.customer_contract_id
      , max(substring(t.names, charindex(' ', t.names) + 1, 100)) as first_name
      , max(substring(t.names, 1, charindex(' ', t.names) - 1)) as last_name
      , max(t.birthdate) as birthdate
    from (select substring(sub2h.data, 1, charindex(' ', sub2h.data) - 1) as customer_contract_id
        , substring(submain.data, 1, 4) as company_contract_id
        , substring(sub2h.data, charindex(' ', sub2h.data) + 1, 200) as names
        , case
         when main.reckey = '3H'
          then convert(date, substring(main.data, 1, 8), 112)
         else null
         end as birthdate

       from #staging main
       outer apply (select top 1 *
         from #staging st2
         where st2.rowid <= main.rowid
          and st2.reckey = '1H'
        order by st2.rowid desc
       ) submain
       outer apply (select top 1 *
         from #staging st2
         where st2.rowid < main.rowid
          and st2.reckey = '2H'
        order by st2.rowid desc
       ) sub2h
       where sub2h.reckey is not null
    ) t
    group by t.customer_contract_id

  • Polymorphist - Friday, June 1, 2018 7:27 PM

    Hello All, 
    I'm not quite sure how to explain my situation without confusing the living hell out of everyone but i'll do my best here.  
    I have a flat file which contains one single header row and multiple detail rows.  The detail rows are for customer data, the header row contains the contract number and the start year and end year which covers the data.

    FILE:
    Here is what the file looks like when I open it in Notepad++:

    1H582620172018 -- this is the header which always starts with "1H".  It contains the company contract number (5826)
    2H1234 JONES MARY --this is the customer contract number and customer last/first name for Mary Jones.  Always starts with "2H"
    3H19501115 --this is the customer birthdate for Mary Jones.  Always starts with "3H"
    2H12346 SMITH BILL --new record for with contract number, last/first name for Bill Smith
    3H19600901 --Bill Smith's birthdate

    After I import the file into a landing table, each part of the customer data is in it's own record just like in the file

    RESULT I NEED:
    I need to create one record for each customer in the file:

    COMPANY_CONTRACT_ID      CUSTOMER_CONTRACT_ID        FIRST_NAME       LAST_NAME     BIRTHDATE
        5826                                           1234                                                  Mary                       Jones             1950-11-15
        5826                                           1236                                                  Bill                          Smith              1960-09-01              

    I'm having a hell of a time trying to get the data into one single record.  Any ideas as to how I might do this?

    Hopefully I didn't confuse anyone.  I can honestly say i've never worked with a file which had header/details rows like this before.

    This can all be done during the actual import.  What and where are the delimiters in each of the lines in the file and what is the end of line marker?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Sunday, June 3, 2018 7:31 AM

    Polymorphist - Friday, June 1, 2018 7:27 PM

    Hello All, 
    I'm not quite sure how to explain my situation without confusing the living hell out of everyone but i'll do my best here.  
    I have a flat file which contains one single header row and multiple detail rows.  The detail rows are for customer data, the header row contains the contract number and the start year and end year which covers the data.

    FILE:
    Here is what the file looks like when I open it in Notepad++:

    1H582620172018 -- this is the header which always starts with "1H".  It contains the company contract number (5826)
    2H1234 JONES MARY --this is the customer contract number and customer last/first name for Mary Jones.  Always starts with "2H"
    3H19501115 --this is the customer birthdate for Mary Jones.  Always starts with "3H"
    2H12346 SMITH BILL --new record for with contract number, last/first name for Bill Smith
    3H19600901 --Bill Smith's birthdate

    After I import the file into a landing table, each part of the customer data is in it's own record just like in the file

    RESULT I NEED:
    I need to create one record for each customer in the file:

    COMPANY_CONTRACT_ID      CUSTOMER_CONTRACT_ID        FIRST_NAME       LAST_NAME     BIRTHDATE
        5826                                           1234                                                  Mary                       Jones             1950-11-15
        5826                                           1236                                                  Bill                          Smith              1960-09-01              

    I'm having a hell of a time trying to get the data into one single record.  Any ideas as to how I might do this?

    Hopefully I didn't confuse anyone.  I can honestly say i've never worked with a file which had header/details rows like this before.

    This can all be done during the actual import.  What and where are the delimiters in each of the lines in the file and what is the end of line marker?

    My suspicion is that 1H and 3H are not delimited and 2H tab delimited (hopefully not space delimited :pinch: )
    Will have to wait for confirmation though 🙂

    Far away is close at hand in the images of elsewhere.
    Anon.

  • Seems to me that it should be that each row is the column delimiter and the row delimiter is the start of the token '1H ...'. However the data presented does not quite fit this. Still some clarity needed.

    ----------------------------------------------------

  • Thanks for all of the replies.  

    The file is a fixed-width with the row delimiter being {cr}{lf}.

  • The sample provided follows standard definition for usage from mainframe style extract files - or PC Cobol files that follow old conventions.
    But not just those follow these - I remember when doing Basel II (or was it Solvency?) many years ago that the file definitions were like this - 1 major header, 1 major trailer, then multiple blocks of detail records - each block with 1 sub-header and multiple detail records.

    records would be LF or CR+LF delimited
    record identifier will be a predetermined code for each type - on the above example 1H is Company ID

    Based on original example
    1H582620172018
    2H1234 JONES MARY
    3H19501115
    2H12346 SMITH BILL
    3H19600901

    Record type
    1H - Company code
    2H - Customer detail
    3H - Customer DOB

    if file is multiple company we could have

    1H582620172018  -- this is companyid + a sequence that looks like from and to year
    2H1234 JONES MARY -- Customer code + customer names (last and first)
    3H19501115   -- customer DOB
    2H12346 SMITH BILL -- New customer block
    3H19600901
    1H988120172018  -- new company block
    2H1234 BURKE LOIS -- new customer block
    3H19701115
    2H8311 SMITH BILL -- new customer block
    3H19801001

    Now tricky sometimes is that headers that do not contain data may be excluded. On the above 3H could be excluded if no DOB exists for the client - or it could be supplied with a "null" date - 00000000 or 99999999 for example

    OP needs to contact whoever creates the file and ask for full record definition and rules for when they are ommited in case it happens.

  • Hello All - 
    Thanks again for all of your input, really helped.  I modified the sample query provided by Frederico_Fonseca and it works perfectly.  I now have a fully-functioning job which imports multiple files.
    Really saved my bacon, thanks again.

    Regarding the file format, I've never worked with this type of file before and it occurred to me that it might be an older format style.  For what it's worth, i'm currently at a health insurance company and there seem to be a lot of old processes which haven't been touched for decades.  I do know that there are some old COBOL processes running somewhere.  Not long ago I converted a bunch of old KornShell scripts, most of which had not been touched since the late 1980s 🙂

    Always an adventure.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply