MERGE to single Flat FILE with different formats

  • Hi

    I am trying to do a package to read data from a single flat file and load to single flat file after filtering on state code

    The input file has four record types:

    1 A Header Record

    2 A Trailer Record

    3 A PRIMARY Record

    4 A DETAIL Record

    the header goes as it is

    the trailer goes as it is.

    now every primary has one or more than one detail rows.

    each primary with state code for say "FL" will take all details associated with that RECID and go to destination flat file in order

    Example input data:

    H01

    PIRE001 EDWARD FL ABC DEF

    DET01E001 PO BOX 334 ABDC 0.00 9.9

    DET02E001 PO BOX 554 UUU 0.00 9.9

    PIRE002 RONALD FL ABC DEF

    DET01E002 PO BOX 334 ABDC 0.00 9.9

    DET02E002 PO BOX 554 UUU 0.00 9.9

    DET03E002 PO BOX 554 UUU 0.00 9.9

    PIREE003 CATHY NY ZZZ ZEF

    DET01E003 PO BOX 994 ZZZC 0.00 9.9

    T01

    output should be like this except Cathy all records should come as all other are from FL

    H01

    PIRE001 EDWARD FL ABC DEF

    DET01E001 PO BOX 334 ABDC 0.00 9.9

    DET02E001 PO BOX 554 UUU 0.00 9.9

    PIRE002 RONALD FL ABC DEF

    DET01E002 PO BOX 334 ABDC 0.00 9.9

    DET02E002 PO BOX 554 UUU 0.00 9.9

    DET03E002 PO BOX 554 UUU 0.00 9.9

    T01

    Also the format for input file for PRIMARY and DETAIL records is not same detail has much more data and totally different format

    Please any idea

    I have been trying to use Flat file src and conditional split on CODE FL and then read only DETAIL rows COLUMN 1 like DET%

    but still it always ends up joining primary ans secondary rows in same row

    I don't want same row at end it should do to different rows

    the join column is in position 4 to 7 in Primary

    and in position 6 to 9 in secondary

    any help on this

    thanks

    [font="Comic Sans MS"]
    ---------------------------------------------------

    Thanks [/font]

  • Cond. Split is the way to go. Look for rows starting with 'DET' and process those.

    Gerald Britton, Pluralsight courses

  • You still have the Header and Trailer rows to contend with. A flat file source will allow you to skip the header, but not the trailer and it sounds like you need both to flow through to the destination files anyways.

    This calls for a Script source. I would recommend writing a script to process the file. You can contend with header/trailer and the primary/detail rows easier here than in a standard flat file source. You could use the script to add a column to the output that you can then key off of in a conditional split or you can write the rows to different outputs depending on the data in the primary row. Do you need to create a file for each State? If so, I'd go with adding a state column to the script output so you don't have to create 50 output buffers for the script.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Thanks for the reply

    I am able to split the 4 type of records using header Footer primary and detail file connections

    but the thing is how to put them back in proper order

    I mean primary should follow the detail then another primary and it's detail ..

    I am not able to achieve that

    also the join is resulting in one row..

    I want still two rows for primary and detail separately.

    I think I am not able to explain it properly.

    [font="Comic Sans MS"]
    ---------------------------------------------------

    Thanks [/font]

  • Priya_learner (10/7/2015)


    Thanks for the reply

    I am able to split the 4 type of records using header Footer primary and detail file connections

    but the thing is how to put them back in proper order

    I mean primary should follow the detail then another primary and it's detail ..

    I am not able to achieve that

    also the join is resulting in one row..

    I want still two rows for primary and detail separately.

    I think I am not able to explain it properly.

    I've done something sort of like this before and I can say keeping things in the proper order will not be easy. If I had to do it again, I'd seriously consider Dot NET, either inside or outside of SSIS. SSIS dataflow was not made form this.



    Alvin Ramard
    Memphis PASS Chapter[/url]

    All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.

    For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • Yea

    I don't have hands on on Dot net programming but was thinking to do it using a CURSOR instead..

    I am still trying:unsure:

    [font="Comic Sans MS"]
    ---------------------------------------------------

    Thanks [/font]

  • well was able to do using for each loop container...

    [font="Comic Sans MS"]
    ---------------------------------------------------

    Thanks [/font]

  • John Rowan (10/6/2015)


    You still have the Header and Trailer rows to contend with. A flat file source will allow you to skip the header, but not the trailer and it sounds like you need both to flow through to the destination files anyways.

    This calls for a Script source. I would recommend writing a script to process the file. You can contend with header/trailer and the primary/detail rows easier here than in a standard flat file source. You could use the script to add a column to the output that you can then key off of in a conditional split or you can write the rows to different outputs depending on the data in the primary row. Do you need to create a file for each State? If so, I'd go with adding a state column to the script output so you don't have to create 50 output buffers for the script.

    You should be able to do with without a script:

    1. Split out the header, trailer and detail records into separate flows

    2. Add a key column (1 for header, 2 for detail, 3 for trailer)

    3. do other stuff (e.g. select just the detail rows you want)

    4. merge the three streams on the derived key column, specifying that the three dataflows are already sorted

    Gerald Britton, Pluralsight courses

  • I don't use SSIS so please pardon me if it sounds like I'm trivializing the use of SSIS for such things in saying, wouldn't it be easier to do something like this? (Note the INSERT into the Temp Table would be replaced by your import). And, no, we don't need to use DOT NET for such a thing. That would be adding to the "Tower of Babel" for this task.

    --===== Create a table with several computed columns.

    CREATE TABLE #MyHead

    (

    RowType AS CASE LEFT(RowData,1) WHEN 'H' THEN 1 WHEN 'P' THEN 2 WHEN 'D' THEN 2 WHEN 'T' THEN 3 ELSE NULL END

    ,PrimarySet AS CASE LEFT(RowData,1) WHEN 'P' THEN SUBSTRING(RowData,4,4) WHEN 'D' THEN SUBSTRING(RowData,6,4) WHEN 'H' THEN ' ' WHEN 'T' THEN ' ' ELSE NULL END

    ,DetailRow AS CASE LEFT(RowData,1) WHEN 'P' THEN '00' WHEN 'D' THEN SUBSTRING(RowData,4,2) ELSE '00' END

    ,Reject AS CASE WHEN LEFT(RowData,1) = 'P' AND CHARINDEX(' NY ',RowData,9)>0 THEN 1 ELSE 0 END

    ,RowData VARCHAR(8000)

    )

    ;

    --===== This would be replaced by the file load into the table

    INSERT INTO #MyHead

    (RowData)

    SELECT 'H01' UNION ALL

    SELECT 'PIRE001 EDWARD FL ABC DEF' UNION ALL

    SELECT 'DET01E001 PO BOX 334 ABDC 0.00 9.9' UNION ALL

    SELECT 'DET02E001 PO BOX 554 UUU 0.00 9.9' UNION ALL

    SELECT 'PIRE002 RONALD FL ABC DEF' UNION ALL

    SELECT 'DET01E002 PO BOX 334 ABDC 0.00 9.9' UNION ALL

    SELECT 'DET02E002 PO BOX 554 UUU 0.00 9.9' UNION ALL

    SELECT 'DET03E002 PO BOX 554 UUU 0.00 9.9' UNION ALL

    SELECT 'PIRE003 CATHY NY ZZZ ZEF' UNION ALL

    SELECT 'DET01E003 PO BOX 994 ZZZC 0.00 9.9' UNION ALL

    SELECT 'T01'

    ;

    --===== Then this simple SELECT will do it all

    SELECT * --Change to just RowData when done with proof

    FROM #MyHead

    WHERE PrimarySet NOT IN (SELECT PrimarySet FROM #MyHead WHERE Reject = 1)

    ORDER BY RowType,PrimarySet,DetailRow

    ;

    Results: (Note, change the * in the code above to just the RowData column to get the exact output desired)

    RowType PrimarySet DetailRow Reject RowData

    ----------- ---------- --------- ----------- -----------------------------------

    1 00 0 H01

    2 E001 00 0 PIRE001 EDWARD FL ABC DEF

    2 E001 01 0 DET01E001 PO BOX 334 ABDC 0.00 9.9

    2 E001 02 0 DET02E001 PO BOX 554 UUU 0.00 9.9

    2 E002 00 0 PIRE002 RONALD FL ABC DEF

    2 E002 01 0 DET01E002 PO BOX 334 ABDC 0.00 9.9

    2 E002 02 0 DET02E002 PO BOX 554 UUU 0.00 9.9

    2 E002 03 0 DET03E002 PO BOX 554 UUU 0.00 9.9

    3 00 0 T01

    (9 row(s) affected)

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Great solution Jeff. I'd say that would work great as long as the OP has the ability to create tables for their ETL. I've worked on projects where all of the ETL has to be done within SSIS so the option of staging the data and pulling it back out is not always available. This is one of those things that ETL developers go back and forth on....how much of the work can/should we do on the DB engine versus solely in SSIS. Speaking for myself, I sometimes have blinders on when it comes to things that could more easily be done in the DB engine just because I develop in SSIS so often.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • John Rowan (10/8/2015)


    Great solution Jeff. I'd say that would work great as long as the OP has the ability to create tables for their ETL. I've worked on projects where all of the ETL has to be done within SSIS so the option of staging the data and pulling it back out is not always available. This is one of those things that ETL developers go back and forth on....how much of the work can/should we do on the DB engine versus solely in SSIS. Speaking for myself, I sometimes have blinders on when it comes to things that could more easily be done in the DB engine just because I develop in SSIS so often.

    Thanks for the feedback, John.

    I think the reason I'm so DB Engine prone rather than using SSIS is that I don't have the same sometimes unreasonable fear of using xp_CmdShell to do the exports that a lot of people have. To wit, I've never personally used SSIS myself. About the closest I've come is to write scripts/procs for folks that needed it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply