Flat with multiple data sets

  • Hi Guys,

    I am sure some of you might have dealt with the situation that I am in... wanted to check the efficient way to do this.... I have fixed width flat file; however, this file will have data for several tables based on the values in the positions 9, 10 and 11... and each set has different layout.... has different fields and different starting and ending positions, but the length of each line is same.

    For example:

    Pos 9, 10, 11 distinguish where that particular record should be loaded and each line ends with "Z" in Pos 900

    20110107001John Doe M34

    20110107002231M Street Germantown DC

    In the above example say 001 goes to individual table and 002 goes to address table... Is there a way we can use SSIS or sql to parse different layouts from one file? What is the better way guys? I was thinking about staging in a table with one column and then parsing...

    Thanks!

  • Having dealt with this before..

    I only want to read the file in once from disk.

    I would review the BULK INSERT command. That gets the data into a table you can work with.. At first blush I would define the table something similar to:

    CREATE TABLE tempdb.dbo.tempBulkLoadedData ( RecId int IDENTITY(1,1) NOT NULL, RecText char(900) NOT NULL )

    I would load the data into RecText and let RecId just do what an identity does. You might also consider creating an indexed calculated column (I think that is kosher) that is like a substring of the positions that identitfy record type. This would make them easier to pick out.

    I added RecId since I wanted to be able to maintain order of insertion.

    Mainly your query for each record type queries out the records for each type and uses SUBSTRING to get the raw data and CAST/CONVERT to get it into the correct type and name. It is really just a lot of typing. You could probably build a sproc to read a table holding a definition and builds then executes a query to do it for you, but thats more complex..

    CEWII

  • Ghanta, one of the key items in dealing with a mainframe output file like this is if the records can be re-associated by anything except positionally. IE If an 002 follows an 001, it belongs to that 001. If it can also be associated because both lines carry a key field to associate them, then your options open up.

    Do these keys exist, or is it positional association only?


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Thanks Elliott and Craig for the response. Craig it is positional association only... each line will have different schema based on three characters at position 9, 10, 11 and I have layout for each schemas.

  • Ghanta (1/7/2011)


    Thanks Elliott and Craig for the response. Craig it is positional association only... each line will have different schema based on three characters at position 9, 10, 11 and I have layout for each schemas.

    Dang. Sadly, you're SOL then. Elliott's solution is really the only feasible one. You'll have to recreate the wheel to deal with this.

    One of the things I did find that helps deal with positional association is a serial update (quirky update, search on articles) that will allow you to associate all secondary records back to the primary using a similar grouping mechanic, which you would apply after getting everything into your DB staging table in a usable order.

    You might try googling for mainframe file load sql to get some more ideas. I didn't test the search but it will hopefully give you a starting place.

    There's no easy solution to this. It's going to be ugly, it's going to be complex, and you're going to do a ton of repetitive code to make it happen.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Ghanta (1/7/2011)


    Hi Guys,

    I am sure some of you might have dealt with the situation that I am in... wanted to check the efficient way to do this.... I have fixed width flat file; however, this file will have data for several tables based on the values in the positions 9, 10 and 11... and each set has different layout.... has different fields and different starting and ending positions, but the length of each line is same.

    For example:

    Pos 9, 10, 11 distinguish where that particular record should be loaded and each line ends with "Z" in Pos 900

    20110107001John Doe M34

    20110107002231M Street Germantown DC

    In the above example say 001 goes to individual table and 002 goes to address table... Is there a way we can use SSIS or sql to parse different layouts from one file? What is the better way guys? I was thinking about staging in a table with one column and then parsing...

    Thanks!

    I realize you only posted partial rows so let me ask... I'm assuming that the two rows in your example are related to each other. Is there anything in the two rows of your example that would relate the two rows to each other?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/8/2011)


    Ghanta (1/7/2011)


    Hi Guys,

    I am sure some of you might have dealt with the situation that I am in... wanted to check the efficient way to do this.... I have fixed width flat file; however, this file will have data for several tables based on the values in the positions 9, 10 and 11... and each set has different layout.... has different fields and different starting and ending positions, but the length of each line is same.

    For example:

    Pos 9, 10, 11 distinguish where that particular record should be loaded and each line ends with "Z" in Pos 900

    20110107001John Doe M34

    20110107002231M Street Germantown DC

    In the above example say 001 goes to individual table and 002 goes to address table... Is there a way we can use SSIS or sql to parse different layouts from one file? What is the better way guys? I was thinking about staging in a table with one column and then parsing...

    Thanks!

    I realize you only posted partial rows so let me ask... I'm assuming that the two rows in your example are related to each other. Is there anything in the two rows of your example that would relate the two rows to each other?

    Thanks Jeff for responding... two rows is not related to each other and need to be loaded into separate staging tables... so we have 12 schemas in the file and the length of each record is same ending with "Z". However, the schemas are different so the fixed width has different starting positions for each columns in different schemas and number of columns are different... first column (8 chars) in each file will be date and 9, 10, 11 pos will tell what schema to follow... Did this answer your query? Thanks again.

  • Thanks guys for the response... I was able to do this using Conditional Split Transformation and Derived Column in SSIS... basically all that tedious substring, I had to write in derived column.

  • Ghanta (1/8/2011)


    Jeff Moden (1/8/2011)


    Ghanta (1/7/2011)


    Hi Guys,

    I am sure some of you might have dealt with the situation that I am in... wanted to check the efficient way to do this.... I have fixed width flat file; however, this file will have data for several tables based on the values in the positions 9, 10 and 11... and each set has different layout.... has different fields and different starting and ending positions, but the length of each line is same.

    For example:

    Pos 9, 10, 11 distinguish where that particular record should be loaded and each line ends with "Z" in Pos 900

    20110107001John Doe M34

    20110107002231M Street Germantown DC

    In the above example say 001 goes to individual table and 002 goes to address table... Is there a way we can use SSIS or sql to parse different layouts from one file? What is the better way guys? I was thinking about staging in a table with one column and then parsing...

    Thanks!

    I realize you only posted partial rows so let me ask... I'm assuming that the two rows in your example are related to each other. Is there anything in the two rows of your example that would relate the two rows to each other?

    Thanks Jeff for responding... two rows is not related to each other and need to be loaded into separate staging tables... so we have 12 schemas in the file and the length of each record is same ending with "Z". However, the schemas are different so the fixed width has different starting positions for each columns in different schemas and number of columns are different... first column (8 chars) in each file will be date and 9, 10, 11 pos will tell what schema to follow... Did this answer your query? Thanks again.

    Yes. I was concerned that the Germantown address was the address for John Doe just because of the position of one row after the other.

    Moot point now... Seem like you have the problem sussed.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ghanta (1/8/2011)


    Thanks guys for the response... I was able to do this using Conditional Split Transformation and Derived Column in SSIS... basically all that tedious substring, I had to write in derived column.

    Yikes! Although I'm a big SSIS guy I don't think I would have braved this method.. But I'm sure it works..

    CEWII

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply