Parsing a summary / detail flat file

  • Brandie Tarvin (5/13/2013)


    wolfkillj (5/13/2013)


    I couldn't use your sample data because the text file formatting seemed to be inconsistent from line to line

    Exactly. If you check my first post, you'll see where I made that point. That's why we import all rows into a single column in the staging table to begin with.

    I need a solution that addresses my current setup, including the inconsistent data.

    I went back and re-read your OP. If I understand you correctly, rows of different types (vendor and detail) have different data and so can't be parsed into the same set of columns. But you CAN reliably parse out the record ID value, leaving the rest of the row in a single column, and you CAN parse each type of row into its particular column set, right? If so, you can use the code I posted as a basis for identifying the rows that belong together (e.g., a record ID = 2 row with its associated record ID = 3 row(s)) and assigning them a single identifying number. Then, you can split the rows out into the different tables based on their record ID values (vendors and detail), using that identifying number as a foreign key. Does that get you where you need to go, or am I overlooking something.

    I didn't study WayneS's solution in detail, but it looks like he took a very similar approach to keeping together the groups of rows that belong together.

    Jason Wolfkill

  • Brandie Tarvin (5/13/2013)


    Thanks everyone for the responses. We're going to start checking them now. I'm glad I asked the question as this was not the direction I had been heading in with my thoughts.

    Brandie, in SSIS are you familiar with using a Synchronous Tranformation Script Component? They are a godsend for issues like this where you're dependant on order for transformation. Also, plus side, they aren't stream interrupts.

    Here's the short form: You fire up a script component midstream and add a column with GUIDs (easiest). Set the an internal variable to the first guid and then reset it on (1) lines. Carry through the old GUID on any other line.

    Dump to database staging table.

    Let me know if you need me to walk you through the code/setup. I'm hoping you're okay as unless you're familiar with them Script Transforms require a lot of screenshots. 😉

    EDIT: Side note: If you want to get fancy, said script transform can also split the string into components based on values and send it to multiple streams. Might be overkill on the first pass, but that's how I usually set them up.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Craig's idea sounds great!

    Before I saw it, I came up with this , which is similar but different to Wayne's method for tying the rows together...

    Just including it as an option, but probably Craig's SSIS magic is much better as this does some horrible sorting operations :sick:

    ;with magic as

    (

    select

    st.ID

    , st.Extract_Record

    , st.UNID

    , case left(st.Extract_Record,1)

    when '2' then rank() over(partition by left(st.Extract_Record,1) order by st.id)

    else row_number() over(order by st.id) - rank() over(partition by left(st.Extract_Record,1) order by st.id)

    end AS new_ID

    from staging AS st

    where st.Extract_Record like '[23]%'

    )

    update magic

    set unid = new_ID

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Brandie Tarvin (5/13/2013)


    Duh. Forgot there is a record 4 line which is the last line of the file. Anyway, sample data attached and CREATE TABLE code as follows:

    Heh... ok. I've watched this long enough and now they're starting to use really nasty 4 letter words... like SSIS.:-D

    How many lines of data does your typical file have? And, any limits on the type of SQL Server objects that you can create?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (5/13/2013)


    Brandie Tarvin (5/13/2013)


    Duh. Forgot there is a record 4 line which is the last line of the file. Anyway, sample data attached and CREATE TABLE code as follows:

    Heh... ok. I've watched this long enough and now they're starting to use really nasty 4 letter words... like SSIS.:-D

    How many lines of data does your typical file have? And, any limits on the type of SQL Server objects that you can create?

    LOL, there is nothing wrong with SSIS, particularly to solve an iterative programming artifact that breaks normalization rules because it was originally built to be processed by iterative programs. 😛

    Not my magic, though, Magoo. I learned off others just like everyone else. 😉 I did however trip over a link that I think embodies the process in the most simplified method possible; it's just the facts, ma'am:

    http://agilebi.com/jwelch/2007/07/12/processing-a-flat-file-with-header-and-detail-rows/

    (As to the SSIS, what did you expect me to solve this with... SQLCLR? :hehe: )


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Evil Kraig F (5/13/2013)


    (As to the SSIS, what did you expect me to solve this with... SQLCLR? :hehe: )

    Umm, why yes, yes I did! :-D:-P:w00t:

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Evil Kraig F (5/13/2013)


    Jeff Moden (5/13/2013)


    Brandie Tarvin (5/13/2013)


    Duh. Forgot there is a record 4 line which is the last line of the file. Anyway, sample data attached and CREATE TABLE code as follows:

    Heh... ok. I've watched this long enough and now they're starting to use really nasty 4 letter words... like SSIS.:-D

    How many lines of data does your typical file have? And, any limits on the type of SQL Server objects that you can create?

    LOL, there is nothing wrong with SSIS, particularly to solve an iterative programming artifact that breaks normalization rules because it was originally built to be processed by iterative programs. 😛

    Not my magic, though, Magoo. I learned off others just like everyone else. 😉 I did however trip over a link that I think embodies the process in the most simplified method possible; it's just the facts, ma'am:

    http://agilebi.com/jwelch/2007/07/12/processing-a-flat-file-with-header-and-detail-rows/

    (As to the SSIS, what did you expect me to solve this with... SQLCLR? :hehe: )

    BWAAAA-HAAA!!! Too funny! I know people that would write SQLCLR for this task. 🙂 I do know some that might do a really great job at it (and that would NOT be me!). :hehe:

    Thanks for the link. I'll take a look.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Oh my! How ironic. The author of that article used a script to do pretty much what I thought an SQLCLR could be written to do.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • WayneS (5/13/2013)


    Brandie Tarvin (5/13/2013)


    So, any ideas?

    Brandie, Try this out:

    Wayne, YOU DA MAN!

    I've been fighting a lot of fires lately and the coworker didn't quite understand everything you posted, so it's been a while since we were able to address this. Finally we had to get back to it and we've been tag teaming it.

    What I did was add a Derived Column step to the SSIS package so that it split out RecordID to a separate column (while leaving it intact in the big long string). Then I added the individual columns to the Staging table and did a pair of updates, one for the 2 records and one for the 3 records. This enabled me to do JOINs without Substring functions. (YAY!)

    Then I added indexes to the three tables. On the Staging table, I have 5 indexes, 1 clustered on ID and RecordID, 2 non-clustered, and 2 filtered non-clustered. I did not add the Date column to the Staging table because I saw no need for it at that point.

    I rewrote the CTE slightly, removed the second CROSS APPLY (because I didn't have the date field added in), then threw in the SET TRANSACTION ISOLATION statement, and inside the Transaction itself, updated the WHERE clause to look at the RecordID instead of LIKE '2%' and LIKE '3%'.

    I also put a TRY...CATCH around the TRANSACTION block so we could rollback problems and report on errors.

    Most of the individual parts ran in 2 or 3 minutes (or less). When I ran the whole thing, it still took 5 hours, so I redid the Clustered Index on the Staging table to include UNID. Then when I ran the whole thing, it ran in less than 3 minutes. So we put the proc in Test, threw in a large file, and it took 15 minutes for the queries to run. Less than a minute for a smaller file.

    And it was a great excuse to teach myself a little bit more about execution plans while doing the indexing.

    WHOO HOO! Thank you for all your help, Wayne. I'm still a little iffy on CROSS APPLYs and your input made all the difference. And I got to impress the boss and my coworker. @=)

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Viewing 9 posts - 16 through 23 (of 23 total)

    You must be logged in to reply to this topic. Login to reply