Importing Poorly Formatted Text File

  • I am trying to import a text file to a SQL server table using SISS. I use SSIS but this is my first time in import this type. Please HELP!

    ---------------text file--------------

    REQ que stn all

    STN

    DN 326 0000

    LOC GHUT ILCE 01 2 12 01

    OPT CFW 1MR DGT RTP 0 ALTO NACT AWS NACT !NPA CFB CFD ENQC

    CFAN

    STN

    DN 326 0001

    LOC GHUT ILCE 01 2 12 02

    OPT SUS CFW 1MR CWT DGT RTP 0 ALTO 3261281 ACT AWS NACT !NPA !ECO !RIT !124 CFB

    CFD VM ENQC

    STN

    DN 326 0003

    LOC GHUT ILCE 03 2 16 14

    OPT 1MR DGT RTP 0 !TKD CFB CFD

    STN

    DN 326 0004

    LOC GHUT ILCE 03 2 19 20

    OPT 1MR DGT RTP 0 !LOC CFB CFD

    -------------------------------------------------------------

    1. The header "REQ que stn all" is to be disregarded

    2. STN disregarded meaning not be included in the destination table

    3. DN disregarded

    4. 7 digits number after DN eg, 326 0000 is the phone number, first column

    5. LOC disregarded

    6. GHUT i.e. the next after LOC is the second column and that the Station Code

    7. The set of codes after GHUT, eg 'ILCE 01 2 12 01' is one column and that is the Port.

    8. OPT is disregarded.

    9. the Codes after OPT eg, 'CFW 1MR DGT RTP 0 ALTO NACT AWS NACT !NPA CFB CFD ENQC

    CFAN' to be treated as individual separate columns.

    Thanks,

  • marg 14154 (7/14/2014)


    I am trying to import a text file to a SQL server table using SISS. I use SSIS but this is my first time in import this type. Please HELP!

    ---------------text file--------------

    REQ que stn all

    STN

    DN 326 0000

    LOC GHUT ILCE 01 2 12 01

    OPT CFW 1MR DGT RTP 0 ALTO NACT AWS NACT !NPA CFB CFD ENQC

    CFAN

    STN

    DN 326 0001

    LOC GHUT ILCE 01 2 12 02

    OPT SUS CFW 1MR CWT DGT RTP 0 ALTO 3261281 ACT AWS NACT !NPA !ECO !RIT !124 CFB

    CFD VM ENQC

    STN

    DN 326 0003

    LOC GHUT ILCE 03 2 16 14

    OPT 1MR DGT RTP 0 !TKD CFB CFD

    STN

    DN 326 0004

    LOC GHUT ILCE 03 2 19 20

    OPT 1MR DGT RTP 0 !LOC CFB CFD

    -------------------------------------------------------------

    1. The header "REQ que stn all" is to be disregarded

    2. STN disregarded meaning not be included in the destination table

    3. DN disregarded

    4. 7 digits number after DN eg, 326 0000 is the phone number, first column

    5. LOC disregarded

    6. GHUT i.e. the next after LOC is the second column and that the Station Code

    7. The set of codes after GHUT, eg 'ILCE 01 2 12 01' is one column and that is the Port.

    8. OPT is disregarded.

    9. the Codes after OPT eg, 'CFW 1MR DGT RTP 0 ALTO NACT AWS NACT !NPA CFB CFD ENQC

    CFAN' to be treated as individual separate columns.

    Thanks,

    I don't think the file is poorly formatted, it looks consistent, free of funny characters:w00t: and relatively human readable.

    First thought is to import the file, line by line, into a staging table and then use DelimitedSplit8K[/url] to parse it, straight forward and quite fast.

    😎

  • Indeed, the file is formatted quite well in the sense that it is very predictable.

    How I would do it:

    * read the file in line by line as one giant column (choose a delimiter that will never be present in the file, for example |$&|)

    * use a conditional split to get rid of the lines you don't need (empty lines and header)

    * implement a script transformation to parse the remaining lines. You can use the first three characters of a line to decide what to do.

    * output all the columns combined into one single row. You can use STN to find out when a new row starts.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Koen Verbeeck (7/15/2014)


    Indeed, the file is formatted quite well in the sense that it is very predictable.

    How I would do it:

    * read the file in line by line as one giant column (choose a delimiter that will never be present in the file, for example |$&|)

    * use a conditional split to get rid of the lines you don't need (empty lines and header)

    * implement a script transformation to parse the remaining lines. You can use the first three characters of a line to decide what to do.

    * output all the columns combined into one single row. You can use STN to find out when a new row starts.

    Hi Koen, how to read line by line ?

    I did this outside SQL, I am sure it can be done on SQL too, I am just curious about a hint.

    thanks

  • Iulian -207023 (7/15/2014)


    Koen Verbeeck (7/15/2014)


    Indeed, the file is formatted quite well in the sense that it is very predictable.

    How I would do it:

    * read the file in line by line as one giant column (choose a delimiter that will never be present in the file, for example |$&|)

    * use a conditional split to get rid of the lines you don't need (empty lines and header)

    * implement a script transformation to parse the remaining lines. You can use the first three characters of a line to decide what to do.

    * output all the columns combined into one single row. You can use STN to find out when a new row starts.

    Hi Koen, how to read line by line ?

    I did this outside SQL, I am sure it can be done on SQL too, I am just curious about a hint.

    thanks

    You can find an example here:

    Creating an Asynchronous Transformation with the Script Component

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Koen Verbeeck (7/15/2014)


    Iulian -207023 (7/15/2014)


    Koen Verbeeck (7/15/2014)


    Indeed, the file is formatted quite well in the sense that it is very predictable.

    How I would do it:

    * read the file in line by line as one giant column (choose a delimiter that will never be present in the file, for example |$&|)

    * use a conditional split to get rid of the lines you don't need (empty lines and header)

    * implement a script transformation to parse the remaining lines. You can use the first three characters of a line to decide what to do.

    * output all the columns combined into one single row. You can use STN to find out when a new row starts.

    Hi Koen, how to read line by line ?

    I did this outside SQL, I am sure it can be done on SQL too, I am just curious about a hint.

    thanks

    You can find an example here:

    Creating an Asynchronous Transformation with the Script Component

    I got it, many thanks

  • I am afraid, I am a newbie in script component (with scripting in C# and VB). I read the example provided in the link but I am still finding difficulties in your point last 2 points:

    * implement a script transformation to parse the remaining lines. You can use the first three characters of a line to decide what to do.

    * output all the columns combined into one single row. You can use STN to find out when a new row starts

    #FeelingRatherDumb

  • 1. You read the line and store it inside a string.

    2. You read the first 3 characters and depending on the result, you decide what to do.

    - if it's STN, you know a new row has stared

    - if it's DN, LOC or OPT, you need to parse the line.

    You can use the .NET split function for that.

    You need to use probably the space as a delimiter.

    - The result of the split function is an array of strings. Loop over this array and write each occurence to a variable.

    Throw away DN, LOC or OPT in the process.

    - Read the next line. If it is STN, you know you have processed a full row and you can write it to the output.

    3. Repeat step 2 until all lines have been processed.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Koen Verbeeck (7/16/2014)


    1. You read the line and store it inside a string.

    2. You read the first 3 characters and depending on the result, you decide what to do.

    - if it's STN, you know a new row has stared

    - if it's DN, LOC or OPT, you need to parse the line.

    You can use the .NET split function for that.

    You need to use probably the space as a delimiter.

    - The result of the split function is an array of strings. Loop over this array and write each occurence to a variable.

    Throw away DN, LOC or OPT in the process.

    - Read the next line. If it is STN, you know you have processed a full row and you can write it to the output.

    3. Repeat step 2 until all lines have been processed.

    Thank you so much. You have helped a lot!

    Just one more thing, there is some cases in which there are some values that should be on the same line as OPT but are the next row like for instance 'CFAN !TKD' e.g. below. Would be able to assist or provide a hint on how I could get those values onto the same row as, merge with those values with OPT?

    Thank you.

    Eg,

    STN

    DN 326 0000

    LOC GHUT ILCE 01 2 12 01

    OPT CFW 1MR DGT RTP 0 ALTO NACT AWS NACT !NPA CFB CFD ENQC

    CFAN !TKD

    The Result would be:

    STN

    DN 326 0000

    LOC GHUT ILCE 01 2 12 01

    OPT CFW 1MR DGT RTP 0 ALTO NACT AWS NACT !NPA CFB CFD ENQC CFAN !TKD

  • @marg 14154,

    Are the 7 digit values for DN record type unique for each file? I ask because I have an idea for this but need to know that to firm up the approach.

    Also, is there a maximum number of different values (ie: columns) that can show up for the OPT record type?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (7/16/2014)


    @marg 14154,

    Are the 7 digit values for DN record type unique for each file? I ask because I have an idea for this but need to know that to firm up the approach.

    Also, is there a maximum number of different values (ie: columns) that can show up for the OPT record type?

    Yes the DN are unique for each file. For the OPT record, I am afraid, the maximum number of values is not known.

  • Jeff Moden (7/16/2014)


    @marg 14154,

    Are the 7 digit values for DN record type unique for each file? I ask because I have an idea for this but need to know that to firm up the approach.

    Also, is there a maximum number of different values (ie: columns) that can show up for the OPT record type?

    Yes Jeff the 7 digit values for DN are unique. For OPT, I am afraid the maximum values is not known.

  • Jeff Moden (7/16/2014)


    @marg 14154,

    Are the 7 digit values for DN record type unique for each file? I ask because I have an idea for this but need to know that to firm up the approach.

    Also, is there a maximum number of different values (ie: columns) that can show up for the OPT record type?

    Hi Jeff, yes the 7 digit values for DN is unique. For the different values that can show up for OPT, it is not known. But I was provided a list of known which is about 47 in total, that is should appear and there are:

    !124!BAR!ECO!IDD!LOC!NPA!OPR!RIT!STD!TKD!VAS1FR1MRACTALTOAWSCDF

    CFANCFBCFBUCFDCFDUCFWCLTCNDCNDB

    COBCWTDGTDNHDORDTMENQC FNTFPSVFREQFRSTMCTMTRNACTOCBOPTRTPSUSSUSO SUST VM

  • That works. It's almost 2:30AM so I've got to hit the rack so I can be up at 6. I'll give this a whirl tomorrow night after work. Thanks for the feedback.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • marg 14154 (7/16/2014)


    Thank you so much. You have helped a lot!

    Just one more thing, there is some cases in which there are some values that should be on the same line as OPT but are the next row like for instance 'CFAN !TKD' e.g. below. Would be able to assist or provide a hint on how I could get those values onto the same row as, merge with those values with OPT?

    Thank you.

    You just look out for those special 3 characters at the start of the line. (OPT,DN,LOC and STN)

    If it is neither of those, the values belong to the previous line.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

Viewing 15 posts - 1 through 15 (of 31 total)

You must be logged in to reply to this topic. Login to reply