Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Importing Poorly Formatted Text File


Importing Poorly Formatted Text File

Author
Message
marg 14154
marg 14154
Forum Newbie
Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)

Group: General Forum Members
Points: 9 Visits: 224
I am trying to import a text file to a SQL server table using SISS. I use SSIS but this is my first time in import this type. Please HELP!
---------------text file--------------
REQ que stn all

STN
DN 326 0000
LOC GHUT ILCE 01 2 12 01
OPT CFW 1MR DGT RTP 0 ALTO NACT AWS NACT !NPA CFB CFD ENQC
CFAN

STN
DN 326 0001
LOC GHUT ILCE 01 2 12 02
OPT SUS CFW 1MR CWT DGT RTP 0 ALTO 3261281 ACT AWS NACT !NPA !ECO !RIT !124 CFB
CFD VM ENQC

STN
DN 326 0003
LOC GHUT ILCE 03 2 16 14
OPT 1MR DGT RTP 0 !TKD CFB CFD

STN
DN 326 0004
LOC GHUT ILCE 03 2 19 20
OPT 1MR DGT RTP 0 !LOC CFB CFD
-------------------------------------------------------------
1. The header "REQ que stn all" is to be disregarded
2. STN disregarded meaning not be included in the destination table
3. DN disregarded
4. 7 digits number after DN eg, 326 0000 is the phone number, first column
5. LOC disregarded
6. GHUT i.e. the next after LOC is the second column and that the Station Code
7. The set of codes after GHUT, eg 'ILCE 01 2 12 01' is one column and that is the Port.
8. OPT is disregarded.
9. the Codes after OPT eg, 'CFW 1MR DGT RTP 0 ALTO NACT AWS NACT !NPA CFB CFD ENQC
CFAN' to be treated as individual separate columns.

Thanks,
Eirikur Eiriksson
Eirikur Eiriksson
SSCertifiable
SSCertifiable (6.8K reputation)SSCertifiable (6.8K reputation)SSCertifiable (6.8K reputation)SSCertifiable (6.8K reputation)SSCertifiable (6.8K reputation)SSCertifiable (6.8K reputation)SSCertifiable (6.8K reputation)SSCertifiable (6.8K reputation)

Group: General Forum Members
Points: 6762 Visits: 17733
marg 14154 (7/14/2014)
I am trying to import a text file to a SQL server table using SISS. I use SSIS but this is my first time in import this type. Please HELP!
---------------text file--------------
REQ que stn all

STN
DN 326 0000
LOC GHUT ILCE 01 2 12 01
OPT CFW 1MR DGT RTP 0 ALTO NACT AWS NACT !NPA CFB CFD ENQC
CFAN

STN
DN 326 0001
LOC GHUT ILCE 01 2 12 02
OPT SUS CFW 1MR CWT DGT RTP 0 ALTO 3261281 ACT AWS NACT !NPA !ECO !RIT !124 CFB
CFD VM ENQC

STN
DN 326 0003
LOC GHUT ILCE 03 2 16 14
OPT 1MR DGT RTP 0 !TKD CFB CFD

STN
DN 326 0004
LOC GHUT ILCE 03 2 19 20
OPT 1MR DGT RTP 0 !LOC CFB CFD
-------------------------------------------------------------
1. The header "REQ que stn all" is to be disregarded
2. STN disregarded meaning not be included in the destination table
3. DN disregarded
4. 7 digits number after DN eg, 326 0000 is the phone number, first column
5. LOC disregarded
6. GHUT i.e. the next after LOC is the second column and that the Station Code
7. The set of codes after GHUT, eg 'ILCE 01 2 12 01' is one column and that is the Port.
8. OPT is disregarded.
9. the Codes after OPT eg, 'CFW 1MR DGT RTP 0 ALTO NACT AWS NACT !NPA CFB CFD ENQC
CFAN' to be treated as individual separate columns.

Thanks,


I don't think the file is poorly formatted, it looks consistent, free of funny charactersw00t and relatively human readable.

First thought is to import the file, line by line, into a staging table and then use DelimitedSplit8K to parse it, straight forward and quite fast.
Cool
Koen Verbeeck
Koen Verbeeck
SSCoach
SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)

Group: General Forum Members
Points: 16461 Visits: 13207
Indeed, the file is formatted quite well in the sense that it is very predictable.
How I would do it:

* read the file in line by line as one giant column (choose a delimiter that will never be present in the file, for example |$&|)
* use a conditional split to get rid of the lines you don't need (empty lines and header)
* implement a script transformation to parse the remaining lines. You can use the first three characters of a line to decide what to do.
* output all the columns combined into one single row. You can use STN to find out when a new row starts.



How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?

Member of LinkedIn. My blog at SQLKover.

MCSA SQL Server 2012 - MCSE Business Intelligence
Iulian -207023
Iulian -207023
Ten Centuries
Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)

Group: General Forum Members
Points: 1198 Visits: 1226
Koen Verbeeck (7/15/2014)
Indeed, the file is formatted quite well in the sense that it is very predictable.
How I would do it:

* read the file in line by line as one giant column (choose a delimiter that will never be present in the file, for example |$&|)
* use a conditional split to get rid of the lines you don't need (empty lines and header)
* implement a script transformation to parse the remaining lines. You can use the first three characters of a line to decide what to do.
* output all the columns combined into one single row. You can use STN to find out when a new row starts.


Hi Koen, how to read line by line ?
I did this outside SQL, I am sure it can be done on SQL too, I am just curious about a hint.

thanks
Koen Verbeeck
Koen Verbeeck
SSCoach
SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)

Group: General Forum Members
Points: 16461 Visits: 13207
Iulian -207023 (7/15/2014)
Koen Verbeeck (7/15/2014)
Indeed, the file is formatted quite well in the sense that it is very predictable.
How I would do it:

* read the file in line by line as one giant column (choose a delimiter that will never be present in the file, for example |$&|)
* use a conditional split to get rid of the lines you don't need (empty lines and header)
* implement a script transformation to parse the remaining lines. You can use the first three characters of a line to decide what to do.
* output all the columns combined into one single row. You can use STN to find out when a new row starts.


Hi Koen, how to read line by line ?
I did this outside SQL, I am sure it can be done on SQL too, I am just curious about a hint.

thanks


You can find an example here:

Creating an Asynchronous Transformation with the Script Component



How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?

Member of LinkedIn. My blog at SQLKover.

MCSA SQL Server 2012 - MCSE Business Intelligence
Iulian -207023
Iulian -207023
Ten Centuries
Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)

Group: General Forum Members
Points: 1198 Visits: 1226
Koen Verbeeck (7/15/2014)
Iulian -207023 (7/15/2014)
Koen Verbeeck (7/15/2014)
Indeed, the file is formatted quite well in the sense that it is very predictable.
How I would do it:

* read the file in line by line as one giant column (choose a delimiter that will never be present in the file, for example |$&|)
* use a conditional split to get rid of the lines you don't need (empty lines and header)
* implement a script transformation to parse the remaining lines. You can use the first three characters of a line to decide what to do.
* output all the columns combined into one single row. You can use STN to find out when a new row starts.


Hi Koen, how to read line by line ?
I did this outside SQL, I am sure it can be done on SQL too, I am just curious about a hint.

thanks


You can find an example here:

Creating an Asynchronous Transformation with the Script Component


I got it, many thanks
marg 14154
marg 14154
Forum Newbie
Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)

Group: General Forum Members
Points: 9 Visits: 224
I am afraid, I am a newbie in script component (with scripting in C# and VB). I read the example provided in the link but I am still finding difficulties in your point last 2 points:
* implement a script transformation to parse the remaining lines. You can use the first three characters of a line to decide what to do.
* output all the columns combined into one single row. You can use STN to find out when a new row starts

#FeelingRatherDumb
Koen Verbeeck
Koen Verbeeck
SSCoach
SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)

Group: General Forum Members
Points: 16461 Visits: 13207
1. You read the line and store it inside a string.
2. You read the first 3 characters and depending on the result, you decide what to do.
- if it's STN, you know a new row has stared
- if it's DN, LOC or OPT, you need to parse the line.
You can use the .NET split function for that.
You need to use probably the space as a delimiter.
- The result of the split function is an array of strings. Loop over this array and write each occurence to a variable.
Throw away DN, LOC or OPT in the process.
- Read the next line. If it is STN, you know you have processed a full row and you can write it to the output.
3. Repeat step 2 until all lines have been processed.



How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?

Member of LinkedIn. My blog at SQLKover.

MCSA SQL Server 2012 - MCSE Business Intelligence
marg 14154
marg 14154
Forum Newbie
Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)

Group: General Forum Members
Points: 9 Visits: 224
Koen Verbeeck (7/16/2014)
1. You read the line and store it inside a string.
2. You read the first 3 characters and depending on the result, you decide what to do.
- if it's STN, you know a new row has stared
- if it's DN, LOC or OPT, you need to parse the line.
You can use the .NET split function for that.
You need to use probably the space as a delimiter.
- The result of the split function is an array of strings. Loop over this array and write each occurence to a variable.
Throw away DN, LOC or OPT in the process.
- Read the next line. If it is STN, you know you have processed a full row and you can write it to the output.
3. Repeat step 2 until all lines have been processed.


Thank you so much. You have helped a lot!
Just one more thing, there is some cases in which there are some values that should be on the same line as OPT but are the next row like for instance 'CFAN !TKD' e.g. below. Would be able to assist or provide a hint on how I could get those values onto the same row as, merge with those values with OPT?

Thank you.

Eg,
STN
DN 326 0000
LOC GHUT ILCE 01 2 12 01
OPT CFW 1MR DGT RTP 0 ALTO NACT AWS NACT !NPA CFB CFD ENQC
CFAN !TKD

The Result would be:
STN
DN 326 0000
LOC GHUT ILCE 01 2 12 01
OPT CFW 1MR DGT RTP 0 ALTO NACT AWS NACT !NPA CFB CFD ENQC CFAN !TKD
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)

Group: General Forum Members
Points: 45151 Visits: 39923
@marg 14154,

Are the 7 digit values for DN record type unique for each file? I ask because I have an idea for this but need to know that to firm up the approach.

Also, is there a maximum number of different values (ie: columns) that can show up for the OPT record type?

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search