Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Importing Poorly Formatted Text File


Importing Poorly Formatted Text File

Author
Message
marg 14154
marg 14154
Forum Newbie
Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)

Group: General Forum Members
Points: 9 Visits: 224
Jeff Moden (7/16/2014)
@marg 14154,

Are the 7 digit values for DN record type unique for each file? I ask because I have an idea for this but need to know that to firm up the approach.

Also, is there a maximum number of different values (ie: columns) that can show up for the OPT record type?


Yes the DN are unique for each file. For the OPT record, I am afraid, the maximum number of values is not known.
marg 14154
marg 14154
Forum Newbie
Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)

Group: General Forum Members
Points: 9 Visits: 224
Jeff Moden (7/16/2014)
@marg 14154,

Are the 7 digit values for DN record type unique for each file? I ask because I have an idea for this but need to know that to firm up the approach.

Also, is there a maximum number of different values (ie: columns) that can show up for the OPT record type?


Yes Jeff the 7 digit values for DN are unique. For OPT, I am afraid the maximum values is not known.
marg 14154
marg 14154
Forum Newbie
Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)

Group: General Forum Members
Points: 9 Visits: 224
Jeff Moden (7/16/2014)
@marg 14154,

Are the 7 digit values for DN record type unique for each file? I ask because I have an idea for this but need to know that to firm up the approach.

Also, is there a maximum number of different values (ie: columns) that can show up for the OPT record type?


Hi Jeff, yes the 7 digit values for DN is unique. For the different values that can show up for OPT, it is not known. But I was provided a list of known which is about 47 in total, that is should appear and there are:
!124   !BAR   !ECO   !IDD   !LOC   !NPA   !OPR   !RIT   !STD   !TKD   !VAS   1FR   1MR   ACT   ALTO   AWS   CDF   
CFAN   CFB   CFBU   CFD   CFDU   CFW   CLT   CND   CNDB   
COB   CWT   DGT   DNH   DOR   DTM   ENQC FNT   FPSV   FREQ   FRST   MCT   MTR   NACT   OCB   OPT   RTP   SUS   SUSO SUST VM
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)

Group: General Forum Members
Points: 44968 Visits: 39862
That works. It's almost 2:30AM so I've got to hit the rack so I can be up at 6. I'll give this a whirl tomorrow night after work. Thanks for the feedback.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
     Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is usually not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Koen Verbeeck
Koen Verbeeck
SSCoach
SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)

Group: General Forum Members
Points: 16380 Visits: 13199
marg 14154 (7/16/2014)

Thank you so much. You have helped a lot!
Just one more thing, there is some cases in which there are some values that should be on the same line as OPT but are the next row like for instance 'CFAN !TKD' e.g. below. Would be able to assist or provide a hint on how I could get those values onto the same row as, merge with those values with OPT?

Thank you.



You just look out for those special 3 characters at the start of the line. (OPT,DN,LOC and STN)
If it is neither of those, the values belong to the previous line.



How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?

Member of LinkedIn. My blog at SQLKover.

MCSA SQL Server 2012 - MCSE Business Intelligence
J Livingston SQL
J Livingston SQL
Hall of Fame
Hall of Fame (3.4K reputation)Hall of Fame (3.4K reputation)Hall of Fame (3.4K reputation)Hall of Fame (3.4K reputation)Hall of Fame (3.4K reputation)Hall of Fame (3.4K reputation)Hall of Fame (3.4K reputation)Hall of Fame (3.4K reputation)

Group: General Forum Members
Points: 3445 Visits: 33000
marg 14154 (7/14/2014)
I am trying to import a text file to a SQL server table using SISS. I use SSIS but this is my first time in import this type. Please HELP!
---------------text file--------------
REQ que stn all

STN
DN 326 0000
LOC GHUT ILCE 01 2 12 01
OPT CFW 1MR DGT RTP 0 ALTO NACT AWS NACT !NPA CFB CFD ENQC
CFAN

STN
DN 326 0001
LOC GHUT ILCE 01 2 12 02
OPT SUS CFW 1MR CWT DGT RTP 0 ALTO 3261281 ACT AWS NACT !NPA !ECO !RIT !124 CFB
CFD VM ENQC

STN
DN 326 0003
LOC GHUT ILCE 03 2 16 14
OPT 1MR DGT RTP 0 !TKD CFB CFD

STN
DN 326 0004
LOC GHUT ILCE 03 2 19 20
OPT 1MR DGT RTP 0 !LOC CFB CFD
-------------------------------------------------------------
1. The header "REQ que stn all" is to be disregarded
2. STN disregarded meaning not be included in the destination table
3. DN disregarded
4. 7 digits number after DN eg, 326 0000 is the phone number, first column
5. LOC disregarded
6. GHUT i.e. the next after LOC is the second column and that the Station Code
7. The set of codes after GHUT, eg 'ILCE 01 2 12 01' is one column and that is the Port.
8. OPT is disregarded.
9. the Codes after OPT eg, 'CFW 1MR DGT RTP 0 ALTO NACT AWS NACT !NPA CFB CFD ENQC
CFAN' to be treated as individual separate columns.

Thanks,


@marg...for clarification...based on your sample data above what does the destination table look like?

thanks jls

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)

Group: General Forum Members
Points: 44968 Visits: 39862
Ok. Here we go. Time for some good ol' fashioned "Black Arts". :-)

The first thing is, is to understand the true nature of the data-file example that was posted. I did that by "quoting the post" that had the example and some remarkable stuff showed up that the HTML rendering on this site was hiding. There's a ton of leading and embedded spaces. Here's what the file data really looks like...


REQ que stn all

STN
DN 326 0000
LOC GHUT ILCE 01 2 12 01
OPT CFW 1MR DGT RTP 0 ALTO NACT AWS NACT !NPA CFB CFD ENQC
CFAN

STN
DN 326 0001
LOC GHUT ILCE 01 2 12 02
OPT SUS CFW 1MR CWT DGT RTP 0 ALTO 3261281 ACT AWS NACT !NPA !ECO !RIT !124 CFB
CFD VM ENQC

STN
DN 326 0003
LOC GHUT ILCE 03 2 16 14
OPT 1MR DGT RTP 0 !TKD CFB CFD

STN
DN 326 0004
LOC GHUT ILCE 03 2 19 20
OPT 1MR DGT RTP 0 !LOC CFB CFD



The blank line actually contain 4 spaces. To help us see the spaces, I changed them all to a dash so they're easier to see.


REQ---que-stn-all
----
----STN
----DN---326-0000
----LOC--GHUT-ILCE-01-2-12-01-
----OPT--CFW-1MR-DGT-RTP-0-ALTO-NACT-AWS-NACT-!NPA-CFB-CFD-ENQC-
---------CFAN-
----
----STN
----DN---326-0001
----LOC--GHUT-ILCE-01-2-12-02-
----OPT--SUS-CFW-1MR-CWT-DGT-RTP-0-ALTO-3261281-ACT-AWS-NACT-!NPA-!ECO-!RIT-!124-CFB-
---------CFD-VM-ENQC-
----
----STN
----DN---326-0003
----LOC--GHUT-ILCE-03-2-16-14-
----OPT--1MR-DGT-RTP-0-!TKD-CFB-CFD-
----
----STN
----DN---326-0004
----LOC--GHUT-ILCE-03-2-19-20-
----OPT--1MR-DGT-RTP-0-!LOC-CFB-CFD



That's almost a fixed field format. In fact, the data to the right of what I'm calling the "RowType" is the only place where it's not fixed field. It's "ragged right" but it does start in the same position in the line even for the "next line extension" of the OPT RowType. That's EASY.

The first thing to do is build a BCP FORMAT FILE to control the import. I recommend creating a new directory somewhere on the server itself to hold such format files. Save the following as a format file. On my test box, I saved it under "C:\SQL Examples\BulkData.fmt".


9.0
4
1 SQLCHAR 0 4 "" 0 Blanks01 ""
2 SQLCHAR 0 3 "" 2 RowType ""
3 SQLCHAR 0 2 "" 0 Blanks02 ""
4 SQLCHAR 0 8000 "\r\ n" 3 BulkData "" <-----<<<< Take the space out between the \ and the n. This site "consumes" backslash n as a control character.



The BCP FORMAT FILE skips the 4 spaces at the beginning of each line, picks up the next 3 characters (spaces or not) as the RowType, skips 2 more spaces, and then picks up to the next 8000 characters (ending with a carriage return/line feed) as the "bulk data" for each RowType.

The rest is pretty easy. I saved your file data example as "C:\SQL Examples\Weird Telephone File.txt". You'll, of course, need to change that and the location of the BCP FORMAT FILE. If all of this works out according to you, then I'll show you how to parameterize those two file paths so that you could turn this into a stored procedure.

Here's the code. As normal, the details are in the comments in the code.


--===== If the work table exists, drop it to make reruns in SSMS easier.
IF OBJECT_ID('tempdb..#BulkData','U') IS NOT NULL
DROP TABLE #BulkData
;
--===== Create the work table with the quintessential CLUSTERED index on
-- the RowNum column.
CREATE TABLE #BulkData
(
RowNum INT IDENTITY(1,1) PRIMARY KEY CLUSTERED
,RowType CHAR(3)
,BulkData VARCHAR(8000)
,DN CHAR(8)
)
;
--===== Using the previously constructed BCP FORMAT FILE,
-- load the file into the work table.
-- Because of the structure of the file and the BCP FORMAT FILE,
-- this will skip the blank rows, as well. Makes life real easy.
BULK INSERT #BulkData
FROM 'C:\SQL Examples\Weird Telephone File.txt'
WITH (
FORMATFILE = 'C:\SQL Examples\BulkData.fmt'
,FIRSTROW = 3
,TABLOCK
)
;
--===== Declare a "per row" variable that will be updated for every row.
-- Think "pseudo-cursor" here where we use the natural looping behind
-- the scenes of the UPDATE as if it were a declared cursor.
DECLARE @DN CHAR(8)
;
--===== Using the old fashioned method of a "Quirky Update",
-- do a "data smear" to copy each DN entry "down" to the next rows
-- until we hit another DN entry and then copy that "down", etc.
-- This is where we also assign a RowType to prep for the pivoting
-- that we'll do using an old fashiond CROSSTAB.
-- Notice that this is where we also create the "OFL" or "OVERFLOw"
-- RowType for the "OPT" row "extensions".
-- Heh... when I say "old fashioned", think "Black Arts".
UPDATE tgt
SET @DN = DN = CASE WHEN RowType = 'DN' THEN LEFT(BulkData,8) ELSE ISNULL(@DN,'') END
,RowType = CASE WHEN RowType = '' AND RTRIM(LTRIM(BulkData)) <> 'STN' THEN 'OFL' ELSE RowType END
,BulkData = LTRIM(RTRIM(BulkData))
FROM #BulkData tgt WITH(TABLOCKX,INDEX(1))
OPTION (MAXDOP 1)
;
--===== Let's see what we have in the work table...
SELECT * FROM #BulkData
;
--===== Now, let's simultaneously pivot and parse the data using a very high performance CROSSTAB.
SELECT PhoneNumber = DN
,StationCode = MAX(CASE WHEN RowType = 'LOC' THEN RTRIM(SUBSTRING(BulkData,1,CHARINDEX(' ',BulkData))) ELSE '' END)
,Port = MAX(CASE WHEN RowType = 'LOC' THEN RTRIM(LTRIM(SUBSTRING(BulkData,CHARINDEX(' ',BulkData),8000))) ELSE '' END)
,Options = MAX(CASE WHEN RowType = 'OPT' THEN RTRIM(BulkData) ELSE '' END)
+ MAX(CASE WHEN RowType = 'OFL' THEN ' '+LTRIM(RTRIM(BulkData)) ELSE '' END)
FROM #BulkData
GROUP BY DN
;



This is what ends up in the "work" table... (scroll-right to see the new "DN" column)


RowNum RowType BulkData DN
1 DN 326 0000 326 0000
2 LOC GHUT ILCE 01 2 12 01 326 0000
3 OPT CFW 1MR DGT RTP 0 ALTO NACT AWS NACT !NPA CFB CFD ENQC 326 0000
4 OFL CFAN 326 0000
5 STN 326 0000
6 DN 326 0001 326 0001
7 LOC GHUT ILCE 01 2 12 02 326 0001
8 OPT SUS CFW 1MR CWT DGT RTP 0 ALTO 3261281 ACT AWS NACT !NPA !ECO !RIT !124 CFB 326 0001
9 OFL CFD VM ENQC 326 0001
10 STN 326 0001
11 DN 326 0003 326 0003
12 LOC GHUT ILCE 03 2 16 14 326 0003
13 OPT 1MR DGT RTP 0 !TKD CFB CFD 326 0003
14 STN 326 0003
15 DN 326 0004 326 0004
16 LOC GHUT ILCE 03 2 19 20 326 0004
17 OPT 1MR DGT RTP 0 !LOC CFB CFD 326 0004


Notice how the DN has been "smeared down" for all rows where it first occurs until a new DN row shows up and then that is "smeared down", etc, etc. That's the key that we GROUP on to do the CROSS APPLY to pivot everything and to combine the any existing OFL rows with their respective OPT rows.

Here's what the final output looks like courtesy of the CROSS APPLY.


PhoneNumber StationCode Port Options
----------- ----------- --------------- ---------------------------------------------------------------------------------------
326 0000 GHUT ILCE 01 2 12 01 CFW 1MR DGT RTP 0 ALTO NACT AWS NACT !NPA CFB CFD ENQC CFAN
326 0001 GHUT ILCE 01 2 12 02 SUS CFW 1MR CWT DGT RTP 0 ALTO 3261281 ACT AWS NACT !NPA !ECO !RIT !124 CFB CFD VM ENQC
326 0003 GHUT ILCE 03 2 16 14 1MR DGT RTP 0 !TKD CFB CFD
326 0004 GHUT ILCE 03 2 19 20 1MR DGT RTP 0 !LOC CFB CFD


This is the "easy" part, so far. If that's ok by you, the next part would be to split the options (OPT) into separate columns but, I have to ask, do you REALLY want to do that?

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
     Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is usually not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)

Group: General Forum Members
Points: 44968 Visits: 39862
@Koen,

Now you see why I never get into SSIS. I just don't need it.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
     Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is usually not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Koen Verbeeck
Koen Verbeeck
SSCoach
SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)

Group: General Forum Members
Points: 16380 Visits: 13199
Jeff Moden (7/18/2014)
@Koen,

Now you see why I never get into SSIS. I just don't need it.


Everyone has his own tool. I can say the same thing about BCP :-D



How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?

Member of LinkedIn. My blog at SQLKover.

MCSA SQL Server 2012 - MCSE Business Intelligence
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)

Group: General Forum Members
Points: 44968 Visits: 39862
Koen Verbeeck (7/18/2014)
Jeff Moden (7/18/2014)
@Koen,

Now you see why I never get into SSIS. I just don't need it.


Everyone has his own tool. I can say the same thing about BCP :-D


Absolutely agreed there.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
     Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is usually not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search