Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««1234»»»

Importing Poorly Formatted Text File Expand / Collapse
Author
Message
Posted Wednesday, July 16, 2014 11:53 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, October 20, 2014 9:06 PM
Points: 9, Visits: 219
Jeff Moden (7/16/2014)
@marg 14154,

Are the 7 digit values for DN record type unique for each file? I ask because I have an idea for this but need to know that to firm up the approach.

Also, is there a maximum number of different values (ie: columns) that can show up for the OPT record type?


Yes the DN are unique for each file. For the OPT record, I am afraid, the maximum number of values is not known.
Post #1593390
Posted Wednesday, July 16, 2014 11:57 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, October 20, 2014 9:06 PM
Points: 9, Visits: 219
Jeff Moden (7/16/2014)
@marg 14154,

Are the 7 digit values for DN record type unique for each file? I ask because I have an idea for this but need to know that to firm up the approach.

Also, is there a maximum number of different values (ie: columns) that can show up for the OPT record type?


Yes Jeff the 7 digit values for DN are unique. For OPT, I am afraid the maximum values is not known.
Post #1593393
Posted Thursday, July 17, 2014 12:05 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, October 20, 2014 9:06 PM
Points: 9, Visits: 219
Jeff Moden (7/16/2014)
@marg 14154,

Are the 7 digit values for DN record type unique for each file? I ask because I have an idea for this but need to know that to firm up the approach.

Also, is there a maximum number of different values (ie: columns) that can show up for the OPT record type?


Hi Jeff, yes the 7 digit values for DN is unique. For the different values that can show up for OPT, it is not known. But I was provided a list of known which is about 47 in total, that is should appear and there are:
!124 !BAR !ECO !IDD !LOC !NPA !OPR !RIT !STD !TKD !VAS 1FR 1MR ACT ALTO AWS CDF
CFAN CFB CFBU CFD CFDU CFW CLT CND CNDB
COB CWT DGT DNH DOR DTM ENQC FNT FPSV FREQ FRST MCT MTR NACT OCB OPT RTP SUS SUSO SUST VM
Post #1593395
Posted Thursday, July 17, 2014 12:29 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 4:36 PM
Points: 35,546, Visits: 32,131
That works. It's almost 2:30AM so I've got to hit the rack so I can be up at 6. I'll give this a whirl tomorrow night after work. Thanks for the feedback.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1593400
Posted Thursday, July 17, 2014 12:50 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 9:00 AM
Points: 13,520, Visits: 11,313
marg 14154 (7/16/2014)

Thank you so much. You have helped a lot!
Just one more thing, there is some cases in which there are some values that should be on the same line as OPT but are the next row like for instance 'CFAN !TKD' e.g. below. Would be able to assist or provide a hint on how I could get those values onto the same row as, merge with those values with OPT?

Thank you.



You just look out for those special 3 characters at the start of the line. (OPT,DN,LOC and STN)
If it is neither of those, the values belong to the previous line.




How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?

Member of LinkedIn. My blog at LessThanDot.

MCSA SQL Server 2012 - MCSE Business Intelligence
Post #1593404
Posted Thursday, July 17, 2014 8:47 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Today @ 2:48 PM
Points: 1,932, Visits: 19,917
marg 14154 (7/14/2014)
I am trying to import a text file to a SQL server table using SISS. I use SSIS but this is my first time in import this type. Please HELP!
---------------text file--------------
REQ que stn all

STN
DN 326 0000
LOC GHUT ILCE 01 2 12 01
OPT CFW 1MR DGT RTP 0 ALTO NACT AWS NACT !NPA CFB CFD ENQC
CFAN

STN
DN 326 0001
LOC GHUT ILCE 01 2 12 02
OPT SUS CFW 1MR CWT DGT RTP 0 ALTO 3261281 ACT AWS NACT !NPA !ECO !RIT !124 CFB
CFD VM ENQC

STN
DN 326 0003
LOC GHUT ILCE 03 2 16 14
OPT 1MR DGT RTP 0 !TKD CFB CFD

STN
DN 326 0004
LOC GHUT ILCE 03 2 19 20
OPT 1MR DGT RTP 0 !LOC CFB CFD
-------------------------------------------------------------
1. The header "REQ que stn all" is to be disregarded
2. STN disregarded meaning not be included in the destination table
3. DN disregarded
4. 7 digits number after DN eg, 326 0000 is the phone number, first column
5. LOC disregarded
6. GHUT i.e. the next after LOC is the second column and that the Station Code
7. The set of codes after GHUT, eg 'ILCE 01 2 12 01' is one column and that is the Port.
8. OPT is disregarded.
9. the Codes after OPT eg, 'CFW 1MR DGT RTP 0 ALTO NACT AWS NACT !NPA CFB CFD ENQC
CFAN' to be treated as individual separate columns.

Thanks,


@marg...for clarification...based on your sample data above what does the destination table look like?

thanks jls


______________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
Post #1593642
Posted Thursday, July 17, 2014 11:51 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 4:36 PM
Points: 35,546, Visits: 32,131
Ok. Here we go. Time for some good ol' fashioned "Black Arts".

The first thing is, is to understand the true nature of the data-file example that was posted. I did that by "quoting the post" that had the example and some remarkable stuff showed up that the HTML rendering on this site was hiding. There's a ton of leading and embedded spaces. Here's what the file data really looks like...

REQ   que stn all

STN
DN 326 0000
LOC GHUT ILCE 01 2 12 01
OPT CFW 1MR DGT RTP 0 ALTO NACT AWS NACT !NPA CFB CFD ENQC
CFAN

STN
DN 326 0001
LOC GHUT ILCE 01 2 12 02
OPT SUS CFW 1MR CWT DGT RTP 0 ALTO 3261281 ACT AWS NACT !NPA !ECO !RIT !124 CFB
CFD VM ENQC

STN
DN 326 0003
LOC GHUT ILCE 03 2 16 14
OPT 1MR DGT RTP 0 !TKD CFB CFD

STN
DN 326 0004
LOC GHUT ILCE 03 2 19 20
OPT 1MR DGT RTP 0 !LOC CFB CFD



The blank line actually contain 4 spaces. To help us see the spaces, I changed them all to a dash so they're easier to see.

REQ---que-stn-all
----
----STN
----DN---326-0000
----LOC--GHUT-ILCE-01-2-12-01-
----OPT--CFW-1MR-DGT-RTP-0-ALTO-NACT-AWS-NACT-!NPA-CFB-CFD-ENQC-
---------CFAN-
----
----STN
----DN---326-0001
----LOC--GHUT-ILCE-01-2-12-02-
----OPT--SUS-CFW-1MR-CWT-DGT-RTP-0-ALTO-3261281-ACT-AWS-NACT-!NPA-!ECO-!RIT-!124-CFB-
---------CFD-VM-ENQC-
----
----STN
----DN---326-0003
----LOC--GHUT-ILCE-03-2-16-14-
----OPT--1MR-DGT-RTP-0-!TKD-CFB-CFD-
----
----STN
----DN---326-0004
----LOC--GHUT-ILCE-03-2-19-20-
----OPT--1MR-DGT-RTP-0-!LOC-CFB-CFD



That's almost a fixed field format. In fact, the data to the right of what I'm calling the "RowType" is the only place where it's not fixed field. It's "ragged right" but it does start in the same position in the line even for the "next line extension" of the OPT RowType. That's EASY.

The first thing to do is build a BCP FORMAT FILE to control the import. I recommend creating a new directory somewhere on the server itself to hold such format files. Save the following as a format file. On my test box, I saved it under "C:\SQL Examples\BulkData.fmt".

9.0
4
1 SQLCHAR 0 4 "" 0 Blanks01 ""
2 SQLCHAR 0 3 "" 2 RowType ""
3 SQLCHAR 0 2 "" 0 Blanks02 ""
4 SQLCHAR 0 8000 "\r\ n" 3 BulkData "" <-----<<<< Take the space out between the \ and the n. This site "consumes" backslash n as a control character.



The BCP FORMAT FILE skips the 4 spaces at the beginning of each line, picks up the next 3 characters (spaces or not) as the RowType, skips 2 more spaces, and then picks up to the next 8000 characters (ending with a carriage return/line feed) as the "bulk data" for each RowType.

The rest is pretty easy. I saved your file data example as "C:\SQL Examples\Weird Telephone File.txt". You'll, of course, need to change that and the location of the BCP FORMAT FILE. If all of this works out according to you, then I'll show you how to parameterize those two file paths so that you could turn this into a stored procedure.

Here's the code. As normal, the details are in the comments in the code.

--===== If the work table exists, drop it to make reruns in SSMS easier.
IF OBJECT_ID('tempdb..#BulkData','U') IS NOT NULL
DROP TABLE #BulkData
;
--===== Create the work table with the quintessential CLUSTERED index on
-- the RowNum column.
CREATE TABLE #BulkData
(
RowNum INT IDENTITY(1,1) PRIMARY KEY CLUSTERED
,RowType CHAR(3)
,BulkData VARCHAR(8000)
,DN CHAR(8)
)
;
--===== Using the previously constructed BCP FORMAT FILE,
-- load the file into the work table.
-- Because of the structure of the file and the BCP FORMAT FILE,
-- this will skip the blank rows, as well. Makes life real easy.
BULK INSERT #BulkData
FROM 'C:\SQL Examples\Weird Telephone File.txt'
WITH (
FORMATFILE = 'C:\SQL Examples\BulkData.fmt'
,FIRSTROW = 3
,TABLOCK
)
;
--===== Declare a "per row" variable that will be updated for every row.
-- Think "pseudo-cursor" here where we use the natural looping behind
-- the scenes of the UPDATE as if it were a declared cursor.
DECLARE @DN CHAR(8)
;
--===== Using the old fashioned method of a "Quirky Update",
-- do a "data smear" to copy each DN entry "down" to the next rows
-- until we hit another DN entry and then copy that "down", etc.
-- This is where we also assign a RowType to prep for the pivoting
-- that we'll do using an old fashiond CROSSTAB.
-- Notice that this is where we also create the "OFL" or "OVERFLOw"
-- RowType for the "OPT" row "extensions".
-- Heh... when I say "old fashioned", think "Black Arts".
UPDATE tgt
SET @DN = DN = CASE WHEN RowType = 'DN' THEN LEFT(BulkData,8) ELSE ISNULL(@DN,'') END
,RowType = CASE WHEN RowType = '' AND RTRIM(LTRIM(BulkData)) <> 'STN' THEN 'OFL' ELSE RowType END
,BulkData = LTRIM(RTRIM(BulkData))
FROM #BulkData tgt WITH(TABLOCKX,INDEX(1))
OPTION (MAXDOP 1)
;
--===== Let's see what we have in the work table...
SELECT * FROM #BulkData
;
--===== Now, let's simultaneously pivot and parse the data using a very high performance CROSSTAB.
SELECT PhoneNumber = DN
,StationCode = MAX(CASE WHEN RowType = 'LOC' THEN RTRIM(SUBSTRING(BulkData,1,CHARINDEX(' ',BulkData))) ELSE '' END)
,Port = MAX(CASE WHEN RowType = 'LOC' THEN RTRIM(LTRIM(SUBSTRING(BulkData,CHARINDEX(' ',BulkData),8000))) ELSE '' END)
,Options = MAX(CASE WHEN RowType = 'OPT' THEN RTRIM(BulkData) ELSE '' END)
+ MAX(CASE WHEN RowType = 'OFL' THEN ' '+LTRIM(RTRIM(BulkData)) ELSE '' END)
FROM #BulkData
GROUP BY DN
;

This is what ends up in the "work" table... (scroll-right to see the new "DN" column)

RowNum  RowType BulkData                                                                    DN
1 DN 326 0000 326 0000
2 LOC GHUT ILCE 01 2 12 01 326 0000
3 OPT CFW 1MR DGT RTP 0 ALTO NACT AWS NACT !NPA CFB CFD ENQC 326 0000
4 OFL CFAN 326 0000
5 STN 326 0000
6 DN 326 0001 326 0001
7 LOC GHUT ILCE 01 2 12 02 326 0001
8 OPT SUS CFW 1MR CWT DGT RTP 0 ALTO 3261281 ACT AWS NACT !NPA !ECO !RIT !124 CFB 326 0001
9 OFL CFD VM ENQC 326 0001
10 STN 326 0001
11 DN 326 0003 326 0003
12 LOC GHUT ILCE 03 2 16 14 326 0003
13 OPT 1MR DGT RTP 0 !TKD CFB CFD 326 0003
14 STN 326 0003
15 DN 326 0004 326 0004
16 LOC GHUT ILCE 03 2 19 20 326 0004
17 OPT 1MR DGT RTP 0 !LOC CFB CFD 326 0004


Notice how the DN has been "smeared down" for all rows where it first occurs until a new DN row shows up and then that is "smeared down", etc, etc. That's the key that we GROUP on to do the CROSS APPLY to pivot everything and to combine the any existing OFL rows with their respective OPT rows.

Here's what the final output looks like courtesy of the CROSS APPLY.

PhoneNumber StationCode Port            Options
----------- ----------- --------------- ---------------------------------------------------------------------------------------
326 0000 GHUT ILCE 01 2 12 01 CFW 1MR DGT RTP 0 ALTO NACT AWS NACT !NPA CFB CFD ENQC CFAN
326 0001 GHUT ILCE 01 2 12 02 SUS CFW 1MR CWT DGT RTP 0 ALTO 3261281 ACT AWS NACT !NPA !ECO !RIT !124 CFB CFD VM ENQC
326 0003 GHUT ILCE 03 2 16 14 1MR DGT RTP 0 !TKD CFB CFD
326 0004 GHUT ILCE 03 2 19 20 1MR DGT RTP 0 !LOC CFB CFD


This is the "easy" part, so far. If that's ok by you, the next part would be to split the options (OPT) into separate columns but, I have to ask, do you REALLY want to do that?


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1593907
Posted Friday, July 18, 2014 12:00 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 4:36 PM
Points: 35,546, Visits: 32,131
@Koen,

Now you see why I never get into SSIS. I just don't need it.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1593911
Posted Friday, July 18, 2014 12:59 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 9:00 AM
Points: 13,520, Visits: 11,313
Jeff Moden (7/18/2014)
@Koen,

Now you see why I never get into SSIS. I just don't need it.


Everyone has his own tool. I can say the same thing about BCP




How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?

Member of LinkedIn. My blog at LessThanDot.

MCSA SQL Server 2012 - MCSE Business Intelligence
Post #1593923
Posted Friday, July 18, 2014 5:22 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 4:36 PM
Points: 35,546, Visits: 32,131
Koen Verbeeck (7/18/2014)
Jeff Moden (7/18/2014)
@Koen,

Now you see why I never get into SSIS. I just don't need it.


Everyone has his own tool. I can say the same thing about BCP


Absolutely agreed there.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1593991
« Prev Topic | Next Topic »

Add to briefcase ««1234»»»

Permissions Expand / Collapse