How to import non-delimited text file

Question

How to import non-delimited text file

SmackMule

Mr or Mrs. 500

Points: 565
More actions
July 3, 2013 at 12:07 pm

#276931

Hi, all. I (will) have a non-delimited text file containing 35 lines each for over 5000 records. Below is an example of lines 1-5 and 30-35 of one record. I need two of the 35 values (SubnetMask and SubnetDescription) for each of the 5000+ records in this long, continuous file. Just looking for high-level ideas on the best way to handle this. All my other processes are in SSIS, so will ultimately integrate this import/parse solution in the flow. Any thoughts appreciated.
[font="Courier New"]SubnetAddress=10.16.224.128
SubnetName=FOS-Salzburg Kasernenstraase #823 VPN
SubnetMask=255.255.255.128
NetworkAddress=10.0.0.0
LocationID=895
...
SubnetDescription=CHINA-BEIJING-C025 F2 XIANGJIANG CNVASX1706
DHCPOptionTemplate=
DHCPPolicyTemplate=
PrimaryInterface=No
AllowDHCPClientsModifyDynamicObjectResourceRecords=Same As in Global Policies[/font]

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Ed Wagner SSC Guru Points: 287024 More actions · Answer 1

This is a pretty interesting problem. I don't know if this is the best way to handle it, but here's one approach. Let me admit up-front that I've not tried this, but it seems feasible. The only gotcha I see is if the values themselves will contain =, which is addressed below.

I'd start by creating an XML format file for it and an inbox table to receive the data. Import the file using BULK INSERT into the inbox table. If the = doesn't appear in the values themselves, the inbox table can consist of 2 columns - name and value. If it does, use one column and split the string on the first = in the field. An identity field in the inbox table can help with keeping the sequence of the rows straight.

Next, delete everything from the table where the name isn't one of the two you need.

You now have a table of rows containing everything you need. Use the identity field to sort (so things are paired up correctly) and you can insert it from the inbox table into a permanent table for long-term storage.

Like I said, I've not tried this. If there's a better way, I'm definitely interested in it.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

Luis Cazares SSC Guru Points: 183698 More actions · Answer 2

Here's an idea, you might have to work on it.

First, insert into a dummy table with an identity column and a varchar column (the length might depend on your file.

Then, with that dummy table you can work something around like this. I used a CTE to represent the dummy table.

WITH Dummy(ID, String) AS(

SELECT 1,'SubnetAddress=10.16.224.128' UNION ALL

SELECT 2,'SubnetName=FOS-Salzburg Kasernenstraase #823 VPN' UNION ALL

SELECT 3,'SubnetMask=255.255.255.128' UNION ALL

SELECT 4,'NetworkAddress=10.0.0.0' UNION ALL

SELECT 5,'LocationID=895' UNION ALL

SELECT 30,'SubnetDescription=CHINA-BEIJING-C025 F2 XIANGJIANG CNVASX1706' UNION ALL

SELECT 31,'DHCPOptionTemplate=' UNION ALL

SELECT 32,'DHCPPolicyTemplate=' UNION ALL

SELECT 33,'PrimaryInterface=No' UNION ALL

SELECT 34,'AllowDHCPClientsModifyDynamicObjectResourceRecords=Same As in Global Policies'

),

Split AS(

SELECTLEFT( String, CHARINDEX( '=', String) - 1) columnname,

SUBSTRING( String, CHARINDEX( '=', String) + 1, 8000) value,

ROW_NUMBER() OVER( PARTITION BY LEFT( String, CHARINDEX( '=', String) - 1) ORDER BY ID) rn

FROM Dummy

)

SELECT MAX( CASE WHEN columnname = 'SubnetMask' THEN value END) AS SubnetMask,

MAX( CASE WHEN columnname = 'SubnetDescription' THEN value END) AS SubnetDescription

FROM Split

GROUP BY rn

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

SmackMule Mr or Mrs. 500 Points: 565 More actions · Answer 3

Wow, I very much appreciate this input. I follow what both of you are saying and never would have thought of either. I will attempt to implement both and see which one I like best. Will be a good learning experience as I'm fairly new to SSIS. Again, thank you.

Ed Wagner SSC Guru Points: 287024 More actions · Answer 4

I really like the approach Luis took to split the data. You could use the BULK INSERT into a single field from my approach and then split the values using the his CTE. I think his Dummy table is meant to implement the physical inbox table with a single field from my example in a reproducible form.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

Ed Wagner SSC Guru Points: 287024 More actions · Answer 5

I should have pointed out that you don't have to use SSIS for this. You could implement this is a SQL procedure using the BULK INSERT statement. The only caveat here is that the file you're importing from must be accessible in the file system on the server. The permission required to use BULK INSERT is the server role bulk_admin or better.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

Luis Cazares SSC Guru Points: 183698 More actions · Answer 6

You're right Ed, I called it a dummy table and you called it inbox table but we're referring to the same thing. There's not much difference on our methods, it's mostly a way on which they're expressed.

SSIS would be optional. It can be done with or without it depending on personal preferences or policies.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

SmackMule Mr or Mrs. 500 Points: 565 More actions · Answer 7

Thanks for the clarification guys, helps a lot. I can, indeed, access the files from the SQL Server so a bulk insert should work well. I will implement via a stored proc and simply call it from within my SSIS flow, where this and other imports will be automated. Part of my SSIS flow will be to copy the text file from a network share to a local folder. Thanks to your input, I now know how to proceed and don't have to worry about this over this 4th-of-July weekend. Much appreciated. 🙂

Dwain Camps SSC Guru Points: 86908 More actions · Answer 8

Similar to Luis's suggestion:

WITH Dummy(ID, String) AS(

SELECT 1,'SubnetAddress=10.16.224.128' UNION ALL

SELECT 2,'SubnetName=FOS-Salzburg Kasernenstraase #823 VPN' UNION ALL

SELECT 3,'SubnetMask=255.255.255.128' UNION ALL

SELECT 4,'NetworkAddress=10.0.0.0' UNION ALL

SELECT 5,'LocationID=895' UNION ALL

SELECT 30,'SubnetDescription=CHINA-BEIJING-C025 F2 XIANGJIANG CNVASX1706' UNION ALL

SELECT 31,'DHCPOptionTemplate=' UNION ALL

SELECT 32,'DHCPPolicyTemplate=' UNION ALL

SELECT 33,'PrimaryInterface=No' UNION ALL

SELECT 34,'AllowDHCPClientsModifyDynamicObjectResourceRecords=Same As in Global Policies'

)

SELECT SubnetMask=MAX(CASE WHEN 'SubnetMask' THEN [value] END)

,SubnetDescription=MAX(CASE WHEN 'SubnetDescription' THEN [value] END)

FROM Dummy a

CROSS APPLY (

SELECT =LEFT(String, CHARINDEX('=', String)-1)

,[value]=RIGHT(String, LEN(String)-CHARINDEX('=', String))) b

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

SmackMule Mr or Mrs. 500 Points: 565 More actions · Answer 9

Dwain.c, your code generates a single output row, not one for each record (comprising 35 rows) in the source file.

Dwain Camps SSC Guru Points: 86908 More actions · Answer 10

reid.kell (7/3/2013)
...
I need two of the 35 values (SubnetMask and SubnetDescription) for each of the 5000+ records in this long, continuous file. ...

reid.kell (7/8/2013)
Dwain.c, your code generates a single output row, not one for each record (comprising 35 rows) in the source file.

Isn't that what Luis and I gave you?

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

SmackMule Mr or Mrs. 500 Points: 565 More actions · Answer 11

No. In my scenario, Luis' routine correctly generates 5000 records; yours generates one. In my text file, each "record" spans 35 lines. There are 35 lines x 5000 records = 175,000 lines in the text file. I need 5000 results, which Luis' routine produces. It's possible I didn't implement your solution correctly.

Dwain Camps SSC Guru Points: 86908 More actions · Answer 12

reid.kell (7/9/2013)
No. In my scenario, Luis' routine correctly generates 5000 records; yours generates one. In my text file, each "record" spans 35 lines. There are 35 lines x 5000 records = 175,000 lines in the text file. I need 5000 results, which Luis' routine produces. It's possible I didn't implement your solution correctly.

I asked that question because I thought I had run Luis's query against your sample data and then wrote my query to match its results set. Perhaps I'm mistaken.

If you've got something that works for your case, I say go with it!

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

SmackMule Mr or Mrs. 500 Points: 565 More actions · Answer 13

Understood, and agreed. I very much appreciate your input, though. Going to study your method and learn from it -- very interesting.