How to import non-delimited text file

  • Hi, all. I (will) have a non-delimited text file containing 35 lines each for over 5000 records. Below is an example of lines 1-5 and 30-35 of one record. I need two of the 35 values (SubnetMask and SubnetDescription) for each of the 5000+ records in this long, continuous file. Just looking for high-level ideas on the best way to handle this. All my other processes are in SSIS, so will ultimately integrate this import/parse solution in the flow. Any thoughts appreciated.

    [font="Courier New"]SubnetAddress=10.16.224.128

    SubnetName=FOS-Salzburg Kasernenstraase #823 VPN

    SubnetMask=255.255.255.128

    NetworkAddress=10.0.0.0

    LocationID=895

    ...

    SubnetDescription=CHINA-BEIJING-C025 F2 XIANGJIANG CNVASX1706

    DHCPOptionTemplate=

    DHCPPolicyTemplate=

    PrimaryInterface=No

    AllowDHCPClientsModifyDynamicObjectResourceRecords=Same As in Global Policies[/font]

  • This is a pretty interesting problem. I don't know if this is the best way to handle it, but here's one approach. Let me admit up-front that I've not tried this, but it seems feasible. The only gotcha I see is if the values themselves will contain =, which is addressed below.

    I'd start by creating an XML format file for it and an inbox table to receive the data. Import the file using BULK INSERT into the inbox table. If the = doesn't appear in the values themselves, the inbox table can consist of 2 columns - name and value. If it does, use one column and split the string on the first = in the field. An identity field in the inbox table can help with keeping the sequence of the rows straight.

    Next, delete everything from the table where the name isn't one of the two you need.

    You now have a table of rows containing everything you need. Use the identity field to sort (so things are paired up correctly) and you can insert it from the inbox table into a permanent table for long-term storage.

    Like I said, I've not tried this. If there's a better way, I'm definitely interested in it.

  • Here's an idea, you might have to work on it.

    First, insert into a dummy table with an identity column and a varchar column (the length might depend on your file.

    Then, with that dummy table you can work something around like this. I used a CTE to represent the dummy table.

    WITH Dummy(ID, String) AS(

    SELECT 1,'SubnetAddress=10.16.224.128' UNION ALL

    SELECT 2,'SubnetName=FOS-Salzburg Kasernenstraase #823 VPN' UNION ALL

    SELECT 3,'SubnetMask=255.255.255.128' UNION ALL

    SELECT 4,'NetworkAddress=10.0.0.0' UNION ALL

    SELECT 5,'LocationID=895' UNION ALL

    SELECT 30,'SubnetDescription=CHINA-BEIJING-C025 F2 XIANGJIANG CNVASX1706' UNION ALL

    SELECT 31,'DHCPOptionTemplate=' UNION ALL

    SELECT 32,'DHCPPolicyTemplate=' UNION ALL

    SELECT 33,'PrimaryInterface=No' UNION ALL

    SELECT 34,'AllowDHCPClientsModifyDynamicObjectResourceRecords=Same As in Global Policies'

    ),

    Split AS(

    SELECTLEFT( String, CHARINDEX( '=', String) - 1) columnname,

    SUBSTRING( String, CHARINDEX( '=', String) + 1, 8000) value,

    ROW_NUMBER() OVER( PARTITION BY LEFT( String, CHARINDEX( '=', String) - 1) ORDER BY ID) rn

    FROM Dummy

    )

    SELECT MAX( CASE WHEN columnname = 'SubnetMask' THEN value END) AS SubnetMask,

    MAX( CASE WHEN columnname = 'SubnetDescription' THEN value END) AS SubnetDescription

    FROM Split

    GROUP BY rn

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Wow, I very much appreciate this input. I follow what both of you are saying and never would have thought of either. I will attempt to implement both and see which one I like best. Will be a good learning experience as I'm fairly new to SSIS. Again, thank you.

  • I really like the approach Luis took to split the data. You could use the BULK INSERT into a single field from my approach and then split the values using the his CTE. I think his Dummy table is meant to implement the physical inbox table with a single field from my example in a reproducible form.

  • I should have pointed out that you don't have to use SSIS for this. You could implement this is a SQL procedure using the BULK INSERT statement. The only caveat here is that the file you're importing from must be accessible in the file system on the server. The permission required to use BULK INSERT is the server role bulk_admin or better.

  • You're right Ed, I called it a dummy table and you called it inbox table but we're referring to the same thing. There's not much difference on our methods, it's mostly a way on which they're expressed.

    SSIS would be optional. It can be done with or without it depending on personal preferences or policies.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Thanks for the clarification guys, helps a lot. I can, indeed, access the files from the SQL Server so a bulk insert should work well. I will implement via a stored proc and simply call it from within my SSIS flow, where this and other imports will be automated. Part of my SSIS flow will be to copy the text file from a network share to a local folder. Thanks to your input, I now know how to proceed and don't have to worry about this over this 4th-of-July weekend. Much appreciated. 🙂

  • Similar to Luis's suggestion:

    WITH Dummy(ID, String) AS(

    SELECT 1,'SubnetAddress=10.16.224.128' UNION ALL

    SELECT 2,'SubnetName=FOS-Salzburg Kasernenstraase #823 VPN' UNION ALL

    SELECT 3,'SubnetMask=255.255.255.128' UNION ALL

    SELECT 4,'NetworkAddress=10.0.0.0' UNION ALL

    SELECT 5,'LocationID=895' UNION ALL

    SELECT 30,'SubnetDescription=CHINA-BEIJING-C025 F2 XIANGJIANG CNVASX1706' UNION ALL

    SELECT 31,'DHCPOptionTemplate=' UNION ALL

    SELECT 32,'DHCPPolicyTemplate=' UNION ALL

    SELECT 33,'PrimaryInterface=No' UNION ALL

    SELECT 34,'AllowDHCPClientsModifyDynamicObjectResourceRecords=Same As in Global Policies'

    )

    SELECT SubnetMask=MAX(CASE WHEN 'SubnetMask' THEN [value] END)

    ,SubnetDescription=MAX(CASE WHEN 'SubnetDescription' THEN [value] END)

    FROM Dummy a

    CROSS APPLY (

    SELECT =LEFT(String, CHARINDEX('=', String)-1)

    ,[value]=RIGHT(String, LEN(String)-CHARINDEX('=', String))) b


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Dwain.c, your code generates a single output row, not one for each record (comprising 35 rows) in the source file.

  • reid.kell (7/3/2013)


    ...

    I need two of the 35 values (SubnetMask and SubnetDescription) for each of the 5000+ records in this long, continuous file. ...

    reid.kell (7/8/2013)


    Dwain.c, your code generates a single output row, not one for each record (comprising 35 rows) in the source file.

    Isn't that what Luis and I gave you?


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • No. In my scenario, Luis' routine correctly generates 5000 records; yours generates one. In my text file, each "record" spans 35 lines. There are 35 lines x 5000 records = 175,000 lines in the text file. I need 5000 results, which Luis' routine produces. It's possible I didn't implement your solution correctly.

  • reid.kell (7/9/2013)


    No. In my scenario, Luis' routine correctly generates 5000 records; yours generates one. In my text file, each "record" spans 35 lines. There are 35 lines x 5000 records = 175,000 lines in the text file. I need 5000 results, which Luis' routine produces. It's possible I didn't implement your solution correctly.

    I asked that question because I thought I had run Luis's query against your sample data and then wrote my query to match its results set. Perhaps I'm mistaken.

    If you've got something that works for your case, I say go with it!


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Understood, and agreed. I very much appreciate your input, though. Going to study your method and learn from it -- very interesting.

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply