Text File Import

  • I'm trying to import a file that is fixed column width, but variable record length. That is, sometimes the last field is null reducing the length of the record. DTS is now continuing to the next line to pick up the missing bytes.

    Can someone tell me how to handle this problem?

    Thanks,

    Jim

    Jim


    Jim

  • Are the records that are short by one field missing a carriage return/line feed character at the end? If so, you could loop through the file first and insert the needed character at the end of the short records. Or you could import the entire row into a varchar column in sql server that is big enough to hold the entire row and then parse it out yourself into another table...just some ideas.

    hth,

    Michael

    Michael Weiss


    Michael Weiss

  • We had (and still have) a problem with data like that here. The solution we came up with and seems to be working as expected is to pre-prepare the data by checking each line of data for a specific number of delimiters in the data. This is actually fairly tricky when dealing with data that may have string fields where commas may be present.

    We actually developed a VB script that would:

    Do while not source EOF

    --Read in a line from the source data into a string variable

    --Set a field counter to zero

    --Do until EOL

    ----Extract the data up to the first delimiter (comma)

    ----Concat the extracted field data to a string variable

    ----Delete the data and first delimiter from the source data line

    ----Add 1 to the field counter

    --Loop

    --If the field counter is less than the fields needed then add a delimiter and a zero or null value to fill in the missing field or fields.

    --Save the new string of data to a new file

    Loop

    This is all assuming that you absolutely know that you are only missing data fields at the end of some data records(lines of data) and that string data is quote delimited. You better be very sure that your not missing data fields that are somewhere in the middle or at the beginning of data records or you will have an entirely different set of problems to deal with.

    We have started to tell people to give us raw data in tab delimited or fixed width formats due to the problems that comma delimited files can cause when there is string data fields containing commas that are not quote delimited as strings.

    Tabs are not as common in data as commas may be and so using a tab delimited format has reduced the number of errors we have encountered with DTS dramatically. Fixed width is the most reliable format we get, but not all legacy data sources seem to be able to create such formatting(-sigh-).

    Lastly, we have also started to accept XML datafiles which has reduced problems as well. Again, many legacy data sources are not yet capable, or not easy to migrate data into XML format without some extracurricular development of processes to do so.

    EDI is so much fun! (cough cough)

    Mike's advice is also very good. I just thought I would put in my 3 cents.

    I hope this helps,

    🙂

    Funny thing about people and their computers...

    The computer is completing millions of processes every second, but the user still thinks it is too slow.


    "I will not be taken alive!" - S. Hussein

  • I had a problem with Unix text files similar to this, where the input was expecting a standard CR/LF at the end of each line -- in hex 13 and 10 at the end of each line. You can tell if this is the problem by looking in the file with a hex editor (I use AXE -- Advanced heX Editor which is freeware).

    Unix only puts out 0x10 as the new line character. I could have written a program to fix up the data, but that would have been a two-step process.

    I wrote a loop that reads in the file byte by byte, accumulates it into a long record. This is much slower, but I wasn't worried about speed -- I wanted it to be seamless. The code looked something like this in VB:

    lcounter = 0

    While Not EOF(infile)

    szChar = Input(1, #infile)

    If Asc(szChar) = 13 Then

    ' If just carriage return don't

    ' do anything (in case it comes from

    ' a PC based server)

    ElseIf Asc(szChar) = 10 Then

    ' If line feed, that is the end

    ' of the record

    ' Do import process here

    ' reset the variable

    temp$ = ""

    Else

    temp$ = temp$ + szChar

    End If

    lcounter = lcounter + 1

    Wend

    '

    ' Close files

    '

    Close infile

    Joe Johnson

    NETDIO,LLC.

    Edited by - johnsonj on 11/17/2003 07:45:22 AM


    Joe Johnson
    NETDIO,LLC.

  • Hehe.. Joe... We had that problem too!!

    At first I planned to write another VBS script deal with that problem as well. But then I tripped over a very lucky feature in the text editor I use to troubleshoot text issues with. I use a rather cheap little text editor called TextPad (http://www.textpad.com) because it can open a file in binary format so I can view the hex values of the file. I had a file open that had the same problem and decided to save a copy of it to work with. After I saved it I opened it up to check something else and low and behold, TextPad had replaced the Char(10) with the correct Char(13) Char(10) EOL combo!!

    So.. now when we get a file from that system, we just open it up in Textpad and save as to another text file and press on with normal business. It may sound silly, but it works and I didn't have to waste time writing a program to do it.

    I love when I make discoveries like that. Sorta like finding a solid hunk of carmel coating in a Cracker-Jack box. If you like the carmel coating of course.

    Funny thing about people and their computers...

    The computer is completing millions of processes every second, but the user still thinks it is too slow.


    "I will not be taken alive!" - S. Hussein

  • Jim,

    Define the input text file as delimited using a character that is not in the input. This will give you each line as a single column.

    Create an ActiveX transformation from the single input column to each output column and slice the input.

    Edited by - davidburrows on 11/18/2003 06:56:36 AM

    Far away is close at hand in the images of elsewhere.
    Anon.

  • quote:


    So.. now when we get a file from that system, we just open it up in Textpad and save as to another text file and press on with normal business.

    I love when I make discoveries like that. Sorta like finding a solid hunk of carmel coating in a Cracker-Jack box. If you like the carmel coating of course.


    I know what you mean. When I'm in a hurry, I use Wordpad to do the same thing. Not very elegant and definitely hands on, but in many instances the best option

    As for the Cracker Jacks, keep the candy, just give me the toy... But I do like the analogy.

    Joe Johnson

    NETDIO,LLC.


    Joe Johnson
    NETDIO,LLC.

  • For converting Unix text files to windows:

    In Word (at least 2000) use - File, New, Other Documents, Batch conversion wizard and select 'convert from word to other format'. Select 'Text' as format. Set the folder with the files without 'proper' linefeed as source, select the files you want to convert.

    Found this when i was in a real hurry to scan some log files from a Unix system.

  • Well, thanks for a lot of interesting reading. But the real problem is most likely much more simple. I was actually looking for a solution to a similar problem. It seems that when you do a BINARY transfer from Unix, you get the Unix style line feed. However, if you use an ASCII transfer, the FTP actually fixes the line feeds. When I use the windows command shell FTP, I get nice line feeds in ASCII mode and crap when I use binary.

    The real answer to your question is to set the field length to zero for the last field and have the field terminator be the carriage return/line feed. Here is what microsoft has to say:

    "If specifying a prefix length of 0 and a terminator, the field length specified is ignored. The data file storage space used is the length of the data, plus any terminators."

    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/adminsql/ad_impt_bcp_1vqg.asp

    Now if someone can tell me how to do an ASCII mode FTP using DTS without shelling out to the command prompt, I would appreciate it.

    -Doug Lampe


    -Doug Lampe

  • Well, I found a solution that worked for me. I copied the little square that notepad put where the cr/lf should have been and pasted it into my format file as the delimiter for the last field. Low and behold it worked.

    Weird.

    -Doug Lampe


    -Doug Lampe

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply