Importing data from txt file (fixed length fields)

  • Hi folks, hope someone can help?

    I am trying to import data from a text file, the data does not have any column delimiters, and is being exported by someone using MS Access using fixed length import specs (I have no control at this point to have that changed).

    I need to import this into a SQL 7 sp2 db. I have tried DTS but the problem that I am facing is this:

    The final column needs to allow for 80 characters, off course not always are 80 characters used, and DTS is taking the characters from the next row to fill up the (white?) space. This is causing problems.

    Has any one come across this problem before? So far I have tired DTS using fixed length parameters and Bulk Insert using a format file (with no luck).

    Thanks in advance!

    Tom.

  • I've only ever imported csv files so haven't had the problem, and I assume that if you specify a format file you can't specify a row delimiter or you would have used this (correct?) so the only thing I'd suggest is to process the file... read a line, check the length, add spaces to it if need be then write it to a new file which you then import. You should be able to do it by running an ActiveX script from DTS.

    A real hassle... there's probably a better answer out there which I now also await.

    Cheers,

    Mike

  • Mike, I did try using a format file, both with DTS and bulk insert but it did not work. I have also thought about reading the file in line by line, but beacuse the file can be 100000+ rows I was hoping for a faster method.

    Thanks anyway. Tom

  • This is a good one! Never come accross un-delimited files personally!

    I would have gone with Mikes suggestion, except I would not write back to a file. I'd create a Script to go through line by line. So I'd read 1 line in, break the line into variables (which will go into your columns) add the padding if required and run an INSERT statement to write straight into the database.

    But as you mentioned...100000+ lines could be quite slow!

    Clive Strong

    clivestrong@btinternet.com

  • Hi tom,

    i just did some tests. With bulk insert and a format file the columns are correctly filled. You only have to trim spaces.

    create table test (col1 varchar(6), col2 varchar(7))
    

    you can create the formatfile with bcp out

    BULK INSERT test FROM 'd:\kj\bulktest.txt'
    WITH (
    FORMATFILE = 'd:\kj\Bcp.fmt'
    )

    with the following format file

    8.0
    
    2
    1 SQLCHAR 0 6 "" 1 col1 SQL_Latin1_General_CP1_CI_AS
    2 SQLCHAR 0 7 "" 2 col2 SQL_Latin1_General_CP1_CI_AS

    and the following test file

     een   twee   
    
    aaa bbbb
    cc
    ii

    i tested this on sql2000 , i don't know if sql 7 does the same.

    best regards,

    Klaas-jan Ruyter

  • I ran into this problem on my last assignment. IMHO, this is one of the most dangerous issues with DTS, because DTS happily loads the table with what it can and doesn't report any kind of error. The bottom line is, if you're telling DTS you're giving it a fixed length file x char's wide, EVERY row had better be at least x char's (as an aside, I found that if the rows were longer than specified, DTS ignored the extra data, but if shorter, then I got the behavior you described).

    Anyway, Mike and Clive are right; however you want to do it, you have to walk that text file and make sure the rows are the correct length. Shouldn't be a big deal though; I just experimented with a 110000 row text file and it only took 12 seconds to "fix" it.

  • We import fixed-field files every day (from an AS-400 running, I presume, COBOL) to our 7.0 database. The trick is to tell the format file that the LAST delimiter on every record is a newline. Here's one of our format files (with some white space compressed out):

    7.0

    5

    1 SQLCHAR 0 8 "" 1 MemNum

    2 SQLCHAR 0 8 "" 2 PremSentDate

    3 SQLCHAR 0 4 "" 3 PremCode

    4 SQLCHAR 0 30 "" 4 PremDescription

    5 SQLCHAR 0 30 "\n" 5 PremNote

    The "\n" is the newline. Took us a little while to get this figured out. This also should take care of the case where lines might be short (although I don't think we have that situation).

    Now for MY question to the group: As someone above pointed out, for all EXCEPT the final field, this does NOT trim blanks when inserting into the table. (I just disovered this aspect.) We import using a big DTS job.

    So does anyone know if there's a way to import AND TRIM at the same time (i.e., both in one pass) with DTS? How about in SQL Server 2000 (we're converting to it very soon)?

  • Hopefully its relevant

    BOL 2000

    SET ANSI_PADDING

    Controls the way the column stores values shorter than the defined size of the column, and the way the column stores values that have trailing blanks in char, varchar, binary, and varbinary data.

    If Off it states

    Trailing blanks in character values inserted into a varchar column are trimmed. Trailing zeros in binary values inserted into a varbinary column are trimmed.

  • FORMAT FILE FLOPS FOR FIXED-LENGTH FILE...

    I'm having a great deal of difficulty trying to BULK INSERT from a fixed-length text file into a SQL Server 2000 (SP3) table. From scanning the newsgroups, this seems to be an issue that plagues many of us, and the answers on the groups don't seem to be very useful (as in, "it's all in BOL." 🙂 There are lots of good examples with delimited files, but fixed-length files aren't covered very well in the documentation or elsewhere that I've found.

    I'll bet here, though, on SQLServerCentral.com, someone (or many) will be able to straighten me right out!

    Here is a sample of the file I'm using as practice:

    1234567890123456789012345678 <== not part of file, just a "ruler"!

    11517*008573764 10F20010808

    11517*008575476 10F20011105

    11517*009069858 10F20020506

    11517*034565873 10F20020529

    11517*395848482 10F20020529

    This is the structure of the table I'm trying to BULK INSERT into

    CREATE TABLE [dbo].[SampleBCP]

    (

    [ESSN] [varchar] (9) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [GroupCode] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [Gender] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [IntColumn] [int] NULL ,

    [SmallIntColumn] [smallint] NULL ,

    [DependentNo] [smallint] NULL ,

    [StartDate] [datetime] NULL ,

    [PaidAmount] [money] NULL

    )

    Here is my format file:

    8.0

    7

    1 SQLCHAR 0 5 "" 2 GroupCode ""

    2 SQLCHAR 0 0 "" 0 Star ""

    3 SQLCHAR 0 9 "" 1 ESSN ""

    4 SQLCHAR 0 0 "" 0 Bogus ""

    5 SQLCHAR 0 2 "" 6 DependentNo ""

    6 SQLCHAR 0 1 "" 3 Gender ""

    7 SQLCHAR 0 8 "\r\n" 7 StartDate ""

    This is the T-SQL I use to perform the BULK INSERT:

    BULK INSERT WH_PROD.dbo.SampleBCP

    FROM 'c:\Projects\ETG\SmallSampleBCP.txt'

    WITH (FORMATFILE = 'c:\Projects\ETG\SmallSampleBCP.fmt')

    And finally, here is the error I'm getting:

    Server: Msg 4864, Level 16, State 1, Line 1

    Bulk insert data conversion error (type mismatch) for row 1, column 7 (StartDate).

    I've tried an awful lot of tinkering, but so far, all I've managed to do is generate new and wonderful errors.

    TIA for shedding light on this topic! And thanks to all of you who created, help run, and contribute to this great Forum!

    Best regards,

    SteveR

    Stephen Rosenbach

    Arnold, MD

    srosenbach@yahoo.com

  • Stephen,

    The problem causing your date errors is due to you specifying zero length input. CHange your file to the following and try again.

    8.0

    7

    1 SQLCHAR 0 5 "" 2 GroupCode ""

    2 SQLCHAR 0 1 "" 0 Star ""

    3 SQLCHAR 0 9 "" 1 ESSN ""

    4 SQLCHAR 0 1 "" 0 Bogus ""

    5 SQLCHAR 0 2 "" 6 DependentNo ""

    6 SQLCHAR 0 1 "" 3 Gender ""

    7 SQLCHAR 0 8 "\r\n" 7 StartDate ""

    Far away is close at hand in the images of elsewhere.
    Anon.

  • I know bcp is very fussy about date formats. Try changing your date format in column 7 to

    2001-08-08. I can't remember the exact format bcp requires. It may be dd/mm/yyyy or

    mm-dd-yyyy

    Andy

  • Hi srosenbach,

    Here is the format file.

    8.0

    9

    1 SQLCHAR 0 5 "" 2 GroupCode ""

    2 SQLCHAR 0 1 "" 0 Star ""

    3 SQLCHAR 0 9 "" 1 ESSN ""

    4 SQLCHAR 0 1 "" 0 Bogus ""

    5 SQLCHAR 0 2 "" 6 DependentNo ""

    6 SQLCHAR 0 1 "" 3 Gender ""

    7 SQLCHAR 0 8 "\r\n" 7 StartDate ""

    8 SQLCHAR 0 0 "" 0 IntColumn ""

    9 SQLCHAR 0 0 "" 0 SmallIntColumn ""

    Regards,

    Paul Joe

    paul.joe@eursystems.com

  • While SQL Server will properly convert a char(8) date like '20020310' in a set or insert, BCP won't. You must use a delimited date format for BCP and BULK INSERT. To prevent ambiguities, Microsoft recommends the yyyy-mm-dd layout.

    quote:


    FORMAT FILE FLOPS FOR FIXED-LENGTH FILE...

    1234567890123456789012345678 <== not part of file, just a "ruler"!

    11517*008573764 10F20010808

    11517*008575476 10F20011105

    11517*009069858 10F20020506

    11517*034565873 10F20020529

    11517*395848482 10F20020529

    This is the structure of the table I'm trying to BULK INSERT into

    CREATE TABLE [dbo].[SampleBCP]

    (

    [ESSN] [varchar] (9) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [GroupCode] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [Gender] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [IntColumn] [int] NULL ,

    [SmallIntColumn] [smallint] NULL ,

    [DependentNo] [smallint] NULL ,

    [StartDate] [datetime] NULL ,

    [PaidAmount] [money] NULL

    )

    Here is my format file:

    8.0

    7

    1 SQLCHAR 0 5 "" 2 GroupCode ""

    2 SQLCHAR 0 0 "" 0 Star ""

    3 SQLCHAR 0 9 "" 1 ESSN ""

    4 SQLCHAR 0 0 "" 0 Bogus ""

    5 SQLCHAR 0 2 "" 6 DependentNo ""

    6 SQLCHAR 0 1 "" 3 Gender ""

    7 SQLCHAR 0 8 "\r\n" 7 StartDate ""

    This is the T-SQL I use to perform the BULK INSERT:

    BULK INSERT WH_PROD.dbo.SampleBCP

    FROM 'c:\Projects\ETG\SmallSampleBCP.txt'

    WITH (FORMATFILE = 'c:\Projects\ETG\SmallSampleBCP.fmt')

    And finally, here is the error I'm getting:

    Server: Msg 4864, Level 16, State 1, Line 1

    Bulk insert data conversion error (type mismatch) for row 1, column 7 (StartDate).


  • ctcampbell, I agree that it is best to format dates as yyyy-mm-dd to avoid confusion. Your statement about bcp not properly convert a date like '20020310' may be true for SQL2000 but not for 7. I am on SQL7 SP4 and the data loads perfectly.

    Far away is close at hand in the images of elsewhere.
    Anon.

  • I had some similiar issues as those mentioned here. And some additional ones. This is the solution I came up with. Hope it may help someone out there.

    My specific issues were:

    text file was fixed column (no delimiters)

    text file contained different row format depending on the 1st character (type)

    text file contained dates in the format of mmddyy or mm-dd-yy

    This was my solution

    CREATE TABLE #TYPE_RECORD (

    Type char(1) NOT NULL,

    Data varchar(150) NULL

    )

    Bulk insert into temporary table with format:

    7.0

    2

    1 SQLCHAR 0 1 "" 1 Type

    2 SQLCHAR 0 117 "\r\n" 2 Data

    Then I performed a series of insert statements for each record type.

    INSERT HEADER_RECORD

    SELECT '1',

    SUBSTRING(data,1,3), --team number

    SUBSTRING(data,4,8), --group/suffix

    CAST(SUBSTRING(data,12,8) AS DATETIME),--paid to date

    CAST(SUBSTRING(data,20,2)+'-'+SUBSTRING(data,22,2)+'-'+SUBSTRING(data,24,2) AS DATETIME),--bill date

    CAST(SUBSTRING(data,26,2)+'-'+SUBSTRING(data,28,2)+'-'+SUBSTRING(data,30,2) AS DATETIME),--bill to date

    SUBSTRING(data,32,28),--group name

    SUBSTRING(data,60,28),--contact person

    SUBSTRING(data,88,28),--address

    SUBSTRING(data,116,14),--city

    SUBSTRING(data,130,2),--state

    SUBSTRING(data,132,10)--zip

    FROM #TYPE_RECORD

    WHERE type = '1'

    Don't know if this was the best way to do this, and I welcome any suggestions.

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply