How do I find the record where "column delimiter not found"

  • I have a | delimited flat file with 100 variables. I'm getting the "column delimiter not found" message when I run the dts package. The file has over 5M records in it, and I don't have enough virtual memory on the server to open it in TextPad. Can anyone provide suggestions on how to track down this problem? I tried to use this from the command line: findstr /V ".*|.*|.*,.*|.*" c:\YourCSVfile.txt I tried this with 99 delimiters, but I got a message that my search string was too long.

    Also, is there a way I can tell if I have Service Packs 1-3 loaded? I this could help solve the problem. I'm new to this job, so please forgive me for being 'green'!

  • 1.  The problem COULD be the 5mm records and you are running into a buffer problem.

    Have you tried importing the file directly into a "staging" table that has 1 column and then parse from there?

    2.  Run SELECT @@VERSION  You should get a 8.00.xx number.  You can look up online what service packs etc you have with that #

     



    Good Hunting!

    AJ Ahrens


    webmaster@kritter.net

  • What I would try is put the "max error count" in the Options tab of the properties task doing the import to a certain number e.g. 10.

    Then in the properties of the DTS package on the Logging tab enter a name for an error file.

    When a row fails that row will be put in the error file.

    To check whether any rows do come through you should put the max error count to a number higher than the number of rows in the table. I don't know the upper limit for this parameter though.

     

    good luck,

     

    fred

  • Following up on what AJ wrote, you could do the this:

    1. Create the following UDF ( dbo.fOccurs() )

    2. Run the script that follows the UDF, adapted for your situation.  The script will load the data file, and then list any lines that do not contain the correct number of delimiters. Of course, this method won't give you the line number that is causing the problem.

     

    Also, asking SQL Server to load five million rows will probably result in blocks of data being inserted in parallel. The physical order of the table data may not match the order of the data in the source file.

    ---

    Another option is to download a trial version of Multi-Edit at

        http://www.multieditsoftware.com/demo.php

    I've used Multi-Edit for years. It can load files of any size, and among all of its other uses, is great for examining files. If you go this route, you can use it to break of the .csv file into several smaller files that may make identifying

    ------------------------------------------------------------------

    -- UDF

    ------------------------------------------------------------------

    CREATE FUNCTION dbo.fOccurs

    (

      @searchIn  varchar(8000),  -- The string to be searched.

      @searchFor varchar(8000)   -- The string to be counted.

    )

    RETURNS int

    BEGIN

      DECLARE @len_diff int

      -- Returns number of times a string occurs in another string.

      -- The first length calculation is the original size minus the length

      -- after all occurrences of the search string are removed.

      -- We divide that result by the length of the search string to get

      -- the number of occurrences.

      IF Len(@searchFor) = 0

        SET @len_diff = 0

      ELSE

      BEGIN

        SET @len_diff = Len(@searchIn) - Len(Replace(@searchIn, @searchFor, ''))

        SET @len_diff = @len_diff / Len(@searchFor)

      END

      RETURN @len_diff

    END

    GO

    ------------------------------------------------------------------

    -- Script

    ------------------------------------------------------------------

    DROP TABLE ImportData

    GO

    CREATE TABLE ImportData ( line varchar(8000) )

    BULK INSERT ImportData

         FROM 'c:\YourCSVfile.txt'

         WITH ( CODEPAGE = 'RAW'

              , DATAFILETYPE = 'char'

              , FIELDTERMINATOR = '|'

              , ROWTERMINATOR = '\n'

              , FIRSTROW = 1

              )

    SELECT line

      FROM ImportData

     WHERE dbo.fOccurs(line, '|') <> 99 -- or however many you expect

  • I have had need to check for common errors (for example, data where the user has included a EOL character in the middle of a field) in large data files. I have found that some basic preprocessing through a text editor can be very helpful in these situations.

    My text editor of choice is TextPad, rather than MultiEdit. However, any text editor that includes find and replace based on regular expressions should be able to do the trick, or something similar.

    What I did in my case was build a regular expression to mark lines with exactly the right number of fields (actually, exactly the right number of delimiters). My editor includes an option to "bookmark" lines, and to flip all bookmarks (so that what were the bookmarked lines are now not bookmarked, and what were the non-bookmarked lines are bookmarked). I toggle the bookmarks, and look at all lines that are now bookmarked (which should be all lines that had too many or too few delimiters).

    This is an excellent bet in situations where you are expected to get as much of the data added to the system successfully as you can, or where errors that cannot be trivially resolved by you (once you find them) are rare. In a situation where your organization is content with rejecting bad data records and shipping them back to the sender, creating an exception file in DTS might be the best way to go.


    R David Francis

  • These are all very helpful. I sincerely appreciate you all sharing your experience with me. I'm currently loading all the records I can get, and then I'll go back and investigate using the suggesions provided. I'd live to browse this file with TextPad, but it won't open - I suppose I'm blowing out the virtual memory.

    Thanks again,

    Dave

  • Oops - sorry, I missed the TextPad note in your original post.

    Unix systems have the 'head' and 'tail' commands that would let you grab chunks of the file to investigate. If you're comfortable with an appropriate tool (I'd do it in Perl, even on Windows; odds are someone has created a tool to do this anyway), you can break the file into chunks yourself.

    Heck, you could always slurp the whole thing in w/ a DTS package, and write it back out a chunk at a time.

    Just a few thoughts for future references (or for lurkers interested in the subject), as you seem to be cranking along at this point....


    R David Francis

  • I had an 11 million row CSV file that had a small number of rows where the delimiters were wrongly set for some reason. I got the error:

    [Tx File [1]] Error: The column delimiter for column [X] was not found.

    I couldn't get SSIS to send these rows elsewehere during the load because (I'm guessing) the lack of delimiter meant that it couldn't skip onto the next row.

    In the end I imported it to an old instance of SQL 2000 which is a lot more forgiving and then transferred it and began investigating and cleaning from there.

    regards, Matt

  • I had this exact error and the cause was an errant double quote, eventhough the text qualifier was a double quote. Hope this helps anyone else looking for potential causes.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply