SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


How do I find the record where "column delimiter not found"


How do I find the record where "column delimiter not found"

Author
Message
Dave Otte
Dave Otte
Grasshopper
Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)

Group: General Forum Members
Points: 13 Visits: 31
I have a | delimited flat file with 100 variables. I'm getting the "column delimiter not found" message when I run the dts package. The file has over 5M records in it, and I don't have enough virtual memory on the server to open it in TextPad. Can anyone provide suggestions on how to track down this problem? I tried to use this from the command line: findstr /V ".*|.*|.*,.*|.*" c:\YourCSVfile.txt I tried this with 99 delimiters, but I got a message that my search string was too long.

Also, is there a way I can tell if I have Service Packs 1-3 loaded? I this could help solve the problem. I'm new to this job, so please forgive me for being 'green'!



AJ Ahrens
AJ Ahrens
SSCrazy
SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)

Group: General Forum Members
Points: 2212 Visits: 9

1. The problem COULD be the 5mm records and you are running into a buffer problem.

Have you tried importing the file directly into a "staging" table that has 1 column and then parse from there?

2. Run SELECT @@VERSION You should get a 8.00.xx number. You can look up online what service packs etc you have with that #





Good Hunting!

AJ Ahrens


webmaster@kritter.net
fred koene
fred koene
SSC-Addicted
SSC-Addicted (467 reputation)SSC-Addicted (467 reputation)SSC-Addicted (467 reputation)SSC-Addicted (467 reputation)SSC-Addicted (467 reputation)SSC-Addicted (467 reputation)SSC-Addicted (467 reputation)SSC-Addicted (467 reputation)

Group: General Forum Members
Points: 467 Visits: 104

What I would try is put the "max error count" in the Options tab of the properties task doing the import to a certain number e.g. 10.

Then in the properties of the DTS package on the Logging tab enter a name for an error file.
When a row fails that row will be put in the error file.
To check whether any rows do come through you should put the max error count to a number higher than the number of rows in the table. I don't know the upper limit for this parameter though.

good luck,

fred


vadba
vadba
Ten Centuries
Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)

Group: General Forum Members
Points: 1299 Visits: 453

Following up on what AJ wrote, you could do the this:

1. Create the following UDF ( dbo.fOccurs() )
2. Run the script that follows the UDF, adapted for your situation. The script will load the data file, and then list any lines that do not contain the correct number of delimiters. Of course, this method won't give you the line number that is causing the problem.

Also, asking SQL Server to load five million rows will probably result in blocks of data being inserted in parallel. The physical order of the table data may not match the order of the data in the source file.

---

Another option is to download a trial version of Multi-Edit at

http://www.multieditsoftware.com/demo.php

I've used Multi-Edit for years. It can load files of any size, and among all of its other uses, is great for examining files. If you go this route, you can use it to break of the .csv file into several smaller files that may make identifying

------------------------------------------------------------------
-- UDF
------------------------------------------------------------------

CREATE FUNCTION dbo.fOccurs
(
@searchIn varchar(8000), -- The string to be searched.
@searchFor varchar(8000) -- The string to be counted.
)
RETURNS int
BEGIN
DECLARE @len_diff int

-- Returns number of times a string occurs in another string.

-- The first length calculation is the original size minus the length
-- after all occurrences of the search string are removed.
-- We divide that result by the length of the search string to get
-- the number of occurrences.

IF Len(@searchFor) = 0
SET @len_diff = 0
ELSE
BEGIN
SET @len_diff = Len(@searchIn) - Len(Replace(@searchIn, @searchFor, ''))
SET @len_diff = @len_diff / Len(@searchFor)
END

RETURN @len_diff
END

GO

------------------------------------------------------------------
-- Script
------------------------------------------------------------------

DROP TABLE ImportData
GO

CREATE TABLE ImportData ( line varchar(8000) )

BULK INSERT ImportData
FROM 'c:\YourCSVfile.txt'
WITH ( CODEPAGE = 'RAW'
, DATAFILETYPE = 'char'
, FIELDTERMINATOR = '|'
, ROWTERMINATOR = '\n'
, FIRSTROW = 1
)

SELECT line
FROM ImportData
WHERE dbo.fOccurs(line, '|') <> 99 -- or however many you expect





RD Francis
RD Francis
Ten Centuries
Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)

Group: General Forum Members
Points: 1053 Visits: 443
I have had need to check for common errors (for example, data where the user has included a EOL character in the middle of a field) in large data files. I have found that some basic preprocessing through a text editor can be very helpful in these situations.

My text editor of choice is TextPad, rather than MultiEdit. However, any text editor that includes find and replace based on regular expressions should be able to do the trick, or something similar.

What I did in my case was build a regular expression to mark lines with exactly the right number of fields (actually, exactly the right number of delimiters). My editor includes an option to "bookmark" lines, and to flip all bookmarks (so that what were the bookmarked lines are now not bookmarked, and what were the non-bookmarked lines are bookmarked). I toggle the bookmarks, and look at all lines that are now bookmarked (which should be all lines that had too many or too few delimiters).

This is an excellent bet in situations where you are expected to get as much of the data added to the system successfully as you can, or where errors that cannot be trivially resolved by you (once you find them) are rare. In a situation where your organization is content with rejecting bad data records and shipping them back to the sender, creating an exception file in DTS might be the best way to go.


R David Francis
Dave Otte
Dave Otte
Grasshopper
Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)

Group: General Forum Members
Points: 13 Visits: 31
These are all very helpful. I sincerely appreciate you all sharing your experience with me. I'm currently loading all the records I can get, and then I'll go back and investigate using the suggesions provided. I'd live to browse this file with TextPad, but it won't open - I suppose I'm blowing out the virtual memory.

Thanks again,
Dave



RD Francis
RD Francis
Ten Centuries
Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)

Group: General Forum Members
Points: 1053 Visits: 443
Oops - sorry, I missed the TextPad note in your original post.

Unix systems have the 'head' and 'tail' commands that would let you grab chunks of the file to investigate. If you're comfortable with an appropriate tool (I'd do it in Perl, even on Windows; odds are someone has created a tool to do this anyway), you can break the file into chunks yourself.

Heck, you could always slurp the whole thing in w/ a DTS package, and write it back out a chunk at a time.

Just a few thoughts for future references (or for lurkers interested in the subject), as you seem to be cranking along at this point....


R David Francis
Matt Greenslade
Matt Greenslade
Forum Newbie
Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)

Group: General Forum Members
Points: 7 Visits: 33
I had an 11 million row CSV file that had a small number of rows where the delimiters were wrongly set for some reason. I got the error:
[Tx File [1]] Error: The column delimiter for column [X] was not found.

I couldn't get SSIS to send these rows elsewehere during the load because (I'm guessing) the lack of delimiter meant that it couldn't skip onto the next row.

In the end I imported it to an old instance of SQL 2000 which is a lot more forgiving and then transferred it and began investigating and cleaning from there.

regards, Matt
Matt De Grand
Matt De Grand
Valued Member
Valued Member (71 reputation)Valued Member (71 reputation)Valued Member (71 reputation)Valued Member (71 reputation)Valued Member (71 reputation)Valued Member (71 reputation)Valued Member (71 reputation)Valued Member (71 reputation)

Group: General Forum Members
Points: 71 Visits: 53
I had this exact error and the cause was an errant double quote, eventhough the text qualifier was a double quote. Hope this helps anyone else looking for potential causes.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search