• I have had need to check for common errors (for example, data where the user has included a EOL character in the middle of a field) in large data files. I have found that some basic preprocessing through a text editor can be very helpful in these situations.

    My text editor of choice is TextPad, rather than MultiEdit. However, any text editor that includes find and replace based on regular expressions should be able to do the trick, or something similar.

    What I did in my case was build a regular expression to mark lines with exactly the right number of fields (actually, exactly the right number of delimiters). My editor includes an option to "bookmark" lines, and to flip all bookmarks (so that what were the bookmarked lines are now not bookmarked, and what were the non-bookmarked lines are bookmarked). I toggle the bookmarks, and look at all lines that are now bookmarked (which should be all lines that had too many or too few delimiters).

    This is an excellent bet in situations where you are expected to get as much of the data added to the system successfully as you can, or where errors that cannot be trivially resolved by you (once you find them) are rare. In a situation where your organization is content with rejecting bad data records and shipping them back to the sender, creating an exception file in DTS might be the best way to go.


    R David Francis