Concatenate variable length rows

  • I have a series of text file which I have import into a table with a single column spanning the whole line in the file and using a variation of Jeff Moden's article http://www.sqlservercentral.com/articles/T-SQL/63003/ to split the sinlge column into individual columns based on a delimiter. This works fine (thanks Jeff ;-)).

    The problem with one of the files that I have to import has embebed carriage return characters which results in the import line being split over several lines. For example, if I should have 10 columns it might look like this:

    1,2,3,4,5,6,7,8

    9,10

    where the last columns are on the second line.

    I need to concatentate the second row to the first row to create a row with 10 values in it so that I can split it into individual columns.

    The problems is more complicated because the number of embebed carriage returns varies so that the number of rows over which each actual row is split varies and the number of columns on each row can vary as the embeded carriage returns can appear in different columns.

    The solution I have at the moment is to take each row and if the number of columns (based on the number of separators) is less than expected then concatenate the next row, test and continue concatenating the next row until I get the correct number of columns (separators).

    I know that this will work but is there another, set based solution?

    Jez

  • How is the data stored currently? Is your problem at the importing stage, IE in the text file it's split like that? Or is it in the database already, in a table such that you have 10 columns, and some rows have null values at the end of the table, because their full values are spread out over multiple rows?

  • That's how it is in the import file. I am importing each line into a single nvarchar(max) column and running some t-sql code to split it into different columns.

  • Couldn't you instead pre-process the file? For example, if you know that every line has to have a certain number of items on it, you could read the entire file content in, do a find-replace on carriage returns, and then go through the entire file content as a single item, and take every 10 items for example, and process those 10 items, then move on to the next 10, etc... until you reach the end of the file.

  • Jez (7/16/2009)


    I have a series of text file which I have import into a table with a single column spanning the whole line in the file and using a variation of Jeff Moden's article http://www.sqlservercentral.com/articles/T-SQL/63003/ to split the sinlge column into individual columns based on a delimiter. This works fine (thanks Jeff ;-)).

    The problem with one of the files that I have to import has embebed carriage return characters which results in the import line being split over several lines. For example, if I should have 10 columns it might look like this:

    1,2,3,4,5,6,7,8

    9,10

    where the last columns are on the second line.

    I need to concatentate the second row to the first row to create a row with 10 values in it so that I can split it into individual columns.

    The problems is more complicated because the number of embebed carriage returns varies so that the number of rows over which each actual row is split varies and the number of columns on each row can vary as the embeded carriage returns can appear in different columns.

    The solution I have at the moment is to take each row and if the number of columns (based on the number of separators) is less than expected then concatenate the next row, test and continue concatenating the next row until I get the correct number of columns (separators).

    I know that this will work but is there another, set based solution?

    Jez

    Jez... just trying to be clear here... do you mean to say that a file will randomly have a different number of items on rows? That it really is unpredictable except that each "record" will have the same number of items?

    Also, how many rows do you have in the file? I ask because I need to setup a test and I might as well check it for performance while I'm at it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff,

    A file will always have the same number of columns. The import file (a text file) will be delimited by characters and there will always be the same number of characters in a 'row'. The difficulty is that some of the fields are mutli line in the input system (eg address) and the users add carriage returns in the field (for formatting reasons).

    In this particular file, I have 53 columns and 81k rows on the import file netting down to 18k rows when I loop round and concatenate subsequent import rows to generate a single row of 53 columns.

    I have written a script task in SSIS that reads the file and then appends then next rows until I get to the right number of columns and then insert this into a table. The performance of this is acceptable - it runs in a couple of minutes. As this is a batch job that runs once a week, I am happy with the performance of it so there's no need to bust a gut on this.

    Jez

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply