SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Concatenate variable length rows


Concatenate variable length rows

Author
Message
Jez-448386
Jez-448386
SSC Eights!
SSC Eights! (996 reputation)SSC Eights! (996 reputation)SSC Eights! (996 reputation)SSC Eights! (996 reputation)SSC Eights! (996 reputation)SSC Eights! (996 reputation)SSC Eights! (996 reputation)SSC Eights! (996 reputation)

Group: General Forum Members
Points: 996 Visits: 1134
I have a series of text file which I have import into a table with a single column spanning the whole line in the file and using a variation of Jeff Moden's article http://www.sqlservercentral.com/articles/T-SQL/63003/ to split the sinlge column into individual columns based on a delimiter. This works fine (thanks Jeff ;-)).

The problem with one of the files that I have to import has embebed carriage return characters which results in the import line being split over several lines. For example, if I should have 10 columns it might look like this:

1,2,3,4,5,6,7,8
9,10

where the last columns are on the second line.

I need to concatentate the second row to the first row to create a row with 10 values in it so that I can split it into individual columns.

The problems is more complicated because the number of embebed carriage returns varies so that the number of rows over which each actual row is split varies and the number of columns on each row can vary as the embeded carriage returns can appear in different columns.

The solution I have at the moment is to take each row and if the number of columns (based on the number of separators) is less than expected then concatenate the next row, test and continue concatenating the next row until I get the correct number of columns (separators).

I know that this will work but is there another, set based solution?


Jez
kramaswamy
kramaswamy
Hall of Fame
Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)

Group: General Forum Members
Points: 3653 Visits: 1825
How is the data stored currently? Is your problem at the importing stage, IE in the text file it's split like that? Or is it in the database already, in a table such that you have 10 columns, and some rows have null values at the end of the table, because their full values are spread out over multiple rows?
Jez-448386
Jez-448386
SSC Eights!
SSC Eights! (996 reputation)SSC Eights! (996 reputation)SSC Eights! (996 reputation)SSC Eights! (996 reputation)SSC Eights! (996 reputation)SSC Eights! (996 reputation)SSC Eights! (996 reputation)SSC Eights! (996 reputation)

Group: General Forum Members
Points: 996 Visits: 1134
That's how it is in the import file. I am importing each line into a single nvarchar(max) column and running some t-sql code to split it into different columns.
kramaswamy
kramaswamy
Hall of Fame
Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)

Group: General Forum Members
Points: 3653 Visits: 1825
Couldn't you instead pre-process the file? For example, if you know that every line has to have a certain number of items on it, you could read the entire file content in, do a find-replace on carriage returns, and then go through the entire file content as a single item, and take every 10 items for example, and process those 10 items, then move on to the next 10, etc... until you reach the end of the file.
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (203K reputation)SSC Guru (203K reputation)SSC Guru (203K reputation)SSC Guru (203K reputation)SSC Guru (203K reputation)SSC Guru (203K reputation)SSC Guru (203K reputation)SSC Guru (203K reputation)

Group: General Forum Members
Points: 203190 Visits: 41947
Jez (7/16/2009)
I have a series of text file which I have import into a table with a single column spanning the whole line in the file and using a variation of Jeff Moden's article http://www.sqlservercentral.com/articles/T-SQL/63003/ to split the sinlge column into individual columns based on a delimiter. This works fine (thanks Jeff ;-)).

The problem with one of the files that I have to import has embebed carriage return characters which results in the import line being split over several lines. For example, if I should have 10 columns it might look like this:

1,2,3,4,5,6,7,8
9,10

where the last columns are on the second line.

I need to concatentate the second row to the first row to create a row with 10 values in it so that I can split it into individual columns.

The problems is more complicated because the number of embebed carriage returns varies so that the number of rows over which each actual row is split varies and the number of columns on each row can vary as the embeded carriage returns can appear in different columns.

The solution I have at the moment is to take each row and if the number of columns (based on the number of separators) is less than expected then concatenate the next row, test and continue concatenating the next row until I get the correct number of columns (separators).

I know that this will work but is there another, set based solution?
Jez


Jez... just trying to be clear here... do you mean to say that a file will randomly have a different number of items on rows? That it really is unpredictable except that each "record" will have the same number of items?

Also, how many rows do you have in the file? I ask because I need to setup a test and I might as well check it for performance while I'm at it.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Jez-448386
Jez-448386
SSC Eights!
SSC Eights! (996 reputation)SSC Eights! (996 reputation)SSC Eights! (996 reputation)SSC Eights! (996 reputation)SSC Eights! (996 reputation)SSC Eights! (996 reputation)SSC Eights! (996 reputation)SSC Eights! (996 reputation)

Group: General Forum Members
Points: 996 Visits: 1134
Hi Jeff,

A file will always have the same number of columns. The import file (a text file) will be delimited by characters and there will always be the same number of characters in a 'row'. The difficulty is that some of the fields are mutli line in the input system (eg address) and the users add carriage returns in the field (for formatting reasons).

In this particular file, I have 53 columns and 81k rows on the import file netting down to 18k rows when I loop round and concatenate subsequent import rows to generate a single row of 53 columns.

I have written a script task in SSIS that reads the file and then appends then next rows until I get to the right number of columns and then insert this into a table. The performance of this is acceptable - it runs in a couple of minutes. As this is a batch job that runs once a week, I am happy with the performance of it so there's no need to bust a gut on this.

Jez
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search