Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Concatenate variable length rows Expand / Collapse
Author
Message
Posted Thursday, July 16, 2009 3:49 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Friday, December 12, 2014 8:03 AM
Points: 231, Visits: 662
I have a series of text file which I have import into a table with a single column spanning the whole line in the file and using a variation of Jeff Moden's article http://www.sqlservercentral.com/articles/T-SQL/63003/ to split the sinlge column into individual columns based on a delimiter. This works fine (thanks Jeff ).

The problem with one of the files that I have to import has embebed carriage return characters which results in the import line being split over several lines. For example, if I should have 10 columns it might look like this:

1,2,3,4,5,6,7,8
9,10

where the last columns are on the second line.

I need to concatentate the second row to the first row to create a row with 10 values in it so that I can split it into individual columns.

The problems is more complicated because the number of embebed carriage returns varies so that the number of rows over which each actual row is split varies and the number of columns on each row can vary as the embeded carriage returns can appear in different columns.

The solution I have at the moment is to take each row and if the number of columns (based on the number of separators) is less than expected then concatenate the next row, test and continue concatenating the next row until I get the correct number of columns (separators).

I know that this will work but is there another, set based solution?


Jez
Post #754022
Posted Thursday, July 16, 2009 6:36 AM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Wednesday, December 17, 2014 10:07 AM
Points: 828, Visits: 1,702
How is the data stored currently? Is your problem at the importing stage, IE in the text file it's split like that? Or is it in the database already, in a table such that you have 10 columns, and some rows have null values at the end of the table, because their full values are spread out over multiple rows?
Post #754094
Posted Thursday, July 16, 2009 7:15 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Friday, December 12, 2014 8:03 AM
Points: 231, Visits: 662
That's how it is in the import file. I am importing each line into a single nvarchar(max) column and running some t-sql code to split it into different columns.
Post #754138
Posted Thursday, July 16, 2009 7:22 AM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Wednesday, December 17, 2014 10:07 AM
Points: 828, Visits: 1,702
Couldn't you instead pre-process the file? For example, if you know that every line has to have a certain number of items on it, you could read the entire file content in, do a find-replace on carriage returns, and then go through the entire file content as a single item, and take every 10 items for example, and process those 10 items, then move on to the next 10, etc... until you reach the end of the file.
Post #754149
Posted Sunday, July 19, 2009 7:53 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: 2 days ago @ 7:27 AM
Points: 35,769, Visits: 32,437
Jez (7/16/2009)
I have a series of text file which I have import into a table with a single column spanning the whole line in the file and using a variation of Jeff Moden's article http://www.sqlservercentral.com/articles/T-SQL/63003/ to split the sinlge column into individual columns based on a delimiter. This works fine (thanks Jeff ).

The problem with one of the files that I have to import has embebed carriage return characters which results in the import line being split over several lines. For example, if I should have 10 columns it might look like this:

1,2,3,4,5,6,7,8
9,10

where the last columns are on the second line.

I need to concatentate the second row to the first row to create a row with 10 values in it so that I can split it into individual columns.

The problems is more complicated because the number of embebed carriage returns varies so that the number of rows over which each actual row is split varies and the number of columns on each row can vary as the embeded carriage returns can appear in different columns.

The solution I have at the moment is to take each row and if the number of columns (based on the number of separators) is less than expected then concatenate the next row, test and continue concatenating the next row until I get the correct number of columns (separators).

I know that this will work but is there another, set based solution?
Jez


Jez... just trying to be clear here... do you mean to say that a file will randomly have a different number of items on rows? That it really is unpredictable except that each "record" will have the same number of items?

Also, how many rows do you have in the file? I ask because I need to setup a test and I might as well check it for performance while I'm at it.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #755440
Posted Monday, July 20, 2009 2:18 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Friday, December 12, 2014 8:03 AM
Points: 231, Visits: 662
Hi Jeff,

A file will always have the same number of columns. The import file (a text file) will be delimited by characters and there will always be the same number of characters in a 'row'. The difficulty is that some of the fields are mutli line in the input system (eg address) and the users add carriage returns in the field (for formatting reasons).

In this particular file, I have 53 columns and 81k rows on the import file netting down to 18k rows when I loop round and concatenate subsequent import rows to generate a single row of 53 columns.

I have written a script task in SSIS that reads the file and then appends then next rows until I get to the right number of columns and then insert this into a table. The performance of this is acceptable - it runs in a couple of minutes. As this is a batch job that runs once a week, I am happy with the performance of it so there's no need to bust a gut on this.

Jez
Post #755583
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse