Combine different rows based on sequence

  • Hello All,

    I have some really ugly data coming in from several files. I am using SSIS to import into a single column staging table and I am incrementing a counter field so I believe I can select the data from my staging table matching the sequence of the data in the file.

    Alas the sequence of the data matters.

    See all my DDL below and proposed solution below.

    In one scenario the 'first' two records need to be combined into one row. The self join technique works. The are other other scenarios where the first three records need to be combined. I guess I could do three joins but I would like to do something more dynamic if possible, not dynamic like dynamic sql, but rather define a parameter that defines how many rows to combine.

    DECLARE @ComboFactor int --The number of rows to combine

    I think I can use my parameter and maybe row_number and partition, but that is where I am stuck since my skills with these newer functions are really minimal.

    Thanks if you can help.

    CREATE TABLE #Staging

    (RowID int,

    RowData varchar(4)

    )

    INSERT INTO #Staging (RowID,RowData) VALUES (1,'ab')

    INSERT INTO #Staging (RowID,RowData) VALUES (2,'cd')

    INSERT INTO #Staging (RowID,RowData) VALUES (3,'ef')

    INSERT INTO #Staging (RowID,RowData) VALUES (4,'gh')

    INSERT INTO #Staging (RowID,RowData) VALUES (5,'hi')

    INSERT INTO #Staging (RowID,RowData) VALUES (6,'jk')

    INSERT INTO #Staging (RowID,RowData) VALUES (7,'lm')

    INSERT INTO #Staging (RowID,RowData) VALUES (8,'no')

    INSERT INTO #Staging (RowID,RowData) VALUES (9,'pq')

    SELECT

    *

    FROM #Staging

    SELECT

    T1.RowData + T2.RowData

    FROM #Staging T1

    INNER JOIN #Staging T2

    ON T1.RowID +1 = T2.RowID

    WHERE

    (t1.RowID % 2) <> 0

    DROP TABLE #Staging

  • How about something like this?

    DECLARE @ComboFactor INT = 3 --The number of rows to combine

    ;WITH Groups AS (

    SELECT RowID, RowData

    ,rn=(ROW_NUMBER() OVER (ORDER BY RowID)-1) / @ComboFactor

    FROM #Staging)

    SELECT (

    SELECT RowData + ''

    FROM Groups b

    WHERE a.rn = b.rn FOR XML PATH(''))

    FROM Groups a

    GROUP BY rn


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Interesting. Clearly I need to study up on FOR XML PATH. Thanks, I'll continue to review.

  • You're welcome. I forgot something though.

    WHERE a.rn = b.rn ORDER BY RowID FOR XML PATH(''))

    Need to make sure the rows are concatenated in the right order.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • The example code above showed this statement:

    SELECT RowData + ''

    FROM Groups b

    WHERE a.rn = b.rn FOR XML PATH('')

    Why is the + '' required? When I remove this then I get an result that looks like XML instead of a concatenated string.

    If I Cast Rowdata to a varchar then the + '' is not required.

    Thanks for the explanation.

  • russellolson (10/16/2012)


    The example code above showed this statement:

    SELECT RowData + ''

    FROM Groups b

    WHERE a.rn = b.rn FOR XML PATH('')

    Why is the + '' required? When I remove this then I get an result that looks like XML instead of a concatenated string.

    If I Cast Rowdata to a varchar then the + '' is not required.

    Thanks for the explanation.

    The XML tag is generated because if you remove the + '' you are now formatting a named column into the string, hence the XML tag. With the + '', it is an unnamed column so hence no XML tag.

    CASTing rowdata has the same effect - producing an unnamed column.

    I tend to use the former because it is fewer keystrokes. I suppose I should be more scientific and base my choice on performance, but I confess I've never tried it.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply