Display all Column information in single cell ? got result, but not able to understand concept

  • Hi

    My objective is to display all the record information of a particular column in single cell.

    tablename

    Iden Code

    2BB

    4BB

    4CH

    Result should be something like ----BB,BB,CH.

    I used the folloeing to acheive it..

    DECLARE @FINAL varCHAR (40)

    SELECT @FINAL = ''

    SELECT @FINAL = @FINAL + Code + ',' FROM table

    But can't understand the concept, hoW this is done.

    Can anyone explain it?

  • It's called custom aggregate (I think). It executes the expression for each row in the resultset. You can do it with FOR XML PATH() too.

    Take a look at this thread:

    http://www.sqlservercentral.com/Forums/Topic462961-145-1.aspx

  • metoseeu (3/6/2008)


    Hi

    My objective is to display all the record information of a particular column in single cell.

    tablename

    Iden Code

    2BB

    4BB

    4CH

    Result should be something like ----BB,BB,CH.

    I used the folloeing to acheive it..

    DECLARE @FINAL varCHAR (40)

    SELECT @FINAL = ''

    SELECT @FINAL = @FINAL + Code + ',' FROM table

    But can't understand the concept, hoW this is done.

    Can anyone explain it?

    This is a bit of an ugly trick. You declare a variable, and then in the select statement you use this variable. It exploits that SQL Server in this case will "process" every row that is to be returned, and it will do this one row at a time. Every time (i.e. for every row) this statement will reuse the previous value of the variable, and append the current row to it.

    A nicer and faster solution that works on SQL Server 2005 and later is one that Robert has linked to.

    Regards,

    Andras


    Andras Belokosztolszki, MCPD, PhD
    GoldenGate Software

  • Ugly trick? why?

    Custom aggregates are extremely useful feature. While lists can be done with "FOR XML PATH", some thing can't, like when you have to do a binary OR (not sum) on a list of integers. I love it and I wish I had something like this on other servers.

    Also, such loops run extremely fast compared to cursors.

  • A nicer and faster solution that works on SQL Server 2005 and later is one that Robert has linked to

    Which of the many solutions are you refering to? There are several on that thread.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Robert (3/8/2008)


    Ugly trick? why?

    Custom aggregates are extremely useful feature. While lists can be done with "FOR XML PATH", some thing can't, like when you have to do a binary OR (not sum) on a list of integers. I love it and I wish I had something like this on other servers.

    Also, such loops run extremely fast compared to cursors.

    I agree... the FOR XML PATH solution isn't much faster than the custom aggregate in the presence of an index beneficial to both methods. And things like a million row running total certainly couldn't be done in 7 seconds without using the custom aggregate method.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I agree... the FOR XML PATH solution isn't much faster than the custom aggregate in the presence of an index beneficial to both methods. And things like a million row running total certainly couldn't be done in 7 seconds without using the custom aggregate method.

    I have to disagree with this. While the select method is adequate for smaller tables, it does not even come close to generate delimited strings as well as For xml path. I did a few speed test with statistics and found that xml was far superior in creating a row string.

    The code for the test table

    /****** Object: Table [dbo].[Tally] Script Date: 02/06/2008 12:14:21 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE TABLE [dbo].[Numbers](

    [Nbr] [int] NOT NULL

    CONSTRAINT [PK_Tally_N12] PRIMARY KEY CLUSTERED

    (

    [Nbr] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,

    IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 100) ON [PRIMARY]

    ) ON [PRIMARY]

    INSERT INTO Numbers (Nbr) VALUES (0) ;

    WHILE (SELECT MAX(Nbr) FROM numbers)<1000000

    INSERT INTO numbers

    SELECT Nbr+(SELECT MAX(Nbr)+1 FROM numbers)

    FROM numbers ;

    The code used to run the test Note: I just changed the where clause to specify the number of records. Additionally, I ran each statement seperately not to skew the results.

    --select method

    declare @var varchar(MAX)

    select @var = ''

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    select @var = @var + cast(LTRIM(RTRIM(Nbr)) as VARchar(10)) + ','

    from Numbers

    where Nbr <= 10000

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    --for xml path method

    declare @var varchar(MAX)

    select @var = ''

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    SELECT cast(LTRIM(RTRIM(Nbr)) AS VARCHAR(10)) + ','

    FROM Numbers

    where Nbr <= 10000

    FOR XML PATH ('')

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    Test 1 - 1,000,000 rows

    Select method:

    I killed the process after 5 minutes

    FOR XML PATH:

    (1000001 row(s) affected)

    Table 'Numbers'. Scan count 1, logical reads 1615, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 844 ms, elapsed time = 1060 ms.

    Test 2 - 65000 rows.

    Select method:

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    Table 'Numbers'. Scan count 1, logical reads 108, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 9079 ms, elapsed time = 9070 ms.

    FOR XML PATH:

    (65536 row(s) affected)

    (65001 row(s) affected)

    Table 'Numbers'. Scan count 1, logical reads 108, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 31 ms, elapsed time = 126 ms.

    Test 3 - 10,000 rows

    Select method:

    Table 'Numbers'. Scan count 1, logical reads 20, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 250 ms, elapsed time = 258 ms.

    FOR XML PATH:

    (10001 row(s) affected)

    Table 'Numbers'. Scan count 1, logical reads 20, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 16 ms, elapsed time = 87 ms.

    As you can see the xml outperformed the select method in every instance, while a clustered index was present. I believe using FOR XML PATH is the new and improved method for creating column strings.

  • I understad the select method is still a good solution and completely backward compatible. I also understand that you probably will not need a string of this magnitude, but why use a more expensive method when a faster one is available?

  • Adam Haines (3/8/2008)


    I understad the select method is still a good solution and completely backward compatible. I also understand that you probably will not need a string of this magnitude, but why use a more expensive method when a faster one is available?

    Heh... because you forgot... 😀 Code is not the only thing to consider where performance is concerned.

    And, let's do an example where the SELECT method actually beats the XML method... on a millon row example similar to yours... by the way... compare the way you built your Tally table against the way I built my Test table... on my box, your method produces 1 column in about 15 seconds... mine produces 3 columns in about 9 seconds... "why use a more expensive method when a faster one is available?" :hehe: (Please don't get mad... I'm just having a little fun...)

    Here's the code to setup the test table...

    --drop table TestData

    --===== Create and populate test table.

    -- Column "RowNum" contains unique row numbers

    -- Column "SomeID" contains non-unique numbers

    -- (should be 1/400th of the row count to allow about 400 concatenated items per SomeInt).

    -- Column "SomeCode" has a range of "AA" to "ZZ" non-unique 2 character strings

    SELECT TOP 1000000 --<<<LOOK! CHANGE THIS NUMBER TO CHANGE THE NUMBER OF ROWS!

    RowNum = IDENTITY(INT,1,1),

    SomeID = ABS(CHECKSUM(NEWID()))%2500+1, --<<<LOOK! CHANGE THIS NUMBER TO 1/400th THE ROW COUNT

    SomeCode = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    INTO dbo.TestData

    FROM Master.dbo.SysColumns t1,

    Master.dbo.SysColumns t2

    --===== A table is not properly formed unless a Primary Key has been assigned

    ALTER TABLE dbo.TestData

    ADD PRIMARY KEY CLUSTERED (RowNum)

    --===== Add the index to support both types of queries

    CREATE INDEX IXC_TestData_Cover1 ON dbo.TestData(SomeID,SomeCode)

    Here's the concatenation function that Robert is talking about using with the SELECT...

    CREATE FUNCTION dbo.fnConcatTest (@SomeID INT)

    RETURNS VARCHAR(8000) AS

    BEGIN

    DECLARE @Return VARCHAR(8000)

    SELECT @Return = ISNULL(@Return+',','')+SomeCode

    FROM dbo.TestData

    WHERE SomeID = @SomeID

    RETURN @Return

    END

    ...and, here's the test code... I put the concatenation function first so that the XML can benefit from any stats the SELECT may generate... just to be fair 😉

    --===== Supress the auto-display of rowcounts for appearance

    SET NOCOUNT ON

    --===== Test and time the "Ugly stick" method

    PRINT REPLICATE('=',80)

    PRINT '"Ugly trick" code...'

    SET STATISTICS TIME ON

    SELECT SomeID, dbo.fnConcatTest(SomeID) AS CSVString

    FROM dbo.TestData

    GROUP BY SomeID

    SET STATISTICS TIME OFF

    --===== Test and time the XML method

    PRINT REPLICATE('=',80)

    PRINT 'XML code...'

    SET STATISTICS TIME ON

    SELECT t1.SomeID,

    STUFF(

    (SELECT ',' + t2.SomeCode

    FROM TestData t2

    WHERE t2.SomeID = t1.SomeID

    FOR XML PATH('')),

    1, 1, '') AS CsvString

    FROM TestData t1

    GROUP BY t1.SomeID

    SET STATISTICS TIME OFF

    PRINT REPLICATE('=',80)

    And, here's the results...

    [font="Courier New"]================================================================================

    "Ugly trick" code...

    SQL Server Execution Times:

    CPU time = 3125 ms, elapsed time = 4289 ms.

    ================================================================================

    XML code...

    SQL Server Execution Times:

    CPU time = 4219 ms, elapsed time = 5273 ms.

    ================================================================================

    [/font]

    And, check it out... which code is actually easier to read?

    Still think the "ugly trick" method is still for small tables? :Whistling:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Almost forgot... here's the output when STATISTICS IO is ON... "Ugly trick" method beats the hell out of the XML method... 😀

    [font="Courier New"]================================================================================

    "Ugly trick" code...

    Table 'TestData'. Scan count 1, logical reads 1985, physical reads 0,

    read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    ================================================================================

    XML code...

    Table 'TestData'. Scan count 2501, logical reads 11463, physical reads 0,

    read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    ================================================================================

    [/font]

    Now... what were people saying about the "ugly trick" method? :w00t:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • One more thing... you left an extra comma at the end of your two outputs 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • (Please don't get mad... I'm just having a little fun...)

    Not at all:). I have actually been looking for this piece of code for some time.

    One more thing... you left an extra comma at the end of your two outputs

    Yeah, got lazy these can be removed by add a simple case statement. Additionally, the select method originally posted adds the extra comma too. 😛

    DECLARE @FINAL varCHAR (40)

    SELECT @FINAL = ''

    SELECT @FINAL = @FINAL + Code + ',' FROM table

    😉

    Also I think the performance hit is directly correlated to manner in which you are using FOR XML. I have only used this techinque to create a string on a set of data, not process each row. For XML will perform substantially worse in doing this method because it is supposed to convert a set of data into XML. I believe,in the scenario you provided, it would be better to create a derived table and inner join the ids together versus process the xml for each row.

    It seems the example I demonstrated was for a traditional use but the example you showed how the process method could perform worse under CERTAIN conditions. I still believe FOR XML will outperform the select method 9 times out of 10, under traditional circumstances. I do know one thing for sure and that is how not to use FOR XML PATH.

    -Adam

  • That's exactly what I was talking about... you asked why I would use something that I knew was slower... I knew that for the application I wanted, it would be faster. And, that's the key to writing a lot of performance code... "A Developer must not guess... a Developer must KNOW."

    That's why folks like you and me test things and why we hardly ever deal in absolutes... we know that something that causes huge problems when used in one manner, is a "performance god-send" when used in an ever so slightly different manner. Another example is finding ranges of missing ID's in a column. Everyone knows that correlated sub-queries usually suck the life blood out of a query... but when used properly for certain situations, there's (almost) nothing faster.

    (Heh... had to avoid the absolute of "nothing 😛 )

    For XML will perform substantially worse in doing this method because it is supposed to convert a set of data into XML.

    By the way, that's exactly what it does and the "set" of data isn't for each row... it's for the "distict" data formed by the GROUP BY.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • touche friend. :)That's exactly what I was talking about... you asked why I would use something that I knew was slower... I knew that for the application I wanted, it would be faster. And, that's the key to writing a lot of performance code... "A Developer must not guess... a Developer must KNOW."

    That's why folks like you and me test things and why we hardly ever deal in absolutes... we know that something that causes huge problems when used in one manner, is a "performance god-send" when used in an ever so slightly different manner. Another example is finding ranges of missing ID's in a column. Everyone knows that correlated sub-queries usually suck the life blood out of a query... but when used properly for certain situations, there's (almost) nothing faster.

    (Heh... had to avoid the absolute of "nothing )

    Well said.

    By the way, that's exactly what it does and the "set" of data isn't for each row... it's for the "distict" data formed by the GROUP BY.

    Yep. This is exactly what I meant. It runs once per each returned row from the main query. In my tests I was running FOR XML one time.

  • Yep. This is exactly what I meant. It runs once per each returned row from the main query. In my tests I was running FOR XML one time.

    Perfect... just didn't want you to think that I instantiated a triangular join or anything like that.

    Both methods I used create a bit of a correlated subquery. This is one of those times when a correlated subquery isn't so bad. Process a million rows in about 4 seconds? Not bad for either method. I certainly wouldn't send either method back to a developer for correction after a code review.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 23 total)

You must be logged in to reply to this topic. Login to reply