Performance Tuning: Concatenation Functions and Some Tuning Myths

  • Comments posted to this topic are about the item Performance Tuning: Concatenation Functions and Some Tuning Myths

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Not only is the substance of this article excellent, but I also find the style most attractive. It's well-written, extremely useful and also quite amusing. What more could one ask for? If this, the first article I've read in 2008, is representative of what's to come, then already 2008 is looking very promising.

  • What a great way to start the year! Thank you for the awesome compliments, Paul!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • nice one jeff...

    "Keep Trying"

  • Thanks, Chirag. Appreciate the compliment!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Nice, as always, Jeff. I notice you said you didn't do any testing on 2005. If you do more testing on 2005, be sure to try out the XML data type with FOR XML for concatenating strings. Here's some timings for the sample below (run on a 1.9 GHz box with 2 GB RAM, SQL 2005):

    * Test with PK/CI on RowNum and NCI on SomeID ran in 9 seconds for 1,000,000 rows

    * An alternate version using the data() node test as a column name and the REPLACE function instead of SUBSTRING ran in 11 seconds on 1,000,000 rows

    * Changing the NCI to a covering index on (SomeID, SomeCode) the code executed in 4 seconds on 1,000,000 rows. I left RowNum out of the covering index create index statement since it's the CI already, so automatically added to the NCI.

    Who knows Jeff, maybe XML will find a place in your heart after all? 🙂

    --Statement to create covering index:

    --CREATE INDEX IX_TestData_SomeInt ON dbo.TestData(SomeID, SomeCode)

    -- Jeff's table creation code

    SELECT TOP 1000000

    RowNum = IDENTITY(INT,1,1),

    SomeID = ABS(CHECKSUM(NEWID()))%2500+1,

    SomeCode = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    INTO dbo.TestData

    FROM Master.dbo.SysColumns t1,

    Master.dbo.SysColumns t2 ;

    ALTER TABLE dbo.TestData

    ADD PRIMARY KEY CLUSTERED (RowNum);

    CREATE INDEX IX_TestData_SomeInt ON dbo.TestData(SomeID);

    -- End table creation code

    GO

    -- Begin function creation

    CREATE FUNCTION dbo.fnConcatTest (@SomeID INT)

    RETURNS VARCHAR(8000)

    AS

    BEGIN

    DECLARE @x XML;

    DECLARE @Return VARCHAR(8000);

    SET @x = (SELECT ',' AS '*',

    SomeCode AS '*'

    FROM dbo.TestData

    WHERE SomeID = @SomeID

    FOR XML PATH (''), TYPE);

    SET @RETURN = SUBSTRING(CAST(@x AS VARCHAR(8000)), 2, 8000);

    RETURN @Return;

    END

    GO

    --Speed test

    SELECT SomeID, dbo.fnConcatTest(SomeID) AS CSVString

    FROM dbo.TestData

    GROUP BY SomeID;

  • Heh... thanks Mike, but you didn't read the code block just before the conclusion 😉 Like Ragu... "It's in there". 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I like the fact you show it's a code issue not a sql issue. I'm constantly asked to index queries to improve them when the reality is that often the underlying code is at fault - I do find less programmers/developers seem to get sent on T-SQL training these days.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • Oops you're right, I did miss that code block. I think the speedup is not really due to UDF overhead though, but rather because of FOR XML and xml data type optimizations. In the example I posted it still uses a UDF and it's still really fast.

    You can also do something like this with the data() node test, replacing the SUBSTRING and the inline string concatenation with a single REPLACE function call:

    SELECT t1.SomeID, REPLACE ((SELECT t2.SomeCode AS 'data()'

    FROM dbo.TestData t2

    WHERE t1.SomeID = t2.SomeID

    FOR XML PATH('')), ' ', ',')

    FROM dbo.TestData t1

    GROUP BY t1.SomeID;

    Glad to bring you over to the XML dark side, even if you're just dipping your little toe in 🙂

    Thanks again Jeff 🙂

  • colin Leversuch-Roberts (1/1/2008)


    I like the fact you show it's a code issue not a sql issue. I'm constantly asked to index queries to improve them when the reality is that often the underlying code is at fault - I do find less programmers/developers seem to get sent on T-SQL training these days.

    Absolutely spot-on, Colin... that's one of the things I'm hoping that folks take away with them after they study the article a bit... doesn't matter what the hardware is and doesn't matter how many indexes you add to a table (well, for Inserts, Updates, and Deletes it might 😉 ), poorly written code cannot be saved by table/index "Tuning Tricks"... the code has to be right. It's like painting a rusty car... it may (or may not) look good for a day or two, but unless you take care of the underlying metal, the paint's got no place to stick. 😛

    On the training thing... Not only do I agree with you on the lack of opportunity to be trained, but a good lot of the training doesn't teach folks how to write code with performance in mind. For example, when's the last time you saw some SQL Server training that included something like a Tally table or what a "Triangular Join" actually is? They just don't teach the good stuff in the schools that are supposed to... maybe a seminar here and there but not in most of the "schools".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Mike C (1/1/2008)


    Oops you're right, I did miss that code block. I think the speedup is not really due to UDF overhead though, but rather because of FOR XML and xml data type optimizations. In the example I posted it still uses a UDF and it's still really fast.

    You can also do something like this with the data() node test, replacing the SUBSTRING and the inline string concatenation with a single REPLACE function call:

    SELECT t1.SomeID, REPLACE ((SELECT t2.SomeCode AS 'data()'

    FROM dbo.TestData t2

    WHERE t1.SomeID = t2.SomeID

    FOR XML PATH('')), ' ', ',')

    FROM dbo.TestData t1

    GROUP BY t1.SomeID;

    Glad to bring you over to the XML dark side, even if you're just dipping your little toe in 🙂

    Thanks again Jeff 🙂

    Thanks for the great feedback and the code examples, Mike. Heh... "dark side" indeed... I hate XML 😛 Still, it seems that's the way a lot of folks have gone and I'm going to do some more heavy testing on some XML things... you know what they say... "Know thy enemy" :w00t:

    Part of the testing isn't to avoid XML... I can see grand use for some of the "new" XML functions like the concatenation example, for example (even though one should severely limit the use of concatenation in SQL).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/1/2008)


    what a "Triangular Join" actually is? They just don't teach the good stuff in the schools that are supposed to...

    I haven't seen a course that even mentioned that anything other than an equi-join is possible. I showed a join on > recently to a developer I work with and his reaction was "You can do that????"

    And then there's the problem that a lot of people don't want to learn. That's a whole nother discussion.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Jeff Moden (1/1/2008)


    Thanks for the great feedback and the code examples, Mike. Heh... "dark side" indeed... I hate XML 😛 Still, it seems that's the way a lot of folks have gone and I'm going to do some more heavy testing on some XML things... you know what they say... "Know thy enemy" :w00t:

    Part of the testing isn't to avoid XML... I can see grand use for some of the "new" XML functions like the concatenation example, for example (even though one should severely limit the use of concatenation in SQL).

    Hey Jeff, send me your address and I'll fire off a copy of my SQL Server XML book to you in a few months when it goes to press 🙂 You might find some other useful features in there as well 🙂

  • GilaMonster (1/1/2008)


    Jeff Moden (1/1/2008)


    what a "Triangular Join" actually is? They just don't teach the good stuff in the schools that are supposed to...

    I haven't seen a course that even mentioned that anything other than an equi-join is possible. I showed a join on > recently to a developer I work with and his reaction was "You can do that????"

    And then there's the problem that a lot of people don't want to learn. That's a whole nother discussion.

    You must have slept through that part of the 431 prep...:)

    I've been having some extra time on my hands at home (funny - sometimes you get EXACTLY what you wish for for christmas...), so I've taken some time to get back into some of my resolutions and see if I can finish some of these certs I've been dancing around with for a while, now that I am back into consulting.

    Not real in depth, but since the collection is targeted at being a primer, I wouldn't expect it to. But still - Inner, left/right/full outer, cross join, equijoins, self-joins, Non-equijoins (come on - who can name the five types of non-equijoins?), cross/outer apply. And they finish off with EXCEPT and INTERSECT, and a small discussion about UNION, UNION ALL, TOP and TABLESAMPLE (that was a new one for me - never had occasion to use it).

    ...and for those dying for the answer to today's quiz... the five types of non-equijoins are: comparison, inequality, range, conversion, and expression...

    In case you're looking for it - it's in the "collection 2778" in MS learning (although that has gotten VERY pricey since I ordered it).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • You must have slept through that part of the 431 prep...

    Please tell me they don't actually teach how to use Triangular Joins to derive running totals 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 82 total)

You must be logged in to reply to this topic. Login to reply