Convert String to a Table using CTE

  • As mentioned its for parsing Parameters passed in to a stored proc. In a real world scenerio an app would let a user select multiple items in a list, most probably via some check boxes, accumulate the selected items in a delimited string and pass them in to the proc. So lets say a user selects 30 odd items, or make it 50. Whats the difference in time.....I suggest you look at it from a practical stand point, in theory you are quite right, but in practical terms for the usage described the routine is perfect.

  • Pratap Prabhu (12/14/2009)


    As mentioned its for parsing Parameters passed in to a stored proc. In a real world scenerio an app would let a user select multiple items in a list, most probably via some check boxes, accumulate the selected items in a delimited string and pass them in to the proc. So lets say a user selects 30 odd items, or make it 50. Whats the difference in time.....I suggest you look at it from a practical stand point, in theory you are quite right, but in practical terms for the usage described the routine is perfect.

    Okay, so you have a split function that works great for ten to thirty items. Along comes a requirement to split strings that are even longer and over several hundred thousand records. Oh, great, I have this nifty function I use to parse strings from an application, I'll just use it here. Problem is, the function doesn't scale well, so you end up with a routine that runs for an extended period of time, and you don't really know why. You don't bother to look at your string parsing rountine but, hey, it works great over here, it can't be the problem.

    See a problem??

    One thing I have learned here, code for scalability and performance. You never know how your nifty little functions may get used by others.

  • write it so it doesn't use either recursion or explicit loops. Both are quite slow compared to a set based solution.

    Following along with Jeff's comment, doing a set-based solution is good for developing the mind-set for thinking in sets in addition to the performance benefits.

    Here's a great resource by Robyn Page and Phil Factor over at simple-talk called The Helper Table Workbench[/url] which includes a set-based technique for Splitting Strings into table-rows, based on a specified delimiter.

    With that said, I think a helpful discussion we can have here is the question of when is it a good idea (generalizing about problems) to use a CTE approach which uses recursion and is an implicit looping technique given that we can usually use a set-based technique to get the same job done? Any opinions on that? Perhaps traversing hierarchies is one.

    Bill Nicolich: www.SQLFave.com.
    Daily tweet of what's new and interesting: AppendNow

  • Traversing hierarchies is about the only one I can think of although there are methods that will blow the doors off that, as well.

    Also, since we're dropping names and articles, here's one on how a Tally table actually works. I know the author personally 😉 and except for a minor error in one of the example scripts, he did a pretty good job of explaining how a Tally (or Numbers) table actually works to replace RBAR.

    http://www.sqlservercentral.com/articles/T-SQL/62867/

    He also wrote an article on passing parameters as 1, 2, and 3 dimensional "arrays".... without clogging the pipe and without the errors of not being able to handle single, blank, or null parameters.

    http://www.sqlservercentral.com/articles/T-SQL/63003/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Pratap Prabhu (12/14/2009)


    As mentioned its for parsing Parameters passed in to a stored proc. In a real world scenerio an app would let a user select multiple items in a list, most probably via some check boxes, accumulate the selected items in a delimited string and pass them in to the proc. So lets say a user selects 30 odd items, or make it 50. Whats the difference in time.....I suggest you look at it from a practical stand point, in theory you are quite right, but in practical terms for the usage described the routine is perfect.

    It's not a matter of time for a single usage.... it's a matter of time if you have 10,000 users all passing parameters within very short time periods of each other. Also, if some poor slob faced with an impossible schedule finds the performance challenged RBAR of the solution in this article and many of the solutions offered in this discussion and (s)he needs it to split a substantial number of rows in a table, (s)he is dead meat.

    NEVER justify poorly performing code based on its lack of frequency of usage because that frequency may not stay the same if someone uses the method for something else.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Aaron Gonzalez-394690 (12/14/2009)


    I second Roland's comment.

    I was involved in a similar situation before and since we were working with sql server 2005 I had the developer pass a xml value to the SProc.

    Aaron, do some "load" testing using the XML method... compared to the more straight forward set based methods, XML shredding of parameters is relatively slow and the extra overhead in the form of tags and the like serves for nothing but a pipe clogger. Seriously, don't use XML for this type of thing.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Roland Howard Boorman (12/14/2009)


    I simply use XML.

    Using XML whole complex structures can be passed to SQL and treated as Tables.

    You can use Functions with the XML to render these into virtual tables and treat these as regular tables!

    See above.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (12/14/2009)


    Aaron Gonzalez-394690 (12/14/2009)


    I second Roland's comment.

    I was involved in a similar situation before and since we were working with sql server 2005 I had the developer pass a xml value to the SProc.

    Aaron, do some "load" testing using the XML method... compared to the more straight forward set based methods, XML shredding of parameters is relatively slow and the extra overhead in the form of tags and the like serves for nothing but a pipe clogger. Seriously, don't use XML for this type of thing.

    Speaking of performance, you can take a look at some of the results I got when testing each method.

    http://jahaines.blogspot.com/2009/12/splitting-delimited-string-part-2.html

  • I'd suggest taking a look at Erland Sommarskog's page on passing arrays (list).

    http://www.sommarskog.se/arrays-in-sql-2005.html or the older article for sql 200

    http://www.sommarskog.se/arrays-in-sql-2000.html

    Good coverage of the various possibilities (with performance results)

    John

  • Adam Haines (12/14/2009)


    Jeff Moden (12/14/2009)


    Aaron Gonzalez-394690 (12/14/2009)


    I second Roland's comment.

    I was involved in a similar situation before and since we were working with sql server 2005 I had the developer pass a xml value to the SProc.

    Aaron, do some "load" testing using the XML method... compared to the more straight forward set based methods, XML shredding of parameters is relatively slow and the extra overhead in the form of tags and the like serves for nothing but a pipe clogger. Seriously, don't use XML for this type of thing.

    Speaking of performance, you can take a look at some of the results I got when testing each method.

    http://jahaines.blogspot.com/2009/12/splitting-delimited-string-part-2.html

    Nice artical/blog and I'll have to revisit XML based on what you said. However, you posted...

    the XML methods were just as performant as the numbers table methods

    ... not to the "pipe". No matter what you do, XML will pass more bytes than a simple tab delimited parameter. As you know, one of the biggest performance problems between the GUI and the database is the usage and clogging of the pipe mostly due to unnessary bytes being passed in one form or another.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Based on my earlier question

    Fatal Exception Error (12/14/2009)


    Why should I use this method versus using a good old fashioned Numbers table?

    It is safe to say that there is no reason except when the requirements are so different that it does not even represent the same problem?

  • Fatal Exception Error (12/14/2009)


    Based on my earlier question

    Fatal Exception Error (12/14/2009)


    Why should I use this method versus using a good old fashioned Numbers table?

    It is safe to say that there is no reason except when the requirements are so different that it does not even represent the same problem?

    I'm thinking that's a pretty safe thing to say.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • This is a useful function, even with the limitations mentioned in the thread. Here is a minor revision for consideration, which addresses the single item point, and a couple of other minor issues. Also, for clarity, comments have been added and some minor naming changes are offered.

    /* ========================================================================

    Script Name: CREATE_udf_StringToTable.sql

    Description: Convert a string to a table

    Parameters:(1) A delimited string of values

    (2) The character used to delimit values

    Test Calls:

    DECLARE @listTable TABLE (RowID int ,Item varchar(100));

    INSERT INTO @listTable

    SELECT * FROM [_udf_StringToTable]('Heuy|Dewey|Louie','|');

    =========================================================================*/

    CREATE FUNCTION [dbo].[_udf_StringToTable]

    (@valueStr varchar(max),

    @delimeter char(1)

    )

    RETURNS

    @listTable TABLE (RowID int Identity ,Item varchar(100))

    AS

    BEGIN

    IF (LTRIM(RTRIM(@valueStr)) <> SPACE(0)) --Ensure both parameters have values

    BEGIN

    --If the delimeter cannot be found in @valueStr, append it.

    IF (CHARINDEX(@delimeter,@valueStr,1) < 1)

    SET @valueStr = @valueStr + @delimeter;

    IF CHARINDEX(@delimeter,@valueStr,1) > 0

    BEGIN

    WITH rep (tableItem,listRemainder) AS

    (

    -- Get the portion of the valueStr that preceeds the delimiter, along with the remainder.

    -- Since the logic in the 2nd statement needs a delimiter for the CHARINDEX function,

    -- the @delimeter is always appended to the @valueStr remainder.

    SELECT

    SUBSTRING(@valueStr, 1, CHARINDEX(@delimeter,@valueStr,1) - 1) -- extracted @valueStr item

    , SUBSTRING(@valueStr, CHARINDEX(@delimeter,@valueStr,1) + 1, LEN(@valueStr)) + @delimeter --remainder

    UNION ALL

    -- Recursion, using the @valueStr remainder from the previous statement.

    -- When the remainder (listRemainder) is empty, we're done.

    SELECT

    SUBSTRING(listRemainder,1,CHARINDEX(@delimeter,listRemainder,1) - 1) -- extracted @valueStr item

    , SUBSTRING(listRemainder,CHARINDEX(@delimeter,listRemainder,1) + 1, LEN(listRemainder)) -- remainder

    FROM rep

    WHERE LEN(rep.listRemainder) > 0

    )

    INSERT INTO @listTable

    SELECT tableItem FROM rep WHERE tableItem <> SPACE(0)

    END

    END

    RETURN

    END

    GO

  • sam.walker (12/14/2009)


    Paul White, I am certain this is not correct.

    One of the longest threads ever on SSC was devoted to this: http://www.sqlservercentral.com/Forums/Topic695508-338-17.aspx

    There is a handy summary of the results on Florian Reischl's blog. That complements the SQLBlog article from Adam Machanic I mentioned before.

  • Paul White (12/14/2009)


    sam.walker (12/14/2009)


    Paul White, I am certain this is not correct.

    One of the longest threads ever on SSC was devoted to this: http://www.sqlservercentral.com/Forums/Topic695508-338-17.aspx

    There is a handy summary of the results on Florian Reischl's blog. That complements the SQLBlog article from Adam Machanic I mentioned before.

    Flo has a kid on his team that wrote a splitter CLR that blows everything away and, if memory serves, the code for that CLR is included someone in that thread.

    For things under 8k, the Tally table was definitely worthwhile performance wise as were a couple of other methods.... Recursive CTE's certainly weren't one of those.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 31 through 45 (of 73 total)

You must be logged in to reply to this topic. Login to reply