Convert String to a Table using CTE

  • Why should I use this method versus using a good old fashioned Numbers table?

  • msaleem-583379 (12/14/2009)


    I've used a function similar to MiddleDatestoTable in my apps, however I followed a different approach.

    ...

    You do realize since your routine is a multi-statement TVF that the Quesry Optimizer will always treat the table returned y this function as if it has only one row regardless of the nuumber of rows actually returned?

    This means if your function is returning more than a few hundred rows you won't be getting an optimal execution plan.

    I discovered this with some AsOF multi-statement TVF's I developed. I had a query linking seven of this TVF together, and after 30 minutes, cancelled the query and modified (actually drop and create) the TVF from multi-statement to in-line TVF. Amazingly, the query then returned in four minutes.

    I have to agree with Paul White, here we go with the discussions regarding what is the best string splitting functions. The use of a recursive CTE, however, is not very scalable. If you don't have a CLR TVF, which Paul indicates is the best, the next best is one based on a tally table. I have a function that actually contains its own tally table should you not have one.

  • Rob Fisk (12/14/2009)


    Pretty certain I found this based on another article here somewhere but cannot find it to give credit.

    I believe that must be a Jeff Moden article, unless I'm mistaken?

  • I'd like to do a poll and find out what most people call it - a "tally" table, a "numbers" table, a "helper" table or what. It would be interesting to know.

    Maybe there's a geographic trend. a "tally" table sounds like something out of Great Britain - where they ride around on the trolley - while "helper" comes from Texas/Colorado/Tennessee with some cities rated as the friendliest in the U.S. and then maybe straight up "numbers," an ambiguous and somewhat unhelpful name for L.A. and New York City where the people are rated by some as the most unhelpful in the U.S. (Of course we all know these are generalizations that don't hold true for everybody - and where do they come up with these stats anyway? ).

    Are there any more names you've heard for the "tally" table?

    Bill Nicolich: www.SQLFave.com.
    Daily tweet of what's new and interesting: AppendNow

  • I called mine the 'Integers' table. Because it's a table of INTs. Also, I used to write code for the lumber industry, where 'Tally' has a specific meaning.

  • I have some great bit of difficulty in believing that people are still using While Loops and the "Hidden RBAR" method of recursive CTE's for such a thing.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jonathan Melo (12/14/2009)


    Rob Fisk (12/14/2009)


    Pretty certain I found this based on another article here somewhere but cannot find it to give credit.

    I believe that must be a Jeff Moden article, unless I'm mistaken?

    First saw the Tally table mentioned by Jeff but it was shortly after and by someone else that I saw an article that carried the string splitting example.

    I call it a Tally because that what it was referred to primarily when I was first introduced to it. Never heard it called a helper table which, I must say, is the most ambiguous of the names since it does not say what it helps with. Numbers table, fairly straightforward but lacking in descriptiveness since it does indeed contain numbers and only numbers.

    If you look at the dictionary definition of Tally then it comes closest to the mark as a description though none quite fit the bill. Besides it's nicely alliterative.

    [font="Arial"]_______________________________________________________
    Change is inevitable... Except from a vending machine.[/font]

  • Paul White, I am certain this is not correct.

    "THE" fastest way to achieve string splitting is a 2 stage approach

    1. use a CLR scalar function to convert a CSV or delimitted string in to a fixed width column string eg 10 chars per id.

    122_______12345324__1234567___12________

    2. with the result string, use a tally/numbers/etc table to split the fixed width string using substring. substring(@fixedlist, n * 20-19, n * 20)

    The problem with solely using CLR is that the marshalling between SQL and .NET is very slow per row.

    So it actually turns out more efficient this way.

    And in the specific case of this article, it is even easier, you can skip the CLR part by initially constructing the argument list as fixed width separation.

  • Sweet bit of TSQL code. As stated by Amit, he used this to parse out parameters passed into a stored proc, so practically the number of params normally would not hit the Max Recursion boundaries. I however do feel a few minor tweaks are needed in order to make it really general purpose.

    1. return a varchar instead of an int, if the values passed in the array are integers, then simply convert the resultant table in the caller

    2. Add a check for at least on delimiter or else the function may fail

    3. And lastly check for empty strings as these are generated when the array list has only 1 item.

    The changed code and examples are below

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    -- =============================================

    -- Author:Amit Gaur

    -- Create date: July 25th 2008

    -- Description:Convert a string to a table

    -- Modified by: Pratap Prabhu, 12/14/2009

    -- 1. Changed returned table to return varchar instead of int

    -- 2. Added trailing Delimiter if no delimiter exists

    -- 3. Added check for Blank item

    -- =============================================

    CREATE FUNCTION [dbo].[strToTable]

    (

    @array varchar(max),

    @del char(1)

    )

    RETURNS

    @listTable TABLE

    (

    Item varchar(max) --changed from int to varchar(max)

    )

    AS

    BEGIN

    --Added check for at least 1 delimiter

    if(CHARINDEX(@del,@array,1)=0)

    set @array=@array+@del

    WITH rep (item,list) AS

    (

    SELECT SUBSTRING(@array,1,CHARINDEX(@del,@array,1) - 1) as item,

    SUBSTRING(@array,CHARINDEX(@del,@array,1) + 1, LEN(@array)) + @del list

    UNION ALL

    SELECT SUBSTRING(list,1,CHARINDEX(@del,list,1) - 1) as item,

    SUBSTRING(list,CHARINDEX(@del,list,1) + 1, LEN(list)) list

    FROM rep

    WHERE LEN(rep.list) > 0

    )

    INSERT INTO @listTable

    SELECT item FROM rep where item<>'' --added check that item is not blank

    RETURN

    END

    GO

    select * from strToTable('String1',',')

    go

    select * from strToTable('String1,String2',',')

    go

    select Item=convert(int,Item) from strToTable('1,2,3,4',',')

    go

    select Item=convert(datetime,Item) from strToTable('10/12/2001,10/13/2001,10/14/2001,10/15/2001',',')

    go

  • We use the following function and it works quite well. It will return 'empty' items in the string array as well as handle single item strings without the delimiter. However, this won't handle delimiters with lengths > 1.

    CREATE FUNCTION udf_StrList2Table (

    @List NVARCHAR(MAX) = '',

    @Delimiter NCHAR(1) = N','

    )

    RETURNS TABLE

    AS

    RETURN

    SELECT SUBSTRING( @Delimiter + @List + @Delimiter, n + 1,

    CHARINDEX( @Delimiter, @Delimiter + @List + @Delimiter, n + 1 ) - n - 1 ) AS Value,

    ROW_NUMBER() OVER ( ORDER BY n ) AS ListPos

    FROM MSLC.dbo.Tally

    WHERE SUBSTRING( @Delimiter + @List + @Delimiter, n, 1 ) = @Delimiter

    AND n < LEN( @Delimiter + @List + @Delimiter )

    /* Anything is possible but is it worth it? */

  • Please don't flame me for suggesting a poorly performing technique. The code below works great with a minimum of programming... but only when the two initial assumptions are correct!

    If the list contains only integer primary keys, and the primary keys are defined by a reference table that doesn't take too long to scan, then I like to delimit every entry in the string list with a non-integer character such as '|' and then build the list table in this very simple way for a reference table named MyTable having a primary key named MyID:

    INSERT INTO @listTable (item)

    SELECT MyID

    FROM MyTable

    WHERE CharIndex('|' + Cast(MyID AS varchar(10)) + '|',@array) > 0

    Here, @listTable and @array are the same as in Amit's example, but the @array string must begin and end with the delimiter ('|' in this case).

  • I second Roland's comment.

    I was involved in a similar situation before and since we were working with sql server 2005 I had the developer pass a xml value to the SProc.

  • I originally stipulated

    Paul White, I am certain this is not correct.

    "THE" fastest way to achieve string splitting is a 2 stage approach

    1. use a CLR scalar function to convert a CSV or delimitted string in to a fixed width column string eg 10 chars per id.

    122_______12345324__1234567___12________

    2. with the result string, use a tally/numbers/etc table to split the fixed width string using substring. substring(@fixedlist, n * 20-19, n * 20)

    The problem with solely using CLR is that the marshalling between SQL and .NET is very slow per row.

    So it actually turns out more efficient this way.

    And in the specific case of this article, it is even easier, you can skip the CLR part by initially constructing the argument list as fixed width separation.

    i apologise I should really put some code in place to demonstrate

    -- first build the fixed padded width string with 10000 items!! @fixed

    -- this simluates what the application would pass through

    declare @fixed varchar(max)

    declare @CT int

    select @CT=0,@fixed = ''

    select @fixed = @fixed + cast(n as char(10))

    from dbo.Tally where n < 10000

    -- here is the REAL code, now split it

    select starttime = getdate()

    select substring(@fixed, (n*10)+1, 10)

    from dbo.tally

    where n <=(len(@fixed)-1)/10

    select endtime = getdate()

    runs in only a few hundred ms!! on my local pc.

    I suspect it would be a lot less without having to send the list back to the client mgmt studio.

    Infact, I just checked and it is more like 20ms.

  • This helps the original query handle the case of a single value passed in? Not sure if it's the most elegant solution since I'm checking charindex many times, but...

    DECLARE@array varchar(max)

    DECLARE @del char(1)

    SET @array = '1' -- Works for '1','1,2,3','1,,2,3' etc.

    SET @del = ',';-- Semicolon needed for cte expression below

    WITH rep (item,list) AS

    (

    SELECTCASE WHEN CHARINDEX(@del,@array,1) > 0 THEN

    SUBSTRING(@array,1,CHARINDEX(@del,@array,1) - 1)

    ELSE @array END AS item,

    CASE WHEN CHARINDEX(@del,@array,1) > 0 THEN

    SUBSTRING(@array,CHARINDEX(@del,@array,1) + 1, LEN(@array)) + @del

    ELSE '' END AS list

    UNION ALL

    SELECT SUBSTRING(list,1,CHARINDEX(@del,list,1) - 1) as item,

    SUBSTRING(list,CHARINDEX(@del,list,1) + 1, LEN(list)) list

    FROM rep

    WHERE LEN(rep.list) > 0

    )

    -- Use this to retrieve all items including empty values like 1,,,3 or 1,2,3, which has a trailing empty item

    SELECT * FROM rep

    -- Use this to remove empty items from string

    -- SELECT item, list FROM rep WHERE LEN(item) > 0

  • Pratap Prabhu (12/14/2009)


    I however do feel a few minor tweaks are needed ...

    The best minor tweek you could make to the code is to write it so it doesn't use either recursion or explicit loops. Both are quite slow compared to a set based solution.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 16 through 30 (of 73 total)

You must be logged in to reply to this topic. Login to reply