Split string using Tally Table

  • Like a few others, I got it right a bit by chance by looking at the number of dots.

    Frankly, do we need such a complicated statement to learn something?

    I don't think so...

    By the way, I did not learn anything with that question!

  • Jeff Moden (7/2/2010)


    BWAA-HAA!!!! First, this is NOT a good example of Tally Table code. It doesn't use a Tally Table and it doesn't use anything that could be called efficient. Anyone who uses a recursive CTE to generate Tally numbers just doesn't know what hidden RBAR is. 😛

    I agree with what someone else has stated... this code should be used as an example of how NOT to write code and how NOT to accomplish a split. Same goes for any example in this thread that uses a recursive CTE to do the split. 😉

    I hope no one actually copies the code to use as a split function but I will say it was very interesting in how many "extra" things were added to the code in an attempt at obfuscation. Heh... maybe it should be titled "job security methods". :hehe:

    The urge to submit a post was burning me for a couple of days, but I decided to wait, because I was afraid that I will not be able to find the way to type what I think about the script in QotD without sounding rude, so I am glad that Jeff, Hugo and vk-kirov stood up to it. I agree that this is one of the finest examples how NOT TO DO something. There are some valid reasons to use recursive CTE, for example to write BOM queries in 2005 (HierarchyID is not yet available), but these reasons do not extend to creating a tally table, specifically with @text declared as varchar(500) but the recursion level restricted to 32767 (???).

    with tally (number) as

    (

    select

    top (len(@text))

    row_number() over (order by [object_id]) number

    from sys.objects

    )

    should do the trick right?

    Using xml to split delimited values is relatively cheap and not difficult to understand (Does not include the job security though :hehe:). Here is one example based on simply replacing the delimiter with closing and opening the node of arbitrary name and appending opening node to the head and closing to the tail:

    use AdventureWorks;

    go

    declare @delimiter char(1);

    declare @text nvarchar(500);

    declare @xml xml;

    -- set variable values

    select

    @delimiter = '.',

    @text = 'This t-sql will split these sentences into rows.' +

    'How many rows will be returned?.' + 'M.a.y.b.e..n.o.n.e.?',

    @xml = '<r>' + replace(@text, @delimiter, '</r><r>') + '</r>';

    -- here is the "split" in all its glory

    select

    item.value('text()[1]', 'varchar(100)') single_item

    from @xml.nodes('//r') R(item);

    The above happily returns desired output:

    single_item

    -----------------------------------------------

    This t-sql will split these sentences into rows

    How many rows will be returned?

    M

    a

    y

    b

    e

    NULL

    n

    o

    n

    e

    ?

    Oleg

  • Oleg Netchaev (7/6/2010)


    Jeff Moden (7/2/2010)


    BWAA-HAA!!!! First, this is NOT a good example of Tally Table code. It doesn't use a Tally Table and it doesn't use anything that could be called efficient. Anyone who uses a recursive CTE to generate Tally numbers just doesn't know what hidden RBAR is. 😛

    I agree with what someone else has stated... this code should be used as an example of how NOT to write code and how NOT to accomplish a split. Same goes for any example in this thread that uses a recursive CTE to do the split. 😉

    I hope no one actually copies the code to use as a split function but I will say it was very interesting in how many "extra" things were added to the code in an attempt at obfuscation. Heh... maybe it should be titled "job security methods". :hehe:

    The urge to submit a post was burning me for a couple of days, but I decided to wait, because I was afraid that I will not be able to find the way to type what I think about the script in QotD without sounding rude, so I am glad that Jeff, Hugo and vk-kirov stood up to it. I agree that this is one of the finest examples how NOT TO DO something. There are some valid reasons to use recursive CTE, for example to write BOM queries in 2005 (HierarchyID is not yet available), but these reasons do not extend to creating a tally table, specifically with @text declared as varchar(500) but the recursion level restricted to 32767 (???).

    with tally (number) as

    (

    select

    top (len(@text))

    row_number() over (order by [object_id]) number

    from sys.objects

    )

    should do the trick right?

    Using xml to split delimited values is relatively cheap and not difficult to understand (Does not include the job security though :hehe:). Here is one example based on simply replacing the delimiter with closing and opening the node of arbitrary name and appending opening node to the head and closing to the tail:

    use AdventureWorks;

    go

    declare @delimiter char(1);

    declare @text nvarchar(500);

    declare @xml xml;

    -- set variable values

    select

    @delimiter = '.',

    @text = 'This t-sql will split these sentences into rows.' +

    'How many rows will be returned?.' + 'M.a.y.b.e..n.o.n.e.?',

    @xml = '<r>' + replace(@text, @delimiter, '</r><r>') + '</r>';

    -- here is the "split" in all its glory

    select

    item.value('text()[1]', 'varchar(100)') single_item

    from @xml.nodes('//r') R(item);

    The above happily returns desired output:

    single_item

    -----------------------------------------------

    This t-sql will split these sentences into rows

    How many rows will be returned?

    M

    a

    y

    b

    e

    NULL

    n

    o

    n

    e

    ?

    Oleg

    Hi Oleg,

    There are, indeed, a number of ways to create a Tally table including the one you used here and a similar one on another post of yours. But you do have to be careful. sys.Objects can contain very few objects. On a brand new database in 2005, it will only contain about 47 objects and even squaring that number will only return 2209 rows. Instead, my recommendation is to refer to Master.sys.All_Columns which will have at least 4000 rows in it on a full installation.

    So far as splits go, using one form of Tally Table or another to split smaller items (like INTs) is usually faster than trying to split such items with XML (I'm working on the tests to show you that on the other thread). Splitting larger items such as sentences is generally done faster (depending on the size) with the XML split. In either case, it's generally preferable to use an iTVF (inline Table Valued Function) over a stored procedure to add the utility of being able to pass a whole column to be split and not just a single variable.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    Thank you very much for your suggestions. You make a very good point about the iTVF, it is definitely more useful than the proc. In my defense about the tally, I can say that my decision about a good-enough-for-qotd-scope tally script is based on the following assumption:

    Someone playing with the script either:

    "Got a job" 🙂 and thus has the development copy of the prod database with the sufficient number of records in sys.objects

    "Don't got a job" :w00t: and thus has AdventureWorks database with the sufficient number of records in sys.objects (about 1800 records, 3.24 mln if cross joined).

    iTVF works very well with cross apply, with which I cannot play at work as we are still at compat 80 2005, so I have to constantly restate my cross apply statements as joins (when I can) when moving the test scripts between the databases at work and AdventureWorks at home.

    Oleg

  • thanks for the question

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Well, I got it wrong - after looking briefly at the code and deciding there was now way I was going to try to make sense of such an ill-formated mess, feeling flabbergasted at the misuse of a recursive CTE to generate a tally table, and concluding that the where clause indicated that all I had to do was count the dots and add 1 unless it was a trck question and the code was going to return an error instead of any rows (and there was no way I was going to syntax-check anything with that layout, so assume no error) I counted the dots wrong (by leaving out the firstline - I had scrolled it off th top of my window and forgot about it).

    I hate code with unneeded complexity, especially when it's layed out so as to obscure its structure. I hate code that burns CPU cycles as if they were going out of fashion. It's a good question in that it does demonstrate that the select list is irrelevant when counting rows. It's a bad question in that it demonstrates a way that code should never be written.

    Tom

  • jcrawf02 (7/2/2010)


    Not sure why this one is tripping folks up, if you just count the delimiters, you can see it will return 13 pieces?

    ...anyway...good question, I'm sure it's introduced some folks to this idea that haven't seen it before.

    That's the point I don't get--when counting the delimiters of the @text variable, I'll only come up with 12.

    What am I missing here?

    Thanks

  • michael.kaufmann (7/13/2010)


    jcrawf02 (7/2/2010)


    Not sure why this one is tripping folks up, if you just count the delimiters, you can see it will return 13 pieces?

    ...anyway...good question, I'm sure it's introduced some folks to this idea that haven't seen it before.

    That's the point I don't get--when counting the delimiters of the @text variable, I'll only come up with 12.

    What am I missing here?

    Just found out--was only looking at the left of the delimiter... 😉

  • michael.kaufmann (7/13/2010)


    jcrawf02 (7/2/2010)


    Not sure why this one is tripping folks up, if you just count the delimiters, you can see it will return 13 pieces?

    ...anyway...good question, I'm sure it's introduced some folks to this idea that haven't seen it before.

    That's the point I don't get--when counting the delimiters of the @text variable, I'll only come up with 12.

    What am I missing here?

    Thanks

    You have to add 1: there's a piece before each delimiter and another piece after the last delimiter.

    Tom

  • michael.kaufmann (7/13/2010)


    jcrawf02 (7/2/2010)


    Not sure why this one is tripping folks up, if you just count the delimiters, you can see it will return 13 pieces?

    ...anyway...good question, I'm sure it's introduced some folks to this idea that haven't seen it before.

    That's the point I don't get--when counting the delimiters of the @text variable, I'll only come up with 12.

    What am I missing here?

    Thanks

    Both the text before the first delimiter and the text after the last delimiter will be a row, as each bit of text between two delimiters. That results in #rows = #delimiters + 1.

    You can see this in the WHERE clause:

    WHERE (...) = @StringDelimiter -- Twelve rows, for the twelve delimiters.

    OR Number - 1 = LEN(@Text) -- One row, for the end of the text.

    If the last character is a delimiter (that is, if the text had ended in a dot instead of a question mark), the number of rows would be equal to the number of delimiters, with the "last" row matching both of the OR'ed conditions.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo Kornelis (7/13/2010)


    Both the text before the first delimiter and the text after the last delimiter will be a row, as each bit of text between two delimiters. That results in #rows = #delimiters + 1.

    You can see this in the WHERE clause:

    WHERE (...) = @StringDelimiter -- Twelve rows, for the twelve delimiters.

    OR Number - 1 = LEN(@Text) -- One row, for the end of the text.

    If the last character is a delimiter (that is, if the text had ended in a dot instead of a question mark), the number of rows would be equal to the number of delimiters, with the "last" row matching both of the OR'ed conditions.

    Hugo,

    thank you very much for your detailed explanation.

    In my mind, a sentence always ends with a dot; hence I was only looking at the left side of the delimiter. Just realized after my initial post that the last piece of text ended with a question mark; so the right hand side needs to be considered as well.

    Thanks again,

    Michael

  • I selected None as i got lost in the logic. also i thought 0 & none would be same isn't it?

Viewing 12 posts - 31 through 41 (of 41 total)

You must be logged in to reply to this topic. Login to reply