I need some help with a test, please.

  • The problem with the Tally Table related splits is that I forgot :blush: to add 1 to the length....

    WHERE N BETWEEN 1 AND LEN(@Parameter)+1

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (9/22/2010)


    Heh... Of course it doesn't handle Unicode correctly... the variables are all VARCHAR.

    That's a bit circular! It doesn't handle Unicode because it doesn't? 😛

    The parameters are indeed VARCHAR, my point is that there is no need to exclude Unicode data from your split function. My worry is that people may forget that it is not Unicode-compatible, and pass such a string to it. The function would appear to work correctly (no errors or warnings) but any un-mappable characters would be silently replaced by question marks. Scope for improvement?

  • Paul White NZ (9/22/2010)


    Jeff Moden (9/22/2010)


    Heh... Of course it doesn't handle Unicode correctly... the variables are all VARCHAR.

    That's a bit circular! It doesn't handle Unicode because it doesn't? 😛

    The parameters are indeed VARCHAR, my point is that there is no need to exclude Unicode data from your split function. My worry is that people may forget that it is not Unicode-compatible, and pass such a string to it. The function would appear to work correctly (no errors or warnings) but any un-mappable characters would be silently replaced by question marks. Scope for improvement?

    Heh... I thought you were just being obvious. 😛 Rough day yesterday.

    I agree. I've repaired the +1 problem (it didn't make a noticeable difference in performance). I'll flip the code to NVARCHAR tonight and test it. Without getting into the world of MAX variables, I need to change the test a bit so it doesn't exceed 4k. Of course, the next thing I need to do after that is some MAX testing and we already know that MAX and Tally splits aren't famous for getting along.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (9/22/2010)


    The problem with the Tally Table related splits is that I forgot :blush: to add 1 to the length....

    WHERE N BETWEEN 1 AND LEN(@Parameter)+1

    Indeed. Even with that correction, however, the LEN prevents correct results when using a space as the delimiter:

    ALTER FUNCTION dbo.Split8KTally

    (@Parameter VARCHAR(8000), @Delimiter VARCHAR(1))

    RETURNS TABLE

    WITH SCHEMABINDING AS

    RETURN

    SELECT ROW_NUMBER() OVER (ORDER BY N) AS ItemNumber,

    SUBSTRING(@Parameter,N,CHARINDEX(@Delimiter,@Parameter+@Delimiter,N)-N) AS ItemValue

    FROM dbo.Tally

    WHERE N BETWEEN 1 AND LEN(@Parameter)+1

    AND SUBSTRING(@Delimiter+@Parameter,N,1) = @Delimiter; --Notice how we find the comma

    GO

    SELECT ItemNumber, ItemValue FROM dbo.Split8KTally('A,B,', ','); -- 3 rows (correct)

    SELECT ItemNumber, ItemValue FROM dbo.Split8KTally('A B ', ' '); -- 2 rows (oops!)

    Sorry. 🙁

  • Jeff Moden (9/22/2010)


    The problem with the Tally Table related splits is that I forgot :blush: to add 1 to the length....

    WHERE N BETWEEN 1 AND LEN(@Parameter)+1

    Shouldn't that be

    WHERE N < DATALENGTH(@Parameter) + 2

  • steve-893342 (9/22/2010)


    Jeff Moden (9/22/2010)


    The problem with the Tally Table related splits is that I forgot :blush: to add 1 to the length....

    WHERE N BETWEEN 1 AND LEN(@Parameter)+1

    Shouldn't that be

    WHERE N < DATALENGTH(@Parameter) + 2

    Unless @Parameter is NVARCHAR.

    Could do LEN(@Parameter+'X')-1

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Paul White NZ (9/22/2010)


    This seems a good time to highlight some other issues too:

    . . .

    The XML solution needs to be modified (as shown on Brad's blog) to handle entitization correctly:

    DECLARE @CSV NVARCHAR(MAX) = N'A,B&C,,C,?,',

    @Delim NVARCHAR(255) = N',';

    SELECT S.ItemNumber,

    S.ItemValue

    FROM dbo.Split8KXMLBrad(@CSV, @Delim) S;

    Other character values are invalid in XML and would cause the method to fail completely. The XML solution also emit NULLs instead of empty strings.

    You're right about the NULL part. That's easily taken care of with an ISNULL call within the function.

    As for the "problem" characters like ampersand, less-than, greater-than, etc, the function doesn't handle that because it was originallly designed to receive a list of INTEGER values. Still, to avoid a runtime error, it should handle those special characters.

    I wrote about how to do this in a blog post last year:

    http://bradsruminations.blogspot.com/2009/10/un-making-list-or-shredding-of-evidence.html

    Here is the revised function... unfortunately adding the logic to handle the encoding/decoding adds more overhead and therefore more CPU, but that's the penalty you pay for doing "tricks" like this by using XML:

    create function dbo.ufn_SplitNCharArrayXML

    (

    @List nvarchar(max)

    ,@Delimiter nchar(1)

    )

    returns table

    as

    return

    select Item=isnull(Item,N'') --Handle NULL

    from (select XMLEncoded=(select [*]=@List for xml path(''))) F0 --Handle Encodings

    cross apply (select XMLString='<x>'+replace(XMLEncoded,@Delimiter,'</x><x>')+'</x>') F1

    cross apply (select XMLList=cast(XMLString as xml).query('.')) F2

    cross apply XMLList.nodes('/x') F3(XMLNode)

    cross apply (select Item=XMLNode.value('(./text())[1]','nvarchar(4000)')) F4

    select * from dbo.ufn_SplitNCharArrayXML(N'A,B&C,,C,?,',',')

    /*

    Item

    ----

    A

    B&C

    C

    ?

    (6 rows affected)

    */

    --Brad

  • Paul White NZ (9/22/2010)


    I must admit I'm a little confused about the conclusions being drawn overall.

    At the risk of stating the obvious, using a multi-statement TVF is exactly equivalent to splitting the input value into a table variable, and then running a separate join query.

    In any case, the net effect is to produce a better plan by counteracting certain costing model limitations by introducing another - namely that table variables are estimated as producing one row (unless OPTION(RECOMPILE) is used).

    There's nothing wrong with that of course - but it does mean that one has to choose between Tally implementations depending on how large the input set is.

    EXACTLY... There doesn't seem to be a one-stop-shopping function that does it all.

  • Brad Schulz (9/22/2010)


    You're right about the NULL part. That's easily taken care of with an ISNULL call within the function.

    Certainly.

    As for the "problem" characters like ampersand, less-than, greater-than, etc, the function doesn't handle that because it was originallly designed to receive a list of INTEGER values. Still, to avoid a runtime error, it should handle those special characters. I wrote about how to do this in a blog post last year: http://bradsruminations.blogspot.com/2009/10/un-making-list-or-shredding-of-evidence.html

    I know, I referred to your blog in my previous post. Maybe I should have been more explicit about that.

    Here is the revised function... unfortunately adding the logic to handle the encoding/decoding adds more overhead and therefore more CPU, but that's the penalty you pay for doing "tricks" like this by using XML:

    Yep, and that was my main reason for mentioning it. To make your method generally applicable, we would need to use the slower method - quite how much slower it is I don't know. I don't think too many people split strings containing invalid XML characters (those that cannot be entitized-around) but I guess the possibility exists.

    Paul

  • Brad Schulz (9/22/2010)


    EXACTLY... There doesn't seem to be a one-stop-shopping function that does it all.

    For my money, a production-quality SQLCLR solution is closest, since it produces the best/very good performance for all the cases we have considered, and handles very large strings and Unicode as well.

    Unusual cases might require a query hint or other minimal manual tuning, but then that's true of most queries. Is that fair?

  • Paul White NZ (9/22/2010)


    Brad Schulz (9/22/2010)


    EXACTLY... There doesn't seem to be a one-stop-shopping function that does it all.

    For my money, a production-quality SQLCLR solution is closest, since it produces the best/very good performance for all the cases we have considered, and handles very large strings and Unicode as well.

    Unusual cases might require a query hint or other minimal manual tuning, but then that's true of most queries. Is that fair?

    Yes, that's fair. I concede that the CLR approach seems to be the best one-stop-shopping approach... it does perform... the 100MB memory grant thing bugs me though.

    If CLR is not available, and if any lists you split are going to be less than 100 items (which I think is the case 99.9% of the time), then I would think that the RBAR approach (or any kind of multi-line function approach) is really the best (in SQL2008 that is). 😀

  • Brad Schulz (9/22/2010)


    Yes, that's fair. I concede that the CLR approach seems to be the best one-stop-shopping approach... it does perform... the 100MB memory grant thing bugs me though.

    Me too, but there are robust ways to avoid that. I hope that future releases will have better support for costing TVFs - both CLR and T-SQL.

    If CLR is not available, and if any lists you split are going to be less than 100 items (which I think is the case 99.9% of the time), then I would think that the RBAR approach (or any kind of multi-line function approach) is really the best (in SQL2008 that is). 😀

    You are probably right. The lack of scalability is a concern, though.

    I think I will stop violently agreeing with you now. 😎

  • Paul White NZ (9/22/2010)


    You are probably right. The lack of scalability is a concern, though.

    I think I will stop violently agreeing with you now. 😎

    :laugh: :laugh: :laugh: :laugh: :laugh:

  • Mark-101232 (9/22/2010)


    steve-893342 (9/22/2010)


    Jeff Moden (9/22/2010)


    The problem with the Tally Table related splits is that I forgot :blush: to add 1 to the length....

    WHERE N BETWEEN 1 AND LEN(@Parameter)+1

    Shouldn't that be

    WHERE N < DATALENGTH(@Parameter) + 2

    Unless @Parameter is NVARCHAR.

    Could do LEN(@Parameter+'X')-1

    Could you write it like this to accommodate the NVARCHAR?

    WHERE N < LEN(@Parameter + 'XX')

    or perhaps

    WHERE N < DATALENGTH(@Parameter)/2 + 2

  • steve-893342 (9/22/2010)


    Jeff Moden (9/22/2010)


    The problem with the Tally Table related splits is that I forgot :blush: to add 1 to the length....

    WHERE N BETWEEN 1 AND LEN(@Parameter)+1

    Shouldn't that be

    WHERE N < DATALENGTH(@Parameter) + 2

    You can if you want but the parts in the query that use t.N don't ever include both the leading comma and the trailing comma so it's not necessary.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 166 through 180 (of 214 total)

You must be logged in to reply to this topic. Login to reply