Splitting Strings Based on Patterns

  • Irozenberg 1347 (11/29/2012)


    Why we have to be inventive and create such acrobatic solution in a first place? Just to compensate for bad design and implementation decisions.

    If original DB developers did not use it as a data dump, there is no need for such acrobatics. Good article to refresh my memory.

    GRUMPY OLD MEN

    See my previous response to Mabry. The need is not always present as a result of bad design.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Jeff Moden (11/29/2012)


    Peter H (11/29/2012)


    My version of SQL (2005) does not seem to handle the "VALUES" in

    SELECT TOP(DATALENGTH(@List))

    n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

    FROM

    (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n),

    (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e (n),

    (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) f (n),

    (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) g (n)

    Could anyone help me expand on this please?

    That form of the VALUES clause came out in 2008. The following can be used as a replacement that will work at approximately the same speed...

    --===== "Inline" CTE Driven "Tally Table" produces values from 0 up to

    -- 10,000... enough to cover VARCHAR(8000)

    WITH E1(N) AS (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ), --10E+1 or 10 rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b) --10E+4 or 10,000 rows max

    --===== Generate the numbers from 1 to the length of @List

    SELECT TOP (DATALENGTH(@List))

    n = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM E4

    ;

    Thanks for the cover Jeff.

    The QU splitter also contains an Itzik Ben-Gan style tally table that Peter could use (after fixing the formatting issues):

    WITH Nbrs_3(n) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1)

    ,Nbrs_2 (n) AS (SELECT 1 FROM Nbrs_3 n1 CROSS JOIN Nbrs_3 n2)

    ,Nbrs_1 (n) AS (SELECT 1 FROM Nbrs_2 n1 CROSS JOIN Nbrs_2 n2)

    ,Nbrs_0 (n) AS (SELECT 1 FROM Nbrs_1 n1 CROSS JOIN Nbrs_1 n2)

    ,Tally (n) AS (SELECT ROW_NUMBER() OVER (ORDER BY n) As n FROM Nbrs_0)

    SELECT *

    FROM Tally


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Jeff Moden (11/29/2012)


    peter-757102 (11/29/2012)


    I haven't read the article properly and just scanned it to see what it is about and if it is interesting.

    But one thing I noticed quickly was a quote at end saying:

    SQL Server Execution Times: CPU time = 249 ms, elapsed time = 247 ms.

    Which fired all alarm bells in my head at once as one shoud not use set statistics time on for any serious benchmarking, ever! Measuring performance this way can have a really big impact on the actual execution times themselfs. This has to do with how SQL Server internally processes various constructs and its points of measurement.

    The most reliable way of measuring is to have long sufficiently long runs and start with a GetDate() and end with a GetDate() and compute the difference after the test has run. You will require a variable to store the start time for this, but it won't affect the results as much.

    Other then that, the article looks as quite a bit of decdication and work has gone in, and I will sure read it in full when I have the time. In the meantime, please recheck the conclusions based on this different method of measuring, you might be surprised!

    Although strongly I agree with what you said (proof is here... http://www.sqlservercentral.com/articles/T-SQL/91724/), don't forget that CPU time can and frequently does exceed DURATION if parallelism is involved.

    Someone else came along recently and said the same thing because he read the article that Jeff quoted (a fine one by him as always of course). From that article I did not draw the conclusion that the findings with respect to elapsed time presented themselves always, only with respect to timings of UDFs.

    Now, maybe I'm mistaken about that, but I'd sure like to see the evidence that Peter's claim is based on.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Jeff Moden (11/29/2012)


    Nicely done, Dwain! I've not done a deep dive on it all but it's a great read that you really did put your heart into. Due to a NEC problem (Not Enough Coffee :-D), I couldn't figure out where you were going with all of this until I saw the first result set (3rd string) and the lights suddenly went on. Well done!

    I was really looking forward to this article coming out because I think it's a fairly undiscovered approach to similar problems that have come up time and time again on this forum.

    I was a bit concerned that people might not initially see the utility. But if the lights came on for you, hopefully they will for others. Glad you stopped by and I'm hoping to hear more from you once you do have a chance to do a deep dive. There's probably room for improvement, as I indicated towards the end!


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • And finally a reply to all that have replied on this discussion thread.

    My thanks to all of you for your comments and taking the time to read the article. And I apologize for the formatting issues in the SQL text boxes. I blame the article's editing software, which continues to elude my best efforts to post "pretty" copy. 😀

    Maybe by the time I have 29 articles like Jeff does I'll get the hang of it.

    And let's not forget the credit to ChrisM@Work for his Ferrari! I named the function after him (PatternSplitCM) so that folks will never forget his contribution.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Thanks for the fix, guys.

    Used it like this (in case anyone else wants it)

    WITH E1(N) AS (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ), --10E+1 or 10 rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b) --10E+4 or 10,000 rows max

    --===== Generate the numbers from 1 to the length of @List

    , Numbers as (

    SELECT TOP (DATALENGTH(@List))

    n = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM E4

    )

  • Great, great article Dwain. Amazing work and perhaps your best article yet on SSC IMHO (it's certainly your longest).

    I would have been delighted to propose a looping solution as the first choice solution to this problem because it clearly performs better than the other set-based alternatives that I could come up with. There I said it, but it still feels weird.

    ...

    If only this article gets people thinking about better approaches to this sort of problem, I feel that I have performed my task adequately.

    Though I have been writing queries for quite some time, I was absolutely clueless about what set-based SQL programming was about until this year when I began participating in SSC forumns, reading the articles, etc (there, I said it, but it still feels true.) Your articles and forumn posts have been intstumental in helping me completely re-learn SQL programming.

    I had to right a loop today for solving a string problem; I'm not proud of it. The difference is the loop was the absolute last choice... the weapon of last resort when all attempts to develop a set-based solution had failed.

    Again, keep up the great work!

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan.B (11/29/2012)


    Great, great article Dwain. Amazing work and perhaps your best article yet on SSC IMHO (it's certainly your longest).

    I would have been delighted to propose a looping solution as the first choice solution to this problem because it clearly performs better than the other set-based alternatives that I could come up with. There I said it, but it still feels weird.

    ...

    If only this article gets people thinking about better approaches to this sort of problem, I feel that I have performed my task adequately.

    Though I have been writing queries for quite some time, I was absolutely clueless about what set-based SQL programming was about until this year when I began participating in SSC forumns, reading the articles, etc (there, I said it, but it still feels true.) Your articles and forumn posts have been intstumental in helping me completely re-learn SQL programming.

    I had to right a loop today for solving a string problem; I'm not proud of it. The difference is the loop was the absolute last choice... the weapon of last resort when all attempts to develop a set-based solution had failed.

    Again, keep up the great work!

    Wow Alan, if I didn't know better you'd make me almost think I've developed a following. :blush:

    While I can't possibly take credit for pushing the concept of set-based programming (others like Jeff, Chris, Joe Celko, etc. deserve much more), I do like to emphasize the importance of thinking that way first and then when all else fails consider other approaches. And as I said, while I am loathe to admit it at times, loops sometimes do perform better. But that seems to be a serious minority of the cases. So I'll stay on my soap box at least until something better comes along.

    Thanks for the high praise! While I feel humbled and undeserving, I'll graciously accept it. 😀


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • gloing38 (11/29/2012)


    very good website:

    (http://www.globalmarket.name)

    The website wholesale for many kinds of fashion shoes, like the nike, jordan, prada, also including the shirts, bags, hat and the decorations.

    WE ACCEPT CREDIT CARD /WESTERN UNION PAYMENT

    YOU MUST NOT MISS IT!!!:-)

    SPAM reported.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • IMHO this task would be better implemented in CLR UDF rather than in pure T-SQL.

    CLR UDF easily allows to apply REGULAR EXPRESSIONS.

    Just a thought.

  • fregatepallada (11/29/2012)


    IMHO this task would be better implemented in CLR UDF rather than in pure T-SQL.

    CLR UDF easily allows to apply REGULAR EXPRESSIONS.

    Just a thought.

    So how about an example?

    I'd love to put it through its paces and compare the performance against the pure SQL version.

    Thanks for stopping by, reading the article and contributing to the discussion!


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (11/29/2012)


    fregatepallada (11/29/2012)


    IMHO this task would be better implemented in CLR UDF rather than in pure T-SQL. CLR UDF easily allows to apply REGULAR EXPRESSIONS.

    So how about an example?

    I'd love to put it through its paces and compare the performance against the pure SQL version.

    Hi Dwain,

    SSC author Solomon Rutzky[/url] has the free SQL# library available at http://www.sqlsharp.com. Among the many features available in the free version running under the SAFE permission set is the RegEx_Replace function. Taking the example from the original thread and the start of this article:

    Given an input string such as: 1234ABC123456XYZ1234567890ADS, I would like to replace any string of digits that is longer than 5 characters with some other character (e.g., ‘x’), while leaving the remaining characters in the string unchanged.

    The whole solution is:

    SELECT

    SQL#.RegEx_Replace(N'1234ABC123456XYZ1234567890ADS', N'\d{5,}', N'x', -1, 1, NULL);

    Producing the result:

    1234ABCxXYZxADS

    There are all sorts of powerful and useful functions in this library, for example, if you want to list the matches for a regular expression (which is a much more powerful superset of the CHARINDEX and PATINDEX syntax), you can use:

    SELECT * FROM SQL#.RegEx_Matches(N'1234ABC123456XYZ1234567890ADS', N'\d{5,}', 1, NULL);

    +----------------------------------------------------+

    ¦ MatchNum ¦ Value ¦ StartPos ¦ EndPos ¦ Length ¦

    ¦----------+------------+----------+--------+--------¦

    ¦ 1 ¦ 123456 ¦ 8 ¦ 13 ¦ 6 ¦

    ¦ 2 ¦ 1234567890 ¦ 17 ¦ 26 ¦ 10 ¦

    +----------------------------------------------------+

  • Paul - Thanks for stopping by and making the CLR suggestion. Haven't seen you posting much of late so seeing your name pop up surprised me.

    I'm a bit CLR-challenged (among other things :-P) but I'm going to try to take a look at this after I return next week from Papua New Guinea.

    For those that need a pure SQL solution because their shop doesn't allow CLRs, I'm hoping what I provided (again many thanks to ChrisM@Work) will be suitable at least as a stop gap.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (11/29/2012)


    Paul - Thanks for stopping by and making the CLR suggestion. Haven't seen you posting much of late so seeing your name pop up surprised me.

    I don't answer SSC forum questions very often (you can find me on http://dba.stackexchange.com/) but I do monitor question of the day and the articles feed from time to time.

    I'm a bit CLR-challenged (among other things :-P) but I'm going to try to take a look at this after I return next week from Papua New Guinea.

    There's no CLR understanding needed. Solomon's library is a simple T-SQL script to install, so all you need do is set a couple of options at the top of the script and hit F5 to install. If you can write T-SQL, you can use his library's functions.

    For those that need a pure SQL solution because their shop doesn't allow CLRs, I'm hoping what I provided (again many thanks to ChrisM@Work) will be suitable at least as a stop gap.

    Right, but you did ask for a CLR implementation so here it is 🙂

    If you're interested, you might like to read Solomon's article on Simple Talk that explores some of the performance characteristics.

  • SQL Kiwi (11/29/2012)


    dwain.c (11/29/2012)


    Paul - Thanks for stopping by and making the CLR suggestion. Haven't seen you posting much of late so seeing your name pop up surprised me.

    I don't answer SSC forum questions very often (you can find me on http://dba.stackexchange.com/) but I do monitor question of the day and the articles feed from time to time.

    I'm a bit CLR-challenged (among other things :-P) but I'm going to try to take a look at this after I return next week from Papua New Guinea.

    There's no CLR understanding needed. Solomon's library is a simple T-SQL script to install, so all you need do is set a couple of options at the top of the script and hit F5 to install. If you can write T-SQL, you can use his library's functions.

    For those that need a pure SQL solution because their shop doesn't allow CLRs, I'm hoping what I provided (again many thanks to ChrisM@Work) will be suitable at least as a stop gap.

    Right, but you did ask for a CLR implementation so here it is 🙂

    If you're interested, you might like to read Solomon's article on Simple Talk that explores some of the performance characteristics.

    Well I must say I'm impressed that the download and install was pretty darn smooth!

    Looks like I've got a bit of reading and exploring to do now. Looks like a heck of a library. Gonna keep me away from writing articles for a spell I guess. But then, people are probably tired of me prattling on anyway.

    Thanks again Paul.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 15 posts - 16 through 30 (of 52 total)

You must be logged in to reply to this topic. Login to reply