Splitting Strings Based on Patterns

  • Dwain Camps

    SSC Guru

    Points: 86873

    Comments posted to this topic are about the item Splitting Strings Based on Patterns


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • apsalescoordinator

    Valued Member

    Points: 66

    dwain.c (11/28/2012)


    Comments posted to this topic are about the item <A HREF="/articles/String+Manipulation/94365/">Splitting Strings Based on Patterns</A>

  • dplaut 49149

    Mr or Mrs. 500

    Points: 531

    Is there a way to download these examples, copying text out of those scrolling boxes is a pain and carries along formatting code.

  • Mike DiRenzo

    SSCrazy

    Points: 2013

    WOW. What a great read!

  • peter-757102

    SSCertifiable

    Points: 6877

    I haven’t read the article properly and just scanned it to see what it is about and if it is interesting.

    But one thing I noticed quickly was a quote at end saying:

    SQL Server Execution Times: CPU time = 249 ms, elapsed time = 247 ms.

    Which fired all alarm bells in my head at once as one shoud not use set statistics time on for any serious benchmarking, ever! Measuring performance this way can have a really big impact on the actual execution times themselfs. This has to do with how SQL Server internally processes various constructs and its points of measurement.

    The most reliable way of measuring is to have long sufficiently long runs and start with a GetDate() and end with a GetDate() and compute the difference after the test has run. You will require a variable to store the start time for this, but it won’t affect the results as much.

    Other then that, the article looks as quite a bit of decdication and work has gone in, and I will sure read it in full when I have the time. In the meantime, please recheck the conclusions based on this different method of measuring, you might be surprised!

  • MarbryHardin

    Old Hand

    Points: 333

    It does seem like that would be one instance where a CLR udf would do very well, especially to support more complex examples. That’s precisely the type of thing regular expressions are good at.

    Although, string parsing like that in SQL typically smells like bad design or something that should have been done outside SQL in the first place.

  • Peter H

    SSC Enthusiast

    Points: 127

    My version of SQL (2005) does not seem to handle the “VALUES” in

    SELECT TOP(DATALENGTH(@List))

    n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

    FROM

    (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n),

    (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e (n),

    (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) f (n),

    (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) g (n)

    Could anyone help me expand on this please?

  • fregatepllada

    SSCommitted

    Points: 1648

    Why we have to be inventive and create such acrobatic solution in a first place? Just to compensate for bad design and implementation decisions.

    If original DB developers did not use it as a data dump, there is no need for such acrobatics. Good article to refresh my memory.

    GRUMPY OLD MEN

  • Jeff Moden

    SSC Guru

    Points: 993410

    Irozenberg 1347 (11/29/2012)


    Why we have to be inventive and create such acrobatic solution in a first place? Just to compensate for bad design and implementation decisions.

    In a word, YES! 😉

    If original DB developers did not use it as a data dump, there is no need for such acrobatics. Good article to refresh my memory.

    I make a good part of my living by fixing such data dumps. Thank goodness there’s a need for such acrobatics! 😀

    “Fellow GRUMPY OLD MAN”.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems

  • Jeff Moden

    SSC Guru

    Points: 993410

    Peter H (11/29/2012)


    My version of SQL (2005) does not seem to handle the "VALUES" in

    SELECT TOP(DATALENGTH(@List))

    n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

    FROM

    (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n),

    (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e (n),

    (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) f (n),

    (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) g (n)

    Could anyone help me expand on this please?

    That form of the VALUES clause came out in 2008. The following can be used as a replacement that will work at approximately the same speed…

    –===== “Inline” CTE Driven “Tally Table” produces values from 0 up to

    — 10,000… enough to cover VARCHAR(8000)

    WITH E1(N) AS (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ), –10E+1 or 10 rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), –10E+2 or 100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b) –10E+4 or 10,000 rows max

    –===== Generate the numbers from 1 to the length of @List

    SELECT TOP (DATALENGTH(@List))

    n = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM E4

    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems

  • Jeff Moden

    SSC Guru

    Points: 993410

    peter-757102 (11/29/2012)


    I haven't read the article properly and just scanned it to see what it is about and if it is interesting.

    But one thing I noticed quickly was a quote at end saying:

    SQL Server Execution Times: CPU time = 249 ms, elapsed time = 247 ms.

    Which fired all alarm bells in my head at once as one shoud not use set statistics time on for any serious benchmarking, ever! Measuring performance this way can have a really big impact on the actual execution times themselfs. This has to do with how SQL Server internally processes various constructs and its points of measurement.

    The most reliable way of measuring is to have long sufficiently long runs and start with a GetDate() and end with a GetDate() and compute the difference after the test has run. You will require a variable to store the start time for this, but it won't affect the results as much.

    Other then that, the article looks as quite a bit of decdication and work has gone in, and I will sure read it in full when I have the time. In the meantime, please recheck the conclusions based on this different method of measuring, you might be surprised!

    Although strongly I agree with what you said (proof is here… http://www.sqlservercentral.com/articles/T-SQL/91724/), don’t forget that CPU time can and frequently does exceed DURATION if parallelism is involved.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems

  • Jeff Moden

    SSC Guru

    Points: 993410

    Nicely done, Dwain! I’ve not done a deep dive on it all but it’s a great read that you really did put your heart into. Due to a NEC problem (Not Enough Coffee :-D), I couldn’t figure out where you were going with all of this until I saw the first result set (3rd string) and the lights suddenly went on. Well done!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems

  • Dwain Camps

    SSC Guru

    Points: 86873

    dplaut 49149 (11/29/2012)


    Is there a way to download these examples, copying text out of those scrolling boxes is a pain and carries along formatting code.

    Sorry but in this article I didn’t post downloadable examples except for the final test harness for DelimitedSplit8K. I’ll make an attempt to do so in a couple of days, as I am traveling out of the country over the weekend on business and I’ll have limited time.

    I’ll try to remember to PM you once that is done.

    Edit: I have uploaded the scripts to here: http://www.sqlservercentral.com/Forums/Topic1390297-3122-4.aspx


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Dwain Camps

    SSC Guru

    Points: 86873

    Mike DiRenzo (11/29/2012)


    WOW. What a great read!

    Thanks Mike! Glad you stopped by.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Dwain Camps

    SSC Guru

    Points: 86873

    MarbryHardin (11/29/2012)


    It does seem like that would be one instance where a CLR udf would do very well, especially to support more complex examples. That's precisely the type of thing regular expressions are good at.

    Although, string parsing like that in SQL typically smells like bad design or something that should have been done outside SQL in the first place.

    To your first point, yes and no. Certainly there are regular expression CLR parsers out there that you could adapt. So if you substitute the regular expression for the PATINDEX argument, this could work well. However if you need to use the SQL PATINDEX for some reason, I suspect creating that in a CLR would be a major headache (waiting for someone to prove me wrong).

    To you second point, in general I agree but often times we forget that we’re not the only designers out there, so unfortunately there are a few bad ones. 🙂 But more importantly, there are times where incoming data to our SQL Server will take the much more rudimentary format of strings that must be parsed to derive the incoming information. Any application that takes feeds from a device (e.g., GPS unit) will typically receive messages that need to be parsed in some format.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 15 posts - 1 through 15 (of 53 total)

You must be logged in to reply to this topic. Login to reply