Finding a String Anywhere in Another String

  • Comments posted to this topic are about the item Finding a String Anywhere in Another String

  • Thank you for taking the time to write the article. I found out about this option last year when Aaron Bertrand wrote about it. If someone wants to read more about it, here's his article.
    https://sqlperformance.com/2017/02/sql-indexes/seek-leading-wildcard-sql-server

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Hi Janeta,

    Thank you for taking the time to write this article. I agree this is a very good technique for doing these types of searches.
    There are a few points of constructive criticism I'd like to offer...
    #1) You didn't supply any sample/test data. Without that, your reader isn't able to follow along and replicate your results.
    #2) You didn't provide the code necessary to generate the "StringSplit" values. Yes it's easy enough to do but you shouldn't assume that your reader knows how to do this in an efficient manner.
    #3) If you get rid of the "StringID" and just use the original string, you'll then have the ability to join back to the original table. You'll also be able to eliminate the duplicate issue with a query like the fillowing...

    DECLARE @_search_value VARCHAR(50) = '40BB-ABF3';

    SELECT
        *
    FROM
        dbo.TableToBeSearched ttbs
    WHERE
        EXISTS (SELECT 1 FROM dbo.StringSplit ss WHERE ttbs.StringOfText = ss.StringOfText AND ss.StringSplit LIKE @_search_value + '%');

  • @luis, I was unaware of Aaron's post and I could not find anything similar online, so I decided to describe what we implemented at work. I hope that this article will further popularise the technique.

    @jason, thank you very much for the suggestions. This is my first article, and I will try to be more thorough in the future. I hope that my attempt to illustrate the idea will still be helpful to the SQL  Server Central readers in its current format.

  • Janeta Parakosova - Tuesday, July 31, 2018 11:03 AM

    @luis, I was unaware of Aaron's post and I could not find anything similar online, so I decided to describe what we implemented at work. I hope that this article will further popularise the technique.

    @jason, thank you very much for the suggestions. This is my first article, and I will try to be more thorough in the future. I hope that my attempt to illustrate the idea will still be helpful to the SQL  Server Central readers in its current format.

    No worries. I'm glad to see a new person creating articles. Hopefully I didn't say anything that would dissuade you from writing more in the future.
    As far as how well it illustrates to idea... I'm probably not a good person to ask. Like Luis, I was already familiar with the technique (and Aaron's previous article) so I immediately recognized what  you were doing and why you were doing it.

  • Janeta Parakosova - Tuesday, July 31, 2018 11:03 AM

    @luis, I was unaware of Aaron's post and I could not find anything similar online, so I decided to describe what we implemented at work. I hope that this article will further popularise the technique.

    @jason, thank you very much for the suggestions. This is my first article, and I will try to be more thorough in the future. I hope that my attempt to illustrate the idea will still be helpful to the SQL  Server Central readers in its current format.

    Folks are just making suggestions.  I've found that the "discussions" that follow an article help everyone including the author of the article (and that includes me!) .

    And don't feel bad about someone pointing out an article that predates yours.  Even Aaron Bertrand missed at least one article that predates his... one that has even more information about such a thing...
    http://www.sqlservercentral.com/articles/Tally+Table/142316/   (Hat's off to  Alan Burstein for his great article)

    And, hat's off to you, Janeta!  Anyone that takes the time to write such a nice clear article with examples is OK in my book and I'm pretty sure that I speak for many of the others in this discussion.  It's a part of the reason why they're taking the time to do so.

    Welcome to SSC!!!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi, thanks for the post! Have you considered trying parallellism such as partition the table or similar to speed things up further?

    Br Håkan B

  • hakan_l_borg - Wednesday, August 1, 2018 2:53 AM

    Hi, thanks for the post! Have you considered trying parallellism such as partition the table or similar to speed things up further?

    Br HÃ¥kan B

    Parallelism is usually just used when needed and it should be correctly configured. Partitioning the table is not a performance solution for queries.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Hi Luis, well the APS sure ships as a parallellism solution to query perf issues so not sure what you mean?

  • hakan_l_borg - Wednesday, August 1, 2018 9:58 AM

    Hi Luis, well the APS sure ships as a parallellism solution to query perf issues so not sure what you mean?

    APS (Analytics Platform System) is a different solution all by itself and requires a special implementation. Parallelism well used can be a great option for performance, but it can also ruin it if you think of it as a magic option that will take care by itself.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Thats not what I said either.

  • hakan_l_borg - Thursday, August 2, 2018 12:09 AM

    Thats not what I said either.

    Understood but a whole lot of people take it the way Luis stated.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Tuesday, July 31, 2018 12:09 PM

    Janeta Parakosova - Tuesday, July 31, 2018 11:03 AM

    @luis, I was unaware of Aaron's post and I could not find anything similar online, so I decided to describe what we implemented at work. I hope that this article will further popularise the technique.

    @jason, thank you very much for the suggestions. This is my first article, and I will try to be more thorough in the future. I hope that my attempt to illustrate the idea will still be helpful to the SQL  Server Central readers in its current format.

    Folks are just making suggestions.  I've found that the "discussions" that follow an article help everyone including the author of the article (and that includes me!) .

    And don't feel bad about someone pointing out an article that predates yours.  Even Aaron Bertrand missed at least one article that predates his... one that has even more information about such a thing...
    http://www.sqlservercentral.com/articles/Tally+Table/142316/   (Hat's off to  Alan Burstein for his great article)

    And, hat's off to you, Janeta!  Anyone that takes the time to write such a nice clear article with examples is OK in my book and I'm pretty sure that I speak for many of the others in this discussion.  It's a part of the reason why they're taking the time to do so.

    Welcome to SSC!!!

    Speaking of suggestions... (from the article)...

    We can populate the table with a sample dataset of 5M rows, with exactly 20 characters in the String column and a sequential StringId. We will use this to test the performance of different queries.

    I cannot duplicate your findings of taking 14 seconds for the first two COUNT queries even on my slowest box .  Can you provide some examples of what the STRING column contains because I've apparently not generated the String column to contain the same kind of "sequential StringId" that you have.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff,

    I used GUIDs to generate random strings with LEFT(NEWID(), 20) for the article. I downgraded the SQL Azure instance to the S0 tier since I wrote the article and the query now takes around 20 seconds.

    The "sequential StringId" I referred to is the identity column.

  • Janeta Parakosova - Thursday, August 2, 2018 1:34 PM

    Hi Jeff,

    I used GUIDs to generate random strings with LEFT(NEWID(), 20) for the article. I downgraded the SQL Azure instance to the S0 tier since I wrote the article and the query now takes around 20 seconds.

    The "sequential StringId" I referred to is the identity column.

    Thanks, Janeta.  I'll be back.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply