Improving Performance for Some LIKE “%string%” Searches (SQL Spackle)

  • Comments posted to this topic are about the item Improving Performance for Some LIKE “%string%” Searches (SQL Spackle)


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Thanks you for your most interesting (and concise) post. When I created a filtered index (AString IS NOT NULL) I got results very similar to the CTE. (Logical Reads = 3271) This might be a simpler solution for some. However for whatever reason my SQL Server never included the parallel plan so it's not exactly a like for like comparison.

  • Hi,

    thanks for the post and only a few comments.

    You do not need to write CTE to get index seek in your case (where you have a lot of sparse data). It ise enough to put the IS NOT NULL condition directly to the WHERE caluse.

    SELECT AString

    FROM dbo.StringsTest

    WHERE AString IS NOT NULL AND AString LIKE '%221%';

    However in case there is a lot of sparse data, the filtered index will be better as it will not need to store all the NULLs and the overall index size will be lower.

    CREATE INDEX s_ix1 ON dbo.StringsTest (AString) WHERE AString IS NOT NULL;

    You will have again index scan instead of index seek, but the amount of data in the original seek will be the same as in case of filtered index scan.

    Anyway in case of real frequent searches for worlds inside comments or whatever longer strings you can gain perfromance by utilizing full-text indexes.

  • Just noticed, you have the NOT NULL directly in the WHERE clause in your blog post.

  • Pavel Pawlowski (4/6/2014)


    Hi,

    thanks for the post and only a few comments.

    You do not need to write CTE to get index seek in your case (where you have a lot of sparse data). It ise enough to put the IS NOT NULL condition directly to the WHERE caluse.

    SELECT AString

    FROM dbo.StringsTest

    WHERE AString IS NOT NULL AND AString LIKE '%221%';

    However in case there is a lot of sparse data, the filtered index will be better as it will not need to store all the NULLs and the overall index size will be lower.

    CREATE INDEX s_ix1 ON dbo.StringsTest (AString) WHERE AString IS NOT NULL;

    You will have again index scan instead of index seek, but the amount of data in the original seek will be the same as in case of filtered index scan.

    Anyway in case of real frequent searches for worlds inside comments or whatever longer strings you can gain perfromance by utilizing full-text indexes.

    Hi Pavel,

    I probably should have mentioned that there were 2 reasons why I left the CTE version in this article.:

    1) I thought it se off the filter a little better (clarity)

    2) It seemed to be slightly faster (although I can't tell why) than the version that I showed in the blog.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Dear Dwain

    1. You write

    "Because we know that SQL Server ignores NULL values when it constructs an INDEX. "

    If I interpret your statement the way it was intended, then,

    to my knowledge it is true with ORACLE, but not with SQL Server

    create table test(i int)

    create unique index test_i_uq on test(i)

    insert into test(i) select null

    insert into test(i) select null

    -->

    sg 2601, Level 14, State 3, Line 1

    Cannot insert duplicate key row in object 'test' with unique index 'test_i_uq'.

    The statement has been terminated.

    (0 row(s) affected)

    2. If you know, what you do not want to see, why not exclude it by a filtered index

    create index ... where stringvar <> ''

    and then query (do not forget to include the same filter)

    select ... from mytable .. where stringvar <> '' and stringvar like '%xyz%'

    This is essentially the same as your solution without the overhead of building rows for empty search string.

  • Nice write up Dwain.

    As with anything like this, whether this technique benefits you or not will depend on so many factors.

    On my laptop, for instance, I see no difference between the query forms (SQL 2012) you have included.

    In my case, the fastest result (elapsed time) comes from this form:

    select Astring

    from StringsTest

    where patindex('%221%',AString)>0;

    ... as it does not prevent parallelism.

    That does, however come with a CPU cost that is nearly double your CTE / Straight select versions, but does complete almost three times as fast.

    *It depends* is the phrase, as usual.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Nice one Dwain! It seems that your refactor has woken up a slice of the optimizer to the fact that it doesn't actually have to "rip through" NULL entries to return the answer. This really SHOULD BE included in the optimizer for all such cases. Obviously that hasn't percolated up the hit-list for Mr. Cunningham's group on the SQL Server Optimization Team. 🙂

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Thanks for the great article!

  • ...and on my SQL 2014 test machine, these are all identical:

    WITH Strings AS

    (

    SELECT ID, AString

    FROM dbo.StringsTest

    WHERE AString IS NOT NULL

    )

    SELECT AString

    FROM Strings

    WHERE AString LIKE '%221%';

    SELECT AString

    FROM StringsTest

    WHERE AString >'' and Astring LIKE '%221%';

    SELECT AString

    FROM StringsTest

    WHERE AString is not null and Astring LIKE '%221%';

    And all perform better than the winner on my laptop using patindex...it depends...

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • mister.magoo (4/7/2014)


    Nice write up Dwain.

    As with anything like this, whether this technique benefits you or not will depend on so many factors.

    On my laptop, for instance, I see no difference between the query forms (SQL 2012) you have included.

    In my case, the fastest result (elapsed time) comes from this form:

    select Astring

    from StringsTest

    where patindex('%221%',AString)>0;

    ... as it does not prevent parallelism.

    That does, however come with a CPU cost that is nearly double your CTE / Straight select versions, but does complete almost three times as fast.

    *It depends* is the phrase, as usual.

    Hi there MM. I ran my tests on SQL 2008 R2, so I can't vouch for what this does or doesn't do on SQL 2012. It is interesting if the performance characteristic has changed. I've found that to be the case on at least one other occasion.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • h.tobisch (4/7/2014)


    Dear Dwain

    1. You write

    "Because we know that SQL Server ignores NULL values when it constructs an INDEX. "

    If I interpret your statement the way it was intended, then,

    to my knowledge it is true with ORACLE, but not with SQL Server

    create table test(i int)

    create unique index test_i_uq on test(i)

    insert into test(i) select null

    insert into test(i) select null

    -->

    sg 2601, Level 14, State 3, Line 1

    Cannot insert duplicate key row in object 'test' with unique index 'test_i_uq'.

    The statement has been terminated.

    (0 row(s) affected)

    2. If you know, what you do not want to see, why not exclude it by a filtered index

    create index ... where stringvar <> ''

    and then query (do not forget to include the same filter)

    select ... from mytable .. where stringvar <> '' and stringvar like '%xyz%'

    This is essentially the same as your solution without the overhead of building rows for empty search string.

    In truth, I haven't gotten around to running a test with a filtered index. While I intend to before I do anything about the Prod system this is intended for, my feeling is kind of like "why go to the bother of a filtered index if you can do as well without one."


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • From my perspective, the filtered index means you do not need to change your source code to get a performance increase. i.e. if you have a dozen bits of badly performing code, you can fix all of them with one filtered index without needing to change/test/deploy a dozen pieces of source code.

  • nick.mcdermaid (4/7/2014)


    From my perspective, the filtered index means you do not need to change your source code to get a performance increase. i.e. if you have a dozen bits of badly performing code, you can fix all of them with one filtered index without needing to change/test/deploy a dozen pieces of source code.

    In my experience with numerous clients your statements are soOO not true Nick. The SET statement requirements can cause a break in many a database app out there. Also all parameterized queries cannot use the filtered index because the value(s) of any parameter(s) is not known at compile time. Filtered indexes are a GREAT idea in THEORY, but the actual implementation leaves a lot to be desired.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • dwain.c (4/7/2014)


    In truth, I haven't gotten around to running a test with a filtered index. While I intend to before I do anything about the Prod system this is intended for, my feeling is kind of like "why go to the bother of a filtered index if you can do as well without one."

    Shy to bother of filtered index? Because of performance and index maintenance. If you will have a table with 1 bilion of rows and index on a column where 90 % of cases is NULL, then the index will be large and in case you will search for NULLs and the index will not cove rthe query, it will not be used. So no sense to store the NULLs there.

    If you create a filtered index, then it will cover only those 10 % and will not have store pointers to the other 90 % of rows with NULL in that particular column. You will save space and processig power.

  • Viewing 15 posts - 1 through 15 (of 38 total)

    You must be logged in to reply to this topic. Login to reply