LIKE with and without wildcards in WHERE clause

  • I have a set of pretty well tuned queries with indexes and all the appropriate trinkets. This is a 'last restort' query, for when the user was absolutely unable to find what he needed. Speed is not crucial here, since it will not be used all that much - if any regular pattern of searches develops from this, I'll build new, separate queries to handle them, but this one will remain as a fallback, when the user has been unable to find anything using more orthodox approaches.

    The WHERE clause may comprise any number of fields, with any or all of them containing wildcards. A sample (air code) may look like this:

    WHERE fldA = 'abc' AND fldB LIKE '%def' AND fldC LIKE 'ghi_jkl' AND fldD = 'mno' ...

    The string literals here are actually all optional parameters, passed to the stored procedure from the user application. To deal with potentially missing parameters, I use the following syntax in my WHERE clause:

    WHERE (@System = '' or Oink.[System] like @System)

    AND(@Stratigrafie = '' or Oink.Stratigrafie like @Stratigrafie)

    AND(@Ulozeni = '' or Oink.Ulozeni like @Ulozeni)

    AND(@DrEvid = '' or Oink.DrEvid like @DrEvid)

    This all works reasonably well, though obviously the performance is never going to be stellar. I recently started fiddling with scanning the parameters for wildcard characters, with the notion of building a conditional phrase that would use 'LIKE' when there is a wildcard and '=' when there is not. However, the syntax is quite messy and I haven't even gotten it to work yet. Then it occurred to me that I may be doing some completely unnecessary work, considering how smart the SQL engine is in general.

    So, here (finally) is my question: Is the query engine smart enough on its own to recognize the absence of wildcard characters and use a straight compare in such a case, even though I have coded a 'LIKE' comparison? Or, if the engine is not that smart, is this worth bothering about anyway? It's almost certainly going to be a full table scan in any case, with multiple joined tables. Is the extra overhead of a LIKE comparison even relevant, considering how much other work this query has to do?

  • If speed isn't a issue here, I doubt it'd be worth it. I imagine there's easier ways to get bigger improvements too.

    Best bet would be to test for each case and see!

    Are you using OPTION (RECOMPILE) in your select? I've seen it have pretty dramatic effects on this type of query.

    Cheers

  • I have two options for you.

    The first is to change the empty string for a '%', that way you get rid of the ORs. You need to do that before the select.

    Another option is to check Gail Shaw's article on Catch-All queries[/url]

    Try what's best for your situation.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Gazareth (11/23/2012)


    If speed isn't a issue here, I doubt it'd be worth it. I imagine there's easier ways to get bigger improvements too.

    Best bet would be to test for each case and see!

    Are you using OPTION (RECOMPILE) in your select? I've seen it have pretty dramatic effects on this type of query.

    Cheers

    Well, speed is always somewhat of an issue, isn't it? 🙂 But it's not critical, since this is not intended to be used much. Testing numerous variations in this case is rather a pain, since I have around thirty parameters, any of which may or may not be present. I would have to try numerous field lengths, various placements of the searched-for text in the fields, various combinations of parameters to have any real idea, all of which would take quite a bit of work. If someone already knows the answer, much better use of resources.

    I'm not using the recompile - I keep forgetting about it. I'll give it a try on some of the queries that I have working, but are a little slow. Thanks.

  • Luis Cazares (11/23/2012)


    I have two options for you.

    The first is to change the empty string for a '%', that way you get rid of the ORs. You need to do that before the select.

    Another option is to check Gail Shaw's article on Catch-All queries[/url]

    Try what's best for your situation.

    Gail's articles are always a good bet, I spend hours some times going through her stuff, and still there's things I miss. Thanks for the link.

    I'm confused about the '%' for empty string, though. How exactly would you code that to eliminate the OR? Gail has ORs in her article, and I don't see how putting a wildcard in for the empty '' would affect that.

  • The easiest way is to do it like this

    WHERE Oink.[System] like @System + '%'

    ANDOink.Stratigrafie like @Stratigrafie + '%'

    ANDOink.Ulozeni like @Ulozeni + '%'

    AND Oink.DrEvid like @DrEvid + '%'

    Or you can add it to the variables before the query if it's easier for you.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis Cazares (11/23/2012)


    The easiest way is to do it like this

    WHERE Oink.[System] like @System + '%'

    ANDOink.Stratigrafie like @Stratigrafie + '%'

    ANDOink.Ulozeni like @Ulozeni + '%'

    AND Oink.DrEvid like @DrEvid + '%'

    Or you can add it to the variables before the query if it's easier for you.

    I see - it wasn't clear to me that you meant a concatenation. I thought that you simply suggested substituting a single percent for the empty string, and I couldn't see how that would change anything.

    But wouldn't this always compare all variables, though? In the syntax with OR, the query engine can see that a variable is empty, thereby satisfying one side of the OR condition, and need not even look in the database. If I use your construct, I think it would require examining every field, every time, to see if there is something in it. Also, it discards any record with nulls in any of the fields. I just tried a small test, and LIKE '%' does not pass a Null.

  • But wouldn't this always compare all variables, though? In the syntax with OR, the query engine can see that a variable is empty, thereby satisfying one side of the OR condition, and need not even look in the database.

    I am not sure that this is true. This isn't .NET where code branches based on one side of an IF or SWITCH. From what I have seen, the query engine runs both sides and then compares.

    But thats just me.

  • DiverKas (11/26/2012)


    But wouldn't this always compare all variables, though? In the syntax with OR, the query engine can see that a variable is empty, thereby satisfying one side of the OR condition, and need not even look in the database.

    I am not sure that this is true. This isn't .NET where code branches based on one side of an IF or SWITCH. From what I have seen, the query engine runs both sides and then compares.

    But thats just me.

    I don't know either. In VBA, such a comparison does get all parameters evaluated before the comparison is performed. In the Access Jet engine, it does not - a SWITCH function, for instance, stops evaluating when the first true condition is reached, so you can put in computations that yield a Null or unknown value, if something else stops the eval first, which is not possible in VBA.

    But the SQL Server optimizer is so smart about avoiding unnecessary work, it seems to me that in a situation when an already known value of a local variable can -completely- obviate the need to examine table data, it would most certainly do so. Maybe examining a few query plans would tell the story, but I'm still not very good at reading those, despite many hours spent over tutorials on the subject.

  • What you mention about choosing the best execution plan, is covered on Gail's article.

    And no, it won't give you better performance as it will use a "safe plan".

    You might not believe this, and the best thing to do, instead of guessing is to test.

    However, as you mention you need NULL values, then my option would be a dynamic query. As most things regarding SQL Server, the best solution would depend on the situation.;-)

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • The T-SQL query optimizer doesn't really shortcut the way you're thinking.

    I took a relatively large table on one of my test servers, added an index to an nvarchar(75) column, and tried both query constructs. The Select statement only covered the clustered index key, so no lookups were needed.

    One query had a Where clause like this:

    WHERE MyColumn LIKE @Input1 + '%'

    AND MyColumn2 LIKE @Input2 + '%'

    The other had:

    WHERE (@Input1 = '' OR MyColumn LIKE @Input1 + '%')

    AND (@Input2 = '' OR MyColumn2 LIKE @Input2 + '%')

    The first version, without the OR statements, gets an Index Seek in the Actual Execution Plan, the second gets an Index Scan. Both use the index I created to test this.

    I added a column to the query that wasn't included in the index, and the first one got an Index Seek + a Key Lookup, while the second remained as a full Index Scan (of course).

    Not that it means very much, but the one with the seek and the lookup was estimated at 17% of the total work, while the scan was estimated at 83%.

    On the data volume I'm dealing with (a few thousand rows in this case), the actual compile+execution time for both was comparable, with the Index Seek being consistently slightly faster. Like 15 milliseconds vs 16 milliseconds. Larger data volume and more complex searches (like '%' + @Input + '%', to find strings inside a column, instead of just on the leading edge), the speed advantage will be more pronounced.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • pdanes (11/26/2012)


    DiverKas (11/26/2012)


    But wouldn't this always compare all variables, though? In the syntax with OR, the query engine can see that a variable is empty, thereby satisfying one side of the OR condition, and need not even look in the database.

    I am not sure that this is true. This isn't .NET where code branches based on one side of an IF or SWITCH. From what I have seen, the query engine runs both sides and then compares.

    But thats just me.

    I don't know either. In VBA, such a comparison does get all parameters evaluated before the comparison is performed. In the Access Jet engine, it does not - a SWITCH function, for instance, stops evaluating when the first true condition is reached, so you can put in computations that yield a Null or unknown value, if something else stops the eval first, which is not possible in VBA.

    But the SQL Server optimizer is so smart about avoiding unnecessary work, it seems to me that in a situation when an already known value of a local variable can -completely- obviate the need to examine table data, it would most certainly do so. Maybe examining a few query plans would tell the story, but I'm still not very good at reading those, despite many hours spent over tutorials on the subject.

    SQL can indeed do short-circuiting, and sometimes it will. It might also test things in a different order than you've specified. For example, you write "A = B OR C = D", SQL might test "C = D" first, and then skip "A = B", you really can't be sure which, if either, will happen.

    A CASE statement, however, is guaranteed to work in order. So you might try coding it like this:

    WHERE

    (1 = CASE

    WHEN @input_variable = '' THEN 1

    WHEN column_name LIKE @input_variable THEN 1

    ELSE 0 END) AND

    (1 = CASE ... END) AND ...

    No guarantees, but it might be worth trying.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher (11/26/2012)


    pdanes (11/26/2012)


    DiverKas (11/26/2012)


    But wouldn't this always compare all variables, though? In the syntax with OR, the query engine can see that a variable is empty, thereby satisfying one side of the OR condition, and need not even look in the database.

    I am not sure that this is true. This isn't .NET where code branches based on one side of an IF or SWITCH. From what I have seen, the query engine runs both sides and then compares.

    But thats just me.

    I don't know either. In VBA, such a comparison does get all parameters evaluated before the comparison is performed. In the Access Jet engine, it does not - a SWITCH function, for instance, stops evaluating when the first true condition is reached, so you can put in computations that yield a Null or unknown value, if something else stops the eval first, which is not possible in VBA.

    But the SQL Server optimizer is so smart about avoiding unnecessary work, it seems to me that in a situation when an already known value of a local variable can -completely- obviate the need to examine table data, it would most certainly do so. Maybe examining a few query plans would tell the story, but I'm still not very good at reading those, despite many hours spent over tutorials on the subject.

    SQL can indeed do short-circuiting, and sometimes it will. It might also test things in a different order than you've specified. For example, you write "A = B OR C = D", SQL might test "C = D" first, and then skip "A = B", you really can't be sure which, if either, will happen.

    A CASE statement, however, is guaranteed to work in order. So you might try coding it like this:

    WHERE

    (1 = CASE

    WHEN @input_variable = '' THEN 1

    WHEN column_name LIKE @input_variable THEN 1

    ELSE 0 END) AND

    (1 = CASE ... END) AND ...

    No guarantees, but it might be worth trying.

    CASE statement like that in the Where clause guarantees index/table scans.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Have you considered a full text index? I have had very good luck in some cases with issues such as yours. Check it out here and see if it would work for your case.

    http://msdn.microsoft.com/en-us/library/ms345119(v=sql.90).aspx

    It will not always fit your needs, but may be an option to explore.

    Steve Jimmo
    Sr DBA
    “If we ever forget that we are One Nation Under God, then we will be a Nation gone under." - Ronald Reagan

  • sjimmo (11/27/2012)


    Have you considered a full text index? I have had very good luck in some cases with issues such as yours. Check it out here and see if it would work for your case.

    http://msdn.microsoft.com/en-us/library/ms345119(v=sql.90).aspx

    It will not always fit your needs, but may be an option to explore.

    I have never used one of those, but this query can adress derived fields in up to five joined views. I'll take a look at it, but that doesn't seem like the best candidate to start on something I know nothing about. Thanks for the tip, though.

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply