Concat columns in Where clause

  • Hello everyone,

    I'm searching to see how performance works on columns concat in the where clause.

    What are the search / performance ramifications?

    Any suggestions?

    WHERE

    [column1] + [column2]

  • Jonathan Marshall (11/12/2013)


    Hello everyone,

    I'm searching to see how performance works on columns concat in the where clause.

    What are the search / performance ramifications?

    Any suggestions?

    WHERE

    [column1] + [column2]

    That would become a nonSARGable predicate. The value will have to be calculated for every single row in the result set. To make things even worse you will likely end up wrapping each column with an ISNULL because if either column is NULL the concatenation will be null. Perhaps if you can provide more details we can help come up with a solution that will work and be fast.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Very interesting I did not think of that at all.

    The query was passed on to me to call from a package.

    I will have to ask the creator more questions but it seems as they are trying to create unique values by not only using the concat in the WHERE clause but also in the SELECT statement.

    The statements are used to update and insert into look-up tables.

  • Your question is incomplete since ther WHERE clause is missing the "= something" part. But that's just as a side note...

    If my limited knowledge is correct, you'll end up with an index or table scan since there's a function applied to both columns (either string concatenation or to sum both values).

    Here's a simple example:

    WHERE [column1] + [column2] ='abc'

    In order to figure out which rows would qualify SQL Server would need to compare all values of column1 that would either be empty (but not NULL), 'a', 'ab', or 'abc' together with each value of column2 either being empty (but not NULL), 'a', 'ab', or 'abc' and finally eliminate all rows not matching [column1] + [column2] ='abc'. It's even more complicated if we're talking about numeric values...

    I guess, SQL Server would simply perform a table or index scan.

    My preferred solution would be a computed, persisted, indexed column added to the table (if the number of queries and /or executions using such a where clause is significant). It depends 😉

    PS: I'm sure Gail (or any other one of the gurus available) will provide a much more reliable answer...



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • The queries are updating lookup tables that do not have a value by using a subquery to the lookup table.

    I like that idea of using a computed column as these will be scanning through lots of data.

    WHERE [column1] + [column2] NOT IN

    (SELECT [column3] + [column4] from dbo.table order by [column1] + [column2])

  • In addition to what others have said another ramification of what you seek (pun intended!!) to do will also prevent the optimizer from getting good estimates on rows to be hit. So you will also wind up with bad query plans to boot.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Jonathan Marshall (11/12/2013)


    The queries are updating lookup tables that do not have a value by using a subquery to the lookup table.

    I like that idea of using a computed column as these will be scanning through lots of data.

    WHERE [column1] + [column2] NOT IN

    (SELECT [column3] + [column4] from dbo.table order by [column1] + [column2])

    Step 1: remove the ORDER BY, it's useless.

    Step 2: add a computed, persisted, indexed column on your source table

    Step 3: add a computed, persisted, indexed column on your lookup table

    Step 4: final query:

    WHERE tableY.computedColumns1and2 NOT IN

    (SELECT table.computedColumns3and4 from dbo.table)

    Unfortunately, you'll usually still end up with an index scan on the large table since each and every value needs to be verified (except for the source table holding values of 1 .. 9 and the lookup table only holds 1 .. 2 (as a basic example), then there might be a chance of an index seek...

    My approach would be an index as narrow as possible with just the computed column being a key column and the other values required being used as included columns.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Wow that is not good as the size of data will get quite large.

    Thank you for the feedback. I will have to continue to research the options as far as a computed columns etc....

    I will update with chosen path or changes.

    Thanks again....

    Jonathan

  • Jonathan Marshall (11/12/2013)


    The queries are updating lookup tables that do not have a value by using a subquery to the lookup table.

    I like that idea of using a computed column as these will be scanning through lots of data.

    WHERE [column1] + [column2] NOT IN

    (SELECT [column3] + [column4] from dbo.table order by [column1] + [column2])

    Now I understand why you're wanting concatenate columns in WHERE clause.

    First, I don't understand why you have an ORDER BY clause in that sub-select. I'm surprised that would even parse.

    When I have a similar situation to select where two or more columns in one set don't intersect with two or more columns in another set, I typically do an outer join. You can experiment to see which performs better.

    select ...

    from TableA

    left join TableB on column1 = column3 and column2 = column4

    where column3 is null;

    Also, if basically all you need is to return column1 and column2 from one set where not contained in column3 and column4 of another set, then you can use the EXCEPT clause. It's essentially the reverse of UNION clause, because you're taking 2nd set away from 1st set rather than unionizing them.

    SELECT column1, column2 from TableA

    EXCEPT

    SELECT column3, column4 from TableB;

    If these sets will be large, then you'll probably benefit from indexing (column1, column2) and (column3, column4).

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Jonathan Marshall (11/12/2013)


    The queries are updating lookup tables that do not have a value by using a subquery to the lookup table.

    I like that idea of using a computed column as these will be scanning through lots of data.

    WHERE [column1] + [column2] NOT IN

    (SELECT [column3] + [column4] from dbo.table order by [column1] + [column2])

    If updating one table from another, then consider MERGE statement.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • @eric:

    From my point of view, every approach will lead to the same execution plan including the table/index scan (including the MERGE approach).

    A rather dirty approach would be a separate column storing the Id of the lookup table and have a filtered index for empty columns. However, this would not only violate any normalization rule, it would also add the additional effort to cascade any change of the lookup table (e.g. DELETE or UPDATE).

    But this could be managed by foreign key references and cascade of update/delete (CASCADE for update and SET NULL for delete).

    It depends on the given scenario whether I would go down the "denormalized path" or not. But it's an option to consider. At least from my point of view.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • LutzM (11/12/2013)


    @Eric:

    From my point of view, every approach will lead to the same execution plan including the table/index scan (including the MERGE approach).

    A rather dirty approach would be a separate column storing the Id of the lookup table and have a filtered index for empty columns. However, this would not only violate any normalization rule, it would also add the additional effort to cascade any change of the lookup table (e.g. DELETE or UPDATE).

    But this could be managed by foreign key references and cascade of update/delete (CASCADE for update and SET NULL for delete).

    It depends on the given scenario whether I would go down the "denormalized path" or not. But it's an option to consider. At least from my point of view.

    LEFT JOIN vs WHERE NOT IN () vs EXCEPT can potentially yield different execution plans. You totally don't know until you unit test each variation.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Jonathan Marshall (11/12/2013)


    The queries are updating lookup tables that do not have a value by using a subquery to the lookup table.

    I like that idea of using a computed column as these will be scanning through lots of data.

    WHERE [column1] + [column2] NOT IN

    (SELECT [column3] + [column4] from dbo.table order by [column1] + [column2])

    WHERE NOT EXISTS (SELECT 1 FROM dbo.table t WHERE t.[column3] = [column1] and t.[column4] = [column2])

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (11/13/2013)


    Jonathan Marshall (11/12/2013)


    The queries are updating lookup tables that do not have a value by using a subquery to the lookup table.

    I like that idea of using a computed column as these will be scanning through lots of data.

    WHERE [column1] + [column2] NOT IN

    (SELECT [column3] + [column4] from dbo.table order by [column1] + [column2])

    WHERE NOT EXISTS (SELECT 1 FROM dbo.table t WHERE t.[column3] = [column1] and t.[column4] = [column2])

    By guessing there's a reason for performing the concatenation before the comparison I don't think this solution would provide the same result as the original query.

    Example: [column1] ='ab', [column2] ='cd', [column3] ='a' and [column4] ='bcd'

    The original query would exclude this row since [column1] + [column2] = [column3] + [column4]) . Your query would consider the comparison not being equal.

    The question is: What business logic is intended?



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • LutzM (11/13/2013)


    GilaMonster (11/13/2013)


    Jonathan Marshall (11/12/2013)


    The queries are updating lookup tables that do not have a value by using a subquery to the lookup table.

    I like that idea of using a computed column as these will be scanning through lots of data.

    WHERE [column1] + [column2] NOT IN

    (SELECT [column3] + [column4] from dbo.table order by [column1] + [column2])

    WHERE NOT EXISTS (SELECT 1 FROM dbo.table t WHERE t.[column3] = [column1] and t.[column4] = [column2])

    By guessing there's a reason for performing the concatenation before the comparison I don't think this solution would provide the same result as the original query.

    I wouldn't be so sure. I've seen 'concatenate and IN' very often, done that way because IN is what the developer knows and concatenate is the only way he can figure out to do an IN on two columns.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply