Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12

Concat columns in Where clause Expand / Collapse
Author
Message
Posted Tuesday, November 12, 2013 1:24 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 2:48 AM
Points: 6,841, Visits: 13,359
@Eric:
From my point of view, every approach will lead to the same execution plan including the table/index scan (including the MERGE approach).

A rather dirty approach would be a separate column storing the Id of the lookup table and have a filtered index for empty columns. However, this would not only violate any normalization rule, it would also add the additional effort to cascade any change of the lookup table (e.g. DELETE or UPDATE).
But this could be managed by foreign key references and cascade of update/delete (CASCADE for update and SET NULL for delete).

It depends on the given scenario whether I would go down the "denormalized path" or not. But it's an option to consider. At least from my point of view.




Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question
How to post performance related questions
Links for Tally Table , Cross Tabs and Dynamic Cross Tabs , Delimited Split Function
Post #1513618
Posted Wednesday, November 13, 2013 7:36 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Yesterday @ 12:14 PM
Points: 1,706, Visits: 4,848
LutzM (11/12/2013)
@Eric:
From my point of view, every approach will lead to the same execution plan including the table/index scan (including the MERGE approach).

A rather dirty approach would be a separate column storing the Id of the lookup table and have a filtered index for empty columns. However, this would not only violate any normalization rule, it would also add the additional effort to cascade any change of the lookup table (e.g. DELETE or UPDATE).
But this could be managed by foreign key references and cascade of update/delete (CASCADE for update and SET NULL for delete).

It depends on the given scenario whether I would go down the "denormalized path" or not. But it's an option to consider. At least from my point of view.

LEFT JOIN vs WHERE NOT IN () vs EXCEPT can potentially yield different execution plans. You totally don't know until you unit test each variation.
Post #1513892
Posted Wednesday, November 13, 2013 7:57 AM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 3:50 AM
Points: 40,180, Visits: 36,585
Jonathan Marshall (11/12/2013)
The queries are updating lookup tables that do not have a value by using a subquery to the lookup table.
I like that idea of using a computed column as these will be scanning through lots of data.

WHERE [column1] + [column2] NOT IN
(SELECT [column3] + [column4] from dbo.table order by [column1] + [column2])


WHERE NOT EXISTS (SELECT 1 FROM dbo.table t WHERE t.[column3] = [column1] and t.[column4] = [column2])



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #1513904
Posted Wednesday, November 13, 2013 10:13 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 2:48 AM
Points: 6,841, Visits: 13,359
GilaMonster (11/13/2013)
Jonathan Marshall (11/12/2013)
The queries are updating lookup tables that do not have a value by using a subquery to the lookup table.
I like that idea of using a computed column as these will be scanning through lots of data.

WHERE [column1] + [column2] NOT IN
(SELECT [column3] + [column4] from dbo.table order by [column1] + [column2])


WHERE NOT EXISTS (SELECT 1 FROM dbo.table t WHERE t.[column3] = [column1] and t.[column4] = [column2])


By guessing there's a reason for performing the concatenation before the comparison I don't think this solution would provide the same result as the original query.
Example: [column1] ='ab', [column2] ='cd', [column3] ='a' and [column4] ='bcd'
The original query would exclude this row since [column1] + [column2] = [column3] + [column4]) . Your query would consider the comparison not being equal.

The question is: What business logic is intended?




Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question
How to post performance related questions
Links for Tally Table , Cross Tabs and Dynamic Cross Tabs , Delimited Split Function
Post #1513975
Posted Wednesday, November 13, 2013 10:22 AM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 3:50 AM
Points: 40,180, Visits: 36,585
LutzM (11/13/2013)
GilaMonster (11/13/2013)
Jonathan Marshall (11/12/2013)
The queries are updating lookup tables that do not have a value by using a subquery to the lookup table.
I like that idea of using a computed column as these will be scanning through lots of data.

WHERE [column1] + [column2] NOT IN
(SELECT [column3] + [column4] from dbo.table order by [column1] + [column2])


WHERE NOT EXISTS (SELECT 1 FROM dbo.table t WHERE t.[column3] = [column1] and t.[column4] = [column2])


By guessing there's a reason for performing the concatenation before the comparison I don't think this solution would provide the same result as the original query.


I wouldn't be so sure. I've seen 'concatenate and IN' very often, done that way because IN is what the developer knows and concatenate is the only way he can figure out to do an IN on two columns.



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #1513981
Posted Wednesday, November 13, 2013 11:04 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 2:48 AM
Points: 6,841, Visits: 13,359
I'm not saying either of the solutions is right or wrong. Both are just different in terms of the final result. They're just not equivalent.

Whether the current approach works just because of plain luck (or the current data distribution) or if it's really intended needs to get clarified.

But for the taks itself (taking aside the concatenation) your preferred solution is NOT EXISTS?

Is this "usually" the fastest solution for a large table against a small lookup table or is this your preferred method to start with?




Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question
How to post performance related questions
Links for Tally Table , Cross Tabs and Dynamic Cross Tabs , Delimited Split Function
Post #1513994
Posted Wednesday, November 13, 2013 11:42 AM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 3:50 AM
Points: 40,180, Visits: 36,585
LutzM (11/13/2013)
But for the taks itself (taking aside the concatenation) your preferred solution is NOT EXISTS?

Is this "usually" the fastest solution for a large table against a small lookup table or is this your preferred method to start with?


On non-nullable columns, NOT IN and NOT EXISTS are usually equivalent. When the columns are nullable, NOT IN performs terribly and can produce different results. EXISTS makes it easier to compare multiple columns without making the predicate non-SARGable
I also find EXISTS easier to read, but that's a personal thing.



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #1514010
« Prev Topic | Next Topic »

Add to briefcase ««12

Permissions Expand / Collapse