Bug with IN?

Question

Bug with IN?

nycdotnet

Right there with Babe

Points: 799
More actions
May 11, 2015 at 10:45 am

#398438

Update: This is NOT a bug and is the correct behavior due to how SQL handles scoping of column names that are not explicitly qualified with a table alias.
Hello,
I've run into what seems like a bug in SQL Server.
If I use an IN clause with a SELECT statement where the inner query uses a field name that doesn't exist, but DOES exist in the outer query, the statement will succeed by returning "true" for all rows rather than returning an error.
Note that if a field name is used that exists in neither the inner query nor the outer query, an error is raised, and the statement is terminated with "Msg 207, Level 16, State 1 Invalid column name 'the column name'." as I would have expected.
The below script demonstrates the issue. When you run this script, all rows are deleted from #Test_A and SQL Server returns no error. Instead of succeeding (and deleting all rows!!) I would have expected the DELETE command to fail with 'Msg 207, Level 16, State 1 Invalid column name 'ID'.
Does anyone know if this has been reported to Connect previously? It is happening in SQL 2008 and SQL 2014. Is this the correct behavior for some reason? If it sounds like this is really a bug, I'll report it to Connect. Sorry if this has already been reported - I did look around and didn't find anything.
Thanks very much!
BEGIN TRAN
CREATE TABLE #TEST_A (
ID INT NOT NULL PRIMARY KEY,
VALUE VARCHAR(50) NOT NULL
)
CREATE TABLE #TEST_B (
OTHERID INT NOT NULL PRIMARY KEY,
VALUE VARCHAR(50) NOT NULL
)
INSERT INTO #TEST_A (ID, VALUE) VALUES (1,'Test One');
INSERT INTO #TEST_A (ID, VALUE) VALUES (2,'Test Two');
INSERT INTO #TEST_B (OtherID, VALUE) VALUES (1,'Some other test one');
-- Would expect this to throw an error because there is no field named ID in #TEST_B!!!
DELETE FROM #TEST_A WHERE ID IN (SELECT ID FROM #TEST_B)
-- This code will correctly return an error because YetAnotherID is in neither of #TEST_A or #TEST_B
-- DELETE FROM #TEST_A WHERE ID IN (SELECT YetAnotherID FROM #TEST_B)
SELECT * FROM #TEST_A
SELECT * FROM #TEST_B
/*
DROP TABLE #TEST_A
DROP TABLE #TEST_B
*/
ROLLBACK

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 1

It's actually "working as designed". The behavior has to do with sub-queries, in that the subquery has access to the data from both the tables named in the subquery as well as all columns visible in the outer query. This is how you can have correlated subqueries which return results based on the values in a given row of the outer query.

Unfortunately, the weakness you're looking at is not fully qualifying your column names, which essentially allows SQL to resolve it as it sees fit.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

nycdotnet Right there with Babe Points: 799 More actions · Answer 2

That's interesting. If I run this with execution plans turned on it does indeed do a clustered index scan with ID as the predicate from #TEST_A and do a clustered index scan with no predicate from #TEST_B and then do a left semi join. So I guess SQL Server is logically treating it as if I had written this...

DELETE FROM #TEST_A WHERE ID IN (SELECT ID FROM #TEST_B CROSS JOIN #TEST_A)

... which would be true for all ID values in #TEST_A.

Wow this feels like a bug wrapped in a feature. Thanks Matt.

Luis Cazares SSC Guru Points: 183698 More actions · Answer 3

nycdotnet (5/11/2015)
That's interesting. If I run this with execution plans turned on it does indeed do a clustered index scan with ID as the predicate from #TEST_A and do a clustered index scan with no predicate from #TEST_B and then do a left semi join. So I guess SQL Server is logically treating it as if I had written this...
DELETE FROM #TEST_A WHERE ID IN (SELECT ID FROM #TEST_B CROSS JOIN #TEST_A)
... which would be true for all ID values in #TEST_A.
Wow this feels like a bug wrapped in a feature. Thanks Matt.

Technically, it does not do that. You're adding an additional read to table #TEST_A.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 4

nycdotnet (5/11/2015)
Is this the correct behavior for some reason?

It is indeed correct (and documented) behaviour.

Binding order for columns is first to tables within the subquery, if no matching column is found in any table within the subquery, then tables in the outer query are checked. Only if there are no columns in any tables within or outside the subquery that match the column name is an error raised.

Without this, correlated subqueries (like EXISTS) would not work.

This is also why you should always, always, always qualify column names with their tables. If you don't, you're depending on a binding order that most developers don't understand and you're hoping that no changes are made to the tables in the future to cause ambiguous column errors.

DELETE FROM #TEST_A a WHERE a.ID IN (SELECT b.ID FROM #TEST_B b)

will fail with the expected error because there's no column called ID in the table aliased as b

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 5

nycdotnet (5/11/2015)
That's interesting. If I run this with execution plans turned on it does indeed do a clustered index scan with ID as the predicate from #TEST_A and do a clustered index scan with no predicate from #TEST_B and then do a left semi join. So I guess SQL Server is logically treating it as if I had written this...
DELETE FROM #TEST_A WHERE ID IN (SELECT ID FROM #TEST_B CROSS JOIN #TEST_A)

No it's not. It's logically treating it as if you wrote this (which is essentially what you did write)

DELETE FROM #TEST_A a WHERE a.ID IN (SELECT a.ID FROM #TEST_B b)

All tables and their columns in the outer query are visible inside the subquery, if they weren't, then this would not be allowed

DELETE FROM #TEST_A WHERE EXISTS (SELECT 1 FROM #TEST_B b WHERE a.ID = b.ID)

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

nycdotnet Right there with Babe Points: 799 More actions · Answer 6

CELKO (5/11/2015)
Perhaps you would like to actually learn SQL before you post?

Ouch! Nevermind.

Luis Cazares SSC Guru Points: 183698 More actions · Answer 7

nycdotnet (5/12/2015)
CELKO (5/11/2015)
Perhaps you would like to actually learn SQL before you post?
Ouch! Nevermind.

Or you could continue to post more questions to learn more about SQL, even if that means getting some unproductive rants from Mr. Celko. 😀

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

nycdotnet Right there with Babe Points: 799 More actions · Answer 8

nycdotnet

Right there with Babe

Points: 799

May 12, 2015 at 8:48 am

#1797297

I think I need an umbrella.

Thanks Luis.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 9

CELKO (5/11/2015)
>> If I use an IN clause [sic] with a SELECT statement where the inner query uses a field [sic] name that doesn't exist, but DOES exist in the outer query, the statement will succeed by returning TRUE for all rows rather than returning an error. <<
Perhaps you would like to actually learn SQL before you post?
IN() is a search condition, and nothing like clause!
A field is part of a temporal value {YEAR, MONTH, DAY, HOUR MINUTE SECOND}.
But your intimidate big conceptual error is not knowing that that SQL is a block structured language with scoping rules. Your mindset is still back in BASIC, FORTRAN or COBOL.
Like any of the Algol family languages, the search begins locally and goes thru the nested layers. This is a fundamental programming concpet.

Perhaps he IS trying to learn SQL and has found himself thrown into the deep end of the pool and is trying to swim. Not everyone has the opportunity to go to school to learn how to program in SQL or any other language.

I have written code in BASIC, FORTRAN, COBOL, and ALGOL. I also have been gainfully employed for almost 20 years working with T-SQL and MS SQL Server. Having worked in those languages has actually helped me when it comes to working with SQL. Not because I write SQL like it is one of those languages, but because I have learned to harness the power of the language at hand.

I am glad I never had to deal with you when first learning SQL as you would have been of no help at all, unlike many of the heavy hitters here on SSC. Why don't you learn how to be a real mentor to the people who are trying to learn instead of the site bully which is all you are at the moment.

ijaz SSCommitted Points: 1912 More actions · Answer 10

I totally agreed with Lynn Pettis. Criticizing is too easy and everyone can do this. But be a mentor and/or helpful to other is not everyone domain?

cheers

TomThomson SSC Guru Points: 104773 More actions · Answer 11

CELKO (5/11/2015)
But your intimidate big conceptual error is not knowing that that SQL is a block structured language with scoping rules. Your mindset is still back in BASIC, FORTRAN or COBOL.

Actually no, his error is not realising that SQL is a block structured language with utterly idiotic scoping rules.

Like any of the Algol family languages, the search begins locally and goes thru the nested layers. This is a fundamental programming concpet.

Well, that's accurate, but I'm amazed to see you admitting that SQL is a language that uses procedural language scoping rules, not a language that makes an attempt to have the syntax reflect the semantics.

Although this is very clearly documented it sits badly with the idea that a relational system would make it easier to understand what is going on.

I was quite unhappy with this sort of thing a long time ago, and actually took the trouble to specify two different reference syntaxes to reflect two diffent semantics (one which conformed to the traditional procedural language style, and one which didn't - despite the language being absolutely procedural) when I wrote the NAL language definition, subsequently adopted by CTL, ICL, and some others as the language used for system software for systems based on CTL hardware. I can't fault SQL for not supplying both options, because it does: but I reckon it got the options the wrong way round, just as I did a decade or so earlier - the default should be local.

Tom

nycdotnet Right there with Babe Points: 799 More actions · Answer 12

Ah - it's over folks. I've made cranky comments on forums before too. And despite using SQL Server since version 7 (though mostly as an evil dev) this is certainly not the last dumb question I'll ask, so any encouragement otherwise may fall on deaf ears. 🙂

Thanks again.

DonlSimpson SSCertifiable Points: 7043 More actions · Answer 13

But your intimidate big conceptual error is not knowing that that SQL is a block structured language with scoping rules. Your mindset is still back in BASIC, FORTRAN or COBOL.

I gotta say I don't understand the use of the word "intimidate" used as an adjective here.

Don Simpson

I'm not sure about Heisenberg.