T-SQL not throwing error for invalid column name in a subselect

Question

T-SQL not throwing error for invalid column name in a subselect

MBroos

SSC Enthusiast

Points: 172
More actions
February 14, 2020 at 5:51 am

#3725823

I have a situation where SQL Server should throw an error but, instead, returns an incorrect result. I simplified it down to this:
CREATE TABLE #PROVINCES (ProvinceCode CHAR(2), ProvinceName VARCHAR(200))
CREATE TABLE #CITIES (ProvinceCode CHAR(2), CityName VARCHAR(100) )
INSERT INTO #PROVINCES VALUES ('NY','New York')
INSERT INTO #PROVINCES VALUES ('FL','Florida')
INSERT INTO #PROVINCES VALUES ('BC','British Columbia')
INSERT INTO #PROVINCES VALUES ('ON','Ontario')
INSERT INTO #CITIES VALUES ('FL','Miami')
INSERT INTO #CITIES VALUES ('NY','New York')
INSERT INTO #CITIES VALUES ('BC','Manitoba')
INSERT INTO #CITIES VALUES ('ON','Toronto')
SELECT * FROM #CITIES WHERE ProvinceCode IN (SELECT ProvinceCode FROM #PROVINCES WHERE ProvinceName = 'Florida')
SELECT * FROM #CITIES WHERE ProvinceCode IN (SELECT ProvinceCode FROM #PROVINCES WHERE CityName = 'Toronto')
The first SELECT returns "Miami", as it should. The second SELECT does NOT throw an error, even though "CityName" is not a column in the #PROVINCES table. Instead, it incorrectly returns "Toronto". This only happens when the invalid column name in the subselect is a valid column name in the outer table.
Does anyone understand what is going on here? This is insidious. A syntax error in the query causes the query to return an incorrect result, instead of throwing an error. There is no way to detect this syntax error unless you rigorously cross-check your query results and find something that does not make any sense.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 1

That is a simple question to answer. The reason it doesn't throw an error in the second query is because of how SQL Server works with subqueries. Since you are not using table aliases in your queries, since SQL Server cannot find the column name specified in the subquery it looks to the outside query for the column name which it finds.

This is one of the reasons you should always use table aliases. You would catch the error if your code looked like this:

-- This query will work
SELECT
  c.*
FROM
  #CITIES AS c
WHERE
  c.ProvinceCode IN (SELECT
                         p.ProvinceCode
                   FROM
                         #PROVINCES AS p
   WHERE
                         p.ProvinceName = 'Florida');

-- The following query will error
SELECT
  c.*
FROM
  #CITIES AS c
WHERE
  c.ProvinceCode IN (SELECT
                       p.ProvinceCode
                     FROM
                       #PROVINCES p
                     WHERE
                       p.CityName = 'Toronto');

MVDBA (Mike Vessey) SSC-Insane Points: 21797 More actions · Answer 2

I did not know this... are there any ms documents around it?

MVDBA

Jonathan AC Roberts SSCoach Points: 18407 More actions · Answer 3

Your query is equivalent to this:

SELECT *
  FROM #CITIES
 WHERE ProvinceCode IN (SELECT ProvinceCode
                          FROM #PROVINCES
                         WHERE #CITIES.CityName = 'Toronto')

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 4

MVDBA (Mike Vessey) wrote:

I did not know this... are there any ms documents around it?

Not that I've seen. The observed behaviour is necessary to support correlated subqueries, where columns from both inside and outside of the subquery have to be visible. As Lynn has pointed out, you only really fall foul of the OP's example if you fail to use table aliases.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Jonathan AC Roberts SSCoach Points: 18407 More actions · Answer 5

In situations like this SQL Server will use the most local column it can find if there are no aliases. So if the column name is present on both #PROVINCES and #CITIES it would pick the column on #PROVINCES, but if it doesn't exist on #PROVINCES it will use the one in #CITIES. I can't find the documentation that states this but I have definitely seen it documented. Good advice by everyone else to always alias columns.

Sue_H SSC Guru Points: 90860 More actions · Answer 6

Jonathan AC Roberts wrote:

In situations like this SQL Server will use the most local column it can find if there are no aliases. So if the column name is present on both #PROVINCES and #CITIES it would pick the column on #PROVINCES, but if it doesn't exist on #PROVINCES it will use the one in #CITIES. I can't find the documentation that states this but I have definitely seen it documented. Good advice by everyone else to always alias columns.

I knew I read it too and had to go find it. It's in the Qualifying column names in subqueries section in the Subqueries documentation:

Subqueries

Sue

Michael L John One Orange Chip Points: 27077 More actions · Answer 7

If you had written the query like this the error would have appeared.

SELECT * 
FROM #CITIES C
INNER JOIN #PROVINCES P ON C.ProvinceCode - P.ProvinceCode
AND C.CityName = 'Toronto'

Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/

T-SQL not throwing error for invalid column name in a subselect

Cookies on SQLServerCentral