T-SQL not throwing error for invalid column name in a subselect

  • MBroos

    SSC Enthusiast

    Points: 112

    I have a situation where SQL Server should throw an error but, instead, returns an incorrect result.  I simplified it down to this:

    CREATE TABLE #PROVINCES (ProvinceCode CHAR(2), ProvinceName VARCHAR(200))

    CREATE TABLE #CITIES (ProvinceCode CHAR(2), CityName VARCHAR(100) )

    INSERT INTO #PROVINCES VALUES ('NY','New York')

    INSERT INTO #PROVINCES VALUES ('FL','Florida')

    INSERT INTO #PROVINCES VALUES ('BC','British Columbia')

    INSERT INTO #PROVINCES VALUES ('ON','Ontario')

    INSERT INTO #CITIES VALUES ('FL','Miami')

    INSERT INTO #CITIES VALUES ('NY','New York')

    INSERT INTO #CITIES VALUES ('BC','Manitoba')

    INSERT INTO #CITIES VALUES ('ON','Toronto')

    SELECT * FROM #CITIES WHERE ProvinceCode IN (SELECT ProvinceCode FROM #PROVINCES WHERE ProvinceName = 'Florida')

    SELECT * FROM #CITIES WHERE ProvinceCode IN (SELECT ProvinceCode FROM #PROVINCES WHERE CityName = 'Toronto')

    The first SELECT returns "Miami", as it should.  The second SELECT does NOT throw an error, even though "CityName" is not a column in the #PROVINCES table.  Instead, it incorrectly returns "Toronto".  This only happens when the invalid column name in the subselect is a valid column name in the outer table.

    Does anyone understand what is going on here?  This is insidious.  A syntax error in the query causes the query to return an incorrect result, instead of throwing an error.  There is no way to detect this syntax error unless you rigorously cross-check your query results and find something that does not make any sense.

  • Lynn Pettis

    SSC Guru

    Points: 442312

    That is a simple question to answer.  The reason it doesn't throw an error in the second query is because of how SQL Server works with subqueries.  Since you are not using table aliases in your queries, since SQL Server cannot find the column name specified in the subquery it looks to the outside query for the column name which it finds.

    This is one of the reasons you should always use table aliases.  You would catch the error if your code looked like this:

    -- This query will work
    SELECT
    c.*
    FROM
    #CITIES AS c
    WHERE
    c.ProvinceCode IN (SELECT
    p.ProvinceCode
    FROM
    #PROVINCES AS p
    WHERE
    p.ProvinceName = 'Florida');

    -- The following query will error
    SELECT
    c.*
    FROM
    #CITIES AS c
    WHERE
    c.ProvinceCode IN (SELECT
    p.ProvinceCode
    FROM
    #PROVINCES p
    WHERE
    p.CityName = 'Toronto');
  • MVDBA (Mike Vessey)

    SSC-Insane

    Points: 21757

    I did not know this... are there any ms documents around it?

    MVDBA

  • Jonathan AC Roberts

    SSCoach

    Points: 17273

    Your query is equivalent to this:

    SELECT *
    FROM #CITIES
    WHERE ProvinceCode IN (SELECT ProvinceCode
    FROM #PROVINCES
    WHERE #CITIES.CityName = 'Toronto')
  • ChrisM@Work

    SSC Guru

    Points: 186094

    MVDBA (Mike Vessey) wrote:

    I did not know this... are there any ms documents around it?

     

    Not that I've seen. The observed behaviour is necessary to support correlated subqueries, where columns from both inside and outside of the subquery have to be visible. As Lynn has pointed out, you only really fall foul of the OP's example if you fail to use table aliases.

    [font="Arial"]“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw[/font]


    For fast, accurate and documented assistance in answering your questions, please read this article[/url].
    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]
    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
    [url

  • Jonathan AC Roberts

    SSCoach

    Points: 17273

    In situations like this SQL Server will use the most local column it can find if there are no aliases. So if the column name is present on both #PROVINCES and #CITIES it would pick the column on #PROVINCES, but if it doesn't exist on #PROVINCES it will use the one in #CITIES. I can't find the documentation that states this but I have definitely seen it documented.  Good advice by everyone else to always alias columns.

  • Sue_H

    SSC Guru

    Points: 90671

    Jonathan AC Roberts wrote:

    In situations like this SQL Server will use the most local column it can find if there are no aliases. So if the column name is present on both #PROVINCES and #CITIES it would pick the column on #PROVINCES, but if it doesn't exist on #PROVINCES it will use the one in #CITIES. I can't find the documentation that states this but I have definitely seen it documented.  Good advice by everyone else to always alias columns.

    I knew I read it too and had to go find it. It's in the Qualifying column names in subqueries section in the Subqueries documentation:

    Subqueries

    Sue

  • Michael L John

    One Orange Chip

    Points: 25917

    If you had written the query like this the error would have appeared.

    SELECT * 
    FROM #CITIES C
    INNER JOIN #PROVINCES P ON C.ProvinceCode - P.ProvinceCode
    AND C.CityName = 'Toronto'

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply