View Column Resolving Issue in Sub-Query

  • peterhe

    SSChampion

    Points: 11363

    Comments posted to this topic are about the item View Column Resolving Issue in Sub-Query

  • Hugo Kornelis

    SSC Guru

    Points: 64645

    Hi Peter,

    It's a nice theory, but unfortunately (or maybe I should say: fortunately), completely incorrect.

    IF NOT EXISTS(SELECT 1 FROM sys.server_principals b

    WHERE b.sid=(SELECT [sid] FROM sys.databases a WHERE a.name=DB_NAME()))

    The reason that this works has nothing to do with the fact that sys.databases is a view, or that sid is a column in one of the base tables. It's related to how unqualified names in a subquery are resolved - SQL Server will first check for a column named sid in the inner query, and if it doesn't find one, it checks the outer query. This query is therefor equivalent to

    IF NOT EXISTS(SELECT 1 FROM sys.server_principals b

    WHERE b.sid=(SELECT b.[sid] FROM sys.databases a WHERE a.name=DB_NAME()))

    It's a shame that you didn't check your theory more carefully before submitting it to SQLServerCentral.com, and it's also a shame that Steve didn't check your claims before publishing. One very easy wway to check it would have been to replace [sid] with the name of a different column that is also aliased in the sys.databases view, like this:

    IF NOT EXISTS(SELECT 1 FROM sys.server_principals b

    WHERE b.sid=(SELECT [color="#FF0000"][b]indepid[/b][/color] FROM sys.databases a WHERE a.name=DB_NAME()))

    Msg 207, Level 16, State 1, Line 2

    Invalid column name 'indepid'.

    If your theory would have held, SQL Server would've used the indepid column from sys.syssingleobjref, which is aliased as source_database_id in thhe sys.databases view, instead of generating an invalid column name error.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • RichB

    SSCrazy Eights

    Points: 9651

    Thanks Hugo for a very concise and informative post, far better than that which I was going to attempt!

    It would be nice if the initial article could be updated with at least a caveat.

    Happy new year btw...

  • Jambu Krishnamurthy

    SSCrazy

    Points: 2054

    Thank you very much SSCCommitted for that correction.

    It was too scary, and I was almost about to talk to my DBA friends

    in different companies. That would have not been a good start with

    my friends on a New Year.

    Before talking to my friends, I just wanted to ask in the forum about

    what was the behavior in Oracle. And Luckily I had you reply ready

    to quell all dounts.

    thankz

    again and Have a New Year

  • Robert Davis

    One Orange Chip

    Points: 28027

    I have to agree with the others. I think blaming SQL Server because someone doesn't code something correctly is just plain illogical. Using that logic, every mistake I ever made (and there have certainly been mistakes) could be called a SQL Server bug.


    My blog: SQL Soldier[/url]
    SQL Server Best Practices:
    SQL Server Best Practices
    Twitter: @SQLSoldier
    My book: Pro SQL Server 2008 Mirroring[/url]
    Microsoft Certified Master: SQL Server, Data Platform MVP
    Database Engineer at BlueMountain Capital Management[/url]

  • Steve Jones - SSC Editor

    SSC Guru

    Points: 716562

    Hopefully Peter will respond soon to this and thanks for a nice post on what you see, Hugo.

    We don't necessarily check every article for accuracy. It's a time constraint as well as the fact that I don't necessarily have the expertise to check every section of every article,QOD, etc. It's a good learning experience and people sometimes find things that they feel are correct or it's what they have observed. We want to give them their voice and let them show the world from their perspective.

    We have had lots of debates and often I wouldn't necessarily say that any particular side is correct. It's often a question of where something applies or how it might fit your situation. I do think that Hugo seems to be correct in this case, but I haven't dug into it today to see.

  • Remaks

    SSC Enthusiast

    Points: 171

    I wish I'd read the discussion before going down the same road Hugo did to prove what I believed I saw up front. I've made the mistake too many times of including an unqualified, invalid inner query field that was a valid outer query field and getting unexpected results, but not an error. I felt that before I responded I should verify that what I thought I saw was in fact the error I believed it to be. That was a waste of 5 minutes. (So I figured I'd waste a few more actually posting this response) 🙂

  • Ian Yates

    SSCoach

    Points: 19738

    The article should really be removed or altered - it makes the site look bad 🙁

  • Alexander Kuznetsov

    SSCrazy

    Points: 2217

    The best practice is to always prefix column names, which prevents such errors.

    Cheers,

    Alex Kuznetsov,

    SQL Server MVP

  • Alexander Kuznetsov

    SSCrazy

    Points: 2217

    Also Steve: I think Hugo is absolutely correct.

    Cheers,

    Alex Kuznetsov,

    SQL Server MVP

  • jimmy-481471

    Grasshopper

    Points: 20

    Hi,

    Hugo is very much right. I would like you to go through the correlated subqueries in case if you want any further details on this.

    cheers,

    Jimz.

  • peterhe

    SSChampion

    Points: 11363

    I am on vacation till Jan.14. I have no reach to any SQL server and have very limited access to internet during my vacation. So I am sorry for the delay to the reply of the comments.

    1) I don't think it's necessary to scope the column in the subquery. Actually you can only select the columns from the tables in the FROM subclause in the sub-query. It does not make sense to select columns from the tables refereneced by the main query. What result set are you expecting from a query like:

    SELECT * FROM sys.server_principals b WHERE b.sid=(SELECT b.[sid] FROM sys.databases a WHERE a.name=DB_NAME())

    2) Since I cannot reach SQL Server, can anyone run the following and let me what you get:

    CREATE TABLE Ta (Ca1 int not null, Ca2 int not null)

    GO

    CREATE TABLE Tb (Cb1 int not null, Cb2 int not null)

    GO

    INSERT Ta (Ca1,Ca2) VALUES (1,1)

    INSERT Ta (Ca1,Ca2) VALUES (2,2)

    INSERT Tb (Cb1,Cb2) VALUES (2,2)

    INSERT Tb (Cb1,Cb2) VALUES (3,3)

    If we make a mistake in the following query

    SELECT * FROM Ta WHERE Ca1 IN (SELECT Cb1 FROM Tb WHERE Cb2=2)

    and write it as:

    SELECT * FROM Ta WHERE Ca1 IN (SELECT Ca1 FROM Tb WHERE Cb2=2)

    What result you get? Does SQL Server report error?

    3) As I said in the article, it's talking about a scenario when a developer makes a mistake and SQL Server does not report the problem. If SQL server impliments the sub-query as a join (as mentioned by Hugo), it's an even bigger issue. The sub-query returns a scalar value in the example. It has nothing to do with the main query.

  • Hugo Kornelis

    SSC Guru

    Points: 64645

    peterhe (1/4/2008)


    1) I don't think it's necessary to scope the column in the subquery. Actually you can only select the columns from the tables in the FROM subclause in the sub-query.

    That is not true. You can also reference columns from the outer query. That is called a correlated subquery, and many people use them on a daily basis.

    It does not make sense to select columns from the tables refereneced by the main query. What result set are you expecting from a query like:

    SELECT * FROM sys.server_principals b WHERE b.sid=(SELECT b.[sid] FROM sys.databases a WHERE a.name=DB_NAME())

    Exactly one row in sys.databases will match the WHERE clause. For this row, b.[sid] will be taken from the outer query and used as the result of the subquey. This is of course always equal to b.sid, so the WHERE clause is True for each and every row in sys.server_principals. The result set I expect from this query would be every single row in sys.server_principals.

    2) Since I cannot reach SQL Server, can anyone run the following and let me what you get:

    CREATE TABLE Ta (Ca1 int not null, Ca2 int not null)

    GO

    CREATE TABLE Tb (Cb1 int not null, Cb2 int not null)

    GO

    INSERT Ta (Ca1,Ca2) VALUES (1,1)

    INSERT Ta (Ca1,Ca2) VALUES (2,2)

    INSERT Tb (Cb1,Cb2) VALUES (2,2)

    INSERT Tb (Cb1,Cb2) VALUES (3,3)

    If we make a mistake in the following query

    SELECT * FROM Ta WHERE Ca1 IN (SELECT Cb1 FROM Tb WHERE Cb2=2)

    and write it as:

    SELECT * FROM Ta WHERE Ca1 IN (SELECT Ca1 FROM Tb WHERE Cb2=2)

    What result you get? Does SQL Server report error?

    The first query returns this (as expected):

    Ca1 Ca2

    ----------- -----------

    2 2

    The second query returns this, as expected by me (but probably not by you):

    Ca1 Ca2

    ----------- -----------

    1 1

    2 2

    3) As I said in the article, it's talking about a scenario when a developer makes a mistake and SQL Server does not report the problem. If SQL server impliments the sub-query as a join (as mentioned by Hugo), it's an even bigger issue. The sub-query returns a scalar value in the example. It has nothing to do with the main query.

    SQL Server doesn't necessarily implement the subquery as a join (though that is, in fact, exactly what the optimizer will do in many cases - but it's irrelevant here). It assumes that you wanted to write a correlated subquery, since you are referencing columns from the outer query in the subquery. I'd love there to be a provision that references to outer queries always require explicit prefixing of table name or table alias, but such a provision is unfortunately not in any of the SQL standards that I know of (SQL-92, SQL-1999, and SQL-2003), and is not implemented by Microsoft either.

    In fact, I'd put it one step further and say that I'd prefer it to be required to prefix each column in every query that uses more than a single table. That would prevent lots of those pesky, hard to find bugs caused by typos.

    Thanks for taking the time to respond. Enjoy the rest of your holiday, I'll still be here, ready to read your reply, when you get back.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • RichB

    SSCrazy Eights

    Points: 9651

    For what its worth, I remembered a thread about this on google a while back.

    I think it was this one, if this link works!

    Covers the main points of the argument in detail - rather than anyone spend too much time trying to figure it out.

  • peterhe

    SSChampion

    Points: 11363

    Thanks Hugo for your clarification and thanks Richard for the link.

    I do use correlated subqueries in a daily basis but in my mind, the outer table can only be referenced in the WHERE clause in the subquery, not in the SELECT list of the subquery.

    1) I checked BOL about subquery, it has a Caution section in the end:

    " If a column is referenced in a subquery that does not exist in the table referenced by the subquery's FROM clause, but exists in a table referenced by the outer query's FROM clause, the query executes without error. SQL Server implicitly qualifies the column in the subquery with the table name in the outer query."

    So I can't blame MS. It does a good job to document what SQL Server does in this scenario.

    2) I also checked Oracle (10g for Windows, express edition), its behaviour is the same as MS SQL Server.

    So two major RDBMS are handling this in the same way. I am not sure this is defined by SQL-99/2003 spec or not.

    My apologize to everyone if the article caused any confusion. However, frankly i am not fully convinced. The subquery that select a column from the outer table is cumbersome and confusing in terms of semantics. It always has more concise and clear way for this type of queries. To me, it brings in more troubles than a feature. And I guess that's why MS put a caution section in the BOL.

    I haven't used and will not use this type of correlated subqueries. To avoid the typos or mistakes, I'll follow Hugo's suggestion to scope the referenced columns.

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply