View Column Resolving Issue in Sub-Query

Question

Post reply

View Column Resolving Issue in Sub-Query

peterhe

SSChampion

Points: 11469
More actions
January 1, 2008 at 11:02 pm

#181716

Comments posted to this topic are about the item View Column Resolving Issue in Sub-Query

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply

Hugo Kornelis SSC Guru Points: 64789 More actions · Answer 1

Hi Peter,

It's a nice theory, but unfortunately (or maybe I should say: fortunately), completely incorrect.

IF NOT EXISTS(SELECT 1 FROM sys.server_principals b

WHERE b.sid=(SELECT [sid] FROM sys.databases a WHERE a.name=DB_NAME()))

The reason that this works has nothing to do with the fact that sys.databases is a view, or that sid is a column in one of the base tables. It's related to how unqualified names in a subquery are resolved - SQL Server will first check for a column named sid in the inner query, and if it doesn't find one, it checks the outer query. This query is therefor equivalent to

IF NOT EXISTS(SELECT 1 FROM sys.server_principals b

WHERE b.sid=(SELECT b.[sid] FROM sys.databases a WHERE a.name=DB_NAME()))

It's a shame that you didn't check your theory more carefully before submitting it to SQLServerCentral.com, and it's also a shame that Steve didn't check your claims before publishing. One very easy wway to check it would have been to replace [sid] with the name of a different column that is also aliased in the sys.databases view, like this:

IF NOT EXISTS(SELECT 1 FROM sys.server_principals b

WHERE b.sid=(SELECT [color="#FF0000"][b]indepid[/b][/color] FROM sys.databases a WHERE a.name=DB_NAME()))

Msg 207, Level 16, State 1, Line 2

Invalid column name 'indepid'.

If your theory would have held, SQL Server would've used the indepid column from sys.syssingleobjref, which is aliased as source_database_id in thhe sys.databases view, instead of generating an invalid column name error.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

RichB SSCrazy Eights Points: 9651 More actions · Answer 2

Thanks Hugo for a very concise and informative post, far better than that which I was going to attempt!

It would be nice if the initial article could be updated with at least a caveat.

Happy new year btw...

Jambu Krishnamurthy SSCrazy Points: 2054 More actions · Answer 3

Thank you very much SSCCommitted for that correction.

It was too scary, and I was almost about to talk to my DBA friends

in different companies. That would have not been a good start with

my friends on a New Year.

Before talking to my friends, I just wanted to ask in the forum about

what was the behavior in Oracle. And Luckily I had you reply ready

to quell all dounts.

thankz

again and Have a New Year

Robert Davis One Orange Chip Points: 28027 More actions · Answer 4

I have to agree with the others. I think blaming SQL Server because someone doesn't code something correctly is just plain illogical. Using that logic, every mistake I ever made (and there have certainly been mistakes) could be called a SQL Server bug.

My blog: SQL Soldier[/url]
SQL Server Best Practices: SQL Server Best Practices
Twitter: @SQLSoldier
My book: Pro SQL Server 2008 Mirroring[/url]
Microsoft Certified Master: SQL Server, Data Platform MVP
Database Engineer at BlueMountain Capital Management[/url]

Steve Jones - SSC Editor SSC Guru Points: 742729 More actions · Answer 5

Hopefully Peter will respond soon to this and thanks for a nice post on what you see, Hugo.

We don't necessarily check every article for accuracy. It's a time constraint as well as the fact that I don't necessarily have the expertise to check every section of every article,QOD, etc. It's a good learning experience and people sometimes find things that they feel are correct or it's what they have observed. We want to give them their voice and let them show the world from their perspective.

We have had lots of debates and often I wouldn't necessarily say that any particular side is correct. It's often a question of where something applies or how it might fit your situation. I do think that Hugo seems to be correct in this case, but I haven't dug into it today to see.

Remaks SSC Enthusiast Points: 171 More actions · Answer 6

I wish I'd read the discussion before going down the same road Hugo did to prove what I believed I saw up front. I've made the mistake too many times of including an unqualified, invalid inner query field that was a valid outer query field and getting unexpected results, but not an error. I felt that before I responded I should verify that what I thought I saw was in fact the error I believed it to be. That was a waste of 5 minutes. (So I figured I'd waste a few more actually posting this response) 🙂

Ian Yates SSCoach Points: 19738 More actions · Answer 7

The article should really be removed or altered - it makes the site look bad 🙁

Alexander Kuznetsov SSCrazy Points: 2217 More actions · Answer 8

The best practice is to always prefix column names, which prevents such errors.

Cheers,

Alex Kuznetsov,

SQL Server MVP

Alexander Kuznetsov SSCrazy Points: 2217 More actions · Answer 9

Also Steve: I think Hugo is absolutely correct.

Cheers,

Alex Kuznetsov,

SQL Server MVP

jimmy-481471 Grasshopper Points: 20 More actions · Answer 10

Hi,

Hugo is very much right. I would like you to go through the correlated subqueries in case if you want any further details on this.

cheers,

Jimz.

peterhe SSChampion Points: 11469 More actions · Answer 11

I am on vacation till Jan.14. I have no reach to any SQL server and have very limited access to internet during my vacation. So I am sorry for the delay to the reply of the comments.

1) I don't think it's necessary to scope the column in the subquery. Actually you can only select the columns from the tables in the FROM subclause in the sub-query. It does not make sense to select columns from the tables refereneced by the main query. What result set are you expecting from a query like:

SELECT * FROM sys.server_principals b WHERE b.sid=(SELECT b.[sid] FROM sys.databases a WHERE a.name=DB_NAME())

2) Since I cannot reach SQL Server, can anyone run the following and let me what you get:

CREATE TABLE Ta (Ca1 int not null, Ca2 int not null)

GO

CREATE TABLE Tb (Cb1 int not null, Cb2 int not null)

GO

INSERT Ta (Ca1,Ca2) VALUES (1,1)

INSERT Ta (Ca1,Ca2) VALUES (2,2)

INSERT Tb (Cb1,Cb2) VALUES (2,2)

INSERT Tb (Cb1,Cb2) VALUES (3,3)

If we make a mistake in the following query

SELECT * FROM Ta WHERE Ca1 IN (SELECT Cb1 FROM Tb WHERE Cb2=2)

and write it as:

SELECT * FROM Ta WHERE Ca1 IN (SELECT Ca1 FROM Tb WHERE Cb2=2)

What result you get? Does SQL Server report error?

3) As I said in the article, it's talking about a scenario when a developer makes a mistake and SQL Server does not report the problem. If SQL server impliments the sub-query as a join (as mentioned by Hugo), it's an even bigger issue. The sub-query returns a scalar value in the example. It has nothing to do with the main query.

Hugo Kornelis SSC Guru Points: 64789 More actions · Answer 12

peterhe (1/4/2008)
1) I don't think it's necessary to scope the column in the subquery. Actually you can only select the columns from the tables in the FROM subclause in the sub-query.

That is not true. You can also reference columns from the outer query. That is called a correlated subquery, and many people use them on a daily basis.

It does not make sense to select columns from the tables refereneced by the main query. What result set are you expecting from a query like:
SELECT * FROM sys.server_principals b WHERE b.sid=(SELECT b.[sid] FROM sys.databases a WHERE a.name=DB_NAME())

Exactly one row in sys.databases will match the WHERE clause. For this row, b.[sid] will be taken from the outer query and used as the result of the subquey. This is of course always equal to b.sid, so the WHERE clause is True for each and every row in sys.server_principals. The result set I expect from this query would be every single row in sys.server_principals.

2) Since I cannot reach SQL Server, can anyone run the following and let me what you get:
CREATE TABLE Ta (Ca1 int not null, Ca2 int not null)
GO
CREATE TABLE Tb (Cb1 int not null, Cb2 int not null)
GO
INSERT Ta (Ca1,Ca2) VALUES (1,1)
INSERT Ta (Ca1,Ca2) VALUES (2,2)
INSERT Tb (Cb1,Cb2) VALUES (2,2)
INSERT Tb (Cb1,Cb2) VALUES (3,3)
If we make a mistake in the following query
SELECT * FROM Ta WHERE Ca1 IN (SELECT Cb1 FROM Tb WHERE Cb2=2)
and write it as:
SELECT * FROM Ta WHERE Ca1 IN (SELECT Ca1 FROM Tb WHERE Cb2=2)
What result you get? Does SQL Server report error?

The first query returns this (as expected):

Ca1 Ca2

----------- -----------

2 2

The second query returns this, as expected by me (but probably not by you):

Ca1 Ca2

----------- -----------

1 1

2 2

3) As I said in the article, it's talking about a scenario when a developer makes a mistake and SQL Server does not report the problem. If SQL server impliments the sub-query as a join (as mentioned by Hugo), it's an even bigger issue. The sub-query returns a scalar value in the example. It has nothing to do with the main query.

SQL Server doesn't necessarily implement the subquery as a join (though that is, in fact, exactly what the optimizer will do in many cases - but it's irrelevant here). It assumes that you wanted to write a correlated subquery, since you are referencing columns from the outer query in the subquery. I'd love there to be a provision that references to outer queries always require explicit prefixing of table name or table alias, but such a provision is unfortunately not in any of the SQL standards that I know of (SQL-92, SQL-1999, and SQL-2003), and is not implemented by Microsoft either.

In fact, I'd put it one step further and say that I'd prefer it to be required to prefix each column in every query that uses more than a single table. That would prevent lots of those pesky, hard to find bugs caused by typos.

Thanks for taking the time to respond. Enjoy the rest of your holiday, I'll still be here, ready to read your reply, when you get back.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

RichB SSCrazy Eights Points: 9651 More actions · Answer 13

For what its worth, I remembered a thread about this on google a while back.

I think it was this one, if this link works!

Covers the main points of the argument in detail - rather than anyone spend too much time trying to figure it out.

peterhe SSChampion Points: 11469 More actions · Answer 14

Thanks Hugo for your clarification and thanks Richard for the link.

I do use correlated subqueries in a daily basis but in my mind, the outer table can only be referenced in the WHERE clause in the subquery, not in the SELECT list of the subquery.

1) I checked BOL about subquery, it has a Caution section in the end:

" If a column is referenced in a subquery that does not exist in the table referenced by the subquery's FROM clause, but exists in a table referenced by the outer query's FROM clause, the query executes without error. SQL Server implicitly qualifies the column in the subquery with the table name in the outer query."

So I can't blame MS. It does a good job to document what SQL Server does in this scenario.

2) I also checked Oracle (10g for Windows, express edition), its behaviour is the same as MS SQL Server.

So two major RDBMS are handling this in the same way. I am not sure this is defined by SQL-99/2003 spec or not.

My apologize to everyone if the article caused any confusion. However, frankly i am not fully convinced. The subquery that select a column from the outer table is cumbersome and confusing in terms of semantics. It always has more concise and clear way for this type of queries. To me, it brings in more troubles than a feature. And I guess that's why MS put a caution section in the BOL.

I haven't used and will not use this type of correlated subqueries. To avoid the typos or mistakes, I'll follow Hugo's suggestion to scope the referenced columns.