• Koen Verbeeck (6/22/2012)


    Very interesting question. I never use this syntax with the correlated subquery, so it would have been nice to give a bit more explanation on that.

    Logically (actual execution may differ, as long as the results are the same), a correlated subquery is evaluated once for each row in the outer query, substituting the reference to the outer query with the value from the current outer query row.

    In this example (which I hope was designed specifically to test understanding of SQL and was not taken from actual code), the logical processing is as follows:

    1. The query has a simple FROM and no WHERE, so all rows in #tmp1 are processed. The alias (t1) means that in the context of the query, the table is temporarily renamed to t1, so references to #tmp1 become invalid. (The official ANSI specification doesn't involve temporarily renaming the table, but making a copy of the table's data under the new name - but for the non-ANSI update ... from construction, that would not work as the updates have to flow to the original table and not to the copy).

    2. The rows in t1 are processed one by one. The order is irrelevant. For this discussion, let's assume they are processed in order of ascending EnrollmentKey value.

    3. For the first row (101), the subquery is evaluated. t1.Enrollmentkey references the outer query, so is replaced with its value. The subquery now reads "(select '5/2/2012' where 101 = 102)". There is no FROM clause, so one zero-column dummy row is assumed (I'm not sure if ANSI allows this; I know Oracle and DB2 don't -you have to add FROM DUAL (for Oracle) or FROM SYSIBM.SYSDUMMY (for DB2) to mimic this behaviour-). The WHERE clause evaluates to false, so this dummy row is kicked out.

    The subquery returns an empty set, which is converted to NULL. The outer query will now set PacketDate to NULL for the first row.

    4. For the second row (102), the subquery is evaluated again. t1.Enrollmentkey is replaced with its value, so the subquery now reads "(select '5/2/2012' where 102 = 102)". This evaluates to true, so the dummy row is kept. For each row in this result (all one of them), the SELECT clause will return the constant character value '5/2/2012'.

    The subquery returns an set consisting of one row with one column, holding the character data '5/2/2012'. The outer query will implicitly convert this to datetime (resulting in either February 5th 2012 or May 2nd 2012, depending on locale settings - or maybe even in a runtime error in some cultures!) and set PacketDate to that value for the second row.

    5. For the third row (103), the subquery is evaluated again. t1.Enrollmentkey is replaced with its value, so the subquery now reads "(select '5/2/2012' where 103 = 102)". This evaluates to false, so the dummy row is kicked out.

    The subquery returns an empty set, which is converted to NULL. The outer query will now set PacketDate to NULL for the third row.

    It is important for any SQL developer to understand correlated subqueries, as there are many situations where they are a great tool. It is also important to understand that the scenario of this QotD is no such situation!


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/