Selecting from sub-queries as tables

Question

Post reply

Selecting from sub-queries as tables

bkubicek

SSChampion

Points: 10948
More actions
January 25, 2015 at 4:03 am

#302508

Comments posted to this topic are about the item Selecting from sub-queries as tables

Viewing 15 posts - 1 through 15 (of 27 total)

You must be logged in to reply to this topic. Login to reply

alex pilsworth SSC Enthusiast Points: 132 More actions · Answer 1

Common table expressions provide a neat way of doing this...

WITH table1 as

(Select OrderNbr, max(UpdateDate) as UpdateDate FROM Orders

Group by OrderNbr)

SELECT * from table1

deroby SSC-Addicted Points: 416 More actions · Answer 2

alex pilsworth (1/26/2015)
Common table expressions provide a neat way of doing this...

Under the hood, CTE is pretty much identical to using sub-queries; only the syntax is different: the actual query plans turned out to be identical whenever I took the time to compare them.

Admittedly, you can't do "recursive-sub-queries' like you can with CTE's... Otherwise I'd simply suggest to use what you like best.

Koen Verbeeck SSC Guru Points: 259085 More actions · Answer 3

Nice basic article.

I encountered the same issue when I was learning T-SQL, so I'm sure people will benefit from this article.

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

e.dicesare Valued Member Points: 59 More actions · Answer 4

looking at performances, I do always prefer to use temp tables instead of sub-queries

Koen Verbeeck SSC Guru Points: 259085 More actions · Answer 5

e.dicesare (1/26/2015)
looking at performances, I do always prefer to use temp tables instead of sub-queries

It all depends on the query and how many subqueries there are.

I've encountered cases where temp tables were slower because in your script you can create them only one at a time (because a script is serially executed), while a big query with lots of temp tables can read everything in parallel.

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

aparker 50461 SSC Rookie Points: 29 More actions · Answer 6

I agree, a CTE is the better way to do this now. It provides greater readability and flexibility.

sgmunson SSC Guru Points: 110639 More actions · Answer 7

Not sure if anyone is aware, but for this particular article, a sub-query isn't even necessary. All that is needed is a column perspective instead of a row perspective. When one is trying to write SQL code, one should try to think about what you are doing to a column, instead of what you are trying to do to a row. The original query is as follows:

Select t.OrderNbr, isnull(t2.minDate,t.UpdateDate) as UpdateDate

FROM

(Select OrderNbr, max(UpdateDate) as UpdateDate FROM @Orders

Group by OrderNbr) t

left join (Select OrderNbr, min(UpdateDate) as minDate FROM @Orders

Where StatusCode <> 'Closed'

Group by OrderNbr) t2 on t.OrderNbr = t2.OrderNbr

All that is being done to get the UpdateDate result is to take the minimum value of the UpdateDate field for any non-closed row, and if there are no such rows, then take the maximum value of UpdateDate field instead. This can be done without any sub-queries at all. One need only use a CASE statement to NULL out any rows where the StatusCode field = 'Closed', and then take the MIN value of that CASE statement, as the MIN aggregate function ignores NULL values. This MIN value can then be the first element of the ISNULL function, and the MAX value of UpdateDate can then be the 2nd element. Here's the query:

SELECT OrderNbr, ISNULL(MIN(CASE WHEN StatusCode <> 'Closed' THEN UpdateDate ELSE NULL END), MAX(UpdateDate)) as UpdateDate

FROM @Orders

GROUP BY OrderNbr

If you take a look at the picture of the execution plans I put out on DropBox (link below), you can see that as part of my overall query set, my query only cost 34% of the total, whereas the original cost 42% of the total, with the remainder taken by the INSERT into the @Orders table. I chose to use a table variable because I have to keep my database clean of any extraneous objects. Here's the DropBox link: https://www.dropbox.com/sh/cupsrdsdu06ylf3/AACnGcO-lF-wYfdLwPwfj0fma?dl=0

The file name is: EXECUTION_PLANS_SUBQUERY_ALIAS_ARTICLE.JPG

See the attachments for the actual execution plans.

EDIT: Had to fix the link to DropBox.

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

RonKyle SSC-Dedicated Points: 31537 More actions · Answer 8

This kind of sub-query is called a derived table. With the addition of CTEs to TSQL I don't know that they have much use anymore.

George W (no relation) Ten Centuries Points: 1339 More actions · Answer 9

Steve - Seems to me that for this particular article the subquery is absolutely essential, as the article topic is selecting from subqueries as tables.

sgmunson SSC Guru Points: 110639 More actions · Answer 10

George W (no relation) (1/26/2015)
Steve - Seems to me that for this particular article the subquery is absolutely essential, as the article topic is selecting from subqueries as tables.

Purely from that perspective, that's true. However, it also ends up serving as a less than stellar example of the use of a sub-query, in a scenario that just doesn't need it. I can't tell you how often people just copy code from an article or forum post and just change table and field names and use it without really thinking much about performance. As the article author actually says he needed a sub-query, when that's just not the case, I thought it worth bringing that fact to the attention of the article readers (and it's author), so that everyone can benefit from the additional information.

There are a number of scenarios that I can envision where a sub-query is a useful thing to join to. A particularly good example is the case of the existence of multiple "audit trail" type tables, where one needs the latest audit trail event from each of the audit trail tables in order to determine which events to deal with. Audit tables typically are designed to audit multiple "types" of things, and using a sub-query to select a maximum event date for a given event type is a common reporting requirement.

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

Charles Kincaid SSChampion Points: 13593 More actions · Answer 11

Nice article Ben. Thanks. I had often wondered about some of these super complex queries that I have seen. Why is that alias there? This makes reading that old code clearer.

I agree that a CTE is more readable. If you are still facing databases on older servers (like SQL 2000) you can't use the modern construct. While temp tables for large subset data would be a good way to go you might be faced with a situation where you only have SELECT permissions and can't create anything.

ATBCharles Kincaid

srajpx Newbie Points: 7 More actions · Answer 12

I believe that for each row of the first subquery, the second subquery is executed. Say, the first sub query returns 100 rows, for each of this row the second subquery is executed.

sgmunson SSC Guru Points: 110639 More actions · Answer 13

srajpx (1/26/2015)
I believe that for each row of the first subquery, the second subquery is executed. Say, the first sub query returns 100 rows, for each of this row the second subquery is executed.

I would expect that only if the APPLY type of JOIN were involved. Take a good look at the execution plans for 2nd and 3rd queries in the following code, which incorporates the original code, but uses a table variable instead of a table to avoid cluttering the database with extraneous objects that I might forget to get rid of:

DECLARE @Orders AS Table

(id uniqueidentifier not null primary key,

OrderNbr varchar(50) not null,

OrderLineNbr varchar(50) not null,

Quantity numeric(8,2) not null,

StatusCode varchar(20) not null,

UpdateDate datetime not null)

Insert into @Orders (id,OrderNbr, OrderLineNbr, Quantity, StatusCode,UpdateDate)

Values(NEWID(),'1','1',1.0,'Closed','2014-12-01'),

(newid(),'1','2',2.0,'InProcess', '2014-12-10'),

(newID(),'1','3',2.0,'Open','2014-12-31'),

(newID(),'2','1',1.5,'Open','2014-01-02'),

(newID(),'3','1',3.0,'InProcess','2014-12-28')

SELECT OrderNbr, ISNULL(MIN(CASE WHEN StatusCode <> 'Closed' THEN UpdateDate ELSE NULL END), MAX(UpdateDate)) as UpdateDate

FROM @Orders

GROUP BY OrderNbr

Select t.OrderNbr, isnull(t2.minDate,t.UpdateDate) as UpdateDate

FROM

(Select OrderNbr, max(UpdateDate) as UpdateDate FROM @Orders

Group by OrderNbr) t

left join (Select OrderNbr, min(UpdateDate) as minDate FROM @Orders

Where StatusCode <> 'Closed'

Group by OrderNbr) t2 on t.OrderNbr = t2.OrderNbr

For the original (3rd) query, You'll find a Nested Loop branching to a Stream Aggregate, Sort, & Clustered Index Scan on the main branch, and secondly to a Compute Scalar, a Stream Aggregate, and Clustered Index Scan. If you hover over the two index scans, you can see the number of executions being 1 on the main branch, and 3 on the second. That suggests that an execution of the LEFT joined sub-query once for each row of the first sub-query isn't accurate. Thus I went ahead and tried changing the original query as follows:

Select t.OrderNbr, isnull(t2.minDate,t.UpdateDate) as UpdateDate

FROM

(Select OrderNbr, max(UpdateDate) as UpdateDate FROM @Orders

Group by OrderNbr) t

outer apply (Select OrderNbr, min(UpdateDate) as minDate FROM @Orders

Where StatusCode <> 'Closed'

Group by OrderNbr) t2 where t.OrderNbr = t2.OrderNbr

The execution plan was nearly identical, but the number of executions of the 2nd branch's Clustered Index Scan was just one instead of 3, so clearly, something different is happening, and it's probably the optimizer getting smart about things. When you compare the execution plan of the original query to the one for the "improved query" that doesn't use any sub-queries at all, you get one and only execution of a Clustered Index Scan.

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

C64DBA SSC Veteran Points: 249 More actions · Answer 14

They do have use. For some complex queries, the engine sometimes needs help. Adding a derived table in just the right part of the query can make it easier for the engine to find the optimal plan.