Blog Post

SQL Server Select – Back to Basics

,

Prelude in SQL Minor

translucentman_greenBack in late December of 2015, a challenge of sorts was issued by Tim Ford (twitter) to write a blog post each month on a SQL Server Basic. Some have hash-tagged this as #backtobasics. Here is the link to that challenge sent via tweet.

I did not officially accept the challenge. Was an official acceptance required? I don’t know. I do know that I think it is a good challenge and that I intend to participate in the challenge. I hope I can meet the requirements and keep the posts to “basics”. Let’s just call this first post in the challenge to be my official acceptance.

With this being another installment in a monthly series, here is a link to review the other posts in the series – back to basics.

SQL Server Select

atlasian_manOf all the fundamental concepts within SQL Server, nothing is potentially more basic and fundamental than the often overlooked Select statement. Think about that statement for just a moment.

How often is the select statement overlooked and under-analyzed? Yet, this tiny little word is a rather large part of the transactions that occur within each instance of SQL Server on a routine basis.

Today, I want to break down the SQL Select statement and show the components of the statement along with the logical processing order of those components.

Let’s put on our thinking caps for a moment and think of all the possible components of this super fundamental piece of the TSQL language – the Select statement.

Anatomy of the Select

The first important piece of information with the Select is to understand what it does. What is a Select statement? What is the purpose of a Select? Really, the purpose of this statement should make sense based on the word alone – to select. In other words, to get or retrieve something that is of interest to you at that moment. In our case, that means to retrieve some piece of data from the database to meet our needs. Very simple, right? You issue a SELECT statement to get something out of the database.

Thinking through the Select statement, certain common elements probably stick out pretty easily. These elements might include the following:

  • Select
  • From
  • Join
  • Where

Then with a slightly more complex query, one might see the following components associated with a Select statement.

  • Top
  • Distinct
  • Group By
  • Order By

Then stepping it up to another level again, one might see the following:

  • Having
  • For
  • Into
  • With (Cube or Rollup)

Soak that in for a moment. Suddenly this fundamental piece of TSQL is starting to look a little more like black voodoo magic. It can get rather complex with this fundamental statement. Now soak in what the purpose of the SELECT is, as was stated just a bit ago. That purpose is to retrieve data that you need. One cannot always retrieve the desired data without some options to go with the SELECTion criteria. Suddenly, this complexity renders itself more as raw DBA power.

Knowing the various aspects of the Select, and not diving too far into what each does, you may be wondering how these pieces fit together into a query. Let’s take a quick look at a few different examples illustrating how these puzzle pieces fit together into a SELECT statement.

USE AdventureWorksDW2014;
GO
SELECT OrderDateKey
, SUM(SalesAmount) AS TotalSales
FROM FactInternetSales
WHERE OrderDateKey > '20020801'
GROUP BY OrderDateKey
ORDER BY OrderDateKey

Here is an example showing a SELECT that utilizes the FROM, WHERE, GROUP BY, and ORDER BY clauses. When writing a SELECT statement, we will write the query in the order shown in the query. Using these clauses in a different location doesn’t read very well to humans and will throw a syntax error.

In this next example, I have used a few different clauses:

SELECT DISTINCT TOP 5 e.BusinessEntityID
FROM HumanResources.Employee AS e
INNER JOIN Sales.SalesPerson AS s
ON e.BusinessEntityID = s.BusinessEntityID
ORDER BY e.BusinessEntityID DESC

The real differences here being that I have used the DISTINCT, TOP and JOIN clauses. Again, trying to write this query with the TOP preceding the TOP or maybe with the JOIN preceding the top doesn’t really make much sense when reading it. Nor would it work if we tried to execute the query due to syntax errors.

This next example uses another option to output the results into an XML document using the FOR clause:

USE AdventureWorks2014;
GO
SELECT p.BusinessEntityID, FirstName, LastName, PhoneNumber AS Phone
FROM Person.Person AS p
JOIN Person.PersonPhone AS pph ON p.BusinessEntityID  = pph.BusinessEntityID
WHERE LastName LIKE 'G%'
ORDER BY LastName, FirstName 
FOR XML AUTO, TYPE, XMLSCHEMA, ELEMENTS XSINIL

And in this last example, I take the previous example to output the results into a temp table.

USE AdventureWorks2014;
GO
SELECT *
INTO #logop
FROM (SELECT p.BusinessEntityID, FirstName, LastName, PhoneNumber AS Phone
FROM Person.Person AS p
JOIN Person.PersonPhone AS pph ON p.BusinessEntityID  = pph.BusinessEntityID
WHERE LastName LIKE 'G%'
ORDER BY LastName, FirstName 
FOR XML AUTO, TYPE, XMLSCHEMA, ELEMENTS XSINIL) lo (xmlout)

These are all examples of the possibilities that can be used with the SELECT statement to retrieve data from the database in order to meet our needs. Each one illustrates different variations and some level of complexities between each of the examples.

Now the question is, did you know that SQL Server does not process these statements in the same syntactic sequence or English friendly forms that we write these SELECT statements? You may be surprised to learn that there is a different processing order to these pieces of the SELECT. This processing order is called the LOGICAL Processing order.

Let’s take one of the queries already shown and compare.

The query as we may write it will look like this:

Anatomy_select

The query as SQL Server will process it will look like this:

anatomy_logop

Or in other words, we may see it as shown in this video:

That is significantly different than we might think, but it makes sense if you think about it long enough.

How does one know that SQL Server processes these steps in this order? Well, besides the documentation in BOL that supports this, one can dive into the internals and generate a tree that will show this in action. For instance, let’s use the same query as just shown and see what the processing tree would look like.

select_tree_map

Besides looking very colorful and almost as if it were in Greek, this processing tree is somewhat revealing in how SQL Server breaks down that SELECT statement we have been using. One more pic, and then a quick explanation.

select_showplan_text

This last image was generated using the SHOWPLAN_TEXT setting. I want to show this because it helps to reinforce a concept from the processing tree. In this last image, note that the OUTER REFERENCE is to the SalesPerson table. This also means (without stating it) that the inner reference is to the Employee table. I have added the INNER and OUTER table references to the image of the processing tree. Why is this important? Well, when trying to interpret the tree it is important to understand that one starts from the inner most node and then read from bottom to top, right to left (as you ascend the nodes).

In this case, we will see that the Employee table is first referenced and processed, then the SalesPerson table, then the ON clause is evaluated. In the tree, I highlighted the ON that pertains to the join in bright green. The vertical dotted lines on the left are to help visually keep the nodes aligned as you step in/out from one level to the next. As we climb the tree to the top left corner, we will see that the last piece of the select to be processed in this case is the TOP statement. We can also see that the top is dependent on the ORDER BY (highlighted in RED) in order to get the correct desired rows.

Recap

This article jumps through the SELECT statement very briskly. The SELECT statement is a powerful tool within SQL Server. Gaining an understanding how the SELECT statement is processed can be helpful to write an appropriate SELECT and how to potentially pseudo code that SELECT. If you can understand that the engine first determines the source of the data and then starts to break down the additional requirements that have been sent with the SELECT. The more options that are thrown in with the SELECT statement, the larger and more complex the processing tree becomes for that statement.

Used well, a SELECT will become your ally and a handy tool in the toolbox. This has been an introduction to the SELECT and how it is processed behind the scenes. If you would like to see more information about the different clauses related to the select statement, check out the BOL entry here.

As an aside, the tree that was displayed is called the “converted tree.” Here is a listing of the other possible trees.

SELECT mv.name, mv.map_value
FROM sys.dm_xe_map_values mv
WHERE mv.name = 'query_optimizer_tree_id';

optimizertrees

Original post (opens in new tab)
View comments in original post (opens in new tab)

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating