Stairway to T-SQL DML

Stairway to T-SQL DML Level 1: The Basic SELECT Statement

,

Here are the different levels that will be covered during this stairway series:

Level 1: The Basic SELECT Statement

There are lots of different aspects of managing data in a SQL Server database. Before you can get into the complex management issues associated with managing application data you need to start with the basic of retrieving data from a table. To return data from a SQL Server table you use a SELECT statement. In this level I will be cover the components of the basic SELECT statement and how you can use it to retrieve data from a single SQL Server table.

The Three Parts in a SELECT Statement

A basic SELECT statement that returns data from a single table consists of three different parts: The Column list, the FROM clause, and the WHERE Clause. The syntax for constructing a basic SELECT statement using these different components looks like this:

SELECT <Column List>
FROM <table name>
WHERE <where criteria>;

The “<Column List>” will contain a list of columns that you want returned from the query, “<table_name>” will contain the table from which the data is selected, and the “<where criteria>” well identify the search criteria that will be used to constrain the rows that will be returned from the SELECT statement. Note that the WHERE clause is optional.

Note that all of my examples in this Stairway will use the AdventureWorks SQL Server 2005 database that can be obtained from Codeplex at this location: http://sqlserversamples.codeplex.com/.

Let’s look at the following very simple SELECT statement that selects some data from a table in the AdventureWorks database. You can follow along by running each of the query statement described in this article using the query window within SQL Server Management Studio, while having the database set to AdventureWorks.

SELECT  ProductCategoryID ,
        Name
FROM    Production.ProductCategory
WHERE   ProductCategoryID < 2 ;

Here I selected two different columns, ProductCategoryID and Name, from the Production.ProductCategory table. Since this SELECT statement has a WHERE clause it limits the rows returned from the table to only those rows that have a ProductCategoryID value less than 2.

Now that you have a basic idea of a SELECT statement let me explore each of the components of the SELECT statement in a little more detail.

The Column List

The column list follows the SELECT keyword and is where you specify the columns you want to return from your table. The columns are identified by specifying the column name. If multiple columns are listed they are separated by commas. In later levels we’ll look at the possibilities for returning values other than columns from the specified table, in the first level, we’re sticking to the basics. Building on the example above let’s explore what a column list might look like if it only selects a single column, or all of the columns of a table.

If I only wanted to return the Name column of the Production.ProductCategory table my query would look like this:

SELECT  Name
FROM    Production.ProductCategory
WHERE   ProductCategoryID < 2 ;

Here you can see I only specified only the Name column between the SELECT and FROM keywords in the above query. But if I wanted to specify all of the columns in the Production.ProductCategory table I would run the following query:

SELECT  ProductCategoryID ,
        Name ,
        rowguid ,
        ModifiedDate
FROM    Production.ProductCategory
WHERE   ProductCategoryID < 2 ;

In this SELECT statement you can see that I have identified 4 different columns each one separated by a comma. These columns can all be listed together on a single line or separated on different lines for readability, as I have done. There is another way to select all the columns from a table, and that is by specifying an asterisk instead of specify the individual column names. Here is a SELECT statement that uses the asterisk specification:

SELECT  *
FROM    Production.ProductCategory
WHERE   ProductCategoryID < 2 ;

Care needs to be taken when using the asterisk (usually referred to as “star”) in your applications. Since the asterisk will return all columns in a table, if you happen to alter a table to include an additional column then the additional column will be returned without modifying the actual SELECT statement. Whereas if you have specified all the columns by name, then when a new column is added it will not be returned, unless you add it to the column list. Using the asterisk is acceptable for testing, but if you want to follow best practices do not use it in your application code. The reason this is a best practice is because most applications expect a fixed number of columns to be returned from a given SELECT statement. When columns have been added to a table and you have used the asterisk method to identify table columns, the additional columns returned can break applications that have not been coded to handle this extra data.

The FROM Clause

In the FROM clause you identify the table from which you want to select data. In this level we are only discussing selecting data from a single table in the FROM clause. Note that as you progress in your knowledge of SQL the FROM clause can identify multiple tables from which to select data.

There are a number of different ways to identify the table in which you want to select data. In fact there are four different ways, three of which I will cover in this article and the fourth I will just mention.

The first way is to specify a table by identifying both the table name and the schema in which it belongs. This is how I have identified all my table names in all my examples so far. In each of my examples I said the table name was Production.ProductCategory. The actual table name is just ProductCategory and it is contained in the Production schema.

The second way to identify a table in the FROM clause is to just state the table name. When the FROM clause only contains the table name, SQL Server will assume the table is contained under the default schema for the database user, or under the dbo schema. Let me explain this concept in a little more detail.

When you submit a query to SQL Server that only identifies the table name, SQL Server will need to resolve which schema this table is under. This is because in a given database there can be multiple tables with the same name, as long as they are in different schemas. To determine which schema a table resides in SQL Server goes through a two-step process. The first step is to use the default schema for the database user who submitted the query and append their default schema the table name and then look for that table. If SQL Server finds the table using the user default schema then step two is not performed. If SQL Server did not find the table using the users default schema, then SQL Server checks in the dbo schema to find the table. Regardless of whether your database contains a single schema or not, it is best practice to get in the habit of identifying your tables in the FROM clause by using the schema name followed by the table name. By doing this you reduce the amount of work SQL Server has to do to resolve the table name and promote plan cache re-usage.

The third way to identify a table is to use a three part name, where the last two parts are schema and table name, and the first part is database name. Here is a SELECT statement similar to my previous SELECT statement that uses a three part name:

SELECT  *
FROM    AdventureWorks.Production.ProductCategory
WHERE   ProductCategoryID < 2 ;

Here you can see that I appended the database name AdventureWorks to the Production.ProductCategory table identified in the FROM clause.

By using a three part naming convention for tables in the FROM clause the database context of your query window in SQL Server Management Studio could be set to any database, and the database engine will still know which database, schema and table to use for the query. When building applications that need to retrieve data from multiple databases within an instance, using a three part name helps facilitate retrieving data from multiple databases in a single application.

The last method is a using a four-part name, with the fourth part (preceding the database name) identifies the name of a linked server. Since linked servers are outside the scope of this Stairway, I will not discuss this topic further. If you should run across table name that contains 4 parts, you will know the table is associated with a linked server.

The WHERE Clause

The WHERE clause of a SELECT statement is optional. The WHERE clause is used to constrain the rows that are returned from a SELECT statement. The database engine evaluates each row against the WHERE clause and then only returns rows if they meet the search condition or conditions identified in the WHERE clause. As you write more SELECT statements you will find most of your SELECT statements will likely contain a WHERE clause.

A simple WHERE clause will contain a single search condition, whereas a more complex WHERE clauses might contain many conditions. When multiple conditions are used in a WHERE clause they will be combined together logically by using AND and OR logical operators. There is no limit to the number of different condition that might be included in a SELECT statement. In my examples so far I’ve only used a single condition. Let’s review a couple of examples that have more complex WHERE clauses.

Here is an SELECT statement that has two different search conditions:

SELECT  *
FROM    Production.Product
WHERE   Color = 'Blue'
        AND ProductID > 900 ;

In this statement the first condition checks to see if the row has the value Blue in the Color column. The second condition checks to see if the value in the ProductId column is greater than 900. Since the AND operator is used both of these conditions need to be true in order for a row to be returned from this query.

Let’s look at a WHERE clause that gets even more complicated:

SELECT  *
FROM    Production.Product
WHERE   ProductID > 900
        AND ( Color = 'Blue'
              OR Color = 'Green'
            ) ;

This example returns Production.Product rows where the ProductID value is greater than 900 and the value in the Color column is either Blue or Green. By looking at this query you can see that I have placed parenthesis around the two different Color column conditions. This sets the order in which conditions are evaluated against each other, just like you would do in a mathematical expression. By using the parentheses the OR operation between the two color conditions will be evaluated before evaluating the AND operator. When using AND and OR together without parentheses, the order in which these logical operators are processes is based on the rules of logical operator precedence. The precedence rules dictate that AND operations are performed before OR operations if no parentheses are included.

Let me build on my prior WHERE clause by adding a NOT operator:

SELECT  Name ,
        Color
FROM    Production.Product
WHERE   ProductID > 900
        AND NOT ( Color = 'Blue'
                  OR Color = 'Green'
                ) ;

I added the NOT operator just after the AND operation to indicate that I want products that are not Blue or Green. When I run this query against my AdventureWorks database I get products that are Silver, Yellow and Black.

For a complete list of all the different possibilities of a search condition refer to Books Online: http://msdn.microsoft.com/en-us/library/ms173545.aspx, and for more information on logical operator precedence you can read this topic: http://msdn.microsoft.com/en-us/library/ms190276.aspx.

Putting it All Together: Business Cases

Now it is your turn to use the information I’ve shared to build your own basic SELECT statements. In this section I will be providing you a couple of exercises with which you can practice writing a SELECT statement to meet the business case described.

Exercise #1:

Assume you have been asked by the head of the Human Resources department to produce a list of all EmployeeID values for employees who have lots of sick leave and vacation hours. For the purpose of this list the Human Resources head tells you the definition of “lots of leave” is as follows: an employee must have a SickLeaveHours value greater than 68 and a VacationHours value greater than 98. You can find these columns along with the EmployeeID column in the Human.Resources.Employee table. Write and test your code against the AdventureWorks Database. When done, check your code against the answer in the section below.

Exercise #2:

There are some questions your manager has about a few specific order numbers. Your manager wants you to produce a list of all the columns associated with a few specific orders so she can review which sales person is associated with each order. Your manager specifies that she is only interested in a report that contain SalesOrderHeader rows for orders that have a SalesOrderId between 43702 and 43712. When you are done writing this SELECT statement, check your answer below.

Answers to Exercise #1:

Your query should look something like this:

SELECT  EmployeeID ,
        SickLeaveHours ,
        VacationHours
FROM    HumanResources.Employee
WHERE   SickLeaveHours > 68
        AND VacationHours > 98 ;

The results of your query should return the following three rows:

EmployeeID  SickLeaveHours VacationHours

----------- -------------- -------------

109         69             99

179         69             99

224         69             99

Answers to Exercise #2:

Your query should look something like this if you read the document referenced above to learn a little more about search conditions and the BETWEEN operator:

SELECT  *
FROM    Sales.SalesOrderHeader
WHERE   SalesOrderID BETWEEN 43702 AND 43712 ;

Or you might have written your code similar to this:

SELECT  SalesOrderID ,
        RevisionNumber ,
        OrderDate ,
        DueDate ,
        ShipDate ,
        Status ,
        OnlineOrderFlag , 
        SalesOrderNumber ,
        PurchaseOrderNumber ,
        AccountNumber ,
        CustomerID ,
        ContactID ,
        SalesPersonID ,
        TerritoryID ,
        BillToAddressID ,
        ShipToAddressID ,
        ShipMethodID ,
        CreditCardID ,
        CreditCardApprovalCode ,
        CurrencyRateID ,
        SubTotal ,
        TaxAmt ,
        Freight ,
        TotalDue ,
        Comment ,
        rowguid ,
        ModifiedDate
FROM    Sales.SalesOrderHeader
WHERE   SalesOrderID >= 43702
        AND SalesOrderID <= 43712 ;

Your query should have returned 11 rows. Five rows have an OrderDate vale of 2001-07-02, five rows have a OrderDate vale of 2001-07-03 and 2 rows have an OrderDate value of 2001-07-04.

One thing to note about this solution versus the previous solution is all the column names are identified, whereas the previous solution used the asterisk to specify all the columns. Identifying all the columns by name is a better coding practice, because your SELECT statement will always return the same number of columns even if an additional column is added to your table. This is important when your application is expecting a specific number of columns to be returned from your SELECT statements.

Understanding the Basic SELECT Statement

In order to be a SQL Server application programmer you need to first understand the basic SELECT statement. Understanding how to retrieve all the data in a table and constraining what is returned are fundamental to developing applications. This article and exercises have provided you with the SELECT statement basics, which is the first building block of this series.

This article is part of the parent stairway Stairway to T-SQL DML

Rate

4.5 (2)

You rated this post out of 5. Change rating

Share

Share

Rate

4.5 (2)

You rated this post out of 5. Change rating