Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

The Voice of the DBA

Steve Jones is the editor of SQLServerCentral.com and visits a wide variety of data related topics in his daily editorial. Steve has spent years working as a DBA and general purpose Windows administrator, primarily working with SQL Server since it was ported from Sybase in 1990. You can follow Steve on Twitter at twitter.com/way0utwest

Common SQL Server Mistakes - SELECT *

I’ve been trying to work on some new presentations so that I have a variety, including some spares, when I go to events. One of the topics that I think has some value, especially for .NET and sysadmin groups, is a list of common mistakes, how to fix them, and why they’re bad.

I was going to call this Common Developer Mistakes, but I’m not sure that would go over well at Developer events, and I see DBAs making these mistakes along with Windows admins.

I decided to build a series of blog posts as I work through the presentation to document some of the issues, and help me work through speaking points. Please feel free to comment.

SELECT * Is For Short Term Use Only

The first mistake that I often see in application code is that too often people write things like

SELECT * 
FROM Sales.Customer


What does this do? If I run this in my 2008 AdventureWorks database, I get something like this:


 SelectStar_b

You can see that I end up with multiple columns (CustomerID, TerritoryID, CustomerType, rowguid, ModifiedDate). That’s handy, and cool, and allows me to get all the data in the table.

But do I really need it?

In most applications, my guess is that we don’t. Why do we need TerritoryID? That’s a foreign key to the SalesTerritory table, and typically what I want instead is the SalesTerritory.Name column instead of the ID value.

I could do this:

SELECT *
FROM Sales.Customer c
INNER JOIN Sales.SalesTerritory t
ON c.TerritoryID = t.TerritoryID

but that’s any better. Now I’ve returned even more columns, 10 more to be exact, including TerritoryID twice, once from each table. In AdventureWorks, this is 19k rows, and at a minimum this query has returned 19k rows x 8 bytes (int data type) too much data. That doesn’t sound like a lot, but what if this runs in your application 500 times a day? That’s a lot of wasted:


  • bandwidth
  • disk access
  • memory from caching
  • CPU work on the server AND client

I would also guess that most of the time when you access a customer, you don’t even want all the rows. Likely you want to filter this somehow, and you will with a WHERE clause, but it’s still wasted time and resources.

We know that the database often is a bottleneck. It’s a shared resource, it’s one machine, and it doesn’t scale as easy as multiple clients or web servers, or even developers, so we should avoid wasting resources when we don’t have to.

What Do You Do?

Here’s what I recommend:

You can write this, and it’s what I often do:

SELECT TOP 2
*
FROM Sales.Customer

And I get a limited result set:


SelectStar_e


Why is this better? I do this so I can easily see the column names. I can then include those in my SELECT statement, with a quick rewrite.


SelectStar_f

I could also quickly use the Object Explorer to find the columns like this:


SelectStar_g


And you can right click, and choose “script” and “as SELECT”


SelectStar_h

and paste the code into your query window. The results would look something like this:



SelectStar_i

Alternatively, my employer, Red Gate Software, makes a fantastic product called SQL Prompt that will help you quickly grab columns. For me, I can do an SSF, and get a SELECT * FROM and then choose the table:


SelectStar_c


Not that I see the columns to the right. I could also just select the table with a tab and then if I remove my askterisk, I get a list of columns I can easily pick:



SelectStar_d

SQL Prompt makes this easier, but it isn’t that hard to just do this by hand. You could easily grab the columns you need from SSMS and add them to queries.

The database is a limited resource, even if you have a 256 core server with 1024GB of RAM. You still want to query the data you need and only return what’s necessary. A little more effort when building code will pay off later with much better performing applications.

Comments

Posted by Gethyn Ellis on 1 September 2010

Nice post Steve, looking forward to the rest of the series too.

Posted by Shawn Melton on 1 September 2010

Not that I write T-SQL queries much but I like using sp_help to get columns of a table/view. This way I can see what data type it is and column length too. I do like the TOP 2 query though to get a quick view of the column names, nice touch.

Posted by Steve Jones on 1 September 2010

Please feel free to let me know what you think are other common mistakes.

sp_help is a good way to quickly view tables, though I'm not sure I find it much more convenient than flipping through SSMS.

Posted by Jason Brimhall on 1 September 2010

Nice post steve.  I have another one in queue along the Select * series that I have been working on - I think the results might be intriguing.

Posted by DaniSQL on 1 September 2010

Great write up! I usually drag and drop the folder columns under each table and delete the columns that I dont want returned by the query.

Posted by Glenn Berry on 1 September 2010

SELECT * is so evil that most static code analyzers flag it as an issue (for good reason). Using it is a sign of lazyness or ignorance.

Posted by Gethyn Ellis on 2 September 2010

Not really TSQL related. But I have seen in a few places, maintenance tasks being run to rebuild indexes from a Maint PLan, all indexes being rebuilt, the next task in the maintenance plane was to shrink all databases! These were on SQL instances where the shop didn't have a DBA looking after SQL Server full time but a Sys Admin wearing many hats.

Posted by david.soussan on 5 September 2010

Since most of my SELECTs would be selecting from a view I have absolutely no qualms about using SELECT * myself. Just make sure that the view only contains the columns and the rows required.

Posted by TheSQLGuru on 5 September 2010

Hey Steve - just grab a copy of my Common TSQL Mistakes presentation materials from one of the recent SQL Saturday Schedule pages (Nashville I think would work).  I got it from Plamen Ratchev quite some time ago, and it has a lot of good stuff and I have received VERY good feedback about it.  

Posted by vgrimes on 5 September 2010

I have a unique situation where we have a few (very few) queries where we absolutely do need every single column in a table.  It would be my habit to still select each individual column myself, but I had a jr dev ask me why bother when performance-wise it makes no difference, and select * is quicker to write?  I found that I didn't have a very good answer and mumbled something about being explicit and readability, but I fear I wasn't very convincing.  In such rare cases, I'd be curious what answer any of the experts would give in that case?

TIA

-VG

Posted by Steve Jones on 5 September 2010

I understand the idea that the view handles the row/column stuff for you, and perhaps if you're pulling that data back for a report or client app, it's not a big deal. I'd argue, however, that it's a bad habit to get into and it proliferates lazy coding.

Posted by Steve Jones on 5 September 2010

In terms of SELECT * being quicker to write, I agree, and performance is the same. However it's still lazy coding, and if that code were to be re-used somewhere, or the values changed in the result set, it could be an issue. It might be confusing to someone debugging your code as well since they need to dig through and go look at the table instead of seeing the list of columns.

There are so many ways to easily get the list of columns without typing, I'm not sure it's worth the bad habit of SELECT *

Posted by Michael Nelson on 5 September 2010

One real problem is that if you include SELECT * in views and then later change the schema of the underlying tables, you'll need to re-CREATE or ALTER the views so they are recompiled and recognize the new changes.

For example, if you have table a with columns (a, b, c)

Then CREATE VIEW vFoo AS SELECT * FROM a

SELECT * from vFoo will return columns a, b and c as expected.

But if you now make a schema change so table a has columns (a, b, d, c)...

SELECT * from vFoo will still return column headers a, b and c... but the data listed for column c will actually be the data in column d!

That caused us major headaches in a production system.

Posted by Charles Kincaid on 5 September 2010

What? Am I the first to mention INFORMATION_SCHEMA.Tables as a way to see the columns in a table.

Nobody yelled RTFD with D being documentation.  Oh wait, most people don't HAVE database documentation.  I forgot.  I gess it might a security thing to prevent folks from looking at VIEW definitions.

The 2008 intellesense is great and SQL Prompt if you are using older versions.  You shold give it a look even on 2008.

Then I have heard "But I have to use * to count rows."  Oh no!  SELECT COUNT(1) AS 'ToProcess' FROM [Order] WHERE Filled = False works just fine.

Posted by ben-732740 on 7 September 2010

SELECT *, is pure laziness these days. If you are serious about writing sql, you'll have a tool like SQL prompt, and or use intellisense with VS.

There are so many problems with it, I can't even start to list them. Also, I would say that 90% of the time, we should be using table aliases as well, so that if you add columns to a table, you don't have problems with joins.

You should try to be as explicit as possible when writing sql. I've seen so many bugs pop up when tables change, and then suddenly views stop working at some point, only because the server needed to recompile, and now additional columns change the semantics of a DML statement.

Posted by bridlon on 7 September 2010

Another option in addition to the 'Top 2' to get column names is:  Select * from TableName Where 1 = 2.

Posted by Nakul Vachhrajani on 7 September 2010

Good post, Steve. Another issue that I find most commonly is mismatching and/or incorrect data-types.

If a field is just going to store values from 1 to 10, use a tinyint and put a check constraint on it. Yet, I do find people using INT columns for such a purpose.

Or, assume a situation where a particular table had a flag introduced in the days of developer generation# 1. Now, in times of developer generation# 2, it is required that the flag is propagated to another table (bad, but let's assume it's okay for a moment). The developer generation# 2 might typically end up using an incorrect data type which would start causing all sorts of problems during data migration and transfers.

Not to mention, one needs to be very meticulous not only during design of the primary storage tables, but also in the temporary tables used. After all, tempdb is also a precious resource and must not be abused.

Posted by Steve Jones on 7 September 2010

Nice point on datatypes. I'll add that to my series.

Posted by Andrew Morton-340351 on 8 September 2010
Leave a Comment

Please register or log in to leave a comment.