The SQL * Wildcard

Question

The SQL * Wildcard

namtaru

Valued Member

Points: 50
More actions
December 26, 2007 at 11:28 am

#184975

So I was on a interview at Microsoft and they threw me up in front of a whiteboard. They described some tables they wanted me to illustrate and then Plain english asked me for data results, for which I wrote the queries to get the results they were after. It all went well and yay I got the job.
BUT
I wrote out one Query with a Select * to which the manager asked me, You know that is a inefficient way to call out the data from that table, a much better way is to list all the columns you wish to show, even if it is all of them. Do you know why?
My answer was " Well from a writing standpoint this way is faster, But i imagine by doing it with a * I am forcing a scan on the tables and not using indexes that could cover some of the Retrieval. He said I was partially right and that I should take this as homework. Well ive searched everywhere for the answer and either everyone is woefully as misinformed as me, or they all know it as common knowledge and I just dont. Anyone have a DEFINITAVE answer to this question?
Thanks a bunch in anticipation to your answers!

Viewing 15 posts - 1 through 15 (of 60 total)

You must be logged in to reply to this topic. Login to reply

jsheldon SSCertifiable Points: 5250 More actions · Answer 1

I would be curious to the answer to this as well. I do know as knowledge that using Select * on large databases you are going to take a hit on performance.

I tried a test and did it on a small table with seven columns (i disagree that you would list all columns on a table, I wouldn't use a select * but select only the columns I needed)

I did a select * and a select list all columns

This table has a clustered index so when I first ran select * it built the clustered index scan, when I did the select list columns, the results were in milliseconds but the cost from the execution plan was the same Clustered index scan.

I guess the answer is 'depends on what type of indexes are applied to the table/views'????

Jeff Moden SSC Guru Points: 1003853 More actions · Answer 2

The best answer would be "to protect the results". I've seen it happen before... DBA's can decide there's a problem with a table or they need to instantiate some security measures (hide a column, whatever). So, they rename the table to something else, and create a view with the old table name... in their efforts, they change the order of the columns. Either the code goes BOOM or, worse yet, they put in an extra column of the same datatype as, say, your column "2" and suddenly you're selecting the wrong data. They may even change the order using the "surrogate view" just because some hot-dog thinks it should be in a different order.

Being a bit of a "SysOp from hell", it might be fun to do that once a week or so just to see whose code breaks 😛

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

namtaru Valued Member Points: 50 More actions · Answer 3

I am wondering the more I think of it,that perhaps this is what he was trying to move me towards, Lets say:

Select *

From dbo.customers

Would that plan be cached? and be reusable? I am not sure it would be, where as if you selected all the columns. with a

Select c.fname,c.lname,c.contactnumber

From dbo.customers c

Would be cached, that sound right to you guys?

Jeff Moden SSC Guru Points: 1003853 More actions · Answer 4

Yep... maybe that too. Same thing goes with statistics, I suppose.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Steve Jones - SSC Editor SSC Guru Points: 734418 More actions · Answer 5

While I don't like SELECT * in code, I'm not sure it's less efficient. I'd like to know why. I always learned that it resolved the column list at compile time, which is why new columns don't show up in a view if you've used SELECT *.

I suppose that it might be slightly less efficient because of that, but that has to be noise compared to the query time. If this is between SELECT * and SELECT (all columns), I would like to know why it's less efficient.

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 6

namtaru (12/26/2007)
Select *
From dbo.customers
Would that plan be cached? and be reusable?

Yes and yes. It's a simple plan and there;'s no reason for it not to be cached and reused. You can check by looking in the plan cache

SELECT creation_time, last_execution_time, execution_count, dbid, objectid, text

FROM sys.dm_exec_query_stats qs

CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) st

The point with SELECT * is not that it's always less efficient, it's that it could result in more data been returned than necessary. It may be less efficient, depending what you're doing and what's in the table

Consider a table with 6 columns.

CREATE TABLE Test (

a int,

b varchar(10),

c datetime,

d datetime,

e varchar(500),

f smallint)

If you only want columns b and c and your use select *, you're returning, across a network, 4 columns that you don't need (between 14 and 514 bytes per row)

If there's an index on B include a and c, and you write a query

SELECT a,b,c from Test where B='Rob'. The index is covering for that query and the query can be satisfied with an index seek. No lookup to the cluster/heap is required.

Now consider SELECT * from Test where B='Rob'

An index seek can still be done, but now a lookup is required to the cluster/heap to retrieve columns d,e and f. More IOs, slower query.

Select * makes creating covering indexes near impossible to create (as all the columns in the table are required by the query)

Now, consider a modification is made to that table and column g is added, a varbinary(max) that holds an image averagng 700kb per row.

Now any query that does a SELECT * will start returning that image, even if it's not required. Since it's a LOB and stored out of row that means more IOs (lots more IOs), less efficient use of memory and a lot more data going over the network.

Worse, any query that has assumed that there are 6 columns in the table

( Like

insert into Table2 (Col1, col2, col3, Col4, Col5, Col6)

Select * From Test

)

will break when run.

Does that help?

jsheldon: The reason your second select ran faster than your first had nothing to do with the structure of the query. The first had to fetch the data from disk into cache, the second read from cache. If you're doing time tests like that, either ensure that the cache has the data before you start timing, or run DBCC DropCleanBuffers between runs to ensure an empty data cache

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Jeff Moden SSC Guru Points: 1003853 More actions · Answer 7

Understood... but the interviewer from Microsoft asked why it's better to list all of the columns than to use Select *...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 8

Other than protecting against future changes, I can't think of a good reason.

Possibly SELECT * requires an extra lookup to the system tables to get the column names. Kinda like the sp_ double lookup.

Would be interesting to check, but I'm not quite sure how. Maybe profiler and the object Access events

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Jeff Moden SSC Guru Points: 1003853 More actions · Answer 9

Now that would be a good reason... can't see it making much of a difference in non-looping batch code, but I can see where it could make a heck of a difference in a high hit volume GUI or RBAR code.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 10

Or in a very high volume OLTP, where lots and lots of small queries are run ver frequently.

Is just conjecture until tested though.

If it is the case, then is another thing against Select *. Unnecessary overhead should be avoided even if it won't make much of a difference

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Jeff Moden SSC Guru Points: 1003853 More actions · Answer 11

high hit volume GUI = very high volume OLTP 😀

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 12

Ah. I took it to be something else (website or other app where the DB defines and controls the Gui)

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Jeff Moden SSC Guru Points: 1003853 More actions · Answer 13

No problem... I absolutely agree with you... unnecessary overhead should always be avoided even if performance is ok for now... someday, the scale will change and if the code isn'tready for it, s---l---o---w will be sure to occur.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Mauve SSChampion Points: 11316 More actions · Answer 14

Always select just the columns that you need.

I've seen more code/applications break due to SELECT * vs. SELECT column1, column2, etc.

The tables in application systems are never permanant. Changes happen. Therefore, when a new column is added to the table, for whatever business reasons, code that is written with SELECT * is now retrieving a column that is never being used. If the column is a CLOB or BLOB ("max", TEXT, IMAGE) then there is a lot of wasted bandwidth.

Given a scenario where the database schema is a product (i.e., sold) application and not a one-off internal application code breaks in the following case:

The application requires a new field (column) in a table. When the new column is added (via an ALTER TABLE) the column goes at the end. This occurs during the schema upgrade process for existing customers that use the application. Yet the schema DDL that creates the table for a new customer may not specify the [new] column at the end. Now, when the application performs a SELECT *, the wrong data is in the wrong columns. This usally causes a crash due to mismatched data types.

I've seen the above happen in a production application.

The same problem occurs if referencing columns in an ADO result set by relative position and not by column name.

So our coding standards are to:

- Always specify the column list for every SELECT and INSERT statement

- Always use table aliases when referencing columns

- For ADO.NET, always reference columns by name.

[font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]