Why is "Select *" bad in any SQL code?

  • SanDroid

    SSChampion

    Points: 10068

    I had to have a discussion with a developer where I work about having put "Select *" in a stored procedure. The developer asked me "What is wrong with using Select * anyway. It seems a lot easier to read and faster to write than all the detailed column names."

    I told the developer that I would make a forum post on the internet and see just how many different answers I could get. So here is you chance to tell an application development team lead and self proclaimed software architect what is wrong with using "Select *" in stored procedures and views to create result sets. 😎

    Please supply your reference in any answer you provide. For example a link to the internet or the name of the book that backs up your post.

  • Ninja's_RGR'us

    SSC Guru

    Points: 294069

    You mean aside from ddl change, performance, network overuse, column order change, performance, indexes being useless, not seing the list of columns when working on the code?

    so aside from bad design that breaks the application and abysmal performance, you want more than that?

    :hehe:

  • PB_BI

    SSCoach

    Points: 16990

    If you use it in SSIS, for example, it can break your package if a new column is added.

    Also, staying on the ETL thoughts, it's not always prudent to use SELECT * as you may not need all of the columns.


    I'm on LinkedIn

  • Michael Valentine Jones

    SSC Guru

    Points: 64818

    The biggest issue is that when you add or drop a column the query will return a different result set and the application will not be prepared for the change and will produce an error.

    Another issue is that most of the time the application does not need all of the columns so returning them is just a waste of server resources. Also, if you return columns that you don't need, then you may get a different, less efficient query plan.

  • SQLRNNR

    SSC Guru

    Points: 281205

    Run it down to execution plans and perf test for the developer. Ensure that developer truly needs every single column, generate tests and show the developer why it is bad.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • GSquared

    SSC Guru

    Points: 260824

    Same problem with anything implicit in code instead of explicit.

    The code calling the proc is then subject to negative impacts from table refactoring. If the sequence of columns is changed, or if columns are added, the code that consumes the results of that proc is likely to end up unexpectedly and unnecessarily broken.

    If the columns are listed explicitly, underlying tables can be refactored without impacting the output of the stored procedure. That keeps the API to the database intact, while still allowing necessary refactoring to accommodate new business needs, or performance improvements, et al. (Note: "can be" and "will be" aren't the same thing. A good DBA will leave the database APIs intact while refactoring tables, but a poor one will break the APIs even if he doesn't have to, just through neglect or ignorance.)

    The prior DBA that I took over for believed strongly in implicit coding. A lot of "INSERT INTO TableName SELECT * FROM Table2Name", with both the inserted columns list and the selected columns list implicit. Whenever anything would change in either source or destination, many/most/all automated processes on multiple servers would start throwing errors, and he would spend the next several days fixing them. This was a very regular occurance during his tenure. I prefer explicit code, with backwards API-compatibility, and changed all the processes to fully list both target and source columns. When a table is changed, I make sure it's done in such a way that the API is still solid. So, instead of spending several days per month fixing dozens of automated processes, I work on new projects, because the automated processes just keep running and producing correct results.

    So, yes, it saves a few seconds in writing the code, at the cost of potentially days of work in the future. I prefer to spend the necessary seconds now and have those days for something more interesting than fixing broken database APIs.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Gail Shaw

    SSC Guru

    Points: 1004446

    Let me give you an example and you tell me.

    SELECT * INTO #Temp

    FROM TableWith50ColumnsAnd5MillionRows

    -- update that uses 2 columns from #temp

    -- update that uses 3 columns from #temp

    -- select that uses 5 columns from #temp

    DROP TABLE #Temp

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • SanDroid

    SSChampion

    Points: 10068

    Ninja's_RGR'us (11/14/2011)


    You mean aside from ddl change, performance, network overuse, column order change, performance, indexes being useless, not seing the list of columns when working on the code?

    so aside from bad design that breaks the application and abysmal performance, you want more than that?

    :hehe:

    I agree with all those opinions, but unless I have public available reference material to go with your statement it is just an opinion and has less weight than the developer in question.

  • Sean Lange

    SSC Guru

    Points: 286482

    Seems the dev is really trying to find an excuse for being lazy and not typing the column names. Now I have to admit that typing column names for a query with 20 columns is totally irritating. Fortunately for us MS had the same gripe and added the ability to drag the "columns" folder from within SSMS to a query window and it will create a perfectly formatted comma separated list of every column in the table. It is then really easy to remove the columns you don't want/need. There are tons of other utilities out there for databases that can make this type of thing a lot easier.

    And if you table has large columns (varchar(max) or varbinary(max)) performance is in the toilet.

    So let's discuss the whole concept of ddl change. I have heard the argument in the past that if we add a new column we don't need to change the query to get the new column. This simply is not a valid argument. Unless you just want to display this new column in a grid you need code to handle the new data anyway.

    Now of course the grid...it should be populated using a stored proc and the grid itself should be able to handle new columns to the underlying dataset via dynamic column creation.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • GSquared

    SSC Guru

    Points: 260824

    SanDroid (11/14/2011)


    Ninja's_RGR'us (11/14/2011)


    You mean aside from ddl change, performance, network overuse, column order change, performance, indexes being useless, not seing the list of columns when working on the code?

    so aside from bad design that breaks the application and abysmal performance, you want more than that?

    :hehe:

    I agree with all those opinions, but unless I have public available reference material to go with your statement it is just an opinion and has less weight than the developer in question.

    We're not talking about natural law here, we're talking about something that's going to be based on annecdotal material. Any "public available reference material" on this is just going to be that author's opinion on it. If it's an author the developer considers more knowledgeable than himself, that might help, but it'll still just be an opinion.

    In my particular case (and many others), the opinion is based on extensive experience, and the weighing of known costs and benefits. But it's still "just an opinion".

    On the other hand, disliking severe pain is also "just an opinion", and one that's based on personal experience. Some like it. Same difference here.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Sean Lange

    SSC Guru

    Points: 286482

    SanDroid (11/14/2011)


    Ninja's_RGR'us (11/14/2011)


    You mean aside from ddl change, performance, network overuse, column order change, performance, indexes being useless, not seing the list of columns when working on the code?

    so aside from bad design that breaks the application and abysmal performance, you want more than that?

    :hehe:

    I agree with all those opinions, but unless I have public available reference material to go with your statement it is just an opinion and has less weight than the developer in question.

    google.com

    Type "why is select * bad" in the textbox.

    Click Search

    You will be rewarded with a couple billion links.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • SanDroid

    SSChampion

    Points: 10068

    GSquared (11/14/2011)


    SanDroid (11/14/2011)


    Ninja's_RGR'us (11/14/2011)


    You mean aside from ddl change, performance, network overuse, column order change, performance, indexes being useless, not seing the list of columns when working on the code?

    so aside from bad design that breaks the application and abysmal performance, you want more than that?

    :hehe:

    I agree with all those opinions, but unless I have public available reference material to go with your statement it is just an opinion and has less weight than the developer in question.

    We're not talking about natural law here, we're talking about something that's going to be based on annecdotal material. Any "public available reference material" on this is just going to be that author's opinion on it. If it's an author the developer considers more knowledgeable than himself, that might help, but it'll still just be an opinion.

    In my particular case (and many others), the opinion is based on extensive experience, and the weighing of known costs and benefits. But it's still "just an opinion".

    On the other hand, disliking severe pain is also "just an opinion", and one that's based on personal experience. Some like it. Same difference here.

    I can't believe that in 40+ years of RDBMS systems reference materials that none of them back up any of the post that have been made.

    That pain hurts is a medical fact and backed up by literal TONs of literature and reference material on the neverous system. It is WAY beyond opinion.

  • Cadavre

    SSC-Forever

    Points: 41582

    SanDroid (11/14/2011)


    Ninja's_RGR'us (11/14/2011)


    You mean aside from ddl change, performance, network overuse, column order change, performance, indexes being useless, not seing the list of columns when working on the code?

    so aside from bad design that breaks the application and abysmal performance, you want more than that?

    :hehe:

    I agree with all those opinions, but unless I have public available reference material to go with your statement it is just an opinion and has less weight than the developer in question.

    How's this?

    BEGIN TRAN

    --Standard TestEnvironment of 1,000,000 rows of random-ish data

    IF object_id('tempdb..#testEnvironment') IS NOT NULL

    BEGIN

    DROP TABLE #testEnvironment

    END

    --1,000,000 Random rows of data

    SELECT TOP 1000000 IDENTITY(INT,1,1) AS ID,

    RAND(CHECKSUM(NEWID())) * 30000 + CAST('1945' AS DATETIME) AS randomDate,

    ABS(CHECKSUM(NEWID())) AS randomBigInt,

    (ABS(CHECKSUM(NEWID())) % 100) + 1 AS randomSmallInt,

    RAND(CHECKSUM(NEWID())) * 100 AS randomSmallDec,

    RAND(CHECKSUM(NEWID())) AS randomTinyDec,

    RAND(CHECKSUM(NEWID())) * 100000 AS randomBigDec,

    CONVERT(VARCHAR(6),CONVERT(MONEY,RAND(CHECKSUM(NEWID())) * 100),0) AS randomMoney

    INTO #testEnvironment

    FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3

    PRINT '========== ID =========='

    SET STATISTICS TIME ON

    SELECT ID FROM #testEnvironment

    SET STATISTICS TIME OFF

    PRINT REPLICATE('=',80)

    PRINT '========== * =========='

    SET STATISTICS TIME ON

    SELECT * FROM #testEnvironment

    SET STATISTICS TIME OFF

    PRINT REPLICATE('=',80)

    PRINT '========== randomDate =========='

    SET STATISTICS TIME ON

    SELECT randomDate FROM #testEnvironment

    SET STATISTICS TIME OFF

    PRINT REPLICATE('=',80)

    GO 10

    ROLLBACK

    Beginning execution loop

    (1000000 row(s) affected)

    ========== ID ==========

    (1000000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 297 ms, elapsed time = 5230 ms.

    ================================================================================

    ========== * ==========

    (1000000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 733 ms, elapsed time = 12864 ms.

    ================================================================================

    ========== randomDate ==========

    (1000000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 359 ms, elapsed time = 6242 ms.

    ================================================================================

    (1000000 row(s) affected)

    ========== ID ==========

    (1000000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 234 ms, elapsed time = 5161 ms.

    ================================================================================

    ========== * ==========

    (1000000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 749 ms, elapsed time = 12237 ms.

    ================================================================================

    ========== randomDate ==========

    (1000000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 203 ms, elapsed time = 5780 ms.

    ================================================================================

    (1000000 row(s) affected)

    ========== ID ==========

    (1000000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 219 ms, elapsed time = 4948 ms.

    ================================================================================

    ========== * ==========

    (1000000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 764 ms, elapsed time = 11919 ms.

    ================================================================================

    ========== randomDate ==========

    (1000000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 203 ms, elapsed time = 5801 ms.

    ================================================================================

    (1000000 row(s) affected)

    ========== ID ==========

    (1000000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 608 ms, elapsed time = 7869 ms.

    ================================================================================

    ========== * ==========

    (1000000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 2028 ms, elapsed time = 17155 ms.

    ================================================================================

    ========== randomDate ==========

    (1000000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 421 ms, elapsed time = 6102 ms.

    ================================================================================

    (1000000 row(s) affected)

    ========== ID ==========

    (1000000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 343 ms, elapsed time = 4873 ms.

    ================================================================================

    ========== * ==========

    (1000000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 749 ms, elapsed time = 11704 ms.

    ================================================================================

    ========== randomDate ==========

    (1000000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 343 ms, elapsed time = 6524 ms.

    ================================================================================

    (1000000 row(s) affected)

    ========== ID ==========

    (1000000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 390 ms, elapsed time = 5321 ms.

    ================================================================================

    ========== * ==========

    (1000000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 702 ms, elapsed time = 12186 ms.

    ================================================================================

    ========== randomDate ==========

    (1000000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 312 ms, elapsed time = 5673 ms.

    ================================================================================

    (1000000 row(s) affected)

    ========== ID ==========

    (1000000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 343 ms, elapsed time = 4771 ms.

    ================================================================================

    ========== * ==========

    (1000000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 702 ms, elapsed time = 11705 ms.

    ================================================================================

    ========== randomDate ==========

    (1000000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 297 ms, elapsed time = 5962 ms.

    ================================================================================

    (1000000 row(s) affected)

    ========== ID ==========

    (1000000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 250 ms, elapsed time = 5523 ms.

    ================================================================================

    ========== * ==========

    (1000000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 811 ms, elapsed time = 12512 ms.

    ================================================================================

    ========== randomDate ==========

    (1000000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 171 ms, elapsed time = 5983 ms.

    ================================================================================

    (1000000 row(s) affected)

    ========== ID ==========

    (1000000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 234 ms, elapsed time = 5099 ms.

    ================================================================================

    ========== * ==========

    (1000000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 765 ms, elapsed time = 12919 ms.

    ================================================================================

    ========== randomDate ==========

    (1000000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 327 ms, elapsed time = 6139 ms.

    ================================================================================

    (1000000 row(s) affected)

    ========== ID ==========

    (1000000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 328 ms, elapsed time = 5241 ms.

    ================================================================================

    ========== * ==========

    (1000000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 811 ms, elapsed time = 12986 ms.

    ================================================================================

    ========== randomDate ==========

    (1000000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 374 ms, elapsed time = 5814 ms.

    ================================================================================

    Batch execution completed 10 times.

    --edit--

    Added a "go 10" to make the batch execute 10 time to show the timings are consistent.

    Bear in mind that this is by no means the biggest issue. Breaking your application because a new column appears in the result-set would be far worse.


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • SanDroid

    SSChampion

    Points: 10068

    Cadavre (11/14/2011)


    SanDroid (11/14/2011)


    Ninja's_RGR'us (11/14/2011)


    You mean aside from ddl change, performance, network overuse, column order change, performance, indexes being useless, not seing the list of columns when working on the code?

    so aside from bad design that breaks the application and abysmal performance, you want more than that?

    :hehe:

    I agree with all those opinions, but unless I have public available reference material to go with your statement it is just an opinion and has less weight than the developer in question.

    How's this?

    BEGIN TRAN

    --Standard TestEnvironment of 1,000,000 rows of random-ish data

    ================================================================================

    I love these examples and have plenty of my own.

    Unfortunately the Database tables and code in question store information that is exchanged by differant middleware components for a limited time. The data stored in them is cleared out when the exchange is complete. The most rows that will ever be in this developers tables at any time is 100.

  • SanDroid

    SSChampion

    Points: 10068

    Sean Lange (11/14/2011)


    SanDroid (11/14/2011)


    Ninja's_RGR'us (11/14/2011)


    You mean aside from ddl change, performance, network overuse, column order change, performance, indexes being useless, not seing the list of columns when working on the code?

    so aside from bad design that breaks the application and abysmal performance, you want more than that?

    :hehe:

    I agree with all those opinions, but unless I have public available reference material to go with your statement it is just an opinion and has less weight than the developer in question.

    google.com

    Type "why is select * bad" in the textbox.

    Click Search

    You will be rewarded with a couple billion links.

    Yes I get it. I just had asked in the original post/question to include a link or quote the book that backs up what you post.

    As for google, my favorite comes from the second link if you search this.

    Oct 10, 2009 – I'm too busy defending my "non-objective opinions" about why SELECT * is bad news outside of the EXISTS case.

    http://sqlblog.com/blogs/aaron_bertrand/archive/2009/10/10/bad-habits-to-kick-using-select-omitting-the-column-list.aspx

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply