Need Ammunition for Developers Doing Select *'s

  • I just started working at a software company that has never had a DBA. The database that evolved over time has 1157 Tables with over 28000 columns. Some table are small but many have 200+ columns.

    The Problem: All of the SQL is built dynamically in the application and the SQL in Most cases looks like 'SELECT t1.*, t2.* from ........'. Even when the application only uses a couple of the columns returned, the SQL does a SELECT *.

    I have spoken to the lead developer and told him that it is a very bad practice and his response is always, 'Prove It'. So, I am looking for good ways to Prove that doing a SELECT * from a 200 column table when you only need 2 of the columns is a Very bad practice. I want to show things like performance, resource utilization, caching issues, and anything else that you bright professionals could turn me on to. Descriptions of how to do it are good, scripts that show it are even better.

    Thanks in advance!

    Mark.

  • Mark

    It's not just performance - if columns are added to or removed from a table then your application may break.

    But with regard to performance itself, get a copy of your database on to a test server, and compare the execution plans of SELECT * versus SELECT <column list>. Then SET STATISTICS IO ON and SET STATISTICS TIME ON and run the two queries, making sure you clear the data cache in between. Finally, compare the result sets. If they're sent across the network, that could mean many more megabytes trying to squeeze through your already overused pipes than is actually necessary.

    John

  • For a point lookup (one row) against a clustered index, there really isn't a major performance hit.

    But, some of the places where there are issues. First, pages in memory. By loading everything from a table, and then only consuming part of it, you're loading all that data into memory and then not using it. Covering indexes, will never work, so you'll always have a key lookup operation for nonclustered indexes. Minimum, that's three extra reads for each row. It seriously limits what you can do to help performance in places where there is poor performance.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Ask him why he's ever want to try to parse a 200+ column X however many rows array to get the data he wants instead of just parsing a 5 X however many rows array.

    Oh and then change the column order in a table and ask him to try to trouble shoot the problem when he's doing a SELECT *.

  • Hahaha. That would truly be funny.

    His explanation of why he issues SELECT *'s instead of SELECT <column names> is that he is lazy and does not like to type nor does he want to spend the time up front and determine which columns he is actually going to need. Doing it his way he says means that I will always have the columns I need.

    It totally drives me insane.

  • Maybe a real life scenario can help. Someone posted a problem that happens when using SELECT *.

    http://www.sqlservercentral.com/Forums/Topic1685042-392-1.aspx

    I'm sure that it's faster to code using * instead of the column names. The same goes for naming variables as a,b,c. It's a bad practice that will hit back at the worst moment. There are tools that can help to get column names faster, some are even included in SQL Server or SSMS (sp_help, Alt+F1, drag columns folder from object explorer, etc).

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • You can apply what I'm going to show you to your data. If you want to prove that what you are recommending is faster then visual proof is often the best way to do that.

    Let's say you had some data that looks like this (I'm including an example of how to show how to generate sample data):

    -- a db we both have

    -- Note, not the best data model - was using for a different example...

    USE tempdb

    GO

    SET NOCOUNT ON;

    /********************************************************************

    SAMPLE DATA

    ********************************************************************/

    -- not the greatest design, just putting something together real quick

    IF OBJECT_ID('tempdb.dbo.sales') IS NOT NULL DROP TABLE dbo.sales;

    IF OBJECT_ID('tempdb.dbo.offices') IS NOT NULL DROP TABLE dbo.offices;

    GO

    CREATE TABLE dbo.offices

    (

    OfficeID int primary key,

    Country varchar(100) NOT NULL,

    OfficeName varchar(100) NOT NULL

    );

    CREATE TABLE dbo.sales

    (

    SaleID int NOT NULL,

    OfficeID int NOT NULL,

    CustomerID int NOT NULL,

    SaleDate date NOT NULL,

    SaleTotal money NOT NULL

    );

    GO

    INSERT dbo.offices

    VALUES(1,'US', 'Midwest'),(2,'UK', 'London'),(3,'NZ', 'Aukland'),(4,'US', 'SW');

    WITH iTally(N) AS

    (

    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns a, sys.all_columns b

    )

    INSERT dbo.sales

    SELECT TOP(200000)

    N, ABS(CHECKSUM(NewId()))%4+1, ABS(CHECKSUM(NewId()))%1000+1000,

    DATEADD(day,(ABS(CHECKSUM(NewId()))%1000)*-1,getdate()),

    (ABS(CHECKSUM(NewId()))%100000)*.01

    FROM iTally;

    ALTER TABLE dbo.sales

    ADD CONSTRAINT pk_sales PRIMARY KEY(SaleID);

    ALTER TABLE dbo.sales

    ADD CONSTRAINT fk_offices FOREIGN KEY(OfficeID) REFERENCES dbo.offices(OfficeID);

    CREATE INDEX nc_sales ON dbo.sales(OfficeID, SaleDate) INCLUDE (SaleTotal);

    CREATE INDEX nc_office ON dbo.offices(OfficeName, OfficeID);

    GO

    You can copy/paste this onto a server you are working with, play around with it and then apply it to your own data. The query takes about 10 sec to run on my laptop.

    Now let's say you needed the OfficeName, SaleDate and SaleTotal for 2014 through today. Here are two examples... The 1st that returns all columns and the second one that only returns what you need:

    SELECT o.*, s.*

    FROM dbo.sales s

    JOIN dbo.offices o ON s.OfficeID = o.OfficeID

    WHERE SaleDate >= '1/1/2014';

    SELECT OfficeName, SaleDate, SaleTotal

    FROM dbo.sales s

    JOIN dbo.offices o ON s.OfficeID = o.OfficeID

    WHERE SaleDate >= '1/1/2014';

    First thing you can do is run this query in SSMS with "Include actual query plan" turned on...

    Exhibit A:

    At a quick glance you can see that the SELECT * query is twice as slow. Nonclustered Indexes are usually better than clustered ones. Index Seeks are better than index scans. You can see that the SELECT * query scans but clustered indexes whereas the other one scans a non-clustered index and performs a seek against the other no-clustered index. When you hover over the SELECT statement in each one you can see that the SELECT * query has double the subtree cost - that basically means twice the work.

    Exhibit B:

    Run the query (as previously mentioned) with STATISTICS IO and STATISTICS TIME ON (and "include query plan OFF) like so:

    SET STATISTICS IO ON;

    SET STATISTICS TIME ON;

    PRINT 'WITHOUT "SELECT *" :'

    SELECT OfficeName, SaleDate, SaleTotal

    FROM dbo.sales s

    JOIN dbo.offices o ON s.OfficeID = o.OfficeID

    WHERE SaleDate >= '1/1/2014';

    PRINT CHAR(10)+'WITH "SELECT *"'

    SELECT o.*, s.*

    FROM dbo.sales s

    JOIN dbo.offices o ON s.OfficeID = o.OfficeID

    WHERE SaleDate >= '1/1/2014';

    SET STATISTICS TIME OFF;

    SET STATISTICS IO OFF;

    Results:

    WITHOUT "SELECT *" :

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    Table 'sales'. Scan count 4, logical reads 329, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'offices'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 219 ms, elapsed time = 5474 ms.

    WITH "SELECT *"

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'sales'. Scan count 1, logical reads 796, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'offices'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 718 ms, elapsed time = 7862 ms.

    Ignore the elapsed time, the CPU time is what matters in this example. As you can see - the CPU had to work 3 times as hard to produce the results. SELECT * produced like double the reads (e.g. more IO).

    This is how I sell my suggested changes and it usually works. Play with what I posted and apply it to your data.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Careful now... I wouldn't ignore the elapsed time. Frequently, it's a strong indication of disk activity even if stats I/O isn't turned on to see. You just have to make sure that the elapsed time isn't due mostly to the time it takes to display the return in SSMS.

    Sometimes it's an indication that you're using the wrong tool to measure with.

    http://www.sqlservercentral.com/articles/T-SQL/91724/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (5/13/2015)


    Careful now... I wouldn't ignore the elapsed time. Frequently, it's a strong indication of disk activity even if stats I/O isn't turned on to see. You just have to make sure that the elapsed time isn't due mostly to the time it takes to display the return in SSMS.

    Sometimes it's an indication that you're using the wrong tool to measure with.

    http://www.sqlservercentral.com/articles/T-SQL/91724/

    I've actually stopped using STATS I/O and STATS TIME entirely. Too much observer effect[/url]. I keep extended events running most of the time now so I can pull this stuff out easily without it being skewed. But, I agree. I always look at both I/O and time. Neither measure is complete alone.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning


  • My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • I Really appreciate everyone who contributed information. I must also apologize for neglecting to let everyone know the other issue that brings me such joy in this new position: Almost all of the queries that contain the SELECT * from one or more tables is contained inside of a Cursor. In fact, every screen within the system operates using Cursor Processing. An example of this is the Shipment Movement screen. This screen has a single record which contains the header information about a given shipment and 1 - N detail rows containing delivery stop information. The details are displayed in a grid which allows any item to be updated and stops to be added and/or deleted. Once the user makes any needed changes and submits the delivery, a cursor processes all of the detail rows.

    When I asked why the software was written using such antiquated patterns, They just said 'Thgat is the way we have always done it and we see no reason to change.

    Aint' Life Grand!

  • Vyper426 (5/14/2015)


    I Really appreciate everyone who contributed information. I must also apologize for neglecting to let everyone know the other issue that brings me such joy in this new position: Almost all of the queries that contain the SELECT * from one or more tables is contained inside of a Cursor. In fact, every screen within the system operates using Cursor Processing. An example of this is the Shipment Movement screen. This screen has a single record which contains the header information about a given shipment and 1 - N detail rows containing delivery stop information. The details are displayed in a grid which allows any item to be updated and stops to be added and/or deleted. Once the user makes any needed changes and submits the delivery, a cursor processes all of the detail rows.

    When I asked why the software was written using such antiquated patterns, They just said 'Thgat is the way we have always done it and we see no reason to change.

    Aint' Life Grand!

    That's extremely problematic.

    However, if they're not experiencing pain, I guess it's ok. However, if they are experiencing pain, the worst possible answer on earth is "That's the way we have always done it." Actually, any time you ask "Why" and someone gives you that answer, it's a bad answer.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey (5/15/2015)


    Vyper426 (5/14/2015)


    I Really appreciate everyone who contributed information. I must also apologize for neglecting to let everyone know the other issue that brings me such joy in this new position: Almost all of the queries that contain the SELECT * from one or more tables is contained inside of a Cursor. In fact, every screen within the system operates using Cursor Processing. An example of this is the Shipment Movement screen. This screen has a single record which contains the header information about a given shipment and 1 - N detail rows containing delivery stop information. The details are displayed in a grid which allows any item to be updated and stops to be added and/or deleted. Once the user makes any needed changes and submits the delivery, a cursor processes all of the detail rows.

    When I asked why the software was written using such antiquated patterns, They just said 'Thgat is the way we have always done it and we see no reason to change.

    Aint' Life Grand!

    That's extremely problematic.

    However, if they're not experiencing pain, I guess it's ok. However, if they are experiencing pain, the worst possible answer on earth is "That's the way we have always done it." Actually, any time you ask "Why" and someone gives you that answer, it's a bad answer.

    Or perhaps it's a bad question. A different way of asking the same thing:

    "Would you like to switch to using SQL Server coding best practices and

    a) Improve performance noticeably?

    b) Reduce network bandwidth requirements significantly?

    c) Reduce CPU utilisation considerably?

    d) Improve developer morale and end-user satisfaction dramatically?"

    I could go on, but I'm running out of adverbs.

    Then again, I suppose the answer could be: "No, we like it this way because that's the way it's always been." :w00t:

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • A SELECT * in a cursor...So that means if a column is added to a table the system uses, the interface breaks at every point where it reads from that table immediately on doing the FETCH. The cursor fetches into variables, which won't match the column list returned by *, so it breaks.

    Well, I guess that's one round of ammunition. It sounds like you have your work cut out for you and I don't envy you in the cultural battle you have to engage in.

    Let me guess:

    1. Is the application is connecting with the 'sa' account too?

    2. Did they solve their slow query problem by using NOLOCK on every table to speed things up?

    3. They don't have any foreign keys defined because they just get in the way of data being inserted?

    4. They have no indexes because they got too fragmented years ago, so the natural answer was to drop them. Besides, the SELECT * wasn't using them for some strange reason.

    5. There's no reason to worry about SQL injection because they've never seen it happen

    I honestly hope I'm wrong on all these guesses.

  • There's absolutely no sense in arguing with people that obviously know it all. The "developer" that retorted "Prove it" is absolutely spot on whether I agree with his use of SELECT * or not. Either prove it with a code comparison or sit down. Diametrically opposed "Best Practices" can be bantered about between warring parties all bloody day and nothing is resolved except you both now know how far the vein on each other's forehead sticks out and what the maximum intensity of their dermal flush is.

    The thing is, you're more knowledgeable about the thing you're speaking of than your opponent. You should be able to easily prove such problems even with the simplicity of right clicking on the instance name in the Object Explorer, selecting {reports}, and then following your nose into the performance related reports.

    Arguing about "SELECT *" is a total waste of time on both your parts. You need evidence and then you need an easily demonstrable comparison of performance between your opponent's code and the your code. And both MUST be IN SITU or your well-convinced opponent will either call foul or chide you for changing the environment in your favor.

    In this case, it does NOT depend. PROVE IT! 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 60 total)

You must be logged in to reply to this topic. Login to reply