Amount of data

  • I was wondering if there is a way to "trap" or track the amount of data (in bytes) that a select statement pulls when querying a table or tables. I'm trying to show some co-workers that using "select *" instead of "select <column_name>" is better. The only way I feel I can accomplish this is to show them the difference in the amount of data that they are pulling.

    Thanks in advance

  • You can look at the execution plan, you're looking for the last arrow, the one pointing at the select operator. It is an estimate, not an actual, assuming stats are accurate it should be a decent estimate if the query is simple.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • You could take a look at the sys.dm_exec_query_stats information. You can see the total_logical_reads for all executions of the query. Compare the output of the two, possibly using the number of executions, also there, to get an average.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • sqldummy79 (11/25/2012)


    I was wondering if there is a way to "trap" or track the amount of data (in bytes) that a select statement pulls when querying a table or tables. I'm trying to show some co-workers that using "select *" instead of "select <column_name>" is better. The only way I feel I can accomplish this is to show them the difference in the amount of data that they are pulling.

    Thanks in advance

    Wait - whaaa? How is using "select *" better than using "select <column_name>"?

    Selecting every column (i.e., "SELECT *") is going to result in, at a minimum, the retrieval of rows from the leaf level of the clustered index or from the heap (or from the leaf level of a non-clustered index that includes all columns of the table). If the optimizer uses a non-clustered index that is not a covering index in evaluating the WHERE clause or join conditions, there will be a index scan or seek followed by a key lookup, which means that SQL Server must read even more pages to get to the final result set.

    On the other hand, "SELECT <column_name, n>" may require SQL Server to read less data than "SELECT *" when a covering index exists that includes fewer than all the columns of the table and the optimizer can use it to satisfy the query. SQL Server will read the data from the leaf level of the covering index, where the rows will be narrower than the clustered index/heap rows. More rows may fit on each page, so SQL Server may be able to satisfy the query by reading fewer pages.

    While there may be some use cases that demonstrate the opposite, I feel fairly confident that "SELECT <column_name, . . . n>" queries will generally require fewer page reads than "SELECT *" queries when well-constructed covering indexes are available. In the absence of covering indexes, SELECT <column_name, . . . n>" queries generally should not require SQL Server to read more data than "SELECT *" queries.

    If the OP's question pertains to the amount of data returned by SQL Server to the requesting application or client (i.e., the number of bytes SQL Server sends through the network pipeline), then surely "SELECT <column_name, . . . n" will return fewer bytes than "SELECT *" where n < the total number of columns in the table.

    I hope that the OP simply mistyped his question and meant to say that "SELECT <column_name>" will be better than "SELECT *". There are many more, and more important, reasons that this is true. If I'm missing something here, please show me where I've gone astray. 🙂

    Jason Wolfkill

  • wolfkillj (11/26/2012)


    I hope that the OP simply mistyped his question and meant to say that "SELECT <column_name>" will be better than "SELECT *". There are many more, and more important, reasons that this is true. If I'm missing something here, please show me where I've gone astray. 🙂

    You haven't missed anything. SELECT * is never better, both mechanically and for long-term support for eventual schema changes.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • sqldummy79 (11/25/2012)


    I was wondering if there is a way to "trap" or track the amount of data (in bytes) that a select statement pulls when querying a table or tables. I'm trying to show some co-workers that using "select *" instead of "select <column_name>" is better. The only way I feel I can accomplish this is to show them the difference in the amount of data that they are pulling.

    Thanks in advance

    SET STATISTICS IO ON is likely what you're looking for.

    I believe that your logic is flawed though. Your statement is true only if the idexes on the table are not used even on a small table such as the following...

    USE AdventureWorks;

    SET STATISTICS IO ON

    --===== All columns from an indexed table (9 reads)

    SELECT * FROM HumanResources.Employee;

    --===== All columns from an identical but non-indexed table (8 reads)

    SELECT * FROM #NoIndexes;

    --===== 2 columns from an indexed table (5 reads)

    SELECT EmployeeID, LoginID

    FROM HumanResources.Employee;

    --===== 2 columns from an identical but non-indexed table (8 reads)

    SELECT EmployeeID, LoginID

    FROM #NoIndexes;

    ---------------------------------------------------------------------------

    --===== All columns from an indexed table (2 reads)

    SELECT * FROM HumanResources.Employee WHERE EmployeeID = 137;

    --===== All columns from an identical but non-indexed table (8 reads)

    SELECT * FROM #NoIndexes WHERE EmployeeID = 137;

    --===== 2 columns from an indexed table (2 reads)

    SELECT EmployeeID, LoginID

    FROM HumanResources.Employee

    WHERE EmployeeID = 137;

    --===== 2 columns from an identical but non-indexed table (8 reads)

    SELECT EmployeeID, LoginID

    FROM #NoIndexes

    WHERE EmployeeID = 137;

    If you compare what looks like a savings in the number of reads (for the single row criteria examples) with the last arrow to the left of the "Actual Execution Plan", as Gail suggested, you will see that even the single row criteria returned less (estimated, in this case) data and less data is better for the "Pipe".

    Unless you're thinking of something else, the code above (along with the "Actual Execution Plan") proves that including only the columns you need is better for the "Pipe" whether you have an index or not and whether you have criteria or not.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply