IO Statistics

  • IO Statistics

  • what would you like to know about IO statistics?

    ----------------------------------------------
    Try to learn something about everything and everything about something. - Thomas Henry Huxley

    :w00t:
    Posting Best Practices[/url]
    Numbers / Tally Tables[/url]

    SQL-4-Life
  • Yes, they are very handy, aren't they?

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • suhasini.m (3/27/2009)


    IO Statistics

    http://technet.microsoft.com/en-gb/library/cc966500.aspx

  • I have 2 queries

    SELECT * FROM Table1

    JOIN Table2

    ON Table1.Column1 = Table2.Column1

    SELECT * FROM Table1

    JOIN Table2

    ON Table1.Column1 = Table2.Column1

    AND Table1.Column2 = XXX

    Both these query returns same number of results say 7000. And Table1 is a partitioned table with Column2 is the partition key. When i compared the execution plan of these 2 queries, the relative cost of these 2 queries is 65:35 which shows 2nd query is around 40% better than 1st query in terms of performance. But my concern is with the IO statistics.

    The IO statistics of these two queries are as follows.

    1st Query

    ---------

    Table1 - Scan Count:0 Logical Reads:100000Physical Reads:200

    2nd Query

    ---------

    Table1 - Scan Count:7000 Logical Reads:80000Physical Reads:100

    There has been a decrease in the no of logical and physical reads. But the scan count has increased to the number of rows returned by the query. I have the following questions on this.

    1. Which is the better query in terms of IO?

    2. What does Scan Count exactly mean. And howz it related to IO.?

    3. As per the Scan count defenition its a table or index scan. If its a table or index scan howz it better in performance. (There are no scans happening on this table in the execution plan)?

    4. Is there a better way to compare the IO of queries?

    5. Is there a way to get the IO in terms of size (like MBs) other than the no of reads?

  • suhasini.m (3/27/2009)


    I have 2 queries

    SELECT * FROM Table1

    JOIN Table2

    ON Table1.Column1 = Table2.Column1

    SELECT * FROM Table1

    JOIN Table2

    ON Table1.Column1 = Table2.Column1

    AND Table1.Column2 = XXX

    Both these query returns same number of results say 7000. And Table1 is a partitioned table with Column2 is the partition key. When i compared the execution plan of these 2 queries, the relative cost of these 2 queries is 65:35 which shows 2nd query is around 40% better than 1st query in terms of performance. But my concern is with the IO statistics.

    The IO statistics of these two queries are as follows.

    1st Query

    ---------

    Table1 - Scan Count:0 Logical Reads:100000Physical Reads:200

    2nd Query

    ---------

    Table1 - Scan Count:7000 Logical Reads:80000Physical Reads:100

    There has been a decrease in the no of logical and physical reads. But the scan count has increased to the number of rows returned by the query. I have the following questions on this.

    1. Which is the better query in terms of IO?

    2. What does Scan Count exactly mean. And howz it related to IO.?

    3. As per the Scan count defenition its a table or index scan. If its a table or index scan howz it better in performance. (There are no scans happening on this table in the execution plan)?

    4. Is there a better way to compare the IO of queries?

    5. Is there a way to get the IO in terms of size (like MBs) other than the no of reads?

    1. The first query, by a long shot. Although it has a higher number of physical reads. But if both these queries are accessing the same data then it was probably in cache when you ran the second query.

    2. Scans are the number of times that the engine had to scan an index or a table. It tells you how the data is being accessed and can help explain performance issues. Scans are not inherently bad, but high numbers of scans should give you pause.

    3. There must be scans going on. Do you have user defined functions in the query? These can mask all kinds of performance problems including scans. Post the execution plans if you can.

    4. Generally, that's the best way.

    5. Not specific to the execution of a particular query, no. You can get these measures through a server-side trace. Otherwise you're looking at performance counters that measure what's happening within the server, probably not helpful for performance tuning a particular query.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thanks for the detailed reply.

    I am running both the quries at the same time and also doing DBCC DROPCLEANBUFFER and DBCC CLEARPROCCACHE before for both the queries.

    If this is a table or index scan howz the execution cost of 2nd query is less than the first one. Ultimately a table or index scan is costlier than an index seek on this table as it has some millions of rows. I have checked the query plan and there was no index or table scan on this table. I will try and post the plan.

    Is there a way to check the number of bytes of data that is being fetched by a sproc per execution if not a query?

  • suhasini.m (3/27/2009)


    Thanks for the detailed reply.

    I am running both the quries at the same time and also doing DBCC DROPCLEANBUFFER and DBCC CLEARPROCCACHE before for both the queries.

    If this is a table or index scan howz the execution cost of 2nd query is less than the first one. Ultimately a table or index scan is costlier than an index seek on this table as it has some millions of rows. I have checked the query plan and there was no index or table scan on this table. I will try and post the plan.

    You just have to remember that those are estimated costs, not actual costs. They're useful as a general measure, but not as a real determining factor. Check out the situation in this blog post[/url] to see why.

    Is there a way to check the number of bytes of data that is being fetched by a sproc per execution if not a query?

    Try checking out sys.dm_exec_query_stats. It's an aggregate, but it'll tell you what you want to know.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Hi,

    In order to decide which is the 'better' query, we would need to know what defines 'best' for you:

    1. Logical IO

    2. CPU

    3. Concurrency

    4. Response time

    5. Overall server throughput

    4. Query plan coolness

    5. ...and so on

    I would imagine that the second query is prettier as an execution plan. The equality predicate on the partitioning column should mean that the query optimizer (QO) just searches the one partition instead of the whole table.

    It may then need to scan (either a full partition scan, or a range scan) to satisfy the join condition. It sounds (from the scan count) as if a loop join is being used.

    One would think that the QO might prefer a HASH or MERGE join here - try forcing it with a hint and check out the effects.

    The first query does 20%-odd more logical IO, so in that narrow sense it is less efficient.

    It's difficult to go into heaps of detail without seeing the actual execution plan - so if you can attach those that'd be great.

    BTW a range scan is a very efficient way of grabbing a load of rows all at once (the scan range can be small or large) and will usually outperform the equivalent number of key/rid lookups.

    Cheers,

  • suhasini.m (3/27/2009)


    2. What does Scan Count exactly mean. And howz it related to IO.?

    The scan count is actually not that useful. It should indicate how many times an object is read, however it doesn't do that consistently. Best bet is to completely ignore that and look at the logical IOs

    see - https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=387322

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Grant Fritchey (3/27/2009)


    1. The first query, by a long shot. Although it has a higher number of physical reads. But if both these queries are accessing the same data then it was probably in cache when you ran the second query.

    Why do you say that? It has higher logical IOs (100 000 vs 80 000) as well as the higher physical

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hi!

    I've just looked at the connect article and MS seem to have closed it as 'by design'.

    My understanding is that a 'seek' always selects one row (either by key or rid).

    A scan may be a scan of any part of an index, which involves more than a 1-row seek.

    In the example given, a composite primary key on two columns is searched using a value for only the first column.

    There is nothing to say that this will always select one row (unless there is another unique constraint or index on the first column alone, I guess). Since the engine is searching a range in the index (all keys with the value given in the first position) it must be reported as a range scan rather than as a seek, even if it only returns on row.

    This makes a kind of weird M$-sense to me.

    Unless I am missing something there?

    Thanks,

    Paul

    GilaMonster (4/2/2009)


    suhasini.m (3/27/2009)


    2. What does Scan Count exactly mean. And howz it related to IO.?

    The scan count is actually not that useful. It should indicate how many times an object is read, however it doesn't do that consistently. Best bet is to completely ignore that and look at the logical IOs

    see - https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=387322

  • Paul White (4/2/2009)


    My understanding is that a 'seek' always selects one row (either by key or rid).

    A scan may be a scan of any part of an index, which involves more than a 1-row seek.

    Not at all. Seeks can return any number of rows, up to the total number of rows in the table.

    It's a seek if SQL used the b-tree to find the row or the start or end of the range. It's a scan if the entire table/index is read with no search.

    See - http://sqlinthewild.co.za/index.php/2009/03/05/when-is-a-seek-actually-a-scan/

    Unless I am missing something there?

    Basically what it means is that the scan count can't be trusted as not all operators set it in the same way.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Unless I am missing something there?

    Basically what it means is that the scan count can't be trusted as not all operators set it in the same way.[/quote]

    I'm inclined to agree - though I'd love to try a repro script for it!

    GilaMonster (4/2/2009)


    Not at all. Seeks can return any number of rows, up to the total number of rows in the table.

    It's a seek if SQL used the b-tree to find the row or the start or end of the range. It's a scan if the entire table/index is read with no search.

    See - http://sqlinthewild.co.za/index.php/2009/03/05/when-is-a-seek-actually-a-scan/

    Not so sure. The article you wrote simply shows that a range seek/scan can cover all the rows in a table. Well, yeah 😉

    The 'Index Seek' operator in that example has a tool-tip description of "*scan* a particular range of rows from a nonclustered index". So it's a seek logical operator doing a physical scan of a portion of the index? Even a full scan of an index is a seek operation (ordered or unordered) down the b-tree. To complicate things further, a single-row seek done as part of a key lookup operation (if the QO estimates that it will access many rows eventually) may implement ordered or unordered prefetch, which is a scan, isn't it?

    I guess my point is that it is a complex area, and probably just a question of semantics - are we talking logical or physical operations, for example? What is the real different between a load of consecutive single-row seeks and an ordered scan?

    I love SQL Server 😀

    -- edited cos I can't quote properly...

  • Paul White (4/2/2009)


    Not so sure. The article you wrote simply shows that a range seek/scan can cover all the rows in a table. Well, yeah 😉

    I was challenging your statement that a seek returns 1 row. It does not.

    In the terms used by the query optimiser and query processor, a seek returns 0 or more rows and does a search down the b-tree to find the start or end of the range. A scan reads all the rows in the table and may or may not traverse the b-tree. Even if it does, it doesn't search, but just uses the b-tree to find the leaf pages

    The 'Index Seek' operator in that example has a tool-tip description of "*scan* a particular range of rows from a nonclustered index". So it's a seek logical operator doing a physical scan of a portion of the index?

    English language semantics here. Perhaps 'Reads a particular range of rows' would be more correct.

    Even a full scan of an index is a seek operation (ordered or unordered) down the b-tree.

    Not necessarily. There are other ways to get at the leaf pages than via the b-tree. Especially in an unordered scan of a cluster.

    To complicate things further, a single-row seek done as part of a key lookup operation (if the QO estimates that it will access many rows eventually) may implement ordered or unordered prefetch, which is a scan, isn't it?

    We're crossing portions of the engine now.

    Seek/scan is a QO/QP operation. Prefetch is done and controlled by the storage engine. Prefetch is what affects the readahead reads in the IO stats

    What is the real different between a load of consecutive single-row seeks and an ordered scan?

    A lot. Each of the single-row seeks would traverse the b-tree. The scan would do it at most once, and may not need to access the b-tree at all if it's doing an unordered scan, as that just uses the IAM pages.

    So, if you were to compare 500 single row seeks with a scan that reads 500 pages, the number of IOs would be dramatically different. If we say those 500 rows are on 100 data pages (at 5 rows per page) and the b-tree is 2 levels deep (just the root and the leaf) then 500 single row seeks would read a total of 1000 pages. The scan would read 100 or 101 pages.

    That, by the way, is why bookmark/key lookups are so expensive. They are single row seeks.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply