Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12

IO Statistics Expand / Collapse
Author
Message
Posted Thursday, April 2, 2009 2:26 AM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 3:52 PM
Points: 42,849, Visits: 35,978
Grant Fritchey (3/27/2009)
1. The first query, by a long shot. Although it has a higher number of physical reads. But if both these queries are accessing the same data then it was probably in cache when you ran the second query.


Why do you say that? It has higher logical IOs (100 000 vs 80 000) as well as the higher physical



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #688609
Posted Thursday, April 2, 2009 2:27 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Saturday, August 30, 2014 4:20 PM
Points: 11,194, Visits: 11,142
Hi!

I've just looked at the connect article and MS seem to have closed it as 'by design'.
My understanding is that a 'seek' always selects one row (either by key or rid).
A scan may be a scan of any part of an index, which involves more than a 1-row seek.
In the example given, a composite primary key on two columns is searched using a value for only the first column.
There is nothing to say that this will always select one row (unless there is another unique constraint or index on the first column alone, I guess). Since the engine is searching a range in the index (all keys with the value given in the first position) it must be reported as a range scan rather than as a seek, even if it only returns on row.
This makes a kind of weird M$-sense to me.
Unless I am missing something there?

Thanks,
Paul
GilaMonster (4/2/2009)
suhasini.m (3/27/2009)

2. What does Scan Count exactly mean. And howz it related to IO.?


The scan count is actually not that useful. It should indicate how many times an object is read, however it doesn't do that consistently. Best bet is to completely ignore that and look at the logical IOs

see - https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=387322




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #688610
Posted Thursday, April 2, 2009 2:39 AM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 3:52 PM
Points: 42,849, Visits: 35,978
Paul White (4/2/2009)

My understanding is that a 'seek' always selects one row (either by key or rid).
A scan may be a scan of any part of an index, which involves more than a 1-row seek.


Not at all. Seeks can return any number of rows, up to the total number of rows in the table.
It's a seek if SQL used the b-tree to find the row or the start or end of the range. It's a scan if the entire table/index is read with no search.
See - http://sqlinthewild.co.za/index.php/2009/03/05/when-is-a-seek-actually-a-scan/

Unless I am missing something there?

Basically what it means is that the scan count can't be trusted as not all operators set it in the same way.



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #688622
Posted Thursday, April 2, 2009 2:59 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Saturday, August 30, 2014 4:20 PM
Points: 11,194, Visits: 11,142
Unless I am missing something there?

Basically what it means is that the scan count can't be trusted as not all operators set it in the same way.[/quote]

I'm inclined to agree - though I'd love to try a repro script for it!


GilaMonster (4/2/2009)
Not at all. Seeks can return any number of rows, up to the total number of rows in the table.
It's a seek if SQL used the b-tree to find the row or the start or end of the range. It's a scan if the entire table/index is read with no search.
See - http://sqlinthewild.co.za/index.php/2009/03/05/when-is-a-seek-actually-a-scan/


Not so sure. The article you wrote simply shows that a range seek/scan can cover all the rows in a table. Well, yeah ;)

The 'Index Seek' operator in that example has a tool-tip description of "*scan* a particular range of rows from a nonclustered index". So it's a seek logical operator doing a physical scan of a portion of the index? Even a full scan of an index is a seek operation (ordered or unordered) down the b-tree. To complicate things further, a single-row seek done as part of a key lookup operation (if the QO estimates that it will access many rows eventually) may implement ordered or unordered prefetch, which is a scan, isn't it?

I guess my point is that it is a complex area, and probably just a question of semantics - are we talking logical or physical operations, for example? What is the real different between a load of consecutive single-row seeks and an ordered scan?

I love SQL Server

-- edited cos I can't quote properly...




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #688634
Posted Thursday, April 2, 2009 3:49 AM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 3:52 PM
Points: 42,849, Visits: 35,978
Paul White (4/2/2009)
[quote]Not so sure. The article you wrote simply shows that a range seek/scan can cover all the rows in a table. Well, yeah ;)

I was challenging your statement that a seek returns 1 row. It does not.

In the terms used by the query optimiser and query processor, a seek returns 0 or more rows and does a search down the b-tree to find the start or end of the range. A scan reads all the rows in the table and may or may not traverse the b-tree. Even if it does, it doesn't search, but just uses the b-tree to find the leaf pages

The 'Index Seek' operator in that example has a tool-tip description of "*scan* a particular range of rows from a nonclustered index". So it's a seek logical operator doing a physical scan of a portion of the index?

English language semantics here. Perhaps 'Reads a particular range of rows' would be more correct.

Even a full scan of an index is a seek operation (ordered or unordered) down the b-tree.

Not necessarily. There are other ways to get at the leaf pages than via the b-tree. Especially in an unordered scan of a cluster.

To complicate things further, a single-row seek done as part of a key lookup operation (if the QO estimates that it will access many rows eventually) may implement ordered or unordered prefetch, which is a scan, isn't it?

We're crossing portions of the engine now.
Seek/scan is a QO/QP operation. Prefetch is done and controlled by the storage engine. Prefetch is what affects the readahead reads in the IO stats

What is the real different between a load of consecutive single-row seeks and an ordered scan?

A lot. Each of the single-row seeks would traverse the b-tree. The scan would do it at most once, and may not need to access the b-tree at all if it's doing an unordered scan, as that just uses the IAM pages.
So, if you were to compare 500 single row seeks with a scan that reads 500 pages, the number of IOs would be dramatically different. If we say those 500 rows are on 100 data pages (at 5 rows per page) and the b-tree is 2 levels deep (just the root and the leaf) then 500 single row seeks would read a total of 1000 pages. The scan would read 100 or 101 pages.
That, by the way, is why bookmark/key lookups are so expensive. They are single row seeks.



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #688670
Posted Thursday, April 2, 2009 5:31 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Saturday, August 30, 2014 4:20 PM
Points: 11,194, Visits: 11,142
Hi again Gail,

Sure. Everything you say is true. The only thing I would say (in a small voice) is that my previous comments were 'off the top of my head' - you clearly spent more time on that last one! I do know really that a seek can return 0 or many rows - I was thinking of a key (+ unique-ifier) or RID lookup when I wrote that. The brain was in a different place from the fingers

The points I was failing to make clearly are:

1. What we understand and see in plans as scans and seeks won't necessarily tally exactly with stats IO ideas of scans and seeks.
2. The range seek in your article is not all that different from a full scan really.

Point 1

If you look at the XML show plan for example, you'll see all sorts of index scan, seek, and range references, nested within the same operation. Some 'seeks' are EQ operations, others are GE and LE ranges, and so on.

Just because that is the logical plan we can see, doesn't mean that the deep dark internals of the engine aren't free to do it slightly differently in practice, or to record different things as 'seeks' or 'scans' if they choose to.

You mentioned semantics in your last post - I think there is a lot to that - and it is also true that 'scan' in particular seems to have different meanings in different places (e.g the tooltip).

I also think that the terms in question might mean different things in different places. I'm happier with that answer than thinking that the reported scan count is somehow 'random' or just 'wrong'.

Point 2

So a full scan runs through the whole thing, and a range scan seeks to find a start and an end. The middle bit is quite similar though, eh? And sure, sometimes we'll get the thing traversing a linked list, and sometimes it'll scan from the IAM. That complexity doesn't change the fact that the operations are extremely similar aside from the initial seek: It's not as if a range seek starts from the very top of index and seeks down the b-tree to the leaf for every row. For a range-start, range-end operation I mean. Multiple prefix EQ seeks (e.g IN query), or key/RID lookups from a loop join, sure they do traverse the structure that way.

Finally, yes prefetch is read-ahead and in a separate area. But in principle, those actions are physically a scan, even if the pages fetched that way are then seeked on a row basis, or range-scanned - it doesn't matter that it's in a separate area, pages were scanned to satisfy the query. I'm just sayin'.

I'm not trying to be difficult at all, just explaining that I'm talking in a looser, wider sense. Hey I hope that makes some sort of sense to you.

Thanks though for the articulate and accurate post, appreciated :)




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #688733
Posted Thursday, April 2, 2009 5:38 AM


SSCoach

SSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoach

Group: General Forum Members
Last Login: Today @ 3:23 PM
Points: 15,664, Visits: 28,062
GilaMonster (4/2/2009)
Grant Fritchey (3/27/2009)
1. The first query, by a long shot. Although it has a higher number of physical reads. But if both these queries are accessing the same data then it was probably in cache when you ran the second query.


Why do you say that? It has higher logical IOs (100 000 vs 80 000) as well as the higher physical


Mainly based on the scan count. I'm still assuming, possibly incorrectly, that the second query was able to read from cache where as the first had to go to disk. I think if the second query had to go to disk as well, it'd be quite a bit longer.

However, I wasn't aware that the scans were actually a little less of a decent measure than I had originally believed (by the way, do you just memorize Connect bugs or something) so maybe my original estimate is off.

Generally, I don't count on the logical IOs as a single measure. I combine them with what I see in the execution plan and, possibly incorrectly, the scans, along with the actual execution times. What's not known here is how the queries were tested. Did the OP clear the cache and the buffers prior to each execution? If not, you're not getting a thorough test. At the very least we'd need to run the queries in reverse order and see how that affects the results.


----------------------------------------------------
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood..." Theodore Roosevelt
The Scary DBA
Author of: SQL Server 2012 Query Performance Tuning
SQL Server 2008 Query Performance Tuning Distilled
and
SQL Server Execution Plans

Product Evangelist for Red Gate Software
Post #688736
Posted Thursday, April 2, 2009 5:53 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Saturday, August 30, 2014 4:20 PM
Points: 11,194, Visits: 11,142
Hey Grant,

You might well be right - the first query does do more logical IO, but that's not the whole story. If the whole table is scanned in allocation order it might be more efficient in practice - might. The other thing is CPU. 2005 often made a bit of a mess of queries against a single partition - especially if the tables were large enough to warrant an parallel plan (one thread per partition), or if a constant scan with a loop join to a lookup or whatever was executed many times. It's impossible to know for sure which is the 'better query' just from looking at the SQL.

As an aside, the addiction to measuring logical IO alone that some have is a bit one-dimensional. See Joe Chang's excellent blogs on CPU usage and table scanning at [url=http://sqlblog.com/blogs/joe_chang/default.aspx][/url]. Makes you think!

Cheers,

/Paul




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #688744
Posted Thursday, April 2, 2009 6:09 AM


SSCoach

SSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoach

Group: General Forum Members
Last Login: Today @ 3:23 PM
Points: 15,664, Visits: 28,062
Yeah, I've read those. I was never completely dependent on IO, but it sure made me second guess even how much I used it.

But Gail was write to call me on such an unequivocal statement based on inadequate information. I made some undefined assumptions. I should have asked the OP a couple of more questions first.


----------------------------------------------------
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood..." Theodore Roosevelt
The Scary DBA
Author of: SQL Server 2012 Query Performance Tuning
SQL Server 2008 Query Performance Tuning Distilled
and
SQL Server Execution Plans

Product Evangelist for Red Gate Software
Post #688755
Posted Thursday, April 2, 2009 7:22 AM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 3:52 PM
Points: 42,849, Visits: 35,978
Grant Fritchey (4/2/2009)

Mainly based on the scan count. I'm still assuming, possibly incorrectly, that the second query was able to read from cache where as the first had to go to disk. I think if the second query had to go to disk as well, it'd be quite a bit longer.


Well they both had to read partially from disk, both have physical IOs.
Thing with the logical IOs is (if I'm remembering correctly), they're cumulative over all 'scans'. So a query that does 500 reads in one scan is doing the same number of IOs as a query that does 500 in 100 scans

(by the way, do you just memorize Connect bugs or something)

No. This came up in the private newsgroups a couple weeks back and the connect was referenced there.

Generally, I don't count on the logical IOs as a single measure.

Likewise, but in this case it's all we have so far.
Of course, in this case it's a moot point since the two queries that we're comparing aren't equivalent. One just has a join, the other a join and a filter.
Apples and pears.



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #688854
« Prev Topic | Next Topic »

Add to briefcase ««12

Permissions Expand / Collapse