|
|
|
SSC-Enthusiastic
      
Group: General Forum Members
Last Login: 2 days ago @ 5:24 AM
Points: 177,
Visits: 821
|
|
[b]GilaMonster (1/7/2013)
All processing is done in memory, regardless of whether it's an index seek, scan, lookup, etc. The query processor has no idea what a file is or how to do disk access.
Something has to do the disk accesses. And because the physical reads on disk is often the determining factor for throughput, this must be somehow included in the optimizer. (I am aware that the optimizer does not know what is present in the cache, but it does base it's optimizing on how many pages are needed).
I am not completely aware what the query processor query engine RDBMS engine SQL-server engine Is.
For me the database is the part which is still there if you pull the plug out of the machine. But do not realy understand the distinction between the above mentioned processors/engines.
For me the RDBMS engine does deliver the results and does take all the neccesary actions to get those results. (Both for select and mutation statements).
The query processor ??? processes the query (compilation and execution) I assume.
The query engine ??? invokes the query processor and gets the neccesary data ???
SQL-server engine is a RDBMS engine.
But for me the borders between the different parts are a bit vague. Sorry.
Ben
|
|
|
|
|
SSC-Dedicated
           
Group: General Forum Members
Last Login: Today @ 3:04 AM
Points: 38,109,
Visits: 30,398
|
|
ben.brugman (1/7/2013) Something has to do the disk accesses.
The storage engine
And because the physical reads on disk is often the determining factor for throughput, this must be somehow included in the optimizer.
The optimiser assumes that none of the required data will be in cache.
You have (in general) The optimiser - responsible for generating the execution plans The query processor - executes the plans (also called the query execution engine) The storage engine - responsible for all disk access, for giving rows to the query processor, for handling transactions and a whole bunch more.
The query processor (what actually executes queries) has no clue what a disk or file is. It asks the storage engine for the rows it needs. If the rows are on pages in cache, the storage engine just gives it rows. If they're on pages not in cache, the storage engine fetches them into cache, then reads the rows off and hands them to the query processor
Gail Shaw Microsoft Certified Master: SQL Server 2008, MVP SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
We walk in the dark places no others will enter We stand on the bridge and no one may pass
|
|
|
|
|
Ten Centuries
      
Group: General Forum Members
Last Login: Monday, June 17, 2013 10:11 AM
Points: 1,333,
Visits: 1,803
|
|
With a clustered index on ( A, B, C ) and a
WHERE A = 6 AND C = 7
I would expect SQL to a clustered index seek.
That is, it will do an indexed search to go straight to the "A = 6" rows and read just those; no other (non-index) rows would have to be read. SQL will search those rows looking for "C = 7".
With only nonclustered indexes, the process is more involved.
SQL first has to determine if you have a "covering index": that is, an index with ALL columns used in the query. If so, SQL will use that; if multiple indexes with all columns, SQL uses the one with the shortest row.
If there is no fully covering index, SQL will decide if an index with all the WHERE column(s) is available, and if it is worth using. Since reads using a nonclustered index involve lots of additional I/O -- SQL must use the rows in the nonclus index to go back and do random reads on the parent row in the heap/clus index -- there is a "tipping point" where SQL won't use a nonclustered index at all and instead just scan the table itself.
There is no specific % of rows where SQL will or won't use a nonclus index. The exact % varies. For example, if only 1% of rows have "A = 6", SQL is almost certain to use the nonclus index. If 90% of the rows have "A = 6", SQL is almost certain to do a table scan. Typically the "tipping point" is somewhere between 10% and 30% of rows, but even that is just a rough guideline and could vary for specific situations.
SQL DBA,SQL Server MVP('07, '08, '09) One man with courage makes a majority. Andrew Jackson
|
|
|
|
|
SSC-Dedicated
           
Group: General Forum Members
Last Login: Today @ 3:04 AM
Points: 38,109,
Visits: 30,398
|
|
ScottPletcher (1/7/2013) If so, SQL will use that; if multiple indexes with all columns, SQL uses the one with the shortest row.
The most selective one (the one that, based on the statistics, is going to return the fewest rows after the seek). If multiple indexes are equally selective, then the one with the fewest number of pages.
For example, if only 1% of rows have "A = 6", SQL is almost certain to use the nonclus index.
If 1% of the rows have A=6 AND C=7 (the index has both columns, therefore both conditions limit the number of rows needing a key lookup), SQL is almost certain to do a table scan.
Typically the "tipping point" is somewhere between 10% and 30% of rows, but even that is just a rough guideline and could vary for specific situations.
The tipping point is where the number of rows needing a lookup = 30% of the number of pages in the table. Usually this is somewhere around 0.1% - 0.5% of the table, but depends on the size of the rows.
Gail Shaw Microsoft Certified Master: SQL Server 2008, MVP SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
We walk in the dark places no others will enter We stand on the bridge and no one may pass
|
|
|
|
|
Ten Centuries
      
Group: General Forum Members
Last Login: Monday, June 17, 2013 10:11 AM
Points: 1,333,
Visits: 1,803
|
|
GilaMonster (1/7/2013)
ScottPletcher (1/7/2013) If so, SQL will use that; if multiple indexes with all columns, SQL uses the one with the shortest row.The most selective one (the one that, based on the statistics, is going to return the fewest rows after the seek). If multiple indexes are equally selective, then the one with the fewest number of pages. For example, if only 1% of rows have "A = 6", SQL is almost certain to use the nonclus index. If 1% of the rows have A=6 AND C=7 (the index has both columns, therefore both conditions limit the number of rows needing a key lookup), SQL is almost certain to do a table scan. Typically the "tipping point" is somewhere between 10% and 30% of rows, but even that is just a rough guideline and could vary for specific situations. The tipping point is where the number of rows needing a lookup = 30% of the number of pages in the table. Usually this is somewhere around 0.1% - 0.5% of the table, but depends on the size of the rows.
Fewest rows? The number of rows that match all the criteria will be the same regardless of the index used, right?
What is the basis for you claim that if only 1% of rows in the table match that SQL will almost certainly do a table scan? If the table has 1M rows, but only 10K match, you state SQL is almost certain to do a table scan??
Where did you get that 30% is a hard number for a tipping point? How could 0.1% of the rows in a table equal 30% of the table pages??
SQL DBA,SQL Server MVP('07, '08, '09) One man with courage makes a majority. Andrew Jackson
|
|
|
|
|
SSC-Dedicated
           
Group: General Forum Members
Last Login: Today @ 3:04 AM
Points: 38,109,
Visits: 30,398
|
|
ScottPletcher (1/7/2013) Fewest rows? The number of rows that match all the criteria will be the same regardless of the index used, right?
Match all criteria, yes. What I was thinking about is the case where there are multiple indexes, none of which have all the criteria.
SELECT <columns> FROM SomeTable WHERE Col1 = @A and Col2 = @B
and we have a two indexes, one on Col1 and one on Col2. The one that SQL will pick to seek is the one that is estimated to return the fewest rows so that SQL can do the fewest lookups in order to fetch the other column(s), for the secondary filter and the column list if necessary
If all the indexes have the columns for all the criteria, then the index used will be the one with the fewest leaf pages.
What is the basis for you claim that if only 1% of rows in the table match that SQL will almost certainly do a table scan? If the table has 1M rows, but only 10K match, you state SQL is almost certain to do a table scan??
Experience and testing. Yes, SQL is almost certain to scan at 1% of the rows in the table if it does not have a covering index to seek on (Almost certain. I can probably conjure an example where at 1% with accurate row estimations seeks and key lookups are done). Feel free to test it, or to see the many, many blog posts and articles on this subject.
Where did you get that 30% is a hard number for a tipping point?
iirc it's in Kalen's book. Not in the mood for hunting for it tonight.
How could 0.1% of the rows in a table equal 30% of the table pages??
Let's say rows of 100 bytes each. That's 80 per page. 1 000 000 rows is 12 500 pages. 30% of that is 3750 pages. 3750 rows would be 0.375% of the total rows in the table.
In case you misunderstood my comment, tipping point is when the number of rows to be looked up is equal to 30% of the total page count in the table
Gail Shaw Microsoft Certified Master: SQL Server 2008, MVP SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
We walk in the dark places no others will enter We stand on the bridge and no one may pass
|
|
|
|
|
Ten Centuries
      
Group: General Forum Members
Last Login: Monday, June 17, 2013 10:11 AM
Points: 1,333,
Visits: 1,803
|
|
 GilaMonster (1/7/2013)What is the basis for you claim that if only 1% of rows in the table match that SQL will almost certainly do a table scan? If the table has 1M rows, but only 10K match, you state SQL is almost certain to do a table scan?? Experience and testing. Yes, SQL is almost certain to scan at 1% of the rows in the table if it does not have a covering index to seek on (Almost certain. I can probably conjure an example where at 1% with accurate row estimations seeks and key lookups are done). Feel free to test it, or to see the many, many blog posts and articles on this subject. Where did you get that 30% is a hard number for a tipping point? iirc it's in Kalen's book. Not in the mood for hunting for it tonight. How could 0.1% of the rows in a table equal 30% of the table pages?? Let's say rows of 100 bytes each. That's 80 per page. 1 000 000 rows is 12 500 pages. 30% of that is 3750 pages. 3750 rows would be 0.375% of the total rows in the table. In case you misunderstood my comment, tipping point is when the number of rows to be looked up is equal to 30% of the total page count in the table
I'm pretty sure the ~30% is the default, but based on what I've read, that is not a hard value because other factors affect it, including memory available, table size, row size, I/O affinity and parallelism.
0.375% is almost 4 times 0.1%; yes, the number goes very low, but I don't think it reaches quite down to 0.1% .
SQL DBA,SQL Server MVP('07, '08, '09) One man with courage makes a majority. Andrew Jackson
|
|
|
|
|
SSC-Dedicated
           
Group: General Forum Members
Last Login: Today @ 3:04 AM
Points: 38,109,
Visits: 30,398
|
|
ScottPletcher (1/7/2013) 0.375% is almost 4 times 0.1%; yes, the number goes very low, but I don't think it reaches quite down to 0.1% .
I did say "Usually somewhere around 0.1%-0.5%"
p.s. I have seen the tipping point as low as 0.15%
Gail Shaw Microsoft Certified Master: SQL Server 2008, MVP SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
We walk in the dark places no others will enter We stand on the bridge and no one may pass
|
|
|
|
|
SSC-Enthusiastic
      
Group: General Forum Members
Last Login: 2 days ago @ 5:24 AM
Points: 177,
Visits: 821
|
|
Thanks to all contributing to this thread, especially Gail and the others who triggered her in providing more information.
This made me wonder about Segments, do they come into the 'picture' at all with the discussed aspects. Does the optimizer 'know' about segments? Or are segment only a way to 'bundle' pages and therefore help to limit the fragmentation?
Thanks for all the input, Ben Brugman
|
|
|
|
|
SSC-Dedicated
           
Group: General Forum Members
Last Login: Today @ 3:04 AM
Points: 38,109,
Visits: 30,398
|
|
ben.brugman (1/8/2013) This made me wonder about Segments, do they come into the 'picture' at all with the discussed aspects. Does the optimizer 'know' about segments? Or are segment only a way to 'bundle' pages and therefore help to limit the fragmentation?
Segments?
Gail Shaw Microsoft Certified Master: SQL Server 2008, MVP SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
We walk in the dark places no others will enter We stand on the bridge and no one may pass
|
|
|
|