Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««123»»

Index usage and RID lookup. Expand / Collapse
Author
Message
Posted Monday, January 7, 2013 9:02 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Today @ 6:24 AM
Points: 247, Visits: 1,170
[b]GilaMonster (1/7/2013)

All processing is done in memory, regardless of whether it's an index seek, scan, lookup, etc. The query processor has no idea what a file is or how to do disk access.


Something has to do the disk accesses. And because the physical reads on disk is often the determining factor for throughput, this must be somehow included in the optimizer. (I am aware that the optimizer does not know what is present in the cache, but it does base it's optimizing on how many pages are needed).

I am not completely aware what the
query processor
query engine
RDBMS engine
SQL-server engine
Is.

For me the database is the part which is still there if you pull the plug out of the machine.
But do not realy understand the distinction between the above mentioned processors/engines.

For me the RDBMS engine does deliver the results and does take all the neccesary actions to get those results. (Both for select and mutation statements).

The query processor ??? processes the query (compilation and execution) I assume.

The query engine ??? invokes the query processor and gets the neccesary data ???

SQL-server engine is a RDBMS engine.

But for me the borders between the different parts are a bit vague. Sorry.

Ben
Post #1403686
Posted Monday, January 7, 2013 9:10 AM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 4:16 AM
Points: 40,194, Visits: 36,599
ben.brugman (1/7/2013)
Something has to do the disk accesses.


The storage engine

And because the physical reads on disk is often the determining factor for throughput, this must be somehow included in the optimizer.


The optimiser assumes that none of the required data will be in cache.

You have (in general)
The optimiser - responsible for generating the execution plans
The query processor - executes the plans (also called the query execution engine)
The storage engine - responsible for all disk access, for giving rows to the query processor, for handling transactions and a whole bunch more.

The query processor (what actually executes queries) has no clue what a disk or file is. It asks the storage engine for the rows it needs. If the rows are on pages in cache, the storage engine just gives it rows. If they're on pages not in cache, the storage engine fetches them into cache, then reads the rows off and hands them to the query processor



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #1403690
Posted Monday, January 7, 2013 12:50 PM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Yesterday @ 4:17 PM
Points: 2,197, Visits: 3,308
With a clustered index on ( A, B, C ) and a

WHERE A = 6 AND C = 7


I would expect SQL to a clustered index seek.

That is, it will do an indexed search to go straight to the "A = 6" rows and read just those; no other (non-index) rows would have to be read. SQL will search those rows looking for "C = 7".


With only nonclustered indexes, the process is more involved.

SQL first has to determine if you have a "covering index": that is, an index with ALL columns used in the query. If so, SQL will use that; if multiple indexes with all columns, SQL uses the one with the shortest row.

If there is no fully covering index, SQL will decide if an index with all the WHERE column(s) is available, and if it is worth using. Since reads using a nonclustered index involve lots of additional I/O -- SQL must use the rows in the nonclus index to go back and do random reads on the parent row in the heap/clus index -- there is a "tipping point" where SQL won't use a nonclustered index at all and instead just scan the table itself.

There is no specific % of rows where SQL will or won't use a nonclus index. The exact % varies. For example, if only 1% of rows have "A = 6", SQL is almost certain to use the nonclus index. If 90% of the rows have "A = 6", SQL is almost certain to do a table scan. Typically the "tipping point" is somewhere between 10% and 30% of rows, but even that is just a rough guideline and could vary for specific situations.


SQL DBA,SQL Server MVP('07, '08, '09)

Carl Sagan said: "There is no such thing as a dumb question." Sagan obviously never watched a congressional hearing!
Post #1403827
Posted Monday, January 7, 2013 1:07 PM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 4:16 AM
Points: 40,194, Visits: 36,599
ScottPletcher (1/7/2013)
If so, SQL will use that; if multiple indexes with all columns, SQL uses the one with the shortest row.


The most selective one (the one that, based on the statistics, is going to return the fewest rows after the seek). If multiple indexes are equally selective, then the one with the fewest number of pages.

For example, if only 1% of rows have "A = 6", SQL is almost certain to use the nonclus index.


If 1% of the rows have A=6 AND C=7 (the index has both columns, therefore both conditions limit the number of rows needing a key lookup), SQL is almost certain to do a table scan.

Typically the "tipping point" is somewhere between 10% and 30% of rows, but even that is just a rough guideline and could vary for specific situations.


The tipping point is where the number of rows needing a lookup = 30% of the number of pages in the table. Usually this is somewhere around 0.1% - 0.5% of the table, but depends on the size of the rows.



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #1403838
Posted Monday, January 7, 2013 1:22 PM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Yesterday @ 4:17 PM
Points: 2,197, Visits: 3,308
GilaMonster (1/7/2013)
ScottPletcher (1/7/2013)
If so, SQL will use that; if multiple indexes with all columns, SQL uses the one with the shortest row.


The most selective one (the one that, based on the statistics, is going to return the fewest rows after the seek). If multiple indexes are equally selective, then the one with the fewest number of pages.

For example, if only 1% of rows have "A = 6", SQL is almost certain to use the nonclus index.


If 1% of the rows have A=6 AND C=7 (the index has both columns, therefore both conditions limit the number of rows needing a key lookup), SQL is almost certain to do a table scan.

Typically the "tipping point" is somewhere between 10% and 30% of rows, but even that is just a rough guideline and could vary for specific situations.


The tipping point is where the number of rows needing a lookup = 30% of the number of pages in the table. Usually this is somewhere around 0.1% - 0.5% of the table, but depends on the size of the rows.



Fewest rows? The number of rows that match all the criteria will be the same regardless of the index used, right?

What is the basis for you claim that if only 1% of rows in the table match that SQL will almost certainly do a table scan? If the table has 1M rows, but only 10K match, you state SQL is almost certain to do a table scan??


Where did you get that 30% is a hard number for a tipping point? How could 0.1% of the rows in a table equal 30% of the table pages??


SQL DBA,SQL Server MVP('07, '08, '09)

Carl Sagan said: "There is no such thing as a dumb question." Sagan obviously never watched a congressional hearing!
Post #1403845
Posted Monday, January 7, 2013 2:54 PM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 4:16 AM
Points: 40,194, Visits: 36,599
ScottPletcher (1/7/2013)
Fewest rows? The number of rows that match all the criteria will be the same regardless of the index used, right?


Match all criteria, yes. What I was thinking about is the case where there are multiple indexes, none of which have all the criteria.

SELECT <columns> FROM SomeTable WHERE Col1 = @A and Col2 = @B

and we have a two indexes, one on Col1 and one on Col2. The one that SQL will pick to seek is the one that is estimated to return the fewest rows so that SQL can do the fewest lookups in order to fetch the other column(s), for the secondary filter and the column list if necessary

If all the indexes have the columns for all the criteria, then the index used will be the one with the fewest leaf pages.

What is the basis for you claim that if only 1% of rows in the table match that SQL will almost certainly do a table scan? If the table has 1M rows, but only 10K match, you state SQL is almost certain to do a table scan??


Experience and testing. Yes, SQL is almost certain to scan at 1% of the rows in the table if it does not have a covering index to seek on (Almost certain. I can probably conjure an example where at 1% with accurate row estimations seeks and key lookups are done). Feel free to test it, or to see the many, many blog posts and articles on this subject.

Where did you get that 30% is a hard number for a tipping point?


iirc it's in Kalen's book. Not in the mood for hunting for it tonight.

How could 0.1% of the rows in a table equal 30% of the table pages??


Let's say rows of 100 bytes each. That's 80 per page. 1 000 000 rows is 12 500 pages. 30% of that is 3750 pages. 3750 rows would be 0.375% of the total rows in the table.

In case you misunderstood my comment, tipping point is when the number of rows to be looked up is equal to 30% of the total page count in the table



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #1403884
Posted Monday, January 7, 2013 3:45 PM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Yesterday @ 4:17 PM
Points: 2,197, Visits: 3,308
GilaMonster (1/7/2013)
What is the basis for you claim that if only 1% of rows in the table match that SQL will almost certainly do a table scan? If the table has 1M rows, but only 10K match, you state SQL is almost certain to do a table scan??


Experience and testing. Yes, SQL is almost certain to scan at 1% of the rows in the table if it does not have a covering index to seek on (Almost certain. I can probably conjure an example where at 1% with accurate row estimations seeks and key lookups are done). Feel free to test it, or to see the many, many blog posts and articles on this subject.

Where did you get that 30% is a hard number for a tipping point?


iirc it's in Kalen's book. Not in the mood for hunting for it tonight.

How could 0.1% of the rows in a table equal 30% of the table pages??


Let's say rows of 100 bytes each. That's 80 per page. 1 000 000 rows is 12 500 pages. 30% of that is 3750 pages. 3750 rows would be 0.375% of the total rows in the table.

In case you misunderstood my comment, tipping point is when the number of rows to be looked up is equal to 30% of the total page count in the table


I'm pretty sure the ~30% is the default, but based on what I've read, that is not a hard value because other factors affect it, including memory available, table size, row size, I/O affinity and parallelism.

0.375% is almost 4 times 0.1%; yes, the number goes very low, but I don't think it reaches quite down to 0.1% .


SQL DBA,SQL Server MVP('07, '08, '09)

Carl Sagan said: "There is no such thing as a dumb question." Sagan obviously never watched a congressional hearing!
Post #1403897
Posted Monday, January 7, 2013 3:50 PM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 4:16 AM
Points: 40,194, Visits: 36,599
ScottPletcher (1/7/2013)
0.375% is almost 4 times 0.1%; yes, the number goes very low, but I don't think it reaches quite down to 0.1% .


I did say "Usually somewhere around 0.1%-0.5%"

p.s. I have seen the tipping point as low as 0.15%



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #1403900
Posted Tuesday, January 8, 2013 3:51 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Today @ 6:24 AM
Points: 247, Visits: 1,170
Thanks to all contributing to this thread, especially Gail and the others who triggered her in providing more information.

This made me wonder about Segments, do they come into the 'picture' at all with the discussed aspects. Does the optimizer 'know' about segments?
Or are segment only a way to 'bundle' pages and therefore help to limit the fragmentation?

Thanks for all the input,
Ben Brugman
Post #1404111
Posted Tuesday, January 8, 2013 4:07 AM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 4:16 AM
Points: 40,194, Visits: 36,599
ben.brugman (1/8/2013)
This made me wonder about Segments, do they come into the 'picture' at all with the discussed aspects. Does the optimizer 'know' about segments?
Or are segment only a way to 'bundle' pages and therefore help to limit the fragmentation?


Segments?



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #1404120
« Prev Topic | Next Topic »

Add to briefcase ««123»»

Permissions Expand / Collapse