SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Table Scan vs. Clustered Index Scan


Table Scan vs. Clustered Index Scan

Author
Message
Levente Kupás
Levente Kupás
Grasshopper
Grasshopper (18 reputation)Grasshopper (18 reputation)Grasshopper (18 reputation)Grasshopper (18 reputation)Grasshopper (18 reputation)Grasshopper (18 reputation)Grasshopper (18 reputation)Grasshopper (18 reputation)

Group: General Forum Members
Points: 18 Visits: 134
Everyone agrees with presence of an index- or table-scan it means missing an index or the shaping of predicates is not really clear.
So far, anyone asked he replyed to me that nothing is worst like table scans. I think we must review this predicate.

Theoretically reading pages from the heap could be faster than reading cluster-index pages.

Background:

- Table scans or serial reads of a heap can be performed by scanning the IAM pages to find the extents that are holding pages for the heap. Because the IAM represents extents in the same order that they exist in the data files, this means that serial heap scans progress sequentially through each file.

- For a clustered index, the root_page column in sys.system_internals_allocation_units points to the top of the clustered index for a specific partition. SQL Server moves down the index to find the row corresponding to a clustered index key. To find a range of keys, SQL Server moves through the index to find the starting key value in the range and then scans through the data pages using the previous or next pointers. To find the first page in the chain of data pages, SQL Server follows the leftmost pointers from the root node of the index.
The storage engine scans the intermediate index page and builds a list of the leaf pages that must be read. The storage engine tschedules all the reads in key order.

- For sequential I/O performance, it is important to distinguish between allocation ordered and index ordered scans. An allocation ordered scan tries to read pages in the order in which they are physically stored on disk while an index ordered scan reads pages according to the order in which the data on those index pages is sorted. (Note that in many cases there are multiple levels of indirection such as RAID devices or SANS between the logical volumes that SQL Server sees and the physical disks. Thus, even an allocation ordered scan may in fact not be truly optimally ordered.) Although SQL Server tries to sort and read pages in allocation order even for an index ordered scan, an allocation ordered scan is generally going to be faster since pages are read in the order that they are written on disk with the minimal number of seeks. Heaps have no inherent order and, thus, are always scanned in allocation order. Indexes are scanned in allocation order only if the isolation level is read uncommitted (or the NOLOCK hint is used) and only if the query process does not request an ordered scan. Defragmenting indexes can help to ensure that index ordered scans perform on par with allocation ordered scans.


Measurement:

The test server is 2008 R2 developer edition, and not any load on it.
I chosed to test a table with 22278012 record, with following statistics of the Heap and Clustered index structure:


Index Type Rows Pages Size (MB) Frag. % Index Level
----------------------------------------------------------------------------------------
CLUSTERED 22278012 397090 3102 2,06 3
CLUSTERED 22278012 2382 18 6,85 2
CLUSTERED 22278012 23 1 95,65 1
CLUSTERED 22278012 1 0 0 0
HEAP 22278012 387701 3028 0 0


The result-set of the query are written out into the tempdb which are placed in different physical disk.
First the buffer was cleared out with dbcc dropcleanbuffers command, to prevent reading pages from cache.


select * into #heap from [dbo].[heap_Table]
select * into #heap from [dbo].[Cl_Table]



Writing out the STATISTICS IO/TIME I got the below results:


Scan Type Logical r. Physical r. Read-ahead r. CPU time ms. Elapsed time ms.
----------------------------------------------------------------------------------------
Clustered 399475 878 399557 35084 57013
Clustered (nolock) 399496 59 399552 36286 50016
Heap 387701 28 387708 35319 49016
Heap (nolock) 387701 37 387708 35880 49141



Second time I let the pages to be kept in the buffer:


Scan Type cached_pages_count cached_pages_in_MB
----------------------------------------------------------------------------------------
Clustered 252842 1929
Clustered (nolock) 260936 1990
Heap 387709 2957
Heap (nolock) 387709 2957



Scan Type Logical r. Physical r. Read-ahead r. CPU time ms. Elapsed time ms.
----------------------------------------------------------------------------------------
Clustered 399475 54 277101 38641 46511
Clustered (nolock) 399496 188 397757 36863 46564
Heap 387701 0 17389 38376 44554
Heap (nolock) 387701 24 31 37643 43225




1. Clustered Index Scan - comitted reads
2. Clustered Index Scan - uncomitted reads (nolock)



Third I used the count methode to eliminate any writes.


select COUNT(ID) from [dbo].[heap_Table]
select COUNT(ID) from [dbo].[Cl_Table] with (index = IX_ID)




Scan Type Logical r. Physical r. Read-ahead r. CPU time ms. Elapsed time ms.
----------------------------------------------------------------------------------------
Clustered 399475 0 0 5445 5438
Clustered (nolock) 399496 0 0 5444 5464
Heap 387701 0 0 5366 5386
Heap (nolock) 387701 0 0 5413 5415



Empirical conclusion:

- In case of Clustered Index Scan we may read up the root + some pages from intermediate levels + leaf level
- The Clustered Index Scan with (nolock) hint we must read up all pages (root + all pages/all intermediate levels + leaf level)
- Scaning the Heap we have to read up the leaf level

From the first measurement table we can observe easily that the Table Scan is faster (with near 15%) than the Clustered Index Scan, however the CPU usage is higher with a bit but not significant (0,5%).
Reading up the top 1 million of rows I found that the Table San is still 11% faster than the Clustered Index Scan, moreover the Clustered Index Scan with (nolock) hint was 20% slower than Table Scan and the CPU usage increased with 12% too.

Very exciting the disk usage performance counter statistics in second methode.
In case of Heap all the pages are kept in the buffer and not any phisycal read observed in both of comitted or uncomitted read.
When we scan the Clustered Index we have lot of phisycal reads even if near all the pages are kept in memory.
In case of Clustered Index Scan in read uncomitted mode we read up again near all the pages from disk vainly we have the pages already in cache.
In read comitted mode is better the situation, because we read some from buffer, but still a lot from disk.

In third methode no significant difference between scan types (2%) which is expected since all the pages are scanned from buffer.

For me is very interesting why so different the operation principle of the scan methode depend we writing out the result or not.

http://kladna.blog.com/
Attachments
Index_scan.gif (697 views, 18.00 KB)
Index_scan_wn.gif (700 views, 18.00 KB)
Evil Kraig F
Evil Kraig F
SSC-Insane
SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)

Group: General Forum Members
Points: 21237 Visits: 7660
kladna (3/2/2011)

From the above table we can observe easily that the Table Scan is faster (with near 15%) than the Clustered Index Scan, however the CPU usage is higher with a bit but not significant (0,5%).
Reading up the top 1 million of rows I found that the Table San is still 11% faster than the Clustered Index Scan, moreover the Clustered Index Scan with (nolock) hint was 20% slower than Table Scan and the CPU usage increased with 12% too.


An interesting analysis, but you've taken the table as an independent entity and your results are assumed on always wanting the entire table loaded, neither of which is the common desire, especially amongst 22 million row tables.

Have you further pursued your testing to multiple tables being used (joins) and where clause mechanics? Even a HEAP with non-clustered indexing will still scan the leaf level in allocation method, so there are a number of options here.


- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions | Forum Netiquette
For index/tuning help, follow these directions. |Tally Tables

Twitter: @AnyWayDBA
Levente Kupás
Levente Kupás
Grasshopper
Grasshopper (18 reputation)Grasshopper (18 reputation)Grasshopper (18 reputation)Grasshopper (18 reputation)Grasshopper (18 reputation)Grasshopper (18 reputation)Grasshopper (18 reputation)Grasshopper (18 reputation)

Group: General Forum Members
Points: 18 Visits: 134
As you expected my article focusing strictly to this two entity, showing the background of the scaning method in different cases. I am surprized the difference between read comitted and uncomitted type of scans.
Also I meant to continue my measurment in multi-table query too, when is better to use Heap instead of Clustered Index, but that will be an another article.
Bhuvnesh
Bhuvnesh
SSChampion
SSChampion (13K reputation)SSChampion (13K reputation)SSChampion (13K reputation)SSChampion (13K reputation)SSChampion (13K reputation)SSChampion (13K reputation)SSChampion (13K reputation)SSChampion (13K reputation)

Group: General Forum Members
Points: 13626 Visits: 4077
these are the expected stats reason being .when we are calling every records (without any filer/where clause ) definitely Sql optimizer will go for table scan(or i would say clustered index scan ) because every records needs to be pulled out. but when we have any where clause condition or join operation definitely Clus index will give us better results.

-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search