October 29, 2010 at 7:29 pm
I was wondering how Clustered Indexing work with Striping. Clustered indexing tries to maintain physical ordering on disk based on clustered keys for sequential retrieval. Whereas, striping aims to distribute data over multiple disks to provide for parallel retrieval. Seems to be mutually conflicting concepts. Any links or literature than can clarify these concepts for me will be highly appreciated. I am also looking for any particular RAID configuration that takes better advantage of clustered indexes.
October 30, 2010 at 3:17 am
The clustered index enforces logical ordering, not physical.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
November 5, 2010 at 5:18 pm
I thought non-clustered index had the RIDs of the actual rows, whereas clustered index stored the actual row as leaf nodes. It is precisely because of physical ordering that we can only have one clustered index per table.
November 5, 2010 at 5:40 pm
biju.pad (11/5/2010)
I thought non-clustered index had the RIDs of the actual rows, whereas clustered index stored the actual row as leaf nodes. It is precisely because of physical ordering that we can only have one clustered index per table.
You're misunderstanding what Gail meant by logical vs. physical. You're correct in regards to non-clustered RIDs (and the indexed columns + included columns) at the leaf level, and the clustered index having the full record at the leaf level.
The logical vs. physical ordering of the clustered index is merely this. A clustered index can jump around the physical file while following the logical order, because of the way pages and extents are assigned. The index itself maintains the logical order.
The striping itself won't affect the clustered index in any way differently then any other striping will. N kilobytes per page per stripe. The heads dancing to find the next logical page won't be affected by a single drive, a RAID 10, or a RAID 5. It'll happen in all cases.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
November 5, 2010 at 5:57 pm
Thanks for the detailed reply. Makes sense now.
Indulge me, I am being simplistic here. So, If I am doing a scan on a range on an indexed column.
Non-Clustered Index.
1. Relevant index pages are brought into memory.
2. The pages are inspected and we retrieve a bunch of RIDs.
3. We next bring in all the pages associated with these RIDs.
There is some parallelism in step 1 and 3 because of striping and multiple heads reading.
Clustered Index.
1. Relevant Pages of the index are brought into memory. Because the leaf nodes have data pages, I am assuming index sizes are much bigger and more I/Os are required.
2. Since leaf nodes have the data row, there is no need for separate data page requests.
There is parallelism in step 1 because of striping and multiple heads reading.
Would clustered work better than non-clustered? Where is the efficiency coming from?
Thanks for helping clear this in my head.
November 5, 2010 at 6:17 pm
biju.pad (11/5/2010)
1. Relevant Pages of the index are brought into memory. Because the leaf nodes have data pages, I am assuming index sizes are much bigger and more I/Os are required.
Depends on seek or scan. In a scan, yes, absolutely correct. In a seek, the Index pages (not the leaf level, but instead the components of the B-Tree) are brought into memory and searched, then only the seeked pages are dragged up because the match. The beauty of the seek.
Also, in regards to the size differences: It depends. You can get just as large with a non-clustered index. But I would say under most circumstances, yes, the non-clustered leaves will be smaller. Also, if the index contains all the necessary data for the query, you never use the RID lookup.
2. Since leaf nodes have the data row, there is no need for separate data page requests.
Correct.
There is parallelism in step 1 because of striping and multiple heads reading.
Would clustered work better than non-clustered? Where is the efficiency coming from?
Thanks for helping clear this in my head.
Clustered should be your primary method of accessing the table (note, clustered does not mean it's your PK...) because it contains all the leaf information already.
The efficiency for Clustered vs. Non-clustered is that the Clustered is already existing data and doesn't need to be duplicated for usage. Nonclustered may also have to return to the clustered for additional columns. NC indexes basically perform data duplication to speed up querying.
The parallelism for this is correct on why, multiple heads across the stripe, and will affect any time you need to to go to the drive for a page to bring to memory. So, no matter if the index is scattered or whole physically, you will get your parallelism.
Now, if you want to dig into logical vs. physical ordering, do some googling on index fragmentation. You'll get a brainful. 😀
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
November 5, 2010 at 6:23 pm
Thanks for the great reply.
November 6, 2010 at 2:07 am
biju.pad (11/5/2010)
1. Relevant index pages are brought into memory.2. The pages are inspected and we retrieve a bunch of RIDs.
3. We next bring in all the pages associated with these RIDs.
If and only if the base table is a heap, not a clustered index. If the base table is a clustered index, that changes:
1. Relevant index pages are brought into memory.
2. The pages are inspected and we retrieve the relevant clustered index keys.
3. We perform a seek on the clustered index for each of those keys to fetch the actual data row.
What you're describing here is a RID/Key lookup.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 8 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply