Clustered Indexes? Sedimentary, my dear Watson

  • GilaMonster (8/29/2011)


    Please, please, please read that indexing article I referenced.

    It doesn't matter which way around the keys are, the index tree is the same depth (and SQL pages are always 8k in size) and it takes the same number of page traversals to find a row.

    Well, that's mostly correct. The number of page traversals may be different if page fragmentation has caused the depth to change in one case but not in the other, but usually not significantly different (I only say "usually not" instead of "never" because like so many things "it depends").

    In the situation described by Ken, if the non-unique small range field comes before the always increasing unique field in the key there will usually be k (where k is the number of values the small field is allowed to take) incomplete leaf pages in the tree after a load of inserts where the unique column always increases and the small range column repeatedly cycles round its range, while if those two fields are in the opposite order in the index there will only be one - so there may be k-1 extra leaf pages; that could result in a tree one level deeper (or more than one level if k is very large, but for Ken I think k was 7 or 8 so it can't be more than 1 deeper and will almost always be the same depth; and antway an index reorganise will always take it to the same depth). This can't have anything like the order of magnitude effect that Ken reported, so it appears that there was something else going on there that we don't know about.

    Tom

  • I understand a page is 8000 bytes (close to but less than 8KB, close enough to call it 8k) and it reads pages to get the information from the DB, but the minimum amount of data that is ever read from the disk is 1 segment. That's true for indexes as well as tables. 1 segment is 8 pages or about 64KB. SQL usually does a good job of collecting related data into a segment. (See, I didn't just pull 64K out of thin air.)

    In the one case I was talking about there were 7 values, however, I was also talking about 7 tables. Each table had a check constraint so only 1 value was valid in each. (That's how partitioned views work, you insert the row into the view, it determines which table's to be inserted because of the check constraint.)

    The varchar field was a composite value composed first with a fixed set of letters used to identify the company involved and there were a lot fewer companies involved than the numbers used to that basically organized it into segments of the table. One letter was used to identify which of 8 servers and which day of the week when the record was being inserted. Finally a random number was generated and multiplied to calculate a 4 digit base 62 set of letters. That's 7 place decimal accuracy. A float only has 6 place, so 99% of them used 4 digits and generally successfully inserted on the first try. (How many times have you seen a random number that when multiplied by 62 would be less than 1?)

    Yes, switching the two fields made the insert 300 to 1000 times faster. Basically the second field in an index is totally useless in finding the location for the insert. Leading me to the erronious conclusion that the index only stores the first field of the index.

    Since it does nothing useful about finding the record in the table or organizing the index, I don't understand why it bothers to store the information in the index.

    Yes, this was a clustered primary key.

  • Tom.Thomson (8/29/2011)


    Well, that's mostly correct. The number of page traversals may be different if page fragmentation has caused the depth to change in one case but not in the other, but usually not significantly different (I only say "usually not" instead of "never" because like so many things "it depends").

    No, a b tree in SQL can never have an uneven depth. If page splits (or inserts or anything else) result in a level being filled and still needing space (typically it's the root page that gets too big eventually), SQL adds another intermediate level and increases the depth of the entire tree.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (8/30/2011)


    Tom.Thomson (8/29/2011)


    Well, that's mostly correct. The number of page traversals may be different if page fragmentation has caused the depth to change in one case but not in the other, but usually not significantly different (I only say "usually not" instead of "never" because like so many things "it depends").

    No, a b tree in SQL can never have an uneven depth. If page splits (or inserts or anything else) result in a level being filled and still needing space (typically it's the root page that gets too big eventually), SQL adds another intermediate level and increases the depth of the entire tree.

    I said "one case not the other" not "one branch of the tree in a given case, not the other", so I think you have misread what I wrote. I made no suggestion that two parts of a given index tree could have different depths, just that two indexes with on same columns but in different orders could sometimes have different depths.

    Maybe I'll knock together a demonstration and post it - probably a chunk of code and some measurement is better than a lot of words. But for now (as I have to catch a plain) I'll try to explain it again =in simple words.

    An example of where this depth difference happens: There is a table with at least two columns, A and B. (We don't care about the other columns). There's an index on a key consisting of two attributes A and B, and A has a very small range and B has a much much larger range and, evey B value inserted is greater than any B value previously inserted (so B depends on and increases with time), and the A values inserted with those B values cycle regularly through the range of A (perhaps A could be a computed column, for example A could be computed as B%7). Then there are two possible indexes like this, one on (A,B) and the other on (B,A). Starting with an empty table and doing lots of inserts can lead to different depths for an index whose key is (A,B) from those obtained for an index whose key is (B,A) - so the order can affect the index depth. Whether the depths are the same or different depends on the number of rows so far inserted, and since A is small they usually be the same but every now and again they will be different. This happens because on the (B,A) index there is never a page split, so that there is at most one unfull page at leaf (or, for an clustered index, the layer above leaf) level, whereas with the (A,B) index page splits will occur regularly and there can be more unfull pages at that level. Since there can be more unfull pages in one case, and these pages can all be almost full, this can mean that there are more pages at the index level immediately above the complete data in the (A,B) index than in the (B,A) index, so whenever the number of pages at this level in the (B,A) index is exactly the maximum that can be supported by the number of index levels above it the (A,B) inex has to have an extra level - the splits will have propagated all the way up.

    Tom

  • Tom.Thomson (8/30/2011)


    ... Whether the depths are the same or different depends on the number of rows so far inserted, and since A is small they usually be the same but every now and again they will be different. This happens because on the (B,A) index there is never a page split, so that there is at most one unfull page at leaf (or, for an clustered index, the layer above leaf) level, whereas with the (A,B) index page splits will occur regularly and there can be more unfull pages at that level...

    Assuming there are no deletes in the table and no extra maintenance is applied to (A,B) because of it's inheirent indexing fill problems. And neither one is a clustered index.( Well, that would only affect the leaf page, not the interviening pages leading to them.)

Viewing 5 posts - 46 through 49 (of 49 total)

You must be logged in to reply to this topic. Login to reply