• Paul White NZ (3/31/2010)


    Hugo Kornelis (3/31/2010)


    I've got to correct you AND myself now.

    Did you read the comments in Alex's blog entry (including mine) or the previous article?

    My point there, as here, is that page splits can cause rows to be read twice or not at all.

    I felt sure that you and I had discussed this effect before.

    Alex's previous article demonstrates COUNT(*) totals going awry, and IIRC Tony R's blog entry is about roughly the same thing, but I am not a fan 🙂

    Please do read the comments on Alex K's blog, because I don't want you to think I am making this up as I go along...;-)

    I'm sorry that I keep arguing, but I just can't let this rest...

    I must admit that I did not read the comments on Alex' bllog entry. I looked at the blog post, recognised it as one I have seen when it was published and recalled what it was about, then went back here to respond. I have now read the comments as well. And just for the record, I have never thought you are making things up, though I still think you don't understand exactly what is happening in the various cases.

    Even though, after my initial "never", I now have found that there are indeed cases where even under read committed a row can be skipped or read twice, I have to maintain that this can NOT be caused by page splits. Allow me to explain (almost everyone except Paul should probably skip the next part; it's pretty deep...)

    SQL Server has two ways of scanning a table or index - using the pointers to process all leaf pages in the "logical" order, or using the IAM (Index Allocation Map) to process pages assigned to the table or index in the order in which they happen to be layed out on disk. The latter version is indeed susceptible to reading rows twice or skipping them, if page splits happen during the read. That's because they can move data from a page that has already been read to a page that has not yet been read, or vice versa. When data is read by following the pointer chain, a page split doesn't change anything - as part of the page split process, the next page/prev page pointer chain is updated so that the "oold" page is replaced by two "new pages" in the original "logical" location.

    However, SQL Server will only use an IAM scan to read data if two requirements are both met. First, the query execution plan must ask for an *unordered* scan (which means: the data doesn't have to be ordered, but no problem if it happens to be) - for an ordered scan, only following the pointers to read data in logical order can be used. The second requirement is that there must be either a table lock or no locks used in the query. Now, with a table lock you can get no page splits, because they can only be caused by modifications, which are precluded by the table lock. Which leaves only the no locks option available for this scenario - that is, either a NOLOCK hint or a READ_UNCOMMITTED isolation level.

    The issues that Tony point out in his blog, and the one pointed out by Alex in the blog with all the C# code are not caused by page splits, but by a different issue. Taking Alex' post as an example, what happens is that the COUNT(*) will get the fastest performance by reading the smallest index, which is the index on the AssignedTo column. And the concurrent connection happens to be updating exactly that column. So what happens here, is that a row is read for the COUNT(*), and then the update causes to the indexed column causes it to be relocated to a different page in the same index (which might or might not cause a page split - that is irrelevant here) which is at that time not yet read. And once the scan for the COUNT(*) gets there, it waits for the locks to be released, then reads the page where that row has been added in the mean time, so that it has now been read twice. (And of course, a similar scenario where the row is moved in a different direction would cause the row to be skipped). It's a bit like trying to count children in toddlers' school - you put them in a line, then start counting heads from left to right, but since they can't stay still long enough, you are bound to count some twice and skip others.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/