Version Store and Index Seeks

  • Hi there, this is just a question out of curiosity really.

    I've recently been playing around with optimistic concurrency and snapshot isolation for the first time and I think I have a high level understanding of how it works.

    I get that updates and deletes need to maintain a copy of pre-updated rows in the version store for as long as there might be an open transaction that may need them, and that when such a query does need these rows it reads the correct version according to the XSN of the read query vs the XSN(s) of the row versions.

    This is all great if the read query knows what row it is trying to read, but I was just wondering how index seeks work when the update affects the index key.

    If, for a simple example, I have a StatusID field in a largeish table and for a single row this is 1.

    I start a transaction in SNAPSHOT isolation level which contains some work prior to a select query which finds rows with StatusID = 1 (and there is a suitable index on the StatusID column).

    Then a separate process runs to update the StatusID of this row to 2.

    Then the read query actually starts to read the table to look for rows with StatusID = 1

    The query will return the row as the transaction was started before the update, but my question is how does it seek to find this row when the index entry's key is now = 2 and the entry with key = 1 is sitting in a separate structure?

    Are there two separate b-trees (one in the main DB, and one in the tempDB for the version store) and SQL Server traverses both to find any potential rows that match on both the index key and the correct XSN?

    Or does it do something else clever?

    I can't find any articles on the web that explain this - most just mention that the correct rows are pulled from the version store, I'm just asking HOW it finds the correct rows to pull when it seeks on an updated index key? If there are any articles that someone can point me to rather than explain it here, that would be great.

    Thanks for your help!

    Graham

  • Once you enable snapshot isolation, SQL Server will add an extra invisible column to all tables. This column is a pointer to the version store. For committed data, this pointer is a NUL pointer, so SQL Server will use this. For uncommitted data, it will be a pointer to the version store.

    So the process is:

    * Transaction 1 updates the row - it copies the original data to the version store, updates the data with the changes you made and a pointer to the version store in the invisible column, then stores that data.

    * Transaction 2 does an index seek. It follows the index to find the first matching row, reads it and checks the invisible column. If it finds a pointer, it then does a second read from the location in the version store to find the old version.

    Note: that the version store can even contain multiple versions of the row if there have been multiple changes, in which case that invisible pointer will form a chain of pointers from the most recent to the oldest version. There is some additional information in the row that helps SQL Server te determine which version is valid for a given query.

    Note 2: I think that those pointers are added to all data structures (clustered index, nonclustered index, heap), but I am not sure. It is also possible that they are only used in clustered index and heap.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hi Hugo. Thanks for the reply, but I'm still a bit in the dark.

    I understand how, once a row is identified as matching the selection criteria, the appropriate version is then read via the pointer(s) to return the data.

    My query really is how the row is identified in the first place if, as in my example, the index key's value has been updated, or in your example this part: It follows the index to find the first matching row.

    If the value of the index key has been changed (so if StatusID has been updated in my example), the index row will be given the version pointer in its invisible column, but its key will have changed, so how does the select query find this row in the first place? So really, I'm asking how versioned data is searched, rather than how it is retrieved.

    Sorry if I'm being difficult!

    Thanks.

  • Ah okay, I missed the detail that the updated column is indexed.

    I did some digging (and some experiments with DBCC IND and DBCC PAGE - always fun!) and I now know how this works.

    First: the 14 bytes of versioning information are indeed added to all indexes, not just the clustered index and heap.

    Second: when an indexed column is updated, it is processed by deleting the old entry and inserting a new entry in the now correct location.

    Third: until the UPDATE is committed, the deleted entry in the index is not physically removed, it is marked as a "ghost record". So the index seek can still see the data, and check the 14-byte pointer into tempdb for previous versions.

    The rest is trivial: follow the pointer to the version store to determine whether the index entry has to be considered valid or deleted based on the transaction timestamp of the select versus the transaction timestamp of the update.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Thanks Hugo, that makes perfect sense now!

    I do remember reading something about ghost rows a while ago (think it was in the SQL Server Internals book), but thinking "this is a bit deep, I probably won't have to know this." Ho hum.

    You learn something new every day on this site.

    Cheers.

  • For a little extra fun, you can even see that ghost records are being retained because of snapshot isolation if you run sys.dm_db_index_physical_stats using DETAILED mode. There's a handy column in the results just for that purpose, version_ghost_record_count.

    Cheers!

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply