Table Valued Functions and Repeatable Reads

  • I'm trying to confirm my understanding of the repeatable read problem and if the following example suffers from it.

    Say I have a table valued function defined as:

    CREATE FUNCTION dbo.MyFunc

    RETURNS @tbl TABLE ( TestColumn VARCHAR(255))

    AS

    BEGIN

    INSERT INTO @tbl

    SELECT TestColumn

    FROM dbo.TestTable

    END

    Then in another stored procedure I join against the table returned by this function as such:

    SELECT *

    FROM dbo.TestTable

    INNER JOIN (SELECT TestColumn FROM dbo.MyFunc()) Tab1

    ON TestTable.TestColumn = Tab1.TestColumn

    Would the fact that my function call is inline with the rest of the query make it not prone to the Repeatable Read problem? (Assuming I'm using the default Isolation level of Read Committed).

    Basically I'm concerned that essentially I'm performing 2 selects. In this case, a SELECT from the outer query and a SELECT from within the function call. In between the call of these 2 selects, is it possible for another transaction to update TestTable such that the SELECT from the function returns a different value to that of the outer SELECT?

    In a similar vein, if I did not use the table valued function, but instead used a CTE in the example ie:

    WITH CTE AS

    (

    SELECT TestColumn

    FROM dbo.TestTable

    )

    SELECT *

    FROM dbo.TestTable

    INNER JOIN (SELECT TestColumn FROM CTE) Tab1

    ON TestTable.TestColumn = Tab1.TestColumn

    it is my understanding that the CTE will be treated as the same (one single) query therefore not suffering from any repeatable reads. Is this correct?

    I hope I'm making myself clear in this example. If not, happy to clarify.

    Thanks.

  • The short answer is no, neither of your two examples are guaranteed to read the same value at the default read committed isolation level, unless the database option READ_COMMITTED_SNAPSHOT is ON. See http://msdn.microsoft.com/en-us/library/ms175095.aspx. If that option is enabled, only the second example (the CTE) is guaranteed for statement-level repeatable reads. It does not extend to scalar or multi-statement function calls, see http://sqlblog.com/blogs/alexander_kuznetsov/archive/2011/08/02/reads-involving-udfs-under-read-committed-snapshot-may-seem-inconsistent.aspx

    The only guarantee offered by the default read committed isolation (without the snapshot option turned on) is that you will never read uncommitted data. If you need reads to be repeatable, you can use the REPEATABLE_READ, SERIALIZABLE, or SNAPSHOT isolation levels. READ_COMMITTED_SNAPHOT (a separate option), provides statement-level read consistency: so each statement (except non-inline function calls as already mentioned) sees a consistent, committed, database state. SNAPSHOT isolation extends this further, providing transaction-level read consistency (though it has many disadvantages too). SNAPSHOT and SERIALIZABLE are also the only isolation levels that prevent reading 'phantoms' - rows that didn't exist when you first ran the statement, but do now. Which isolation level you need depends on your requirements, and how strictly you interpret 'repeatable reads'.

  • Thanks Paul for your answer and the link.

    I guess what I need for my purposes is to set the isolation level to SNAPSHOT since my function will actually be a more complicated multi statement function.

    The function actually SELECTS from a VIEW, stores that view in a table variable and then does a SELECT from the table variable joined with another table.

    I am still a little unsure of why you said my 2nd example will still suffer from unrepeatable reads.

    I thought that CTEs will always be inlined with the main query?

    For example, wouldn't my 2nd example above be equivalent to this:

    SELECT *

    FROM dbo.TestTable

    INNER JOIN (dbo.TestTable) Tab1

    ON TestTable.TestColumn = Tab1.TestColumn

    which is just one single select.

  • elogen (1/10/2012)


    I guess what I need for my purposes is to set the isolation level to SNAPSHOT since my function will actually be a more complicated multi statement function. The function actually SELECTS from a VIEW, stores that view in a table variable and then does a SELECT from the table variable joined with another table.

    Do you mean (transaction-level) snapshot isolation (SI) or (statement-level) read committed snapshot isolation (RCSI)? I can't think of a way to use SI in a function, since it requires an explicit SET TRANSACTION ISOLATION LEVEL statement, and that's not allowed in a function. Why not join the view and the table directly and use RCSI? Perhaps it would be worth explaining a little more about what makes you think you need higher isolation in the first place, that is, what problem are you seeking to avoid? People often over-think isolation levels.

    I am still a little unsure of why you said my 2nd example will still suffer from unrepeatable reads. I thought that CTEs will always be inlined with the main query? For example, wouldn't my 2nd example above be equivalent to this [...code...] which is just one single select.

    It might still encounter non-repeatable reads (I said, there is no guarantee). Non-recursive CTE definitions are indeed in-lined and optimized with the main query text, but that is not the issue. Any time a query plan accesses the same table more than once, there is a window where the table contents might change between accesses (unless prevented by the isolation level, naturally).

    At default read-committed, any shared locks taken when reading a row are generally released immediately. This is the behaviour that provides the window for changes to occur before the table is read a second time. For example, query execution might read a row from the test table, release any shared lock that might have been taken, and then be moved off the processor by SQL Server so another task can run. When it is next scheduled on the processor, it could go to read the same table again (to perform the join) and find the row was no longer there. Or consider this query plan:

    CREATE TABLE dbo.TestTable (TestColumn integer NOT NULL)

    SELECT *

    FROM dbo.TestTable AS tt

    JOIN dbo.TestTable AS tt2 ON

    tt2.TestColumn = tt.TestColumn

    This query happens to use a hash join, so all rows from the top branch are read first and used to build a hash table. If the table is large, this process might take some time, and remember any shared locks are being released immediately each row is read. When hash table building is complete, SQL Server starts reading the table a second time, probing the hash table for matches on the join condition. It's easy to see large window here where concurrent activity could change the table, causing a row not to join to itself 🙂

  • Thanks again Paul for the detailed explanation.

    You're exactly right, my problem really had nothing to do with isolation levels and was more to do with the inefficiency of how I defined my View.

    I'll try to explain.....

    I tried to get around the "slow query" by initially thinking that if I moved the view definition into a multi statement function then I would get a quick win because I can make use of table variables whereas I couldn't use one in the view. (In my View, I had made use of CTEs which I *thought* were supposed to be efficient but later found that my queries would return much faster by using a table variable instead).

    In moving my view query to a TVF, I suddenly found myself joining to the same table twice, in 2 separate SELECTS which is why I posted about the repeatable read problem. But as you correctly pointed out, the repeatable read problem inherently exists whenever you access the same table twice whether its defined in a view or a TVF at the default isolation level of READ COMMITTED.

    However even using a TVF, performance would still noticeably degrade as the number of rows in my table increased.

    So I had to rethink how I was defining the query.

    In short, I was basically retrieving rows I wasn't interested in, so I improved performance by filtering on only those rows that I needed.

    I do have another question on whether my query as it is defined now is as efficient as it can be. But I might save that for another time when I get a bit more time up my sleeve.

    But for the time being, thanks again Paul for pointing me in the right direction.:-)

  • Thanks very much for letting me know, I appreciate it.

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply