SERIALIZABLE Transaction Isolation Level Is Not Behaving As I Expected

  • I have 2 goals to accomplish:

    1. Continuously and atomically migrate data from a primary table to an archive table.

    2. Ensure that a VIEW of a UNION ALL between the primary and the archive tables never shows duplicate rows.

    Current results:

    1. Code working correctly.

    2. Code NOT working correctly.

    This is a subset of the data migration query executed continuously as a SQL Agent job. AUDITMSG is the primary table. OLDAUDIT is the archive table:

    DECLARE @DayCount INT = 90;

    DECLARE @DayCutOff DATETIME = DATEADD(day, -@DayCount, GETDATE());

    DECLARE @TempTable TABLE (ID INT NOT NULL);

    BEGIN TRY

    WHILE(1 = 1)

    BEGIN

    SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

    BEGIN TRANSACTION

    INSERT INTO @TempTable

    SELECT TOP 10 ID FROM AUDITMSG WHERE TRANS_DATE < @DayCutoff;

    INSERT INTO OLDAUDIT( ORIGINAL_ID, Vendorid )

    SELECT ID, Vendorid FROM AUDITMSG WHERE ID IN (SELECT ID FROM @TempTable);

    DELETE FROM AUDITMSG WHERE ID IN (SELECT ID FROM @TempTable);

    COMMIT TRANSACTION;

    DELETE FROM @TempTable;

    WAITFOR DELAY '00:00:01.000';

    END

    END TRY

    BEGIN CATCH

    DECLARE @ErrorMessage VARCHAR(255);

    SET @ErrorMessage = ERROR_MESSAGE();

    ROLLBACK TRANSACTION;

    RAISERROR(@ErrorMessage, 20, 0) WITH LOG;

    RETURN;

    END CATCH

    Below is a test query designed to prove that duplicate rows do not exist in either table:

    SELECT COUNT(*)

    FROM AUDITMSG a

    INNER JOIN OLDAUDIT o ON o.ORIGINAL_ID = a.ID

    When the SQL Agent job is not executing, the result returned is always zero rows, which means no duplicate rows were found in either table.

    When the SQL Agent job is executing, the result returned is 30, 40, or 50 rows, which indicates that the rows exist simultaneously in both tables and the VIEW of both tables, AUDITMSG and OLDAUDIT, can show duplicates.

    This is not the way I expected the SERIALIZABLE transaction isolation level to work.

    From Books Online:

    SERIALIZABLE - Specifies the following:

    Statements cannot read data that has been modified but not yet committed by other transactions.

    No other transactions can modify data that has been read by the current transaction until the current transaction completes.

    Other transactions cannot insert new rows with key values that would fall in the range of keys read by any statements in the current transaction until the current transaction completes.

    Range locks are placed in the range of key values that match the search conditions of each statement executed in a transaction. This blocks other transactions from updating or inserting any rows that would qualify for any of the statements executed by the current transaction. This means that if any of the statements in a transaction are executed a second time, they will read the same set of rows. The range locks are held until the transaction completes. This is the most restrictive of the isolation levels because it locks entire ranges of keys and holds the locks until the transaction completes. Because concurrency is lower, use this option only when necessary. This option has the same effect as setting HOLDLOCK on all tables in all SELECT statements in a transaction.

    What have I overlooked?

    What do I need to change to prevent the VIEW'ing of duplicate rows?

  • Hi Gail,

    In what isolation level is the query that checks the results running? Do you have Read Committed Snapshot on?

    Also, have you considered using the DELETE...OUTPUT clause to capture the results that should be inserted into the new table, saving you first having to SELECT into your table variable?

    http://msdn.microsoft.com/en-gb/library/ms189835.aspx#CaptureResults

    Jamie

  • HunterJ (2/18/2013)


    Hi Gail,

    In what isolation level is the query that checks the results running? Do you have Read Committed Snapshot on?

    Also, have you considered using the DELETE...OUTPUT clause to capture the results that should be inserted into the new table, saving you first having to SELECT into your table variable?

    http://msdn.microsoft.com/en-gb/library/ms189835.aspx#CaptureResults

    Jamie

    Thanks for responding, Jamie.

    The query that checks the results is operating at the READCOMMITTED isolation level. And, in fact, the data migration query I wrote was originally operating at the same level which should be completely adequate. I escalated the isolation level to SERIALIZABLE just to see if it would fix the problem.

    I have considered the DELETE...OUTPUT clause but I think what I've already written should work. I want to understand what the problem is before moving on to another way of implementing the code.

  • Correct me if I am wrong here.. looks like you are trying to archive data.

    How big is the data size in the table?

    If the data size is significantly big, why don't you use partitioning? I see in your code, you are going against date to move data. You can use that Date Column as a partitioning column and deal with the data movement once a month/quateryly(depending on your requirement and/or archival strategy).

    This way you can avoid Isolation levels from the picture. I am not against isolation levels but, I would avoid them as much as I can in my system unless until necessary.

    Thanks

    Jagan K

    Thanks
    Jagan K

  • Thanks for confirming that - was hoping that might have been READ UNCOMMITTED 🙂

    As you would expect, as soon as you insert the data in your audit table the check query should block until the transaction is committed. Even with snapshot read committed I'd still expect to see a consistent set of results as per the uncommitted state and no blocking. Every test i have been able to perform has behaved as expected.

    Its unlikely, but could it be a timing thing, that the initial RangeS-S KEY locks are not enough...maybe try adding a hint to force taking an exclusive lock, or be more aggressive with more restrictive locks? Not recommending that as a solution though 🙂

    Are you getting errors from the catch statement? table variables arent effected by transactions, so theres an error scenario where the data being inserted/delete might not have the Range-S locks in place, although I still dont really see whats going on.

    It could be your queries are using different a locking strategy and what you are seeing are the side effect known as phantom rows which can happen when reading from a table at anything other than SERIALIZABLE isolation levels.

    It might be worth trying to capture the locks on both sessions to see whats happening.

    Sorry I cant be more definitive.

    Jamie

  • jvkondapalli (2/19/2013)


    Correct me if I am wrong here.. looks like you are trying to archive data.

    How big is the data size in the table?

    If the data size is significantly big, why don't you use partitioning? I see in your code, you are going against date to move data. You can use that Date Column as a partitioning column and deal with the data movement once a month/quateryly(depending on your requirement and/or archival strategy).

    This way you can avoid Isolation levels from the picture. I am not against isolation levels but, I would avoid them as much as I can in my system unless until necessary.

    Thanks

    Jagan K

    I think the question was more focused on explaining what she is seeing, rather than discussing other solutions (further emphasised by Gails response to my suggestion for an alternative solution :)). Also, partitioning is only available on Enterprise editions, this may not be available.

    Jamie

  • Gail,

    Apart from forcing a hint, I would like you revisit the TOP clause. I have seen TOP behaving unexpectedly particularly when you dont have an order by. Its is a wild guess, but please let us know if you got some breakthrough

    ----------------------------------------------------------------------------------------------------------------------------------------------------
    Roshan Joe

    Jeff Moden -Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • jvkondapalli,

    I am archiving data. Currently, the AUDITMSG table contains 17,303,002 rows. The OLDAUDIT table contains 69,227,085 rows. The tables are not partitioned but use different file groups. The data archiving was previously implemented to execute several times nightly and non-atomically in increments of 100,000 rows, before I was hired. This was causing table locks and reports executing in the middle of the night were timing out. I have implemented the archiving as a "PACMAN" style job that executes continuously, around the clock, in small "nibbles" of 10 rows per second, so no one is impacted. I have fallen back to using the READCOMMITTED isolation level which should be, and should have been, adequate in the first place.

    Jamie,

    I asked a DBA friend of mine to try to duplicate the problem. He couldn't. In testing to eliminate the problem I'm seeing, I implemented HOLDLOCKs hints for every step of the transaction. They had no effect at all. That's why I escalated the entire transaction to SERIALIZABLE, to try to eliminate the problem. I'm getting no errors in the CATCH statement. The query executed all weekend on our production server and worked perfectly, except that I know that there is a small window of time where the rows can be seen in both tables, even with READCOMMITTED isolation. I am going to change the TABLE variable to a TempDB table and see if it fixes the problem. I hadn't considered that table variables might be treated differently.

    joeroshan,

    I appreciate your suggestion on the TOP clause. I could use an ORDER BY clause to force an order but that would be very expensive. I've used TOP in similar queries with success. The main thing is that I need that part of the operation to read and lock 10 rows that can be escalated to exclusive locks to perform the deletions. So any 10 rows that meet that criterion will do.

    Thank you all for your responses and ideas. I'm going to try some more tests with your ideas and I'll respost, even if I have no success.

  • I substituted a TempDB table for the TABLE variable. No change.

  • Given the rare occurrences and not being able to reproduce it either by myself nor by your DBA friend, I am only able to suggest it is a side effect off phantom rows which can exists out with of SERIALIZABLE - both the archive and the checker need to work in this isolation level. It is after all why this isolation level exists.

    I think it would be fairly involved to capture enough system data and to analyse it to work out exactly whats happening, I'd certainly be interested in hearing your results if you go down this path!

    Personally, I'd reimplement the solution (using DELETE...OUTPUT or another suggested solution - whichever is best for your solution) to try and minimise any time spent within the transaction, before forcing such a high level isolation level.

    I guess you could also try using PAGE locks rather than ROW locks, and up the number of rows you archive at a time. It might be more aggressive, but it could give you the room you need to work at a lower isolation level.

    Jamie

  • Thanks for the advice, Jamie.

    I've fallen back to my original READCOMMITTED transaction isolation level. It should work. I don't know why it appears not to.

    As you suggested, I experimented with setting both the data archive/delete query and the test query to the SERIALIZATION transaction isolation level. This produced a deadlock on the test query which was not a surprise. I increased the interval between loop executions in the data archive/delete query and the test query executed without incident. Unfortunately, it was not a complete test because at this moment, there are no rows to be archived/deleted.

    Your comments, this test, and some additional research I've done has made me realize that my fallacy was almost certainly that I had 2 different queries looking at the same data with different levels of transaction isolation. Phantom reads were definitely the symptom I experienced.

    Thank you, all of you, for your responses.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply