Diagnosing Page Latch Issue

  • I'm working with both a Vendor's DB and a purge script they've provided, and I need some outside opinions.

    This purge runs incredibly long, especially on our SAN. Our first thought, was because the purge was querying tables using non-indexed columns, especially several large tables the big one being 60m rows and 160GB.

    We added indexes and we definitely saw the performance improvements you'd expect when just purging other tables. Additionally, after about an hour of the purge, I can query the big table being purged and according to SQL Server receive the expected post-purge counts/values.

    The catch here is that 5-6 hours after the table "looks" purged, the job is continuing to alternate in pagiolatch waits. Checking the resource information confirms that it's the same table requiring the wait.

    It doesn't appear to be a script issue, per se. But, does this suggest anything that I can give back to my SAN admins or the vendor?

    Portion of the script (implicit transactions is off by default in this environment):

    BEGIN

    SELECT@table_name=TABLE_NAME,@duration=PURGE_DURATION FROM OLTPDB_PURGE WHERE TABLE_NAME = 'CHECKPOINTDATA';

    SET @purge_date = CURRENT_TIMESTAMP-@duration;

    DECLARE m_cursor CURSOR FOR SELECT ID FROM CHECKPOINTDATA WHERE CREATEDATE < @purge_date

    OPEN m_cursor

    FETCH NEXT FROM m_cursor INTO @str_id;

    WHILE @@FETCH_STATUS = 0

    BEGIN

    DELETE FROM CHECKPOINTDATA WHERE ID = @str_id;

    SET @record_cnt = @record_cnt + 1;

    IF (@record_cnt >=1000)

    BEGIN

    COMMIT;

    SET @record_cnt = 0;

    END

    FETCH NEXT FROM m_cursor INTO @str_id;

    END

    CLOSE m_cursor

    DEALLOCATE m_cursor

    END

  • well it sounds like an io wait .. use the function sys.fn_virtualfilestats to see what's going on initially.

    the partial script you've pasted doesn't look like it's written for sql server - commit without begin tran won't do very much one way or another.

    you might want to check for open transactions - that might cause issues.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • Have them stop deleting data one row at a time!!! Using a cursor for this is ridiculous! This should be done in batches of X records at a time using an index seek to get to the proper ranges of batches to delete. I have done this type of stuff at many clients over the years and it is both incredibly efficient AND avoids locking large ranges of data.

    Oh, and there isn't a hint of error handling in that code!

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • brownph99 (3/10/2013)


    I'm working with both a Vendor's DB and a purge script they've provided, and I need some outside opinions.

    This purge runs incredibly long, especially on our SAN. Our first thought, was because the purge was querying tables using non-indexed columns, especially several large tables the big one being 60m rows and 160GB.

    We added indexes and we definitely saw the performance improvements you'd expect when just purging other tables. Additionally, after about an hour of the purge, I can query the big table being purged and according to SQL Server receive the expected post-purge counts/values.

    The catch here is that 5-6 hours after the table "looks" purged, the job is continuing to alternate in pagiolatch waits. Checking the resource information confirms that it's the same table requiring the wait.

    It doesn't appear to be a script issue, per se. But, does this suggest anything that I can give back to my SAN admins or the vendor?

    Portion of the script (implicit transactions is off by default in this environment):

    BEGIN

    SELECT@table_name=TABLE_NAME,@duration=PURGE_DURATION FROM OLTPDB_PURGE WHERE TABLE_NAME = 'CHECKPOINTDATA';

    SET @purge_date = CURRENT_TIMESTAMP-@duration;

    DECLARE m_cursor CURSOR FOR SELECT ID FROM CHECKPOINTDATA WHERE CREATEDATE < @purge_date

    OPEN m_cursor

    FETCH NEXT FROM m_cursor INTO @str_id;

    WHILE @@FETCH_STATUS = 0

    BEGIN

    DELETE FROM CHECKPOINTDATA WHERE ID = @str_id;

    SET @record_cnt = @record_cnt + 1;

    IF (@record_cnt >=1000)

    BEGIN

    COMMIT;

    SET @record_cnt = 0;

    END

    FETCH NEXT FROM m_cursor INTO @str_id;

    END

    CLOSE m_cursor

    DEALLOCATE m_cursor

    END

    What's going on in the rest of the script? Are you certain the waits are related to this portion of the code?

    As was previously stated, this is a majorly inefficient way of deleting multiple rows from a table. I would question why your vendor is sending you this script - it suggests a real lack of depth of understanding of T-SQL, SQL Server, and set-based operations.

    Jason Wolfkill

  • wolfkillj (3/13/2013)


    What's going on in the rest of the script? Are you certain the waits are related to this portion of the code?

    As was previously stated, this is a majorly inefficient way of deleting multiple rows from a table. I would question why your vendor is sending you this script - it suggests a real lack of depth of understanding of T-SQL, SQL Server, and set-based operations.

    The rest of the script is essentially a repetition of the block I sent, with some exceptions where additional child-table records are deleted. I watched the page resource waits, and getting the info via DBCC page(), the pages are on the same table that's been purged.

    As to the vendor logic, I'm in full agreement there. At the minimum I'm looking at turning IMPLICIT_TRANSACTIONS on before executing the script.

  • colin.Leversuch-Roberts (3/11/2013)


    well it sounds like an io wait .. use the function sys.fn_virtualfilestats to see what's going on initially.

    the partial script you've pasted doesn't look like it's written for sql server - commit without begin tran won't do very much one way or another.

    you might want to check for open transactions - that might cause issues.

    Actually, with SET IMPLICIT_TRANSACTIONS ON, the DELETE would implicitly open a transaction that must then be either committed or rolled back by the code.

    On my SQL Server 2008R2 instance, with SET IMPLICIT_TRANSACTIONS OFF, a COMMIT without a BEGIN TRANSACTION throws this error:

    Msg 3902, Level 16, State 1, Line 10

    The COMMIT TRANSACTION request has no corresponding BEGIN TRANSACTION.

    But of course the statements preceding the COMMIT are auto-committed when they complete, so the results are not affected.

    @brownph99, are you sure this code runs in with IMPLICIT_TRANSACTIONS OFF (if you have SET ANSI_DEFAULTS ON, then IMPLICIT_TRANSACTIONS will be ON also)? If so, does your code have a BEGIN TRANSACTION statement somewhere above the snippet you posted? If not, do you see the error message above? If you don't see the error message, that suggests that the code actually is running on a connection with SET IMPLICIT_TRANSACTIONS ON.

    If you are indeed running with SET IMPLICIT_TRANSACTIONS ON, I think I see the cause of your problem. If the code that populates your cursor returns 1125 rows, it loops 1000 times to delete 1000 rows, then the IF condition evaluates to TRUE and the code block with the COMMIT runs. Then, it loops through 125 times to delete 125 rows, but the IF condition never evaluates to TRUE, so the COMMIT never runs and you end up with an open transaction that is not committed or rolled back.

    Even if you are running with SET IMPLICIT_TRANSACTIONS OFF and have opened a transaction somewhere above this code block, that last loop of <1000 rows coming out of your cursor will not invoke the COMMIT and the transaction will remain open.

    So, do you have open transactions on this connection after the table "looks" purged?

    Jason Wolfkill

  • wolfkillj (3/14/2013)


    Actually, with SET IMPLICIT_TRANSACTIONS ON, the DELETE would implicitly open a transaction that must then be either committed or rolled back by the code.

    On my SQL Server 2008R2 instance, with SET IMPLICIT_TRANSACTIONS OFF, a COMMIT without a BEGIN TRANSACTION throws this error:

    Msg 3902, Level 16, State 1, Line 10

    The COMMIT TRANSACTION request has no corresponding BEGIN TRANSACTION.

    But of course the statements preceding the COMMIT are auto-committed when they complete, so the results are not affected.

    @brownph99, are you sure this code runs in with IMPLICIT_TRANSACTIONS OFF (if you have SET ANSI_DEFAULTS ON, then IMPLICIT_TRANSACTIONS will be ON also)? If so, does your code have a BEGIN TRANSACTION statement somewhere above the snippet you posted? If not, do you see the error message above? If you don't see the error message, that suggests that the code actually is running on a connection with SET IMPLICIT_TRANSACTIONS ON.

    If you are indeed running with SET IMPLICIT_TRANSACTIONS ON, I think I see the cause of your problem. If the code that populates your cursor returns 1125 rows, it loops 1000 times to delete 1000 rows, then the IF condition evaluates to TRUE and the code block with the COMMIT runs. Then, it loops through 125 times to delete 125 rows, but the IF condition never evaluates to TRUE, so the COMMIT never runs and you end up with an open transaction that is not committed or rolled back.

    Even if you are running with SET IMPLICIT_TRANSACTIONS OFF and have opened a transaction somewhere above this code block, that last loop of <1000 rows coming out of your cursor will not invoke the COMMIT and the transaction will remain open.

    So, do you have open transactions on this connection after the table "looks" purged?

    Right. With IMPLICIT_TRANSACTIONS ON, a transaction would be started and need to be explicitly committed. The difference in this case is instead of 1000 transactions when OFF, it's 1 transaction w/ 1k deletes.

    I haven't actually seen the compile-time error you described, but I did confirm that it does need IMPLICIT_TRANSACTIONS for the COMMIT to be worth anything.

    The problem with the <1000 rows is an actual bug in the script that I pointed out to the Vendor over a month ago, but it apparently fell on deaf ears.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply