PAGELATCH Contention

  • Summary information: Exclusive, Update, and Share Page Latch events were identified on specific temporary tables within TEMPDB during the analyzed period. A SQL Server background process was being blocked on a temporary table owned by the suspect process. This contention caused the LOGON TRIGGER to continually fail preventing other sessions from logging into the instance. During the analyzed period, the suspect process is performing a large number of inserts into the temporary table in question.

    NOTE: Posted on SQLTeam.com as well

    http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=159128

    Enviroment:

    WINDOWS 2003 Enterprise (Cluster Active/Passive)

    SQL Server 2005 Enterprise (SP3 CU6)

    Problem Statement:

    Time window of occurrence: 5 minutes.

    This Instance utilizes a logon trigger to prevent unauthorized access via SSQL Server Accounts. The trigger has been in place for two years without any issues. Recently we had been experiencing failed login attempts with many backend process accounts.

    “Logon failed due to trigger execution”

    Upon further research we noticed excessive buffer latch contention (PAGELATCH_UP) referenicng the MASTER database during this specific timeframe. Compared to our baseline buffer latch waits have increased by 1000%. Also, CPU consumption increased by 36% compared to the baseline. Overall workload increased from 7 active sessions to 55 active sessions during the analyzed period.

    Additional Information:

    Server has plenty of memory to accommodate the additional sessions. 64GB RAM allocated to SQL Server. Page Life 18K.

    Has anyone run into similar issues with the use of Login Triggers

    1. We’ve found that there is a new process running inserting massive amounts of data into a temporary table. This process loops through 1,000’s of records calculating financial returns. Once we’ve killed this particular thread the latch contention within MASTER subsided. Note: this process runs against an user defined database, not master directly. However, the logon trigger is associated with MASTER, which is my opinion.

    2. The duration of today's incident lasted for 1 hour.

    More information: During the time we are experiencing heavy PAGELATCH contention within the MASTER database a SQL Server background process, running out of the MASTER database, is being blocked. The background process, command=TASK MANAGER, is attempting to acquire a LCK_M_Sch_S on a temporary table in TEMPDB. This temporary table is locked by the process mentioned earlier. I need to understand what this background process is doing and why does it require a LCK_M_Sch_S on this processes temporary table.

    Finding what the background process is may shed some light on why the logon trigger is failing.

  • I'm afraid I don't know, in particular, why Task Manager wants a shared shema lock on a temp table.

    However, regarding your increase in pressure on the Logon Trigger, check for newly deployed code that doesn't persist its connection. I've seen pressure like that show up when a front end process (for example) runs an iterative CRUD call to update a series of records, but the connection isn't persisted. When that happens, it logs in hundreds of times a second, which caused us serious issues.

    I would confer with whatever app teams have done recent deployments against databases on that box.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • @Craig Farrell has given a plausible explanation.

    What is the default dB for the app account? It may not have permissions in Master... maybe change its default to the app dB.

    Regards

  • Good Information, I will take a look the connections for this particular application. Thanks again!

  • 1) can you post the code for the login trigger?

    2) if you are having an 'outage' of this nature, I recommend you get a professional on board to quickly isolate/fix the issue.

    3) this definitely reeks of a "what has changed lately" problem. 😀

    4) an obvious workaround is to simply disable the login trigger while this other proces is running - at least until you can find out WHY the process (may) is causing the issue and get it fixed.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • If you are referring to Microsoft CAT as the professional, we are considering it. We cannot disable the logon trigger due to audit compliance. What we have done is disabled the rouge process until it can be optimized.

    The question still stands, why is the SQL Server back ground process attempting to gain a schema lock on a temporary table owned by another session?

  • 1) lock reason: snapshotting perhaps? or deferred temptable dropping?

    2) still wondering about trigger code. can you post that?

    3) have you tried sp_whoisactive to see if you can get details about what each spid is actually doing? awesome freebie from Adam Machanic

    4) no idea what company you work for, but CAT focuses on BIGGIES and typically for longer-term engagements from what I know. If you go through MS support it could be quite some time before you get escalated to someone with enough chops to help you out. There are other consultants that can probably help you out more quickly.

    5) has someone identified massive login attempts by this rogue process you disabled?

    6) Just noticed that Steve Jones has tweeted about this using the #sqlhelp hash tag. Maybe someone will chime up - but we really don't have many details about the processes themselves so probably can't get much without that.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Trigger Code:

    BEGIN

    SET ANSI_PADDING ON

    SET CONCAT_NULL_YIELDS_NULL ON

    SET ANSI_WARNINGS ON

    DECLARE @LogonTriggerData xml,

    @PostTime datetime,

    @LoginName varchar(50),

    @HostName varchar(50),

    @LoginType varchar(50),

    @AppName varchar(128);

    set @LogonTriggerData = eventdata();

    set @PostTime = @LogonTriggerData.value('(/EVENT_INSTANCE/PostTime)[1]', 'datetime');

    set @LoginName = @LogonTriggerData.value('(/EVENT_INSTANCE/LoginName)[1]', 'varchar(50)');

    set @HostName = @LogonTriggerData.value('(/EVENT_INSTANCE/ClientHost)[1]', 'varchar(50)');

    set @LoginType = @LogonTriggerData.value('(/EVENT_INSTANCE/LoginType)[1]', 'varchar(50)');

    set @Appname = APP_NAME();

    if @LoginType = 'SQL Login'

    and @LoginName = (SELECT LogonAcct

    FROM dbamaint.dbo.audit_include

    WHERE LogonAcct = @LoginName

    AND LogonApp = @Appname)

    and @Appname = (SELECT LogonApp

    FROM dbamaint.dbo.audit_include

    WHERE LogonApp = @Appname

    AND LogonAcct = @LoginName)

    BEGIN

    ROLLBACK;

    END;

    end;

  • 3) have you tried sp_whoisactive to see if you can get details about what each spid is actually doing? awesome freebie from Adam Machanic

    Reply: have the procedure. We know what the process is doing.

    4) no idea what company you work for, but CAT focuses on BIGGIES and typically for longer-term engagements from what I know. If you go through MS support it could be quite some time before you get escalated to someone with enough chops to help you out. There are other consultants that can probably help you out more quickly.

    Reply: We are a very large company and have a relationship with MSFT.

    5) has someone identified massive login attempts by this rogue process you disabled?

    Reply: Have not had a chance, however, will get to this soon and post my results.

    6) Just noticed that Steve Jones has tweeted about this using the #sqlhelp hash tag. Maybe someone will chime up - but we really don't have many details about the processes themselves so probably can't get much without that.

    Reply: the process uses a parent procedure with nested child procedures. The nesting level is very deep which becomes a nightmare to troubleshoot. Also, this code was written when the database resided on SQL Server 6.x. Development continues to push back on spending the time and money to optimze the code.

  • Poking around in BOL for schema locks:

    Schema modification (Sch-M) locks are used when a table data definition language (DDL) operation (such as adding a column or dropping a table) is being performed.

    Schema stability (Sch-S) locks are used when compiling queries. Schema stability (Sch-S) locks do not block any transactional locks, including exclusive (X) locks. Therefore, other transactions can continue to run while a query is being compiled, including transactions with exclusive (X) locks on a table. However, DDL operations cannot be performed on the table.

    So it appears the query engine is preparing a query and is attempting to get a picture of the temporary object in question but the other process is preventing it. Sch-S seem like they would be lightweight by the description.

    PAGELATCH contention could come from heavy inserts on the same dB page.

    I know I haven't answered your question but maybe this will help you determine your next path...

  • There is heavy insert and update activity being performed against the temporary table, also with an active explicit transaction. however, the background process being blocked is causing the login trigger to fail.

    I have asked development to batch their inserts which has appeared to help the cause. Moreover, an understanding of what this background process is actually doing is something I need an answer to. Eventually I will have to open a case with MSFT to help me understand this.

  • Alfredo Giotti (4/26/2011)


    There is heavy insert and update activity being performed against the temporary table, also with an active explicit transaction. however, the background process being blocked is causing the login trigger to fail.

    I have asked development to batch their inserts which has appeared to help the cause. Moreover, an understanding of what this background process is actually doing is something I need an answer to. Eventually I will have to open a case with MSFT to help me understand this.

    The Task Manager, as I understand it, pairs batches (tasks) with a worker thread. A thread will be responsible for the entire life of the batch. It is part of the SQLOS and through co-operative scheduling allows for high concurrency. The TM polls the workers constantly and attempts to balance the load on them.

  • I wonder if the SET statements in your trigger aren't causing extra recompiles - which can cause some interesting bottlenecks. Check for that, and if they are happening try removing the SET statements. Personally I question why you need those - what is the reason behind them (6.x backwards compat perhaps)?

    Been a LONG time since I have heard 6.5 mentioned!! Oh, those were the days - we actually had knobs we could tweak in the engine! 🙂

    Do you have any versioning going on (i.e. other non-login triggers)? Still trying to find things that might be monkeying with tempdb.

    Also, try changing the @var = (select field from table) code to EXISTS statements and check the plans and costing. If this login trigger gets called a bunch it needs to be absolutely as efficient as possible. Not sure, but EXISTS could be a tad better. Also see if grouping all those separate SET statements into one SELECT gives improved performance. I have had wins in the past doing that, but haven't tried it with XML parsing like you are doing. Oh, and since this is a login trigger, can the type ever be anything other than 'SQL Login'? If not, that is an unnecessary IF statement.

    Good luck with your troubleshooting! This sounds like a REALLY interesting problem, and I don't get to see much that is interesting any more. Please keep us (or me via PM) up to date with what you discover!

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Alfredo Giotti (4/26/2011)


    Trigger Code:

    BEGIN

    SET ANSI_PADDING ON

    SET CONCAT_NULL_YIELDS_NULL ON

    SET ANSI_WARNINGS ON

    DECLARE @LogonTriggerData xml,

    @PostTime datetime,

    @LoginName varchar(50),

    @HostName varchar(50),

    @LoginType varchar(50),

    @AppName varchar(128);

    set @LogonTriggerData = eventdata();

    set @PostTime = @LogonTriggerData.value('(/EVENT_INSTANCE/PostTime)[1]', 'datetime');

    set @LoginName = @LogonTriggerData.value('(/EVENT_INSTANCE/LoginName)[1]', 'varchar(50)');

    set @HostName = @LogonTriggerData.value('(/EVENT_INSTANCE/ClientHost)[1]', 'varchar(50)');

    set @LoginType = @LogonTriggerData.value('(/EVENT_INSTANCE/LoginType)[1]', 'varchar(50)');

    set @Appname = APP_NAME();

    if @LoginType = 'SQL Login'

    and @LoginName = (SELECT LogonAcct

    FROM dbamaint.dbo.audit_include

    WHERE LogonAcct = @LoginName

    AND LogonApp = @Appname)

    and @Appname = (SELECT LogonApp

    FROM dbamaint.dbo.audit_include

    WHERE LogonApp = @Appname

    AND LogonAcct = @LoginName)

    BEGIN

    ROLLBACK;

    END;

    end;

    Can you audit the the process how many times it login to the system when its active, could this part in the trigger cause this ?

    if @LoginType = 'SQL Login'

    and @LoginName = (SELECT LogonAcct

    FROM dbamaint.dbo.audit_include

    WHERE LogonAcct = @LoginName

    AND LogonApp = @Appname)

    and @Appname = (SELECT LogonApp

    FROM dbamaint.dbo.audit_include

    WHERE LogonApp = @Appname

    AND LogonAcct = @LoginName)

    BEGIN

    ROLLBACK;

    END;

  • We had to include the SET statements because the trigger was failing without them. We first attempted to place the SET statement outside the procedure which did not work. Therefore, we had to place them inside.

    There can be either SQL logins or Windows Logins. The reason we test on SQL Logins is due to old legacy application, hundreds, which the development and end-user community know. therefore, we need to prevent unauthorized access via this trigger from certain applications. i.e. SSMS, QA... We are currently looking at a product by IBM, Guardian, to help prevent this type of access, amongst other cool features. Once this is implemented, I will remove the logon trigger.

    Versioning? The only versioning operations that is occurring is our online index rebuilds. Index maintenance is completed hours prior to the execution of this rouge process. Snapshot Isolation and Read Committed Snapshot Isolation are not allowed.

    I will work on tweaking the SELECT statement to include the EXISTS clause and will post back results.

    I am still researching the number of connections this application utilizes, more to come on this.

    Yes this is very interesting, I will keep everyone posted on what I find.

    Thank you very much for your valuable input.

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply