Logon trigger rejects connections when CPU use is high

  • Hi,

    I've implemented a logon trigger designed to prevent a specific user from accessing the SQL Server instance other than via our website (so I'm checking the host is in the list of webserver IP addresses, and the program_name in sys.dm_exec_sessions for the session in question is either '.Net SqlClient Data Provider' or 'EntityFramework'.

    When rejecting a logon attempt, I write the details to a dedicated table in my event recording database.

    This all runs perfectly most of the time (it's been live for 2 days so far on a very busy system), but I have seen three patches of 3 or 4 minutes where I have a large number of rejected logons with the log entry saying 'Logon failed for login '[random user]' due to trigger execution.' for logon attempts from accounts the trigger does not target, and at all other times allows through OK.

    None of these are written to my rejection table, so the trigger doesn't appear to be executing successfully (the trigger logic doesn't rollback for these logins, and always records whenever it does rollback - neither are happening correctly so it seems the trigger is failing somehow?)

    Investigation shows that these patches of erroneous rejections coincide with periods of high (>90%) CPU use on the server...

    Has anyone seen this before? Is the trigger failing because of a resource issue (timeouts due to CPU contention perhaps?)?

    The CPU spikes are a regular feature before I implemented the trigger, so I don't think the connection rejection is causing the CPU usage to rise...

    Does anyone have anything to suggest - an explanation, a workaround, or anything else that might help me sort this out?

    Thanks in advance...

  • i would lean more towards a logic error in the trigger.

    the login trigger clearly rejected the login, but that might be because the login user does not have read permissions to sys.dm_exec_sessions, or the logic to write to a log might be killed due to permissions to write to the table, or goes down a logic tree of IF statements that raise the error, but don't write to your log.

    show us the trigger itself, and we can offer some peer review.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • I would have thought a logic error would be a logic error all the time, not just for 4 minute bursts? That's why I am looking at outside factors.

    This is the trigger (with IP addresses and login name redacted) - it works 99% of the time, no other users get rejected apart from in those short periods of high CPU, and when 'illegal' attempts are made to logon with the restricted login it is rejected and details of the attempt are written to the table...

    CREATE TRIGGER RAC_logon_trigger

    ON ALL SERVER

    WITH

    EXECUTE AS 'sa'

    FOR LOGON

    AS

    BEGIN

    DECLARE @User NVARCHAR (100) = ORIGINAL_LOGIN();

    DECLARE @IP TABLE (IPAddress VARCHAR (15));

    DECLARE @data XML;

    SET @data = EVENTDATA();

    DECLARE @SessionId INT;

    DECLARE @AppName SYSNAME;

    DECLARE @HostName SYSNAME;

    SELECT @SessionId = @data.value ('(/EVENT_INSTANCE/SPID)[1]', 'int');

    SELECT @HostName = @data.value ('(/EVENT_INSTANCE/ClientHost)[1]', 'sysname');

    -- Select the application name

    SELECT @AppName = [program_name]

    FROM sys.dm_exec_sessions

    WHERE session_id = @SessionId;

    INSERT INTO @IP

    (

    IPAddress

    )

    VALUES

    ('xxx.xxx.xxx.xx')

    ,('xxx.xxx.xxx.xx')

    -- , <etc. list of IP addresses>

    IF @User = '<Specific user name to be restricted>'

    AND (

    @AppName NOT IN ('.Net SqlClient Data Provider' ,'EntityFramework')

    OR

    NOT EXISTS (SELECT 1

    FROM @IP

    WHERE IPAddress = @HostName)

    )

    BEGIN

    ROLLBACK;

    INSERT INTO <rejected login logging table>

    (

    [User]

    ,HostName

    ,LoginTime

    ,SessionId

    ,AppName

    )

    SELECT @User

    ,@HostName

    ,GETDATE()

    ,@SessionId

    ,@AppName

    END

    END;

    GO

  • i only see some minor suggestions so far, still reading, but it looks fine to me.

    my permissions concert is alleviated by the EXECUTE AS statement.

    Besides TCP that you are testing for, it's possible to connect via named pipes or Shared memory.

    either of those would produce an unexpected value in clienthost from the XML, right? "<local machine>" would be something you still want to block for that login, right? no one logs in by RDP to the SQL server.

    i might change the test to include where net_transport='TCP' if it's relevant

    SELECT s.[program_name],c.[net_transport]

    FROM sys.dm_exec_sessions s

    inner join sys.dm_exec_connections c

    ON s.session_id = c.session_id

    WHERE s.session_id = @SessionId;

    I'm not sure, but can the @data.value return null? should @HostName be bulletproofed for nulls?

    lastly make sure the insert statement is a fully qualified name INSERT INTO master.dbo.<rejected login logging table> for example;

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Thanks for looking at this for me.

    I guess connection is possible by other means, but for ordinary users, they would still pass the trigger, and the restricted user would be rejected - same for "<local machine>" and any nulls in the @HostName variable (not sure it' possible, but if it is, it will still process through the trigger and reject just the one targetted user).

    The logging table is fully qualified 🙂

    I'm happy to have any and all strange logon attempts fail the logic and be rejected cleanly, and logged. If something crops up that way I can deal with it. The fact that other users are being rejected, and not logged, but only in short time intervals where there is high CPU (I suppose that could be coincidence, but the odds are heavily in favour of the two being at least related and probably causal) is my concern, and my question...

  • It's still happening - clearly related to the 90%+ CPU on the server (not the instance, that isn't reporting CPU pressure, just the server).

    Do you have any idea what mechanism could cause a trigger failure when the CPU is busy (though not 100%)?

  • Why do you need all those horrible XML and dmv queries?

    SELECT @SessionId = @data.value ('(/EVENT_INSTANCE/SPID)[1]', 'int');

    SELECT @HostName = @data.value ('(/EVENT_INSTANCE/ClientHost)[1]', 'sysname');

    -- Select the application name

    SELECT @AppName = [program_name]

    FROM sys.dm_exec_sessions

    WHERE session_id = @SessionId;

    Do you realise that any string processing is horribly slow in SQL, and parsing XML makes things even worse?

    What's wrong with system functions?

    SELECT @SessionId = @@SPID, @HostName = HOST_NAME(), @AppName = APP_NAME()

    _____________
    Code for TallyGenerator

  • It's also not such a bright idea to create a new object in tempdb on every logon.

    Get rid of @IP and replace it with a static table:

    1. Hardcoding parameters is not a good practice;

    2. no need to create new objects in the database;

    3. Because the trigger is executed quite often, and the table is updated not so often, the data from it will be kept in cache all the time, causing less overhead than using the table variable.

    _____________
    Code for TallyGenerator

  • Try this trigger:

    CREATE TRIGGER RAC_logon_trigger

    ON ALL SERVER

    WITH

    EXECUTE AS 'sa'

    FOR LOGON

    AS

    BEGIN

    IF ORIGINAL_LOGIN() = '<Specific user name to be restricted>'

    AND (

    APP_NAME() NOT IN ('.Net SqlClient Data Provider' ,'EntityFramework')

    OR

    NOT EXISTS (SELECT 1

    FROM dbo.Login_Allowed_IP

    WHERE IPAddress = CONNECTIONPROPERTY('client_net_address') )

    )

    BEGIN

    ROLLBACK;

    INSERT INTO <rejected login logging table>

    (

    [User] ,HostName ,LoginTime ,SessionId ,AppName

    )

    SELECT ORIGINAL_LOGIN() , HOST_NAME() ,GETDATE() ,@@SPID ,APP_NAME()

    END

    END;

    GO

    _____________
    Code for TallyGenerator

  • Thanks for the guidance Sergiy, I looked into what you suggested and implemented most of your ideas.

    CREATE TRIGGER [<the trigger>]

    ON ALL SERVER

    WITH

    EXECUTE AS 'sa'

    FOR LOGON

    AS

    BEGIN

    IF ORIGINAL_LOGIN() = '<the restricted user>'

    AND (APP_NAME() NOT IN ('.Net SqlClient Data Provider' ,'EntityFramework')

    OR

    NOT EXISTS (SELECT 1

    FROM <table of allowed host names> AS ah

    WHERE ah.HostName = HOST_NAME()))

    BEGIN

    ROLLBACK;

    INSERT INTO <denied connections auditing table>

    (

    [User]

    ,HostName

    ,LoginTime

    ,SessionId

    ,AppName

    )

    SELECT ORIGINAL_LOGIN()

    ,HOST_NAME()

    ,GETDATE()

    ,@@SPID

    ,APP_NAME()

    END

    END;

    It appears to have improved matters, but the problems still exists. When CPU is high, innocent connections get blocked, seemingly by the trigger failing to complete (again, there's no entry in the audit table, unlike when I try to log in in violation of the rules).

    There was a batch of blocked connections today where the CPU was fine, but there was a spike in disk queue length and disk transfers/sec... I'm more and more sure this is a resource contention issue, but I don't know what I can do about it.

    Is there a timeout value for the trigger itself I can adjust? I can't find any references to such a thing, but some SQL features are a bit obscure...

  • You may take the performance optimisation to the extreme, minimising the number of checks required for "ordinary" logons:

    BEGIN

    IF ORIGINAL_LOGIN() = '<the restricted user>'

    -- if it's not the restricted user we are not interested in anything else. Further checks for that restricted user only.

    BEGIN

    IF APP_NAME() IN ('.Net SqlClient Data Provider' ,'EntityFramework')

    -- If restricted user logs in from the allowed application we do not need to check for hosts.

    RETURN

    ELSE

    BEGIN

    -- only if all previous checks did not clear the alarm we're going to query allowed host names.

    IF NOT EXISTS (SELECT 1

    FROM <table of allowed host names> AS ah

    WHERE ah.HostName = HOST_NAME())

    BEGIN

    ROLLBACK;

    INSERT INTO <denied connections auditing table>

    (

    [User]

    ,HostName

    ,LoginTime

    ,SessionId

    ,AppName

    )

    SELECT ORIGINAL_LOGIN()

    ,HOST_NAME()

    ,GETDATE()

    ,@@SPID

    ,APP_NAME()

    END

    END

    END

    END;

    I did not check the code for errors, it's a template only.

    _____________
    Code for TallyGenerator

  • I had done much the same (the logic is a little different, but separating out the username test)... and removing the logging completely - the rejected logins get written to the SQ Server log and my monitoring software (Red Gate) picks them up from there too.

    I think it's as skinny as it can be now, so I just wait to see how it copes with the resource issues. I don't think there's anything else I can do to the trigger itself now.

    I'm still eager to hear from anyone who can explain what's happening (I have my theory of timeouts when contending for cpu or queueing for disk access, but nothing to say I'm right), and whether there's anything I can do about it - any configuration, settings etc. I can adjust to prevent the issue?

  • Did you check sp_configure 'user connections'?

    _____________
    Code for TallyGenerator

  • When you get this fixed to some level of satisfaction, you'll need to starting working on the real problem and that's to find out and fix what has your server at 90% CPU all the time. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sergiy (6/1/2016)


    Did you check sp_configure 'user connections'?

    User connections is 0, but that wouldn't make any difference to the trigger operation - if the users were being rejected bit that would happen with or without the trigger, and I assume would have it's own error message for refused connections?

    Jeff Moden (6/1/2016)


    When you get this fixed to some level of satisfaction, you'll need to starting working on the real problem and that's to find out and fix what has your server at 90% CPU all the time. 😉

    Ah, I know what the cause of the high CPU issue is - it's the backups of a large database on memory-restricted non-live instances on my server (UAT and Stage)... Not really much I can do about them in my current environment - I can't give those instances more memory, and I don't have the disk space available to allow me to turn off backup compression. I'm looking into the possibility of turning down the compression a little, but I can't turn it off altogether.

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply