Job schedule with availability groups ( 4 nodes )

  • Hello,
    We seem to be having an issue of a SSIS job running on all nodes instead of only the first node when not in a failover mode.
    It is spawning errors from the secondary nodes and we have even created a step in the job to check the active node as such.

    DECLARE @role int
        DECLARE @status tinyint
        SET @role = (SELECT [role] FROM [sys].[dm_hadr_availability_replica_states] hars INNER JOIN [sys].[availability_databases_cluster] adc ON hars.[group_id] = adc.[group_id] WHERE hars.[is_local] = 1 AND adc.[database_name] ='SSISDB')
        IF @role <> 1
        BEGIN
        RAISERROR('Database is secondary.', 11, 1);
        END

    The logs seem to say that it is running at the same time on some of the nodes which in turn produces object errors because of staging tables not existing.

    Any ideas ?

    Much appreciated

  • What does your step behavior look like? For example, after you run the is primary check in your first step, do you have to proceed to next step even on failure?

    I run similar checks on my secondaries but I don't make it it's own step. I have it begin the execute if primary, else, end.

    DECLARE @ServerName NVARCHAR(256) = @@SERVERNAME
    DECLARE @RoleDesc NVARCHAR(60)

    SELECT @RoleDesc = a.role_desc
      FROM sys.dm_hadr_availability_replica_states AS a
      JOIN sys.availability_replicas AS b
       ON b.replica_id = a.replica_id
    WHERE b.replica_server_name = @ServerName

    IF @RoleDesc = 'PRIMARY'
    BEGIN
    --Run your SSIS execution script here
    END

  • I apologize if unclear 🙂
    The job is set up on the primary node and replicated on the secondary nodes ( 4 of them ). It seems like we are getting a failure mail from the secondary nodes ( random 2 to 4 ) although it completes on the primary.

    I was not the one who set this up and have it as a first step for all jobs on the HA nodes.
    Therefore STEP 1 is the check as above and then

    ON SUCCESS
    go to next step

    ON FAILURE
    quit the job reporting success

    My question is why is even going to next step 2 when it reports the node as secondary and sending us a FAIL 
    is the logic correct ?

    Thanks

  • If you run your check manually in a query window on a secondary, does it properly raise the error? I would start looking at the actual SQL at this point now that you have confirmed proper step behavior is in place. The below SQL should provide a solution for what you are trying to do. This properly throws the error if it is ran on a secondary. if it is ran on the primary, it will not raise the error.


    DECLARE @ServerName NVARCHAR(256) = @@SERVERNAME
    DECLARE @RoleDesc NVARCHAR(60)

    SELECT @RoleDesc = a.role_desc
      FROM sys.dm_hadr_availability_replica_states AS a
      JOIN sys.availability_replicas AS b
       ON b.replica_id = a.replica_id
    WHERE b.replica_server_name = @ServerName

    IF @RoleDesc = 'SECONDARY'
    BEGIN
    RAISERROR('Database is secondary.', 11, 1)
    END

  • One more thing to add is that node 4 has an error with AG synchronisation
    can this be a factor with the job?

  • An error synchronizing should not affect whether the instance is considered a primary or a secondary under asynchronous mode.

  • thalsell - Tuesday, September 25, 2018 6:57 AM

    An error synchronizing should not affect whether the instance is considered a primary or a secondary under asynchronous mode.

    Strange that the job is indicating a failure message from the other nodes even with the HA check step.

  • Have you manually ran the checkstep SQL in a query window on a secondary yet? I am really leaning towards that step not being good. It should properly raise an error in the results. If it doesn't you know it doesn't work. What I provided above works.

  • We have to rethink this thanks a lot for your help !

    will report back

  • johnnyrmtl - Tuesday, September 25, 2018 8:02 AM

    We have to rethink this thanks a lot for your help !

    will report back

    Since I did not write the job and do not have access to all the nodes , Is a way to check from the primary node or HA logical node where the job ran and if multiple times ?

  • As far as I know you need some level of access to view the agent history / job activity monitor on each node. 

    Since they are all trying to execute the same SSIS package... If you have access to the server that hosts the package, you can right click the package -> reports -> view all executions.

  • Travis Halsell - Thursday, September 27, 2018 9:38 AM

    As far as I know you need some level of access to view the agent history / job activity monitor on each node. 

    Since they are all trying to execute the same SSIS package... If you have access to the server that hosts the package, you can right click the package -> reports -> view all executions.

    I completely agree and the admin does not seem to understand my reasoning.
    I indeed did view that report and I see two identical executions ( start times ) of the job one failed and one succeeded but not sure on which node it failed.

  • I believe you can see the server that attempted to execute the package if you dig a little deeper in the report

  • Travis Halsell - Thursday, September 27, 2018 10:04 AM

    I believe you can see the server that attempted to execute the package if you dig a little deeper in the report

    I drilled down within the execution info to both the one that worked and the failure and they both point to the primary mode 🙁

  • Do you have 2 jobs on the primary executing the same package at the same time?

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply