Stairway to SQL Server Agent - Level 12: Using MSX/TSX to scale out Job Management

  • Comments posted to this topic are about the item Stairway to SQL Server Agent - Level 12: Using MSX/TSX to scale out Job Management

  • This is nice article..

    when i tried to implement the Enlist and Ensure process doesn't work. so do i have to make any security changes ?

  • Are the service accounts the same? What was the exact error you got?

  • Says Access Denied..

  • What are some of the bigger MSX/TSX deployments out there? Are there upper limits? Has anyone done or seen large scale enterprise deployments?

  • I would like to use a master job to deploy maintenance scripts. One of the requirements is to stagger the schedule. For example, separate the DBCC a few minutes for instances sharing the same set of disks (either on direct attached disk or on a SAN). I can't think of a way to do this except perhaps to create a master job that creates and schedules a regular job. Sure would be nice if there was an option to add some customization for each target.

    Also, when supporting 2008+, do you have to use a SQL 2008 server for the master? Can you use SQL 2012 with jobs provided everything in the job is compatible with a SQL 2008 system?

    RandyHelpdesk: Perhaps Im not the only one that does not know what you are doing. 😉

  • I also noticed that there is no option to run the MSX job on the master. The master can't enlist itself or another master. It appears I will need to keep a duplicate copy of the maintenance jobs (different names) on the master server. It's a nice feature, but would be a lot nicer if it had a little flexibility.

    RandyHelpdesk: Perhaps Im not the only one that does not know what you are doing. 😉

  • Hi

    To stagger the jobs on the different target servers you would have to, in the 1st job step for example, have a 'randomizer' based on the host name.

    Eg:

    DECLARE @HostN VARCHAR(128) = @@ServerName

    IF @HostN = ServerA WAITFOR DELAY '00:01:00'

    Etc

  • Re the 2012 / 2008 question. Yes, you can use a SQL 2012 MSX server and enlist 2008 R1 and R2 and SQL 2005 servers if you have those still. Obviously be aware of using 2008 or above features in jobs that will target 2005 servers. These will not work. What works quite well is to create Target server groups, see BOL for detail. You could then create a 2005 group if this is an issue. You can also populate your Target server groups from a SQL CMS server which may have groups defined in it.

  • I was thinking of a waitfor, but that would make the run time information harder to use. I want to know how long the backup takes, for example. If it runs 20 minutes rather than 10 minutes, I will want investigate. However, perhaps the metric is the actual end time of the last backup - one metric for all servers. I could also create a report that gets the backup times directly instead.

    In the long run, I was hoping the master job script can include a list of average runtimes. (With a job that collects this data and updates the job being deployed.) I suppose I could collect the second step time from each instance and create a wait that depends on the sum from instances "scheduled" before it (e.g., those prior in the list).

    I also can't have it completely random because of other non-maintenance jobs. However, I do like the idea. It would be an easy way to handle some of the servers on the SAN.

    RandyHelpdesk: Perhaps Im not the only one that does not know what you are doing. 😉

  • If I either remove the target or delete the master job, the job on the target gets removed - as it should. I'm curious - is it possible to not delete the job and convert it to a local job instead? Thanks.

    RandyHelpdesk: Perhaps Im not the only one that does not know what you are doing. 😉

  • Hi Randy

    Yes there is a way to convert an MSX job to a local job. But remember, with great power comes great responsibility. ;-).

    In the msdb db on the target server in the sysjobs table, is a column - originating_server_id. For jobs that originate from a Master server, these are a 1. Update the 1 to a 0 and the job becomes local.

  • Hi Randy

    Re your average run time per server job = backup time. I would, as you intimated, rather directly query the backupset table in msdb and do a DATEDIFF on the backup_start_date and backup_end_date columns to get the backup run time. You could do this to each server individually or through a CMS distributed query or using powershell to loop through a list of servers and populate a central table. Just some ideas.

  • Thanks for all the advice. I'm still trying to figure out the "best" way to go. I'm also considering creating a group of "identical" master jobs that very only by schedule. Each would be a "schedule bin" of sorts to place instances. One master job for all servers sharing the same schedlue for that job. Might be too much clutter.

    I think the idea of deploying a single job, coverting it to a local job, and then updating the schedule is interesting. Some of the targets will be production servers. A solution based on hacking system tables will not likely be used. I tried hacking the record on a local test instance. I had to also change the category_id to 0 to "unlock" it. A refresh of the jobs folder in SSMS allowed it to open without error. Sure enough, I was able to add a schedule and rename the job. I ended up enabling one of the master schedules, but it had to be also "localized".

    After a call to sp_resync_targetserver on the master, the "converted" job gets replaced by the master copy. It uses the job_id for the sync, so it would not matter if the job was renamed.

    If I also include a version number in the job name, I can effectively move the test servers first by removing them from the old bin and adding them to the new bin.

    Here is the first step in my test job if you are curious. The master job has 5 schedules named with "Bin 0" to "Bin 4". Only "Bin 0" is enabled so that the job can run within 5 minutes - every 5 minutes. Only the first step is configured to run. The first step removes the "Bin 0" schedule, enables the correct schedule, and then removes the first step of the job. Then the second step is now the first step and will run with the new scheduled. Crazy, but it actually works. I thought the removal of the "Bin 0" would not work as coded, but it appears the @delete_unused_schedule=1 applies to MSX schedules.

    -- This job step only exists because the job has not been localized.

    declare @JobID uniqueidentifier

    set @JobID = $(ESCAPE_SQUOTE(JOBID))

    declare @StepID int

    set @StepID = $(ESCAPE_SQUOTE(STEPID))

    declare @LocalOpName nvarchar(128)

    set @LocalOpName = 'RStone'

    declare @LocalOpAddress nvarchar(128)

    set @LocalOpAddress = 'rstone@marincounty.org'

    declare @OperatorID int

    select @OperatorID = id from msdb.dbo.sysoperators where name = @LocalOpName

    -- 1. Determine which of the job schedules to enable.

    declare @ScheduleBin int

    /*

    if @@SERVERNAME like '[a-g]%' set @ScheduleBin = 1

    else if @@SERVERNAME like '[h-m]%' set @ScheduleBin = 2

    else if @@SERVERNAME like '[n-s]%' set @ScheduleBin = 3

    else set @ScheduleBin = 4

    */

    if @@SERVICENAME in ('STG') set @ScheduleBin = 2

    else if @@SERVICENAME in ('TST') set @ScheduleBin = 3

    else if @@SERVICENAME in ('TRN') set @ScheduleBin = 4

    else set @ScheduleBin = 1

    declare @ScheduleID int

    select @ScheduleID = s.schedule_id

    from msdb.dbo.sysschedules s

    inner join msdb.dbo.sysjobschedules js

    on js.schedule_id = s.schedule_id

    where job_id = @JobID and s.name like '% Bin ' + LTRIM(STR(@ScheduleBin )) + ' %'

    -- 2. Create local operator to replace MSXOperator

    if not exists(select * from msdb.dbo.sysoperators where name = @LocalOpName)

    begin

    EXEC msdb.dbo.sp_add_operator

    @name=@LocalOpName,

    @enabled=1,

    @weekday_pager_start_time=90000,

    @weekday_pager_end_time=180000,

    @saturday_pager_start_time=90000,

    @saturday_pager_end_time=180000,

    @sunday_pager_start_time=90000,

    @sunday_pager_end_time=180000,

    @pager_days=0,

    @email_address=@LocalOpAddress,

    @category_name=N'[Uncategorized]'

    end

    -- 3. Localize this job and the schedule to be enabled

    if exists(select * from msdb.dbo.sysjobs where job_id = @JobID and originating_server_id = 1)

    begin

    update msdb.dbo.sysjobs

    set originating_server_id = 0,

    category_id = 0,

    owner_sid = 0x01

    where job_id = @JobID

    update msdb.dbo.sysschedules

    set originating_server_id = 0

    where schedule_id = @ScheduleID

    end

    -- 4. Change operator for email notification

    if not exists(select * from msdb.dbo.sysjobs where job_id = @JobID and notify_email_operator_id = @OperatorID)

    begin

    execute msdb.dbo.sp_update_job

    @job_id = @JobID,

    @notify_email_operator_name = @LocalOpName

    end

    -- 5. The "Bin 0" schedule only exists to run this step once - remove it

    declare @Bin0ScheduleID int

    select @Bin0ScheduleID = s.schedule_id

    from msdb.dbo.sysschedules s

    inner join msdb.dbo.sysjobschedules js

    on js.schedule_id = s.schedule_id

    where job_id = @JobID and s.name like '% Bin 0 %'

    EXEC msdb.dbo.sp_detach_schedule

    @job_id=@JobID,

    @schedule_id=@Bin0ScheduleID,

    @delete_unused_schedule=1

    -- 6. Enable the appropruate schedule

    EXEC msdb.dbo.sp_update_schedule

    @schedule_id=@ScheduleID,

    @enabled=1

    -- 7. Delete this step. The job is not configured to go to the next step.

    execute msdb.dbo.sp_delete_jobstep @job_id = @JobID, @step_id = @StepID

    RandyHelpdesk: Perhaps Im not the only one that does not know what you are doing. 😉

  • After all that, I am wondering if perhaps I can't just change the bit on the target schedule, edit the schedule, and then change the bit back.

    RandyHelpdesk: Perhaps Im not the only one that does not know what you are doing. 😉

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply