Prior to the release of Contained Availability Groups, SQL Agent job management always seemed to be a little bit trickier than it needed to be?
Create the jobs on every replica? Yarp, most definitely. But, how do I control which jobs run on the replica that I want them to?
If you have an automated release pipeline that detects AGs and amends the jobs for you , then fantastic, you've got it sorted. I'll save you some time, this blog post isn't for you. But that isn't the majority of what's out there in my experience.
A Standard Approach
A common method is to code logic into the job, either around the existing step(s), or by adding a new step into the job to run first, and quit out if the AG replica state isn't in the expected state?
DECLARE @RoleDesc NVARCHAR(60)
--Work out Current Role
SELECT @RoleDesc=hars.role_desc FROM sys.availability_groups ag
INNER JOIN sys.dm_hadr_availability_replica_states hars ON ag.group_id=hars.group_id
INNER JOIN sys.availability_replicas ar ON hars.replica_id=ar.replica_id
WHERE ar.replica_server_name=@@SERVERNAME
AND ag.name=@AGName
IF @RoleDesc = 'PRIMARY'
BEGIN
...
etc.
Or even simpler at a database level...
Something like:
If sys.fn_hadr_is_primary_replica ( @dbname ) <> 1
BEGIN
-- If this is not the primary replica, exit ...ie
RAISERROR ('This is not the Primary Replica, exiting',16,1)
END
-- Rest of your wonderful code goes here if in the same job step.
If this is a separate Job step, then setting the logic to On Failure - Quit the job reporting Success will work nicely. I know I sound stranger than usual, but its exactly what we want to do. It won't light up our monitoring screens or alert inboxes, (that I know you all have) or wake us up on call at 2am. It just stops naughty bad things from trying to run where they shouldn't. Great, exactly what we want.
Or is it....?
<Cue dramatical music here...>
This is great when its fully accepted, and the job code itself can be amended? But what about:
The vendor deployments that don't care about customer set ups and availability groups? - "Its the customer's responsibility as they know their estate" etc.
The Developers who don't know what an Availability Group is, never mind terminology around Primaries, Secondaries etc.
The additional admin to make sure that every job is covered by the correct logic.
Also, you do get the inconvience of the Agent job looking like its executing on the Secondaries, which may cause confusion to the uninitiated. As ultimately the job IS running, just not running all code.
Tip: Don't enable/disable jobs on failovers manually... just don't, you're asking for additional pain and suffering.
So, how do we make this easier to manage?...how about a framework?...
A Framework Approach?
What if we could just have automation that monitors the state of the Availability Group(s), and then enables/disables the jobs, performs any config you want based on that state?
"Who knows? It’s a long shot, but it just might work..."
It may sound fiddly, but it isn't when its all laid out.
Assign Job Categories to the jobs that you want to run based on A.G. Name
Create Detect Jobs and Activity Jobs
Detect Jobs - monitor the state of an A.G, and as part of their actions, call the activity jobs dependant on the state change, then self disable and enable the other state check job. (ie... if a replica role transitions to Secondary, the jobs stop checking for the AG to become Secondary and start checking if its Primary instead.
Activity jobs perform the customizable steps that you require.
Par Exemple...
So, 4 jobs for an Availability Group named AG1, created on each replica
Replica1 (Primary)
Replica2 (Secondary)
AG1PriRoleCheck
AG1SecRoleCheck
AG1PriConfigJob (Unscheduled)
AG1SecConfigJob (Unscheduled)
On Replica1 (Primary), the Role check jobs are as follows.
AG1PriRoleCheck (Disabled)
AG1SecRoleCheck (Enabled)
So at every schedule interval specified (1 min, although could be 10sec dependent on your detection requirements). Check if Replica1 transitions to the Secondary Role.
At the same time...
on Replica2 (Secondary) - Role job checks are:
AG1PriRoleCheck (Enabled)
AG1SecRoleCheck (Disabled)
The jobs monitor if Replica2 transitions to the Primary Role.
Then a Role transition occurs...
So now,
Replica1 - AG1SecRoleCheck Job detects that Replica1 is now the secondary
Runs the AG1SecConfigJob - Setting any/all config to how we want on the Secondary
Disables the AG1SecRoleCheck Job - monitoring for the next transition back to primary
Enables the AG1PriRoleCheck Job
Replica2 - AG1SecRoleCheck Job detects that Replica2 is now the primary
Runs the AG1PriConfigJob - Setting any/all config to how we want on the Primary
Disables the AG1PriRoleCheck Job
Enables the AG1SecRoleCheck Job - monitoring for the next transition back to primary
Great, this now effectively has a closed loop self monitoring routine for role changes, and running whatever config that we want.
Yes, thats great, what about the logic behind?
Throw it in a procedure, and schedule that through the RoleCheck jobs...no messing around. This is the primary role, with the secondary just being the reverse.
CREATE OR ALTER PROCEDURE [dbo].[sp_AGPrimaryReplicaRoleCheck] @AGname sysname, @PrimaryConfigJob sysname, @PrimaryCheckJob sysname, @SecondaryCheckJob sysname
AS
BEGIN
--Check if AG is primary, if so, run Primary configuration job, disable primary check job and enable secondary check job
--Usage: EXEC sp_AGPrimaryReplicaRoleCheck @AGName='AG1', @PrimaryConfigJob='AG1PriJobConfig' , @PrimaryCheckJob='AG1PriRoleCheck' , @SecondaryCheckJob='AG1SecRoleCheck'
DECLARE @RoleDesc NVARCHAR(60)
--Work out Current Role
SELECT @RoleDesc=hars.role_desc FROM sys.availability_groups ag
INNER JOIN sys.dm_hadr_availability_replica_states hars ON ag.group_id=hars.group_id
INNER JOIN sys.availability_replicas ar ON hars.replica_id=ar.replica_id
WHERE ar.replica_server_name=@@SERVERNAME
AND ag.name=@AGName
IF @RoleDesc = 'PRIMARY'
BEGIN
PRINT 'Primary ROLE Detected'
--Run Config Steps for Primary
EXEC msdb..sp_start_job @job_name=@PrimaryConfigJob
--Now disable Primary check as server is now Primary
EXEC msdb.dbo.sp_update_job @job_name=@PrimaryCheckJob,
@enabled=0
--Now Enable Secondary Check that will monitor from this point
EXEC msdb.dbo.sp_update_job @job_name=@SecondaryCheckJob,
@enabled=1
END
END
And the secondary...
CREATE OR ALTER PROCEDURE [dbo].[sp_AGSecondaryReplicaRoleCheck] @AGname sysname, @SecondaryConfigJob sysname, @PrimaryCheckJob sysname, @SecondaryCheckJob sysname
AS
BEGIN
--Usage: EXEC sp_AGSecondaryReplicaRoleCheck @AGName='AG1', @SecondaryConfigJob ='AG1SecJobConfig' , @PrimaryCheckJob='AG1PriRoleCheck' , @SecondaryCheckJob='AG1SecRoleCheck'
DECLARE @RoleDesc NVARCHAR(60)
--Work out Current Role
SELECT @RoleDesc=hars.role_desc FROM sys.availability_groups ag
INNER JOIN sys.dm_hadr_availability_replica_states hars ON ag.group_id=hars.group_id
INNER JOIN sys.availability_replicas ar ON hars.replica_id=ar.replica_id
WHERE ar.replica_server_name=@@SERVERNAME
AND ag.name=@AGName
IF @RoleDesc = 'SECONDARY'
BEGIN
PRINT 'Secondary ROLE Detected'
--Run Config Steps for Secondary
EXEC msdb..sp_start_job @job_name=@SecondaryConfigJob
--Now disable Secondary check as server is now Secondary
EXEC msdb.dbo.sp_update_job @job_name=@SecondaryCheckJob,
@enabled=0
--Now Enable Primary Check that will monitor from this point
EXEC msdb.dbo.sp_update_job @job_name=@PrimaryCheckJob,
@enabled=1
END
END
GO
We've auto detected our role change, and called our config Job...
Q: what do we want in there?
A: Anything we like. Everyone's A.G. configs are different, so with this, we can specifically set what we would like running on our Primaries, and secondaries.
Job Categories.
I mentioned earlier that job categories play a part in this, and they do, as they can significantly narrow down the complexity of our Config Jobs.
Say that our Replicas above, have 10 Jobs, Job1 to Job10. Job 1, 2, 6, 7, 9 are only to run on the primary, Job 3,4,5,8,10 are to run on all replicas. I know...I know, what sort of ship is this i'm running with a set up like that, but its a valid question.
We could set each job to enable / disable accordingly as a JobStep in our ConfigJobs above, or we could just assign a category to those we want to address, and sweep through those in a standard first step.
ie... Create a Job Category "AG1"
EXEC msdb.dbo.sp_add_category
@class=N'JOB',
@type=N'LOCAL',
@name=N'AG1' ;
GO
Assign the jobs you want to only run on the Primary to this category. ie...
EXEC dbo.sp_update_job
@job_name = N'Job3',
@category_name = N'AG1';
GO
Then as the first step of your Config Job, call the code to sweep down these jobs and enable in the AG1PriConfJob or disable in the AG1SecConfJob.
The code is super simple: Pass a Job Category, and whether to Enable or Disable it.
CREATE OR ALTER PROCEDURE [dbo].[sp_AGJobStatusSet] @JobCat sysname, @Action char(1)
AS
BEGIN
--Disables or Enables all jobs in specified job category.
--Use for AG Failover
--USAGE: sp_AGJobStatusSet @JobCat='AG1', @Action=1
--@JobCat - Job Category to target
--@Action 1 = Enable, 0 = Disable
DECLARE @SQL varchar(100)
DECLARE JobLoopCursor CURSOR
FOR
select j.name from msdb.dbo.sysjobs j inner join msdb.dbo.syscategories c
on j.category_id=c.category_id
where c.name=@JobCat
DECLARE @jobname sysname
OPEN JobLoopCursor
FETCH NEXT FROM JobLoopCursor INTO @jobname
WHILE (@@fetch_status <> -1)
BEGIN
IF (@@fetch_status <> -2)
BEGIN
EXEC msdb.dbo.sp_update_job @job_name=@Jobname, @enabled=@Action
END
FETCH NEXT FROM JobLoopCursor INTO @jobname
END
CLOSE JobLoopCursor
DEALLOCATE JobLoopCursor
END
Potential
So that handles the Job enabled states on the Primary or Secondary for you. But these are just jobs remember, running automatically on transition to the specific AG role of primary or secondary. Do we need specific logic in for each role too?
ie...
Setting specific server config, max memory etc. (not that i'm suggesting you do.. but some shops have to use Active-Active clusters with multiple A.G.s, primaries spread across replicas. ) You may have to lower the memory footprint of the secondary. (I wouldn't personally, but budgets and resources can be tight sometimes)
Revoke User Access from a Primary so report users have to use the secondary and that lovely ReadOnlyRouting you sweat blood over setting up.
You have the ability to do anything you like, just add a Job step in the relevant Config job(s) and boom! Robert's your father's brother.
And its all in one place.
If you need to see exactly what config is going to be done for a specific AG in a specific state, all the configuration is held in a single job. Easy life!
I have no problem at all with the Standard approach of amending all the jobs to have the check logic in as job steps. Just sometimes others make it difficult for us by not knowing/caring about what has to be done with A.G.s and Agent Jobs.
"Long live Contained Availability Groups!"
"Hip! Hip!...Hoor.."
No, lets not.
As always, thanks for reading.
Rod, currently running as a read only secondary, as its late.
Komentarze