The On-Call Load

  • Comments posted to this topic are about the item The On-Call Load

  • "Have you never been called outside of working hours?"  I'm really intrigued to know if anyone can answer yes to this question!

    With the modern flexible working, working-from-home, etc, plus the use of multiple devices and apps to talk to employees, the line will be quite blurred between working hours and non-working hours for many people; does responding on Microsoft Teams, to a work-related question, count as being called?

    I do my best to keep work separate; at my employer, we have an defined on-call rota, with one week (full seven days) on-call, which is shared between 5 or 6 people. It is reasonable, as long as the number of calls is very low; more of a stand-by in case of a serious incident. I do know of places where the on-call is effectively people working overtime, rather than incident response, which is a very different situation.

    Andy

  • When I was a DBA in a team we shared the on-call rota.  We were given a specific mobile phone when we were on-call.

    Senior DBAs would be a 2nd line of defence if the DBA on-call could not resolve the issue.  There was remuneration if being on-call exceeded a bare minimum.  We were expected to check that certain jobs had run and various other logs and metrics.

    In theory the development squads had the same setup and deal.  HOWEVER, the beleaguered support desk would ALWAYS ring the DBA phone 1st because they trusted us to ALWAYS pick up.  In most cases the DBA had enough knowledge of the various systems to be able to help resolve the issue, even though, technically, it wasn't a DBA issue.

    Overtime the reliability of the various systems became such that fully staffed 24/7 support wasn't needed.  Re-organisation pushed responsibility for systems reliability down into the squads rather than having a specific support function.

  • I worked for a managed service provider providing DBA services.  We had an on call rotation, where we were primary on call  one week and secondary on call one week.  When there were seven of us, it wasn't too bad.  The on call involved about two calls a week during the middle of the night.  Then 3 people quit and we  were basically two weeks on call, two weeks off. It was challenging.  Fortunately, the number of calls went down about that time.  2 months later one more person quit (our company couldn't hire if they wanted to) and I decided to call it quits and left with no real prospects, which was one of the best decisions I have made in my long career.  I certainly didn't want to get to where I was permanently on call.

    Russel Loski, MCSE Business Intelligence, Data Platform

  • We don't have an official off hours call support duty. The occasional call may get fixed if the person feels up to it, otherwise it is postponed to working hours.

     

     

  • I did about 10 years of on-call work as a DBA. From my experience, it seems like 10 years of on-call is about the limit for most people in data. Flexibility with hours never seemed to really balance. Is an unexpected work assignment from 3 am  to 7 am really equivalent to getting to go home 4 hours early on a Friday?

    I'd like to see some  standardization with  on-call expectation across the industry. I've had too many coworkers who approached on-call too casually. They either took way too long to get logged on, or made really poor decisions due to fatigue/anxiety and being the lone person working (no other team members around to consult).

    I wonder if there is a market for dedicated on-call/off hours DBA work -- I'm talking DBAs working in a different timezone, i.e. follow-the-sun support.  Does anyone work at a company with follow-the-sun support or use some third party in a different timezone? I'd be interested in hearing about your experience with either.

  • Our DBA team supports 24/7 production facilities so we definitely have an on call rotation.  We organize our systems so each has a primary and backup DBA.  When an after hours call comes in, the expectation is that the on call DBA handles the issue if they are qualified and it is short of a full system rebuild.  Past that they call the primary or backup for that system.

    Back in the mid-2000s being on call was a huge burden on your life.  There were calls every evening (usually job failures or backup drive space issues) and you typically had at least two calls in the middle of the night.  She saving grace of it all was that with a team of 22, on call only happened 2-3 times a year.  Years of improvements to hardware and software has made on call far less of a burden, but it is still a week that you need to be within 20 minutes of being able to get online.  Now normal to go an on call week without ever getting a call.  Flip side is that the team has shrunk to three DBAs so it happens far more often.

  • We have a weekly on-call rotation between our team of six DBAs. There are scheduled jobs that proactively monitor things like disk space and replication errors, which send alerts via PagerDuty to the DBA currently on-call. Disaster recovery events are rare, but alerts about AOG failovers are common. Tempdb / log out of space errors are even more common, typically because someone started a 5,000 line ad-hoc SQL query inserting into a staging table, and then clocked out for the day - so it can finish (or fill the disk) overnight. One week of on-call duty every six weeks is acceptable considering the benefits of the job.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • A common theme is that there is a minimum number of people needed for a support rota.  Below that and you risk burn-outs and resignations.

     

  • About 10 years ago, I worked on the Microsoft Exchange team for a hospital. The helpdesk was severely undermanaged. Combine that with certain doctors who felt like they were gods, we would often get called for non-emergency issues that should have been easy for the helpdesk technician to resolve. It wasn't the tech's fault, it was the lack of training that was provided by their management.

    Most of the calls were easy to resolve but kind of a nuisance. Luckily were compensated a small amount for our on-call time.

  • "A common theme is that there is a minimum number of people needed for a support rota.  Below that and you risk burn-outs and resignations."

    Definitely.  In the 2007-2009 timeframe our DBA team dropped from 22 down to six.  The only way to make that sustainable was to very publicly reduce service levels to what the new team size could support.  First off was that non-production servers were only going to be supported on corporate HQ business hours.  For teams with off shore developers that was not popular.  Next was a whole lot of servers which were not built to standards and listed as "best effort support" had the best effort level reduced to not able to page us.  Fortunately we had senior management support do do anything we needed to as the whole company went through a period of "keep the lights on" mode.

  • Andy sql wrote:

    "Have you never been called outside of working hours?"  I'm really intrigued to know if anyone can answer yes to this question!

    ...

    I wouldn't be surprised if some people haven't been called, but it's likely not many

  • David.Poole wrote:

    When I was a DBA in a team we shared the on-call rota.  We were given a specific mobile phone when we were on-call.

    Senior DBAs would be a 2nd line of defence if the DBA on-call could not resolve the issue.  There was remuneration if being on-call exceeded a bare minimum.  We were expected to check that certain jobs had run and various other logs and metrics.

    In theory the development squads had the same setup and deal.  HOWEVER, the beleaguered support desk would ALWAYS ring the DBA phone 1st because they trusted us to ALWAYS pick up.  In most cases the DBA had enough knowledge of the various systems to be able to help resolve the issue, even though, technically, it wasn't a DBA issue.

    Overtime the reliability of the various systems became such that fully staffed 24/7 support wasn't needed.  Re-organisation pushed responsibility for systems reliability down into the squads rather than having a specific support function.

    Similar situation for me in a number of places, multiple escalations of support. I haven't seen an org that was fully staffed getting away from that, but maybe because I haven't been employed long enough in one place.

    I do see lots of places that never fully staffed, just had contact lists in case things fell apart.

  • Coffee_&_SQL wrote:

    I...

    Is an unexpected work assignment from 3 am  to 7 am really equivalent to getting to go home 4 hours early on a Friday?

    ...

    It's not, though that's a good question. However, I'm not sure the idea is to be equivalent or the same. It's to recognize that time taken needs to be given back in some way. I don't feel this is a bad trade, as long as it's not every week.

  • TL wrote:

    ...

    Years of improvements to hardware and software has made on call far less of a burden, but it is still a week that you need to be within 20 minutes of being able to get online.  Now normal to go an on call week without ever getting a call.  Flip side is that the team has shrunk to three DBAs so it happens far more often.

    Hmm, an interesting evolution. Is that better or worse? I think a week out of every three being close to a phone/laptop is tough. I might rather be on call 2 out of 6.

Viewing 15 posts - 1 through 15 (of 23 total)

You must be logged in to reply to this topic. Login to reply