The Mindset of the Enterprise DBA: Creating and Applying Standards to Our Work

Although many professions, such as pilots, surgeons and IT administrators, require judgement and skill, they also require the ability to do many repeated standard procedures in a consistent and methodical manner. These procedures leave little room for creativity since they must be done right, and in the right order. For DBAs, standardization involves providing and following checklists, notes and instructions so that the results are predictable, correct and easy to maintain

   On Adopting the Mindset of an Enterprise DBA
The Mindset of the Enterprise DBA: Creating and Applying Standards to Our Work
  The Mindset of the Enterprise DBA: 8 Ways To Centralize Your Work
  The Mindset of the Enterprise DBA: Harnessing the Power of Automation
  The Mindset of the Enterprise DBA: Delegating Work

Previously I wrote about the idea of adopting the mindset of an enterprise DBA, and how this might help us in our daily work as database professionals. I laid out a set of four core practices or skills that chart our course towards fully embracing this way of thinking: Standardization, Centralization, Automation, and Delegation.

In this and future articles, we’ll dive into each in more detail. This first article looks at the skills and patterns that comprise the practice of standardization.

Standardization

If you have only enough time to work on developing just one of the four skills, focus on this one, and focus on it first. It’s difficult to say this strongly enough or repeat it sufficiently.

Why might I say this? Standardization is all about ensuring that we do tasks consistently. It lays the foundation for the way we go about the rest of our work. If we don’t do our daily work methodically, and don’t have that work clearly defined, how can we write a script to do it for us (automation), or teach someone else to do it instead of us (delegation)?

Here’s a simple example: let’s say that there’s a series of steps that need to be done whenever you set up a new instance of SQL Server. These steps would include such things as acquiring a license, installing SQL itself, and running some post-installation scripts. If we don’t list these steps at the time we tackle the task (a checklist will do, there’s no need for a novel) and then subsequently follow these notes, we risk forgetting a step, human memory being as transient as it is. Then, once we discover our mistake (if we’re fortunate enough to), we must go back and correct it after the fact, which is always more expensive than doing it the first time around. We also can document certain kinds of known problems (“Hey, anyone remember how we got Kerberos working that one time?”), which saves us the pain and time of looking up obscure answers buried in internet forums.

Unfortunately, standardizing things is one of the least enjoyable of the steps. Why? Probably because it involves a lot of documentation, something which no one in IT enjoys. Heck, I love writing as much as anything, but sitting down and trying to pound out a good set of instructions gives me the willies. The problem is, without clear and concise documentation to refer to, any attempt at standardization is going to fail, and fail miserably. There’s simply no way that we could consistently do our work without some kind of written guideline. If you doubt the truth of this, just watch the pre-flight rituals of airline captains, or the set of steps a doctor must go through before entering the surgical suite. In both cases (and many more), you’ll see extensive use of checklists and documentation, because it’s been shown repeatedly how this simple act greatly reduces the chance of errors of omission.

The following list gives some examples of what I’d add to the general bucket I’ll refer to as “standards”:

  • Checklists – usually good for simple step-wise operations.
  • Rules – descriptions of standards or guidelines about how things should / must be done.
  • Knowledge bases (KBs) – a store of knowledge gained through experience, good for things like documenting known problems and solutions.

Although it is likely that some tasks need to be documented in more detail than others, it’s difficult to argue that there are parts of the role that need not be documented at all. So, it’s probably best to err on the side of doing too much. After all, is there a downside to documenting something?

Take something as simple as restoring a database. Sure, this seems easy to most DBAs, who do it on a daily basis. But what happens when you’re not around? If you’ve got a checklist with all the steps listed, you’ve at least a chance of not getting a phone call and seeing just how bad the internet connection is from your cruise ship. You can also leverage this checklist in training your junior staffers.

The examples I’ll give below are by no means an exhaustive list. They’re merely something to get your brain warmed up, so that you might have a good chance of obtaining reasonably wide coverage. By all means, give suggestions for additional things in the comments section, so that we can improve as we go.

To keep this article relatively brief, I’m going to focus on two main areas: server setup and standard operating procedures.

Server Setups / Installations

We need to standardize the setup of all the systems we own. By doing this in a consistent manner, not only does it save us work down the road, but it also makes it easier to do our current work. Let’s take an easy example: the name of the Active Directory account which runs the SQL Server service. If we always name this in a consistent way, place them in the same OU in Active Directory, and put them in the same security groups, we will always know where to find them and can leverage technologies such as delegation (i.e. the domain admins can let us do things ourselves) or group policy (which I’ll argue later is a form of centralization). It’s also something that we (hopefully) don’t have to do that often, so it’s probably not a good idea to trust our memory.

Here’s some specific examples of things we might standardize about our servers and installations:

  • Naming Standards
    • Account names – Make them something meaningful as to what they are for.
      • SQL Server service account
      • SQL Agent service account
      • Application service accounts that connect to the database (may be outside our control, but worth trying)
    • Server and Instance names – In these cases, we should include things like environment identifiers (i.e. Dev, Test, QA, Prod) and versions (i.e. 2K5, 2K8, 2K12).
  • Layout, size, and labeling of disk drives:
    • Do we split up the volumes for the system databases from TEMPDB and the user databases?
    • Do we have dedicated drives for backups?
    • Do we use mount points or drive letters (or some combination)?
  • OS versions / patch levels:
    • What OS (i.e. Windows 2003, 2008, 2012, etc) do we install and is it related to what version of SQL Server we install (i.e. do we always match SQL 2012 with Windows Server 2012)?
    • Is there a standard service pack or set of patches we always install?
  • SQL Server versions / patch levels:
    • What versions of SQL do we allow / support?
    • What patch levels should they be at (i.e. “SQL 2008 is always patched to SP3 CU2”)?
  • Post / Pre Installation Steps – are there actions that we take on all servers either before or after the installation, such as:
    • Restricting accounts from logging in locally or interactively.
    • Further reducing the security of standard roles like PUBLIC on SQL (I know this is sometimes frowned upon, but some like to do it), a.k.a “hardening”.
    • Adding additional TEMPDB data files based on some formula.
    • Setup of standard agent jobs for maintenance or other purposes.
    • Adding the newly created server to repositories or third party tools such as SQL Monitor.
    • Installation of standard third party tools or software.

Procedures (a.k.a. SOP – Standard Operating Procedures)

We must carry out our daily work in a consistent manner. By doing so, it not only makes our jobs easier (do you really want to try and remember how to restart the TPS report at 3 in the morning after a night at the bar?), but also ensures good service to our customers.

Here are some areas to think about when we write our SOP:

  • What are some common questions we get and what are their answers?
  • What are the steps to do common tasks, such as:
    • Granting access to systems for users.
      • Do we use domain security groups? If so, which ones for what roles?
      • Are SQL accounts allowed, or must all access be via Windows credentials?
      • Are there any restrictions on what levels of privilege we give without some kind of senior approval?
    • Moving backups from production to test systems.
      • Is there any kind of approval required, and if so, by whom?
      • Must any sensitive data be scrubbed or otherwise removed?
      • Are there any scripts or other actions that must be done after the database is restored, either to any or specific application’s databases?
        • Do we need to grant developers access?
        • Do we need to re-link orphaned SQL accounts?
        • Do we need to change any static configurations to point to development systems (don’t miss this one!)?
    • Releasing new code into the production systems.
      • Is someone’s approval or some kind of change management process required?
      • Who handles the release activities?
    • Responding to incidents (system down, slow, degraded)
      • Are there standard metrics to capture or baselines to compare against?
      • Are there any common problems for which known solutions are documented?
      • Is there an incident management process that needs to be triggered if certain things happen (i.e. we need to notify management if the system goes down between 8AM and 5PM EST)?
  • Are there any SLAs (Service Level Agreements) in place for things like:
    • Incident response times (i.e. “all Severity 1 incidents must be resolved within four hours” – and make sure you define what “Severity 1” means).
    • Off hours support or response times (and what defines “normal hours”).
    • Completion of standard activities (i.e. when a request is received to provide a backup for testing, it must be done by the end of the next business day).
    • Standard down times / maintenance windows.
    • Backup and recovery, such as Recovery Time Objectives or levels of acceptable amounts of data loss.
  • Is there contact information for teams which we commonly interface with, such as infrastructure or system administrators, domain admins, or SAN admins? Do we need to follow any specific procedures when asking them to do something for us?

Phew! That’s admittedly quite a list, and by no means complete. If you’re feeling a little overwhelmed at the thought of documenting all this, don’t worry, it’s a completely natural response. Attempting to tackle all of this at once is a monstrous and intimidating task, which is why I wouldn’t recommend trying to do that, especially while juggling your existing duties.

Instead, adopt the gradual “pay-down” approach that I’ve talked about before (and trust me, I will again). A good start might be to simply document things as you go, since it’s a good way to capture your workload in the moment and it shouldn’t be too invasive. For example, the next time you do a server installation, take note of the steps you do and liberally take screenshots as you go. You probably won’t get it completely the first time, but as you keep doing it your documentation will get more accurate. And if you find yourself with some spare time (yes, a highly unlikely event I know) you can always go back and step through everything as well.

Keeping a “work journal” can also be useful, where you simply record what you do and how you did it. I like to use a text file for this, since I can easily paste in scripts or commands I use and search for them later. As a bonus, the next time Joe from accounting asks for “another copy of that report you sent me last year”, you can just search your notes and find the query. I know it’s saved me quite a lot of work on repeated occasions.

My next point, while hopefully obvious, bears repeating due to its importance: once you have your standards, make sure that you follow them. The more consistently you do this, the more value they will bring to your work. If you don’t follow them, they’re largely meaningless and you may as well not have them at all. This isn’t to say they can’t change (they will), or that there won’t be exceptions (which you’ll document, right?), but just that you should give as best an effort as possible to keep true to them. Consider trying to bring existing pieces into compliance as you touch them. For example, if you have to add some drives to an old server, at least make them follow your standards. It’s likely not feasible to go back and correct everything you have, but fixing over time is a better option than simply allowing things to stay as they are.

A final note of caution: these documents, once written, need to be carefully guarded. I don’t mean in the sense of keeping them behind steel doors guarded by vicious dogs and laser security systems, but be prudent about who can access them. Often times we inadvertently put sensitive information in these, such as what our standard ‘sa’ password might be. In the wrong hands this could cause a lot of damage, so make sure they are reasonably secured (perhaps password protect them or put them on a locked down file share / web site).

It may be a slow process, but defining and following standards in our work as DBAs is as critical a piece of our journey as any. These procedures will become your requirements for automation, and your guides for those to whom you delegate your work. Capture them, keep them updated, and use them every day. Do these three things and you will be well on your way towards adopting the mindset of an enterprise DBA.