Must-have hands-on reports for SQL Server DBA

  • Hello!

    I'm fairly new to SQL Server and i have been wondering if you guys masters, professionals who have been in the IT Field as DBA for years give me some kind of idea or ideas for reports and health checks for everyday routine.
    ¿What kind of health check you guys do everyday, weekly or monthly to your SQL Server Database?
    ¿What kind of reports you guys made to check for instant diagnostics besides the SQL Server Log?

    Sorry for the english if misunderstood something, my native language is spanish.

    Greetings and thank you for your time.

  • Daily:
    The first thing I check each morning when I start work is if any of my backups failed.  
    Next I check for any failed SQL jobs on any instance.
    Then I check for stuck or long running SSIS jobs.  

    Weekly:
    I check the server memory utilization.  Under utilized memory is a waste of money and over utilized memory doesn't leave any room for things to run.  You need to find a good balance.
    Company specific metrics (largest serial number for example)

    Constantly:
    blocking and deadlock alerts are generally a good thing to set up.

    Randomly:
    I check if the backups are restorable.  A backup is only "good" if you know you can restore from it.

    We purchased a license for RedGate SQL Monitor and the tool works fairly well.  We have been using it for a number of years and are quite happy with it.  If you have any 3rd party reporting tool available, look at what sort of metrics it collects and what you can report on.  Get a test setup of that tool (if you don't have one already) and play with it.  Fiddle with settings.  Get it to work with your company.

    I would also try to find out your RTO and RPO of your comapny and ensure that your backup schedule can meet those.  

    And my last tip - make an excel spreadsheet of all of the SQL Instances you are to be in control of.  In that, keep track of the instance name, the Alias (if applicable), the port, the host machine (physical or VM), the failover machine (if applicable), the connection string, the version (such as SQL Server 10.50.4422.0 - Standard Edition (64-bit)).  In the event of disaster recovery, this will work as a nice checklist for you to ensure that everything is back up and working as expected on the appropriate versions.  Plus it makes it handy for the next DBA who comes in to have a quick and easy spreadsheet indicating all the instances that exist.
    If possible, it is also nice to have a list of instance purpose (such as having an instance named IntApps and a description of "this is used for internal application data storage") as well as a data owner.  In the event you need to take something offline (updates or upgrades for example) or you need to prune some of the data, it is good to know who you need to contact (person or department).
    As a DBA (at least where I work), you are the gatekeeper of the data, not the data owner.  You need to ensure end users can access the data they need to and not access the data they shouldn't be able to see.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Alejandro Santana - Tuesday, March 6, 2018 1:44 PM

    Hello!

    I'm fairly new to SQL Server and i have been wondering if you guys masters, professionals who have been in the IT Field as DBA for years give me some kind of idea or ideas for reports and health checks for everyday routine.
    ¿What kind of health check you guys do everyday, weekly or monthly to your SQL Server Database?
    ¿What kind of reports you guys made to check for instant diagnostics besides the SQL Server Log?

    Sorry for the english if misunderstood something, my native language is spanish.

    Greetings and thank you for your time.

    The absolute most important thing to check on is... did the backups work and are they viable for a restore.  Don't think of another thing until that's hammered down 110%.  Nothing else matters until that's guaranteed and the backups are on a different machine than the databases themselves  and any tapes are safely stored offsite.  That includes at least weekly test restores.  I do daily test restores (fully automated, of course).

    The second most import thing is to determine who has what privs.  Lock down the server.  Devs and other so called "super users" should not have privs to deploy their own code and access to data must be locked down lest you read about yourself in the morning news.

    The third most important thing is DBCC CHECKDB and it's not far behind backups/restores in criticality.

    All three things must be done in very short order and maintained at all times.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Alejandro Santana - Tuesday, March 6, 2018 1:44 PM

    Hello!

    I'm fairly new to SQL Server and i have been wondering if you guys masters, professionals who have been in the IT Field as DBA for years give me some kind of idea or ideas for reports and health checks for everyday routine.
    ¿What kind of health check you guys do everyday, weekly or monthly to your SQL Server Database?
    ¿What kind of reports you guys made to check for instant diagnostics besides the SQL Server Log?

    Sorry for the english if misunderstood something, my native language is spanish.

    Greetings and thank you for your time.

    Realistically, the first thing I check is my email. If things are automated or monitored with notifications sent for faillures, errors, etc you'd find those.
    While I'm checking my email, the next thing I check is if all of the instances are up and available, you can connect to them and if the databases are online. If you use a Central Management Server, that can be done with one query doing a multiserver query of sys.databases.
    Backups are incredibly important. You usually need to do refreshes from production to lower environments which can be automated and that provides a way to check the validity of backups. For those that's aren't refreshed, a monthly test restore is a good practice if feasible. 
    I also check windows, sql server and sql agent logs. With some work, that can be automated to check for errors - especially with Windows Event logs. It's a bit more of a challenge with the SQL Server/Agent logs but still doable for the most part. I still manually check them whenever I connect to an instance.
    Then we have a lot of jobs that look for issues/problems and send out email for things like failed jobs, long running jobs, checking the backups on disk to make sure they are there and the older ones are being deleted.
    A lot of people will use a central repository database of different metrics for different servers, databases and those can be checked with notifications sent out if needed - and that can be automated as well. We've used that central database to store all the information about the servers and databases we support - server specs, databases and what apps use them or the general usage, who are the contacts, specific notes for databases/servers - such as using fulltext, using service broker, replication, log shipping, CDC, etc just to get a quick overview without manually running a bunch of queries. Then we have SSRS reports pointing to those with several parameters so you can quickly just pull up a report for everything or certain servers or things used by certain apps. etc.
    Most places have a DBA database or something similar on their servers - it's a good idea to have something like that for DBA only procedures, functions and other things like that. It can also be used for capturing baseline metrics for that instance as well as the basic CPU, disk, memory usage, storage. You can compare the current metrics against your baseline. It's a good idea to setup alerts and notifications for the alerts.
    You want to automate as much as you can. And have baselines for different metrics. Many metrics mean nothing if you don't have baseline measurements to compare them to. You will need to know if something was always an issue or just an issue today or this week.
    Over time you will be able to figure out more of what to check and how often. There should be a lot of list of activities like the replies you have here if you do a search. I'd go through a lot of those to pick up things. Search on something like: DBA Checklist or DBA Daily Procedures.
    I know I've seen quite a few of them in the past. And if you do things all the time for a log while, you do things without even thinking about it. So everyone is going to forget some things they do regularly. That's why I'd recommend checking as many lists as you can - before you get sick of doing it 🙂

    Sue

  • Thanks alot for your feedback guys!! 
    Really appreciated!!

  • Regarding backups verifications, do you guys even restore the system databases?

  • Alejandro Santana - Monday, March 12, 2018 10:18 AM

    Regarding backups verifications, do you guys even restore the system databases?

    You should test restoring any database (user or system) that you care about the backup of.  If you don't test restore a backup, how do you know for certain that it is valid?
    I do test restores of the system databases to ensure I can restore a system database in the event of DR.  They are pretty small and thus restores of them are quick.
    I think the the only database I don't backup and restore is tempdb.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Alejandro Santana - Monday, March 12, 2018 10:18 AM

    Regarding backups verifications, do you guys even restore the system databases?

    Yes. Keep in mind there are two different processes with the system databases - restore or rebuild. You could end up having to do either one of them.
    Back Up and Restore of System Databases
    Rebuild System Databases

    Another thing to consider that is inferred in the docs is that the resource database could be corrupted. That's a different process to keep in mind as well. If you copy the files for backups, you can often just replace the files that were copied as your backups and then restart. Otherwise you need to do the repair as stated in the article.
    The second link to the rebuild also lists some things you would want to regularly script out as rebuilding creates new system databases. Especially pay attention to the section about msdb as there are many other objects that you would want to include in backup/script plans.

    Sue

  • Thanks again guys!

  • The first and most frequent thing I check is emails. Locally we get emails for job failures, deadlocks, long running processes, tempdb growth, and a few other health checks. The majority of our databases have associated dev/qa environments that get refreshed from a combination of full/diff/log backups every day and I get notified if something goes sideways on that. Other databases are not critical at all and get tested infrequently. They all also get synced over to a DR site that's always live.

    I've got a number of auditing jobs that reach out to our servers and check for various failures like job and database mail failures. They also populate and update a central DBA database that houses inventory information (instance level, database level, server level), job run history for all the instances, database file information history, server drive information history, etc... Even though we have failure notifications set up locally on jobs, sometimes who gets notified gets changed without my seeing it - my auditing notifies me of all job failures on any of our instances whether I'm set up to get notified or not. It also notifies me when security changes are made on the servers and if there are connection failures while attempting to audit anything. At the start and end of every day I get a summary email from the central DBA database of various failures, security changes, and an overview of how much each database grew in the last 24 hours. 

    Most of our resource monitoring is done in WhatsUp Gold so I have a few dashboards set up there so I can just watch things like memory and cpu and disk throughout the day without thinking about it.

  • This was removed by the editor as SPAM

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply