September 17, 2008 at 8:16 am
Only 3 what?
September 17, 2008 at 8:32 am
paul.t.silvey (9/8/2008)
Automate this stuff.... why manually look... but do not have you database server watch itself. That is like an internal investigation. Not much checks and balances there.
I agree it's not the best idea to rely on a server checking itself. If you have multiple servers then have the servers checking each other. Have them provide daily reports, that way you have an extra chance of becoming aware of a problem if you do not receive the report(s).
For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]
August 27, 2010 at 12:01 am
great work! I like it but could not do much for remote servers...
August 27, 2010 at 1:14 am
I use SQL Response to monitor all my servers in various locations around the globe.
August 27, 2010 at 1:23 am
Hello, everybody.
This is good idea to che?k servers every morning. But if you have 50 servers, like me, you will be dead before you will finish such check out. I can give some advi?e, as head of DBA team, how you can save a lot of time and be shure that all works perfectly:
1. Collect information about all IT infrastructre in your company and gover it in CMDB.
2. Create alerts on SQL servers and operators to inform them.
3. The message from alerts and jobs with high priority have to be sent in Service Desk system and to operators.
4. Very usefull to deploy monitoring system (HP, SCOM or so on).
5. Use the SMS services to inform operator about disasters in non work time.
Theese steps with help you, becouse if disaster you will know about it right in the moment, if you have a problem you will information as soon, as you run your comp.
My group consists from 4 mans, and only one hour we need to become shure that we do not have any problem.
August 27, 2010 at 1:44 am
I was excited by the article when I saw the title, but reading through I realised that most of these checks are done. I have only just started in the dba role for the last 8 months. My first job was to set up db monitoring and morning checks.
I have a system which monitors job failures, replication, indexing, db growth...etc, I get mails through out the day and each morning as a review. This email I reply to as confirmation (including any action taken) and stored in source safe.
August 27, 2010 at 2:09 am
rubes (4/14/2008)
Nice article. I would just like to point out that for those of us that have numerous servers, automation of the checklist is critical. If you're dealing with only one server, manually checking these things does not take a lot of time. But imagine checking job failures or drive space on 50 sql servers. We get paid too much to perform these menial tasks by hand. There are many 3rd party tools out there that do this for us. It's also pretty easy to write your own scripts and sql jobs... many starter scripts could probably be found on this forum.One benefit of automating your checklist is time. The other benefit is proactive in nature. If a drive is out of space because tempdb exploded in size over night, it's better to get notified via email at 4 am. Sure, the cell phone disturbs your precious sleep, but you now have 4 hours to fix the situation before business opens at 8 am and people start screaming.
Also, if there are numerous DBAs on your team, automating these checks helps greatly with standardization.
Seconded.
For Oracle we aggregate results of overnight operations across all databases into one email (this includes known databases where something should have happened and didn't)
We then have an auto generated service call where the reactive DBA for the day lists the errors and what he/she did about them.
I would like to do the same for SQL Server, but my knowledge is limited and hence we still have a very manual process on SQL Server.
August 27, 2010 at 4:09 am
rubes (4/14/2008)
Nice article. I would just like to point out that for those of us that have numerous servers, automation of the checklist is critical. If you're dealing with only one server, manually checking these things does not take a lot of time. But imagine checking job failures or drive space on 50 sql servers. We get paid too much to perform these menial tasks by hand. There are many 3rd party tools out there that do this for us. It's also pretty easy to write your own scripts and sql jobs... many starter scripts could probably be found on this forum.One benefit of automating your checklist is time. The other benefit is proactive in nature. If a drive is out of space because tempdb exploded in size over night, it's better to get notified via email at 4 am. Sure, the cell phone disturbs your precious sleep, but you now have 4 hours to fix the situation before business opens at 8 am and people start screaming.
Also, if there are numerous DBAs on your team, automating these checks helps greatly with standardization.
I signed in to make exactly these points. Automation, monitoring, and making use of job failure notifications, etc, is key if you have multiple sql servers, there is no way I could check 150+ servers each day.
August 27, 2010 at 4:32 am
I use Quest's Spotlight and Solarwinds IP Monitor, which are displayed on 2 wall mounted wide screen TVs.
In the morning I can see immediately what is going on the moment I walk in the office and all I do is check and fix whatever is red.
Any items flagged as orange need to be looked into, but not urgently.
As the monitoring is constant, any issues are flagged up within minutes without me having to run any repetative tasks myself.
On the weekends certain critical events (like disk space etc) are sent to me automatically by text message so I can remotely log on and fix rather than waiting until Monday by which time the critical event may have become a fatal one. I also receive a text message to confirm that the issue is resolved - handy when you know some issues will clear themselves with a certain timeframe.
With these two tools set up efficiently, there are no logs or emails to go through and no manual tasks leaving me to get on with the 'interesting stuff'.
August 27, 2010 at 6:43 am
Very cool. Interesting how valid this article is still today. I also find it interesting all this talk about 50 hundred SQL Servers and DBA teams. In every contract that I have worked for the last 10 years, I have usually been the only one qualified as a DBA, but was expected to be spending most of my time programming. NO ONE spent any time monitoring the server(s) (usually one, very occasionally two). In fact, the only thing I would check would be to verify that the previous night's DB backup had run and that the instance had successfully restarted after the the server backup had run. Other than that the only time anyone touched the server was to create stored procedures, restore backups copies of the production databases, and to generate DB diagrams for the monthly report or after changes in the DB structure. So it is truly fascinating to hear about all these people who don't have the time to do these morning checks. It is also fascinating to hear about these people who are saying we get paid too much to be spending time doing this manually. (Really! Where do you work? I want to work there.) Around here, if all you do is DBA work without being a web developer, a GIS developer, and a desktop developer as well, your skills are not valued.
I will implement this morning checklist and maybe that will proactively handle the rare issues that come up so that I can get back to development quicker.
August 27, 2010 at 6:48 am
i've been slowly building a monitoring system over the last few years
around 2 years ago due to SOX we had a requirement to save all security logs from domain controllers and some servers. i set up a system to dump them into a database and use SSRS to present the data to people.
the first year was mostly learning and this year i wrote some more reports and transferred them to a new scale out SSRS deployment we did. this also enabled the emailing of data to people. then i added to the system by exporting application logs as well.
every morning i get an email from SSRS with any application log errors from all our SQL servers in the last week. i don't check it every day which is why the report goes a week back.
another report has security log events from SQL servers and there is another one for failed jobs
for security i also get a few emails about wrong passwords for admin accounts as well as any AD group changes. this past week i caught someone adding a person to one of our AD groups that we use for Windows Authentication on a server that is in SOX scope and that gives rights to change revenue data on several servers and databases. the policy is to issue a ticket that has to be approved to add anyone to that group.
for backups i have a daily job to export the tables from msdb to a central database and query it. i get emails for any database that has never been backed up, no full backup in 7 days, a general report of the latest full/diff backups for all servers and databases and a few others i made up. i used to audit backups once every 6 months or so and always found databases not being backed up. sometimes it was a developer creating a database on a server they have access to and not telling anyone. other times it was a mistake when changing a script. Netbackup isn't very good in reporting the backup status of databases so i had to write my own process.
for performance i've been collecting perfmon counters for 9 months now and email an hourly report. we also bought a third party tool to monitor servers that does it as well except it started emailing alerts and we had no data of our own since it was controlled by someone else. so i wrote a report to query the last few hours of permon data and send it out hourly. it used to send only anything out of the accepted range but changed it due to the above application sending out alerts. going to code another report just for alert data.
i also have a report that sends hourly the amount of commands waiting to be replicated. have plans to write another one for the amount of commands at distributor waiting to be replicated
and the final report is an hourly report of all SSRS report modifications. our BI devs have access to create/modify reports and we've had a few tickets where people complained that some report didn't work. set this up so we know if anyone is modifying a report people are complaining about.
all this is done using logparser and normal SQL Server features with a central SQL Server used to store the data. i wanted to use powershell but version 1 had some limitations and looking to see if i can use version 2. once in a while i get calls about buying some expensive monitoring software and there is never any value compared to what you can do yourself.
some things like backup monitoring i coded from examples in the articles here and just modified them. other reports like querying log data i wrote myself and used http://www.ultimatewindowssecurity.com for explanations on what all the event ID's mean
August 27, 2010 at 8:40 am
Hola Todos,
Creo que ademas de verificar lo basico, tambien se debe verificar como los usuarios estan accesando la Base de Datos, 2) se debe tener en cuenta el crecimiento de los datafiles, y separar las tablas bigs enormes con miles de rows en datafiles, analizando datos y preparando cambios para el afinamiento.
Saludos
Arturo Caceres S.
DBA Nicaragua
August 27, 2010 at 8:43 am
great article. I'm a big fan myself of checklists. I.T. is just too complex to manage without such a discipline.
The main exception I have is paper (though always a great place to start).
I'd rather have the automation check these things out then send the DBA a report. Here is an example:
August 27, 2010 at 8:46 am
arturo_caceres (8/27/2010)
Hola Todos,Creo que ademas de verificar lo basico, tambien se debe verificar como los usuarios estan accesando la Base de Datos, 2) se debe tener en cuenta el crecimiento de los datafiles, y separar las tablas bigs enormes con miles de rows en datafiles, analizando datos y preparando cambios para el afinamiento.
Saludos
Arturo Caceres S.
DBA Nicaragua
i've wanted to do that for a while but it seems that storage is the most expensive part of a server and it's hard to buy enough hard drives to do it the right way.
Viewing 15 posts - 31 through 45 (of 54 total)
You must be logged in to reply to this topic. Login to reply