Busy Servers

  • I saw the image above on the sqlSentry site as the winner of their "How Busy Is Your Server?" contest. The winner used sqlSentry to show just how much activity they have taking place on a single server. That's quite a busy server and I'm sure that tracking down issues with any particular job isn't fun, especially for inter-related jobs.

    So it got me thinking about how busy my environments have been. Not that I want you to count up the jobs on your servers, but I am curious about one thing.

    How do you rollup job/status information from multiple servers?

    By this I mean that if you manage multiple servers, and I'm sure most everyone has 2 or more, how do you track the job failures, database sizes, etc. across those servers?

    I've seen DBAs that manually checked every server, every day. Go through logs, check the jobs for failures, even note databases sizes. But that's cumbersome, a waste of time most days, and doesn't leave you free to do must else. And what do you do if there's a problem? Skip it for a day?

    When I worked at JD Edwards, we had literally hundreds of servers to watch. Fortunately I'd developed some techniques in smaller environments that scaled up well and worked for me. I basically setup a database and a series of jobs on each server that allowed it to track itself. Job completions, database sizes, even some performance information would be stored locally on each server and updated daily by jobs running on that server. This scales beautifully since each server manages itself and only depends on itself. If the server is up, it tracks its own information.

    But that didn't roll things up, so I designated a single SQL Server as the central repository. On this server there was a specific job that would scan a list of servers and pull the information from each server and store that in a central table. Then I had a separate job to read the current day's information, generate a report, and email it to the DBA group.

    If we missed the report, we knew to check the central server. If there was information missing on the report from a specific server, then we checked that server. For individual jobs, we could check the appropriate job.

    This system worked well and seemed to scale up across dozens of critical servers and many non-critical ones. By keeping a standard configuration on each server, we could easily troubleshoot things. We also added the package to create the local database and jobs to the server installation procedure. As long as we got the new server name onto the central repository, it would work. And since each server stored its own information, we had a backup for the central server if we needed to check individual reports.

    So I'm curious if anyone has any better solutions or any other ways they've used to check their servers.

    Steve Jones

  • We've got around twenty production servers (6.5,7.0,2000,2005). Most of them are obviously 2000 and only one 2005. We've got an ASP 3.0 applicaton which show us by grid all the jobs and its executions as well as all the ETL processes (they are fired when comes files (.nul or .txt) in a repository and we've got always a service running all day long checking that file structure). Fortunately critical jobs and DTS are only in our cluster so that it isn't very trick up to the moment.

    The fucking problem is we are obligated to move to sql25k and hundreds of dts will be SSIS and the main thing is how the hell will do us in order to check all of them??? I think that the next development will be to make an .Net solution for that due to vb 6.0 and asp 3.0 doesn't reach Dts.runtime model.

    Sometimes any trace but nothing else. With that we discoverd ilegal accesses from our outsourcing consultants.

  • I have about twenty servers running SQL 2000 and one (new this week) with SQL 2005.

    I use the advanced properties of the jobstep to jump on failure to a step which sends a mail to a DBAadmin mailbox which a group of us have showing in Outlook so any mail to it is seen and acted upon. Only one server has mail enabled but others have that one linked and call a stored proc on it which does sendmail. The server name and job name are of course included in the mail.

    I alsways work on the principle of fault reporting not success reporting - even in life. My ex used to complain that I never rang to say I'd arrived somewhere (not a special trip) safely and I always said "Who needs a program that pops up a dialog box every couple of minutes saying "I'm Ok, click here to continue""

  • This is another good one Steve!

    Like the folks above, I've built applications to monitor and report job and package status. Usually these applications have used a combination of internal logging queries that write job status to some sort of centralized (or collected and centralized eventually) table(s), then display that information in either an email, on the Web, or both.

    :{> Andy

    Andy Leonard, Chief Data Engineer, Enterprise Data & Analytics

  • Since I have over a 100 DB servers (7.0, 2000 and now 2005), it took a long time to check just the status of the jobs that ran on these servers. So I created a batch file that executes osql and isql against all the server. This script, below, show only what jobs have failed.

    select '*** Failed Jobs Report for ' + @@servername + 'SQL Server'

    SELECT name FROM msdb.dbo.sysjobs A, msdb.dbo.sysjobservers B WHERE A.job_id = B.job_id AND B.last_run_outcome = 0

    This script is executed against all the servers and a file is create by using the -o option or each server. Next the script makes a text file which is the dir of this directory and finally emailed to me. If the size of each entry is over 1k then I know that something has been reported and I open up that particular file to see what jobs have failed. From there..well you know the rest.

    This all running on my pc and is scheduled with the XP scheduler app. No database or server required.

    I see that there is not easy tool for this and am still adding more information to my report. Too bad MS did not add a function like this within SQL 2005.

     

    Rudy

  • I have 28 servers. 1 7.0 and the remaining 2000. Use Idera DiagnosticManager's smtp alerts on jobs/thresholds and works fine. The bulk of my morning routine consists of running and reviewing reports written in MSAccess for backup status, account changes, diskspace, logs, etc.

  • "I ask what would Chuck Norris do?"

    Then respond to email alerts and such from DTS....

    But theres nothing like taking a good look at the servers.  It's not only revealing of errors but you get a feel of how the servers are feeling.  Whether some sluggishness that could not be truly represented in email alerts might indicate some software bad sectors in drives.  I love, and this is sad, being able to "see" a failing drive before Computer Management lets you know.

     

     

  • Steve,

    Something pretty similar to what you are / were doing. The only variations that we have in place is the use of SQLH2 and Reporting Services for the reviews. By using SQLH2 we just add the new server to the monitoring in that and scan the SQLH2 database for servers. One stop shopping to find all my SQL Servers in the environment.

    Additionally, we are in the process of converting the reports to SQL Server Reporting Services so that we can publish them to a Tier 1 group for review and failure remediation.

    Somehow homegrowing stuff is always more fun to me.....

    David

    @SQLTentmaker

    “He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

  • I too have a job on one 2005 server going towards other servers daily and querying MSDB and sending me an email. But nothing can replace a visual check and you can not predict what could happen and write scripts for Every possible case.That is why I have a habit of checking certain things on the server when I am logging to a machine anyway for other reasons. You would not believe what I found yesterday (I will post my question on Administration for 7, 2000 shortly). No script would be able to detect this.

    Also there are servers where I am not a sysadmin, but sort of "advisor" . In this case I would remind the support to check on the backups on a regular basis.

    Regards,Yelena Varsha

  • When I was at some of my previous companies we wrote some similar things to what has been mentioned here and had manually checked it each day. But when i came into a heavily replicated world and started to try to use that and still tell my regular jobs from the replicated ones it got much harder.  We chose Sentry to help out with it and it's really been a help to find out what's going on when with a quick view.  I still use a space checker that i wrote to monitor the growth of our databases. 

     

    PAt


    Pat Wright
    Dba

  • I have several SQL 2000, the way we monitor the jobs is adding a final step on every job that produces a ctl file, then we have an internal service that erase these files, we fill every job information in a database so this service is reading this database continously, every job has is own information depending on the times a day it runs and has its unique ctl name, when a job fails does not generate this ctl file, then the service update the record of the job and an intranet shows that the job: "xxx" failed to execute. This method has worked great for us. We inmediatly know when a job fails.

  • Totally automation ... SQL Server MSX server, custom scripts, errorlog scanning, event log scanning, smtp email amd MOM to be brief. Basically the servers email/page me whenever there is an type of issue that is out of the norm.

    RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."

  • I only have a few servers with critical jobs and we monitor them primarily through email.  However I have been planning on writing a script to monitor status information.  Thanks discovery2003 for giving me a jump start in the write direction.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply