How to administer 100+ SQL Servers?

  • I was looking on Jobserve and there was a job for a merchant bank which asked for the ability to manage 100+ SQL Servers. This provoked several questions which perhaps some of you can answer for me.

    In my current position we have several environments, but only 5 SQL Servers. In what way would 100 differ? How typically would they be effectively be managed (eg one central EM with groups of registered servers and replication?)? Also, why typically would it be necessary to have so many? What skills would be needed so I would be able to learn this in a limited environment?

    Thanks for any insights.

    Paul Ibison

    Paul.Ibison@btinternet.com


    Paul Ibison
    Paul.Ibison@replicationanswers.com

  • Hey Paul,

    Not sure I have the answers, but I at least have comments! EM might be a start, though I have to wonder how it would hold up at that level - refreshes might be slow. Im using EM to manage about 250 db's on one server without problem, so maybe it would work. Worst case maybe you'd build your own EM, or look at 3rd party tools. Seems like the whole master/target server job administration might come into its own for something like this - nice to manage all jobs in one place.

    Servers are cheap these days, so I think in a lot of cases it makes sense to put servers in local offices, then use replication or DTS to push the transactions around - much more fault tolerant than everyone connecting to one central box.

    Aside from volume, how different can it be? I think you just have to learn to work a little differently. I know it took a while to get used to having 60 db's...that seems puny compared to the 250 I have now! At some point it doesnt matter how many unless the tools just blow up trying to do it.

    Or maybe Im way off track!

    Andy

  • Thanks Andy - this gives me some ideas. Would also welcome any details from people out there who work in this sort of 'extreme' environment.

    Paul Ibison

    Paul.Ibison@btinternet.com


    Paul Ibison
    Paul.Ibison@replicationanswers.com

  • Main things to think about are

    How do you install

    How do you maintain

    How do you monitor.

    With 5 servers you can easily install and check things manually.

    With 100+ it becomes a lot more difficult (although still possible with that number - but you wouldn't want to get behind).

    You need to install monitornig systems so that you are warned when something is going wrong (and not when it isn't). The servers need to be built so that all nmaintenance is done automatically. You wouldn't want to many interdepedencies between the servers or yuo will find it difficult to trace the source of the problems. You will need a security strategy that encompasses grouops of servers maybe. The backup/restore strategy may have to include scripts to resynchronise the other servers as there may be too many to do manually.


    Cursors never.
    DTS - only when needed and never to control.

  • You'd also have to automate checking of disk space, performance monitoring, etc.

    If there are few problems, then there isn't much to manage. It's when you have problems that the scale is overwhelming. If you have 100 separate physical servers, then if 20% of them fail, you have a long day, have to be able to prioritize, etc. You won't have time to manually script restores for 20 servers. If 20% of your 5 fail (1), then you have less to worry about.

    Personally, I managed 13 separate physical servers and at times it was overwhelming when 2 or 3 had issues at once. Different departments are yelling and you are trying to manage 3 things at once in your head. I had 2 jr DBAs, but they constantly wanted to check with me, so it was as though I had to fight 3 separate fires at once.

    I need to update them, but I wrote a series of articles on automated reporting of information which I used to have all the servers compile reports each day of potential issues and email them to me. I then could try to be proactive. I think that's the key, in that you can catch things before they become problems.

    Steve Jones

    steve@dkranch.net

  • Other things to think about.

    Architecture - think about server interactions, if there are too many interdependencies then it becomes very difficult to upgrade a server in isolation. If the interactions are via a single server - star or bridge pattern then you need to think about introducing single points of failure.

    How do you add a user to the sytem - what are users allowed access to - how do domains interact.

    You need to have a means of keeping track of the servers - what are their IP addresses, what are they used for, who uses them, what hardware/software do they have (which one is the problem one), when are they active. It becomes necessary to document, file and keep up to date all the information about the servers or you will soon run out of post-its.

    You will need to have some means of administering requests for the users of the servers or you will soon be spending all your time just trying to plan things and keep people happy rather than doing it.

    How do you upgrade - one server a night can take most of a year - all at once is a huge risk.

    How do you test that backups are being saved and are reliable.


    Cursors never.
    DTS - only when needed and never to control.

  • I believe that Nigel touched on this in his first post but just to clarify, an automated install process is necessary for Service Pack deployment, etc. Tough enough with 15, can't imagine 100+. Have fun, the challenge sounds great!

    David

    David

    @SQLTentmaker

    “He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

  • Service Pack deployment is probably the biggest bear.

    We manage somewhere in the neighborhood of 40-50 SQL Servers where I work. We've built some simple custom utilities to do basic monitoring (heartbeat checks and the like) because there are only 3 DBAs and two of us have other responsibilities (VB development, web development, mainframe coding). There are monitoring products out there, including some you wouldn't think to use (for instance, HealthMon that's part of SMS2.0 is supposed to monitor the SQL Server service but I've not tried it).

    Biggest thing for us was to determine a standard config and use it... making variances only when absolutely necessary. That simplifies things a lot. Extensive use of alerts via email (we carry Blackberries so we know immediately when there is an issue) as well.

    K. Brian Kelley

    bk@warpdrivedesign.org

    http://www.sqlservercentral.com/columnists/bkelley/

    K. Brian Kelley
    @kbriankelley

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply