• I think there's another aspect of the "scale well" paradigm that needs to be considered. When we do enter into the next order of magnitude at the server or staffing level you often need to consider adding the next level of administrative unit whether that be a person or software module geared more toward overall governance and performance monitoring. There are different demands on that resource and they should be purchased or hired based on their proven abilities in those areas.

    If you have 1,000 servers you probably have some very high level monitoring and reporting tools to govern them and the responsibility for performance, outages, exception reporting rises to that module. If you have 10 IT staff do you have that same manager in place and are you expecting them to perform those administrative tasks that deal with performance, outages and exceptions? You should, in both cases it relieves some of the pressure on the layer below who are not necessarily designed to carry those functions out at the current scale and that will indirectly increase your uptime and throughput.

    Cheers.