As background my company currently has ~3000 virtualized servers. ~400 TB is dedicated to VM. Of the total amount of virtualized servers 364 of them are virtualized SQL Servers. Our entire virtual environment runs on NetApp hardware and VMWare.
We've been virtualizing our servers for about 8 months. The project started in order to prepare our systems for a move to a new data center. This move is happening now and is supposed to be completed by end of July. Below is a list of my experiences and observations during this process. This also reflects feelings felt by other team members and other departments.
Encountered difficulties in customer and vendor buy-in
Early on there was immediate push back from customers and vendors against virtualization and especially against the aggressive timeline set to virtualize all our systems. For better or worse the corporate response to this was very heavy-handed. In many cases we were told to specifically not mention virtualization to any customer or vendor. If a vendor refused to support virtualization we were told to find a new vendor that would. In the end the vendor came into line and so did the customers.
Replacing older physical resources with fewer faster virtual resources was a difficult sell
During the P2V (physical to virtual conversion) process, resources were reduced on the systems. This was originally done without our knowledge. The physical server might have had 8 CPUs and 16 GB of RAM, but after the P2V we would see our system with 2 CPU and 4 GB of RAM. This caused a lot distrust and confusion. Some customers were very upset and worried about how this was handled. In the end the faster VM hardware proved more than sufficient to make up for the smaller amount of resources. In some cases, resources were added and the VM environment allowed this to happen seamlessly and transparently. The lack of communication was the problem in this case and not the logic behind reducing resources.
Post-VM errors are always attributed to the VM regardless of their actual cause
With any major change all subsequent problems are attributed to the change. As a database team we spent many hours trying to determine if post-VM errors were caused by the migration or something else. We became heavily reliant on the VM team to ease our mind and explain to us the new VM architecture. Again, there was some distrust but then we began to realize that there were very few post-VM problems that could be directly attributed to the migration. As we learned more about virtualizations, we increased the confidence in our ability to troubleshoot our systems.
DBA's must change the way they manage and think about SQL Server provisioning
Virtualization is excellent for helping us to not focus on the particulars of a physical system. For example; how much disk space do we need, or is there enough RAM? These can be easily provisioned later. The problem we have encountered is server sprawl. Since virtualization we have increased our SQL Server inventory by about 30% and this number is growing. Management of this sprawl is becoming a serious challenge.
Backup teams, storage teams, and database teams must work well together
Early in the process my DBA team had a deep distrust of the VM team. This was mostly because of a lack communication about the process but also the simple fact that we did not fully understand the technology and how it may affect our database systems. We were ultimately responsible for the databases and the new processes and architecture were being shoved on us, which could jeopardize our ability to manage the environment. This still poses a significant risk and we've gone through a huge learning curve in our attempt to understand new technologies, like SnapManager for SQL and vSphere. Still, we are working closely with the other teams and the lines of communication have been opened. We've learned it is critical to work together and share knowledge because each of our systems is so tightly linked.
Virtualization alters traditional backup and recovery methods
This may not be the case for other companies and also might be more of a symptom of our underlying hardware than of virtualization, but we have been forced into using NetApp's SnapManager for SQL Server as our primary backup tool going forward. This was dictated to us and implementing it has caused serious challenges, some of which have yet to be worked out. It has potential, but the product isn't quite ready for an environment as large as ours. The lesson to be learned is that virtualization puts the decision making power in the hands of the storage and server teams and in some cases they will dictate what tools you might use to monitor and backup your databases.
Overall, virtualization has been challenging but successful though long-term outlook is still unclear
Virtualization brings a lot to the table but the long-term outlook is still unclear. The system has not become more stable due to virtualization and, in fact, some of the operational costs have probably increased due to server sprawl and management overhead. Our systems may be virtualized but we migrated so quickly that we have yet to take advantage of many of the benefits. These benefits include cloning and setting up virtual test and development environments. Overall virtualization will provide us with many new opportunities but at the cost of many potential risks.