Lessons Learned from a Large Virtualization Implementation

  • bstephens,

    You're right. Our LUN configurations have been the greatest source of pain for us but our LUN configuration pain has been due to our implementation of SnapManager for SQL Server and not the P2V. In order to properly configure SMSQL you need to align the LUN and files just right. This alignment is required for the snapshot technology.

  • Scott, thanks for the great information. We are just starting to think about virtualizing SQL servers.

    ALZDBA (6/16/2011)


    Nice overview !

    I always keep in mind this very to the point article by Brent Ozar (@BrentO) regarding our job and virtualization:

    http://www.theinfoboom.com/articles/virtualizing-databases-too-big-to-fail/

    Pay special attention on the real question !

    Thanks for the reference - great questions!

  • henry.scott (6/16/2011)


    I think the key point in the article is that most of this was dictated to the SQL guys. This seems to be the way these are done. There is no benefit to SQL itself, but there is to the organization with respect to overall manageability. Problem I am currently having is that our VM guys think the VM server is fire and forget. We end up having performance problems and they do not monitor properly so we don't get any answers on solving the issues. The age old mantra of "optimize the SQL code" becomes the crutch for VM server mismanagement.

    So true.

    At my employment we have some virtualization although our main/live SQL Server instances are not one of them, and whenever there's an issue the VM heads automatically say its something custom that the DB folks have done (implying that VM is perfect and so it can't be anything on their end) only to eventually find out it was something related to the VM implementation. I don't have issues for problems as problems will always occur not matter what the setup but it does irk me to no end the "Well it has to be you because my stuff is perfect" attitude that often seems to come from the VM team; at least in my experiences.

    Kindest Regards,

    Just say No to Facebook!
  • As someone who loves the benefits that virtualization offer, implementing VM across the board on everything simply because its whats hot and being pushed at the time is as bad an idea as any prior 'hot tech of the minute' being pushed as the be-all end-all solution to every IT woe.

    Virtualization is another tool that when used appropriately will benefit all but when used in excess and for the sole reason that its hot and everyone else is doing it, will only cause new headaches even if it manages to solve old ones.

    Kindest Regards,

    Just say No to Facebook!
  • We have been mandated to virtualize all of our SQL Servers (Tier 3 -> Tier1). We have had discussions with VMware SQL experts and short of telling us that VMware performs great, we still lack the capacity to monitor/make configuration changes as we are used to in our physical environment. SQL Server must be looked at as an application running on a VMware environment. Even though the environment runs great (watch your VCenter console) I am not getting the response I desire from SQL Server application through integration (application affinity errors, poor SQL query responses, memory paging, and by the way you cannot turn off Hyper-Threading (this is a Microsoft best practices for SQL Server). Upcoming SQL Server releases such as 2008 R2 SP1 will enhance intergration by creating a better dynamic memory interaction, however the question is why should Microsoft cater to VMware when they have their own product Hyper-V? Are these improvements going to enhance VMware or just Hyper-V? If I had my wishes I would shove off VMware and turn on Hyper-V. SharePoint 2010 in particular is a complete Microsoft product and you are going to run it on VMware. Good Luck!!!

  • Jim Johnston (6/16/2011)


    We have been mandated to virtualize all of our SQL Servers (Tier 3 -> Tier1). We have had discussions with VMware SQL experts and short of telling us that VMware performs great, we still lack the capacity to monitor/make configuration changes as we are used to in our physical environment. SQL Server must be looked at as an application running on a VMware environment. Even though the environment runs great (watch your VCenter console) I am not getting the response I desire from SQL Server application through integration (application affinity errors, poor SQL query responses, memory paging, and by the way you cannot turn off Hyper-Threading (this is a Microsoft best practices for SQL Server). Upcoming SQL Server releases such as 2008 R2 SP1 will enhance intergration by creating a better dynamic memory interaction, however the question is why should Microsoft cater to VMware when they have their own product Hyper-V? Are these improvements going to enhance VMware or just Hyper-V? If I had my wishes I would shove off VMware and turn on Hyper-V. SharePoint 2010 in particular is a complete Microsoft product and you are going to run it on VMware. Good Luck!!!

    does hyper-v support all the features of vmware like vmotion?

  • I work for a company that hosts our own VM environment for our customers, as well as for in-house systems. In setting up the environment, we learned a few lessons, mainly that I/O is as important in virtual as it is in physical. I've seen a few other posts touch on this as well.

    The initial build started with 1Gbps connectivity from the VM hosts back to the SAN. We quickly found that performance was non-optimal compared to physical, especially when testing multiple VMs in parallel. The solution was to upgrade to 10Gbps connectivity, which got us much closer. But, that's not the whole story. Our early test VMs had essentially just a large C drive. It's virtual, right? Shouldn't make a difference. Wrong! After talking with the Storage team we realized that we had multiple redundant paths from the hosts back to SAN. Each path has 10Gbps throughput. However, a VM with only one virtual disk can use only one of those paths. So, our next test was to create a VM with multiple virtual disks, each one residing on a different datastore (LUN), and having the Storage team affinity each LUN to its own storage processor on the SAN side.

    Using this setup, we were able to configure a super-VM where each drive (D:, E:, etc) was getting a full 10Gbps pipe back to SAN. At this point we were convinced that VM could match or beat physical in raw I/O.

    To reiterate other posts, it's definitely true that virtual is not for everyone. Virtual can't match physical on number of cores and total system memory for very high-end systems. And, licensing comes into play as a physical server with a single quad-core CPU is one license, but the equivalent VM with four virtual CPUs requires four licenses. But for us it's a big win as we can now run hundreds of VMs in the same floor space and using far less hardware/power/network than equivalent physical boxes. That's not something necessarily visible to the typical DBA but it can have a big impact on the company's bottom line.

  • Frankie-464050 (6/16/2011)


    Licensing: We have SQL Enterpise licensed our physical VM node sockets and this has allowed us to bring up as many SQL servers as the licensed nodes can handle. We have many very small systems that require SQL Server and this has saved us thousands and thousands on licensing costs.

    I'm not sure how you can say this. If you license each processor on a physical server, you can install as many instances as you want without the 10% - 15% performance penalty that virtualization gives you. I will grant that MS has made licensing for virtual machined much better than it used to be, but still...

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • DCPeterson (6/16/2011)


    Frankie-464050 (6/16/2011)


    Licensing: We have SQL Enterpise licensed our physical VM node sockets and this has allowed us to bring up as many SQL servers as the licensed nodes can handle. We have many very small systems that require SQL Server and this has saved us thousands and thousands on licensing costs.

    I'm not sure how you can say this. If you license each processor on a physical server, you can install as many instances as you want without the 10% - 15% performance penalty that virtualization gives you. I will grant that MS has made licensing for virtual machined much better than it used to be, but still...

    Virtual lets you run different configurations - 32-bit or 64-bit. SQL2005 or 2008. SP1, SP2, RTM. Etc. Can't do this with multiple instances running on a single physical box.

  • SQL Server 2008 R2 licensing is a major change from 2008, It allows a virtual host to have one license for 4 virtual guests for the EE edition and unlimited for the datacenter edition. So now if you are on EE and you Vmotion a guest to another host and get audited, you will be fined because you will exceed your licensing quota. Just a gotcha.

    Also intel nehalem processors bring back hyper-threading. It's configurable in the ESX server. It is supposed to be disable by default, for those that are hyper-thread antiactivists it might make you and Microsoft happy, but there are ramifications about how many cores you will actually not see because you don't have monitoring tools????

  • Very interesting article. Thank you for providing the info.

    At the place I work we've had a virtualization discussion recently with the (outsourced) hardware team. They presented a concept to reduce the number of servers to 1/3rd. The presentation was rather long, but at the end they presented the costs involved: we were supposed to pay the same for the virtualized environment than for the number of servers we currently run (replacement cycle). But at least it made us think about server consolidation. We were able to reduce the number of servers required to 60..65%.

    The question our vendor still has to answer: why did they present a virtualization solution instead of a consolidation concept?



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Great article and discussion!

    We've converted most of our SQL servers to VMware (NetApp back-end) for many reasons... Data center space, energy consumption, replace aging hardware, etc. I'm a fan, but now disk I/O is a concern. Has anyone else encountered frequent I/O slowness messages, such as this:

    Date6/17/2011 10:47:38 AM

    LogSQL Server (Current - 6/17/2011 10:50:00 AM)

    Sourcespid2s

    Message

    SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [j:\mssql.1\mssql\data\templog.ldf] in database [tempdb] (2). The OS file handle is 0x000006B4. The offset of the latest long I/O is: 0x0000000005d600

    Our SAN admins say there's no latency on the drives, but I see high numbers for sec/Read and /Write in perfmon.

    Any advice?

  • seen that a lot. a lot of SAN vendors will use RAID5 arrays and then the software will carve out your volumes from the big array. it's very possible the same physical disks that host your SQL databases also host MS Exchange and other servers.

    i've seen where a SAN had every bit of space on every set of disks filled by different apps because PHB was a micro manager and couldn't stand to waste any space. one time we bought a bunch of disk and i ran some tests and it outperformed the production volumes because it was new disk and nothing else was on there.

    i believe EMC and Netapp both say best practices are to host SQL on it's own volumes but a lot of companies will do otherwise due to costs

  • LutzM (6/17/2011)


    Very interesting article. Thank you for providing the info.

    At the place I work we've had a virtualization discussion recently with the (outsourced) hardware team. They presented a concept to reduce the number of servers to 1/3rd. The presentation was rather long, but at the end they presented the costs involved: we were supposed to pay the same for the virtualized environment than for the number of servers we currently run (replacement cycle). But at least it made us think about server consolidation. We were able to reduce the number of servers required to 60..65%.

    The question our vendor still has to answer: why did they present a virtualization solution instead of a consolidation concept?

    if they own the virtualization infrastructure then they can sell the same hardware to other customers

  • AllyAnneA (6/17/2011)


    Great article and discussion!

    We've converted most of our SQL servers to VMware (NetApp back-end) for many reasons... Data center space, energy consumption, replace aging hardware, etc. I'm a fan, but now disk I/O is a concern. Has anyone else encountered frequent I/O slowness messages, such as this:

    Date6/17/2011 10:47:38 AM

    LogSQL Server (Current - 6/17/2011 10:50:00 AM)

    Sourcespid2s

    Message

    SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [j:\mssql.1\mssql\data\templog.ldf] in database [tempdb] (2). The OS file handle is 0x000006B4. The offset of the latest long I/O is: 0x0000000005d600

    Our SAN admins say there's no latency on the drives, but I see high numbers for sec/Read and /Write in perfmon.

    Any advice?

    SANs and backend storage configurations are the greatest risk to VM implementations - especially for databases. VMWare does recommend isolating databases on their own volumes. Another issue you need to look at is block size and alignment. VMWare recommends larger block sizes for database files but the storage team rarely implements storage focused solely on database files.

    First off, I would demand access to vSphere. There should be no reason why the VM team can't give you read access to vSphere. They can even create a separate folder for you and put all the SQL Servers in that folder so you don't see other machines on the same host (though that would be good information). You won't be able to win a bottleneck argument unless you have the same tools available to you. The tools you are normally used to for troubleshooting SQL Server performance may not be giving you correct information.

Viewing 15 posts - 31 through 45 (of 47 total)

You must be logged in to reply to this topic. Login to reply