Performance Recommendations for Virtualizing AnyThing with VMware vSphere 4

( Derived from: Performance Recommendations for Virtualizing Zimbra with VMware vSphere 4 http://wiki.zimbra.com/wiki/Performance_Recommendations_for_Virtualizing_Zimbra_with_VMware_vSphere_4)

Introduction

VMware vSphere’s virtualization capability to deliver computing and I/O resources far exceeds the resource requirements of most x86 applications. This is what allows multiple application workloads to be consolidated onto the vSphere platform and benefit from reduced server cost, improved availability, and simplified operations.

However, there are some common misconfiguration or design issues that many experience when virtualizing applications, especially Enterprise workloads with higher resource demands than smaller departmental workloads.

We have compiled a short list of the essential best practices and recommendations to ensure a highly performant deployment on the vSphere platform. We have also provided a list of highly recommended reference material to both build and deploy a vSphere platform with performance in mind, as well as troubleshooting steps to resolve performance related issues.

CPU Resources

NUMA

Non-Uniform Memory Access (NUMA) is a memory architecture used in multi-processor systems. A NUMA node is comprised of the processor and bank of memory local to that processor. In NUMA architecture, a processor can access its own local memory faster than non-local memory or memory local to another processor. A phenomenon known as NUMA “crosstalk” occurs when a processor accesses memory local to another processor causing a performance penalty.

VMware ESX™ is NUMA aware and will schedule all of a virtual machine’s (VM) vCPUs on a ‘home’ NUMA node. However, if the VM container size (vCPU and RAM) is larger than the size of a NUMA node on the physical host, NUMA crosstalk will occur. It is recommended, but not required, to configure your maximum VM container size to fit on a single NUMA node.

For example:

CPU Resources

It is okay to over commit CPU resources, it is not okay to over utilize. Meaning you can allocate more virtual CPUs (vCPUs) than there are physical cores (pCores) in an ESX host as long as the aggregate workload does not exceed the physical processor capabilities. Over utilizing the physical host can cause excessive wait states for VMs and corresponding applications while the ESX scheduler is busy scheduling processor time for other VMs.

Most apps are not CPU bound when disk and memory resources are sized correctly. It is perfectly fine to over commit vCPUs to pCores on ESX hosts where the workloads will be running. However, in any over committed deployment it is recommended to monitor host CPU utilization, VM Ready Time, and utilize the Dynamic Resource Scheduler (DRS) to load balance VMs across hosts in a vSphere Cluster.

VM Ready Time, host CPU utilization, and other important resource statistics can be monitored using ESXtop or from the Performance tab in the vSphere Client. You can also configure Alarms and Triggers to email administrators and perform other automated actions when performance counters reach critical thresholds that would affect the end user experience.

See the Performance Troubleshooting for VMware vSphere 4 guide for detailed information on performance troubleshooting.

vCPU Resources

Reduce the number of vCPUs allocated to your VM to the fewest number required to sustain your workload. Over allocating vCPUs causes excessive and unnecessary CPU overhead and idle time on the physical host. When memory and disk resources are sized appropriately, most apps are not a CPU bound. If your VM experiences less than 60% sustained utilization during peak workloads, we recommend reducing the allocated vCPUs to half the number of currently allocated vCPUs.

VM Memory Allocation

If you see periods of high, sustained CPU utilization on your VM, this may actually be caused by memory backpressure or a poorly performing disk subsystem. It is recommended to first increase the memory allocated to the VM (make sure you match the VM memory reservation to the total allocated memory for as a JAVA workload best practice). Then, monitor VM CPU utilization, VM disk I/O, and in-guest swapping (can cause excessive disk I/O); for signs of improvement and other issues before increasing the number of vCPUs allocated to your VM.

Memory Resources

To configure memory reservations:‘myVM’ -> Summary Tab -> Edit Settings -> Resources – > Memory -> Reservation

Network Resources

Storage Resources

VMFS Datastores

Do not oversubscribe VMFS datastores. Disk I/O and latency is a physics issue and storage design has the same impact on performance virtual as it does physical. Design your VM’s storage with the appropriate number of spindles to satisfy I/O requirements for DBs, indexes, redologs, blob stores, etc.

See the Performance Troubleshooting for VMware vSphere 4 guide for detailed information on performance troubleshooting. Remember that insufficient memory allocation can cause excessive memory swapping and disk I/O. See the memory resource section for information on tuning VM memory resources.

PVSCSI Paravirtualized SCSI Adapter

RDM devices versus VMFS Datastores

There is no performance benefit to using RDM devices versus VMFS datastores. It is recommended to use VMFS datastores unless you have specific storage vendor requirements to support hardware snapshots or replications in a virtual environment.

VMDK Disk Devices

Configure your VMs, VMDK disk device as thick-eagerzeroed to zero out each block when the VMDK is created. By default, new thick VMDK disk devices are created lazyzeroed. This causes duplicate I/O the first time each block is written to the disk device by first zeroing the block, then writing your application data. This can cause significant performance overhead for disk I/O intensive applications.

To configure thick-eagerzero VMDK disk devices either:

Or

vmkfstools -k /vmfs/volumes/path/to/vmdk

To configure thick-eagerzero VMDK disk devices from the ESX CLI: vmkfstools -k /vmfs/volumes/path/to/vmdk

For more information about the ESX CLI, see the vSphere Command-Line Interface Documentation at http://www.vmware.com/support/developer/vcli/

Fiber Channel Storage

If using Fiber Channel storage, configure the maximum queue depth on the FC HBA card.

IP-Based Storage

vSphere Cluster Recommendations

VMware vMotion

Use dedicated physical NIC ports, teams, and VLANs for vMotion traffic to avoid contention between client/server I/O, storage I/O, and vMotion traffic.

VMware HA

Confirm VMware HA is enabled for the vSphere Cluster to automatically recover your VMs in the vSphere Cluster in case of unplanned hardware downtime.

VMware DRS

Reference Materials

Zimbra vSphere Best Practices

http://files2.zimbra.com/zca/zca-6.0.7_GA_341/doc/Zimbra_on_vSphere_Performance_Best_Practices.pdf

Performance Best Practices for VMware vSphere 4.0

http://www.vmware.com/pdf/Perf_Best_Practices_vSphere4.0.pdf

VMware vSphere 4 Performance with Extreme I/O Workloads

http://www.vmware.com/pdf/vsp_4_extreme_io.pdf

Performance Troubleshooting for VMware vSphere 4

http://communities.vmware.com/servlet/JiveServlet/download/10352-1-28235/vsphere4-performance-troubleshooting.pdf

Understanding Memory Resource Management in VMware ESX Server

http://www.vmware.com/files/pdf/perf-vsphere-memory_management.pdf

Comparison of Storage Protocol Performance in VMware vSphere 4

http://www.vmware.com/files/pdf/perf_vsphere_storage_protocols.pdf

Best Practices for Running vSphere on NFS Storage

http://vmware.com/files/pdf/VMware_NFS_BestPractices_WP_EN.pdf

Configuration Maximums for VMware vSphere 4.0

http://www.vmware.com/pdf/vsphere4/r40/vsp_40_config_max.pdf

What’s New in VMware vSphere 4: Performance Enhancements

http://www.vmware.com/files/pdf/vsphere_performance_wp.pdf