One of the key benefits of virtualization is the ability to achieve a high consolidation ratio, thereby getting higher utilization of the hardware. This is especially true of the CPUs, since software licenses are usually tied to the number of CPUs in the hardware. During times of heavy utilization, the environment needs to be configured to make sure VMs with SLAs have their resources protected.
Remember best practices are recommendations which are dependent on their context. One often looks at two sources, stating a best practice that conflicts with each other. The difference can be the context. Here is a perfect example; one best practice is to never overcommit VMs running Oracle applications with stringent performance SLAs. Instead, protect the VMs with features like memory reservations, right size virtual CPUs, resource pools, allocation management mechanisms; such as Storage I/O Control (SIOC) and Network I/O Control (NIOC), etc. This is strongly recommended when virtualizing high workload intensive critical Oracle applications. The virtualized environment should be able to guarantee the resource and the Quality of Service (QoS), needed to meet the business requirements.
Another best practice in conflict with the previous one is to allow some level of overcommitment from Oracle environments. This is a great way to leverage all the features of virtualization by squeezing every ounce of utilization from your hardware that you can. vSphere can manage resource sharing with its algorithms for fair share CPU scheduling, memory entitlement, NIOC, SIOC, and resource pools. However, this approach requires that the virtualization team have a lot of expertise and experience managing the overcommitting of resources and at the same time, ensuring the business SLA requirements are met. Latency-sensitive environments need to perform operations at the millisecond level. In order to achieve both goals, not having the required skill set will severely affect applications, especially when the environment grows or the application encounters increased utilization.
The best way to approach this issue is to start conservative and grow into aggressive, as and when you attain the required level of confidence with the workloads. The recommended way to go with over commitment would be:
- Overcommit your development and test environments as much as you can, staying within common sense and meeting defined requirements.
- Try not to overcommit production high profile environments, unless your expertise is ready for it. Initially, be conservative and do not overcommit production environments with SLAs. Then you can begin overcommitment of databases that do not have high utilization or strict performance SLAs. Use this strategy to build success and confidence with your users that the Oracle software will perform well in a VM.
- Once you develop the right level of expertise, you can overcommit some production environments, if you have guidelines that ensure SLAs are always met. Your team must be good with resource pools, setting DRSpriorities and rules, I/O controls (Storage and Network), and SR-IOV, etc. To say you absolutely do not overcommit production environments is a simple answer, but it is not always the correct one. Over committing allows much higher utilization of your hardware, but requires you to be smart as to when and how you overcommit.
Goals of Best Practices for Virtualizing Oracle
A goal for best practices is to reduce the possibility of errors and minimize variables when trouble shooting.
- Develop virtualization best practices and make sure they are consistently followed.
- Build analytical skills and metricknowledge around the four areas you are virtualizing: Memory, CPU, Storage, and Networking.
- Understand dependencies and inter-dependencies of the various layers of the stack.
- Educate the DBAs about the key metrics they need to understand about the virtual infrastructure, so they can determine if it is a virtualization issue or an Oracle issue.
- Building custom widgets, using vCOPSfor DBAs, to be able to look at the virtual infrastructure the same way they would look at storage and networking in physical server environments.
- Your bench marking should allow you to create consistent and reproducible results that you can compare against. Metrics should always be quantitative.
- With VMware, develop best practices around vCenterand vCenter Operations, or the management and monitoring software you are using. Understand this is going to take time as well as the development of skill and expertise.
- The approach that since the Oracle software has no knowledge if the underlying platform is physical or virtualized, the DBAs do not need to know about the virtual infrastructure will not help solve problems. DBAs and vAdmins need to work together as a team to effectively troubleshoot issues.
What are key metrics for virtualization? With any infrastructure it comes down to people, processes, and technology; learning to understand key metrics with tools like esxtop will be helpful. KISS (Keep It Simple Stupid) applies, because complex systems fail in complex ways. Good design is critical. It’s important to develop internal best practices, management processes, and guidelines for managing and monitoring a virtualization environment. It is vital to ensure your infrastructure management is ready to handle tier one workloads and the dynamics they can create.
Posted by Charles Kim, VMware vExpert, Oracle ACE Director