Administrator > Performance

Administrator


Optimizing Partition Performance

Performance - Optimizing Partition Performance

Print Email

It was 35 years ago in 1973 that IBM first shipped a System 370 model with partitioning capabilities. While the first partitioning support for AIX* was delivered much more recently in 2001, the virtualization of 2008 is beyond what could have been imagined all those years ago. Every new release of AIX has seen additional enhancements and features. To enable partitioning, a layer of code sits between the operating system or systems and the hardware. This layer is called a hypervisor, and its job is to control access to the hardware resources.

The IBM* POWER* Hypervisor* is efficient at allocating shared resources among partitions. How those partitions are configured can affect the performance of each partition and the system as a whole. The three top configuration settings for maximizing your partitions’ performance are: using dedicated-processor partitions, allocating shared-processor partitions using whole processors and minimizing the number of virtual processors configured. Let’s examine these settings and the potential downsides that are common with performance-tuning tips.

Underlying Hardware Background

To understand partitioning performance, some background is needed about how the partitions and hardware interact. Figure 1 shows a POWER5* chip, which contains two simultaneous multithreading (SMT) processor cores and a single L2 cache. The L2 cache stores instructions and data that the processors need. Though the L2 cache is relatively small—1.88 MB—it’s much faster than getting data “off-chip” from the L3 cache or main storage. The two processors share the L2 cache, which is important to partition-performance optimization.

Each chip can also be thought of as being grouped with some amount of L3 cache and main store. Additional main storage is accessible to the chip, which is grouped with the other chip. We refer to the main storage grouped directly with the chip as local and the rest of the main storage as remote. It’s slightly faster for a processor to access local memory.

Dedicated-Processor Partitions

Dedicated-processor partitions perform best. Each is configured so no other partition can use the processors assigned to it while the partition is active (powered on). A key advantage of dedicated-processor partitions is that the hypervisor tries to maximize the amount of the partition’s memory that’s local to the processors it uses, resulting in faster memory access. Another advantage is that, if the partition owns the entire chip (has both processors), it never has to share the L2 cache with any other partitions. This benefit drives the next recommendation about shared-processor partitions.

The lack of sharing gives dedicated-processor partitions the best performance, but it’s also the main disadvantage to configuring partitions this way. In dedicating processors to a partition, you optimize an individual partition’s performance at the potential expense of overall system utilization. For example, if you have a processor dedicated to a partition whose CPU utilization never exceeds 50 percent, the rest of the system can’t use the leftover half processor.

Whole-Processor Shared Partitions

A compromise worth considering to balance individual partition performance with overall system performance is using shared-processor partitions, but configured in whole units (i.e., 1.00, 2.00, etc.). The main difference in using shared-processor partitions is that these partitions share processor resources with one another and aren’t assigned to a specific processor. The other difference is that memory for shared-processor partitions is striped across all nodes in the system included in the shared-processor pool. This means on a multi-node system, where multiple nodes exist in the shared pool, a portion of the partition’s system storage will be remote and a portion local.

The advantage in using whole units when configuring shared-processor partitions is to limit the effect of partitions sharing the same processor. While shared-processor partitions can be assigned to run on any processor in the shared pool, and can move around from processor to processor, the hypervisor tries to keep the partition on the same processors. This helps the partition by keeping more of its data in the same L2 cache. If a partition isn’t sized in whole-processor increments, it’ll be sharing at least one processor with another partition. As each partition gets to run on the shared processor, it’ll find that the previously running partition has displaced out of the L2 cache much of the data it needs. So at first it won’t run as efficiently because it’s getting this data from main storage. This inefficiency may only last a short while, but it can have a noticeable effect on very small partitions that only have the processor for short bursts. A partition sized at 0.90 processing units might see little effect, while a 0.10 processing unit partition could have noticeably reduced performance.

Next page: >>

Page 1 2

Eric Barsness is a software engineer at IBM in Rochester, Minn. He's a member of the IBM Systems and Technology Group, System i Lab Services. Eric has specialized in System i performance for 14 years.

Advertisement


Buyers Guide

Browse products and services for Administrator.







Advertisement