Implementing High Availability for Workload Partitions
When deploying WPAR environments, careful consideration is needed to ensure maximum availability. The High Availability Cluster Multi-Processing (HACMP) 5.4.1 release introduces basic WPAR support, which allows a resource group to be started within a WPAR. However, several limitations currently exist.
The November 2007 release of the AIX 6 OS introduced workload partitions (WPARs), an exciting new feature with the potential to improve administration efficiency and assist with server consolidation. System WPARs carve a single instance of the AIX OS into multiple virtual instances, allowing for separate virtual partitions. This capability allows administrators to deploy multiple AIX environments without the overhead of managing individual AIX images.
When deploying WPAR environments careful consideration is needed to ensure maximum availability. Potentially, new single points of failure are introduced into the environment, including the network between the WPAR host and the NFS server, the NFS server, the WPAR(s), the OS hosting the WPAR(s) and the WPAR applications.
The High Availability Cluster Multi-Processing (HACMP) 5.4.1 release introduces basic WPAR support, which allows a resource group to be started within a WPAR. However, several limitations currently exist. The biggest drawback is that currently, HACMP doesn’t manage or monitor the WPAR itself; it manages the applications that run within a WPAR. This is a key point and for this reason with best practice in mind, I recommend a much different method of supporting WPARs with HACMP than the default method supported by HACMP.
Recommendations for High Availability
Firstly, and most importantly, I recommend that all WPARs that are clustered for high availability be designed and created so they’re not fixed to any one individual node or LPAR using what’s known as WPAR mobility. WPAR mobility is an extension of WPAR and allows a WPAR to be moved from node to node independently of products such as HACMP. This involves the use of at least a third node that acts as an NFS server to host at minimum the /, /home, /var, /tmp filesystems for each of the WPARs. This approach has several major advantages as by default we can move the WPARs by checkpointing and restoring the system. The checkpoint facility saves the current status of the WPAR and its application(s) and then restarts them at the previously saved state. Checkpoints can be saved continually at any time length, say every hour. During recovery the appropriate checkpoint can be restored. This way the system can be restored to a point of last known good state. Furthermore, I recommend that :
- The service-type IP addresses for the WPARs are defined to the WPAR themselves and not to HACMP.
- WPAR mobility is created and tested outside of HACMP control before being put under HACMP control.
- HACMP controls the startup, shutdown/checkpoint and movement of the WPAR to the backup nodes.
- HACMP monitors the health of the WPAR applications using process and/or custom application monitoring.
- For multiple WPAR configurations, use mutual takeover implementation to balance the workload of the AIX servers hosting the WPARs and the NFS servers.
Figure 1 shows an example of a highly available WPAR environment with both resilience for the NFS server and the WPAR hosting partitions. The WPAR zion is under the control of HACMP and shares both filesystems from the local host and the NFS server as shown in Figure 2. Note that the movement of wparRG will checkpoint all running applications, which will automatically resume from the checkpoint state on the backup node. No application startup is required, but a small period of downtime is experienced.
Using this method of integration makes WPAR support with the HACMP release independent, meaning that the same implementation steps can be carried out with any supported version of HACMP, not just 5.4.1.
Alex Abderrazag has worked for IBM since 1994 and has been part of IBM Training for the past six years specializing in POWER technology, TCP/IP, security and high availability. Alex has more than 17 years experience working with UNIX systems and has been actively responsible for managing, teaching and developing the AIX/Linux education curriculum.
More Articles From Alex Abderrazag
Advertisement
Search our new 2012 Buyer's Guide.
Advertisement
Maximize your IT investment with monthly information from THE source...IBM Systems Magazine EXTRA eNewsletter. SUBSCRIBE NOW.
View past AIX EXTRAs here