Implementing a Three-Node Redundant GPFS Cluster
The IBM General Parallel File System (GPFS) is a highly available scalable clustered filesystem that provides both availability and performance when it’s necessary to share files between LPARs. It provides concurrent shared disk access to a single global namespace and can be set up in multiple configurations, each providing different levels of redundancy. IBM has an excellent document available on setting up a two-node cluster; however, in this article we’ll discuss setting up a three-node cluster for full redundancy.
GPFS provides a single global namespace for data across platforms, eliminating the need for NFS and multiple copies of the data. It can handle direct SAN connections from clients and servers or it can connect over TCP/IP. The data access appears to be the same to the client as GPFS will transparently send the block I/O request over the TCP/IP network instead of the SAN, if the client is network attached.
GPFS is very scalable and currently supports up to 2 billion files per filesystem, up to 256 filesystems and up to 8,192 nodes in the cluster. AIX, Linux and Windows systems can all participate in the cluster.
The current AIX setup we’re migrating from has two disk subsystems, each providing disks to the LPAR, and those disks are LVM mirrored to provide cross-disk subsystem redundancy. GPFS doesn’t support LVM mirroring so we’ll take advantage of using failure groups for redundancy. We put all the disks from one disk subsystem into one failure group and all the disks in the other disk subsystem into a second failure group. In our case, we’re using NPIV but this will work with vSCSI or direct SAN-attached disks. The reason for using three nodes instead of two is that we don’t want to use a tiebreaker disk; to have a quorum (majority of nodes present), we needed to have three nodes instead of two.
To keep it simple, we have three LPARs (gpfs1, gpfs2, gpfs3) and each will have a boot disk (hdisk0) and two GPFS disks (hdisk1 and hdisk2). We’re using NPIV to provide the disks.
The first step is to build the AIX nodes with only the boot disks attached and get them installed at the same version of AIX (7.1 TL03 SP1 in this example). Once the LPARs are booted, set the following on each of the fibre adapters:
chdev -l fcs0 -a max_xfer_size=0x200000 -a num_cmd_elems=1024 –P
Do this on each of fcs adapters. It’s important that these be set on the VIO servers first and the VIO servers should then be rebooted.
Also set the same values on the NPIV client, and then zone and map the disks from the first disk subsystem to that LPAR, and run cfgmgr.
In my test, I tried using rendev to rename the disks to make it obvious where they came from, however GPFS wouldn’t recognize the disk type when I ran mmcrnsd so I left them as the old names.
Set the following parameters and put a PVID on the disks:
chdev -l hdisk1 -a queue_depth=48 -a reserve_policy=no_reserve –P
chdev –l hdisk1 –a pv=yes
Perform this on every hdisk. At this point, reboot the client LPARs. After the reboot, check that you see all the disks and that they have the same PVIDs, etc.
The next step is to install the GPFS software and any fixes – in my case, GPFS 3.5 with FP15. Once installed, set up ssh with no password between all of the three LPARs. You can strengthen ssh to ensure that only a subset of servers can use this facility (see reference 5).
On each node, create a file called /usr/local/etc/gpfs-nodes.txt and put in it a list (one per line) of the nodes in the cluster. At this point, you must decide whether or not you’ll use fully qualified names. I used fully qualified names, which I also used for the hostname.
On all of the nodes, add the following to /etc/environment:
/usr/lpp/mmfs/bin should be added to the end of the PATH
Additionally /etc/security/limits should be updated with fsize=-1 and nofiles=20000 or some other large number. At this point, you’re ready to create the cluster.
comments powered by