AIX Memory Performance : Large Memory Pages for AIX
AIX expert Mark Ray gives a primer on large memory pages.
By Mark J. Ray10/14/2019
In most modern OSes, it’s all about the memory. You can have the fastest processors on the planet driving the most sophisticated storage and have it all connected to a blazingly fast network. But what will happen if you take those three resources and couple them to inadequate amounts of memory? Try that little thought experiment for two seconds and you’ll all come up with the same answer: absolutely nothing. You’ll have a system that’s slow as molasses and good for little else than service as a rather large doorstop.
All flavors of UNIX do the same thing when you boot them. They will attempt to load as much of the OS—certainly the kernel, at the very least—into memory so system initialization is accomplished as quickly as possible. And when you start your applications, that code is probably modularized so that big chunks of it will follow the OS into RAM—or as many chunks of the code as will fit. Paging, the process of moving pieces of code from memory out to disk storage and back again, only occurs when a program can’t find a physical memory address range in which to plug the code and has to move previously running code out of that address range and store it on disk while it moves the most recently called code into memory. Severe paging can lead to extreme system slowness, and in the worst cases, can crash a system. And all of this can occur with the best CPUs, storage and network devices. So the first thing to do when installing a new system is to make sure you have your memory sized correctly, with a healthy margin over and above your initial requirement to cover you when your workload grows—and they always grow.
In this series, we’re going to take a look at how AIX uses memory, how to deploy and tune that memory for maximum performance, some of the most popular utilities to monitor memory usage in an AIX system, and how to parcel that memory out in different configurations. We’ll also see that we can make different code in the same system use memory in different ways, altering installed executables such that they will adopt new patterns of memory usage. With that, let’s take a look at some basics.
In any computing system, Memory is allocated in “pages,” chunks that vary in size, depending on the OS. AIX supports four memory page sizes: small (4K), medium (64K), large (16MB) and supreme (16GB). The small memory page size is common to many OSes; it’s a legacy holdover from the early days of computing when memory was very expensive and programming code assumed small amounts of it. You’ll find the small, 4K page size in every version of UNIX, and that includes Linux. Most OSes started life supporting only the 4K size, but over time, they expanded their memory repertoire to include the usage of larger and larger memory page sizes.
In the case of AIX, it was found that coalescing many 4K pages into a larger 64K—or medium-sized—page significantly increased system performance. So the default memory scheme in AIX evolved into one that supports both the small- and medium-sized pages. Let’s take a look at this default memory scheme and how we can measure it.
Default Memory Scheme
When you install an AIX system, the small, 4K sized pages, and medium, 64K pages will be active. How these page sizes are used is determined by AIX as it doles memory out to whatever you’re running on the box, be it applications, databases, middleware or utilities. The usage of medium versus small pages is entirely transparent to running processes; AIX will assign the best memory page size allocation to a process as it runs throughout its life cycle.
Generally, this assignment is a mix of small and medium pages to every process in a system. What AIX is aiming for is maximal efficiency in process execution. So how do you tell how any given system is “mixing” the allocation of small and medium pages? That’s easy, you use vmstat and svmon. Vmstat will give you the 10,000-foot level view of how memory is being used in your system, while svmon lets you get far more detailed in your memory studies, with different flags yielding more and more information.
Let’s start our memory study with vmstat. Most of you run constant vmstat statistics on your systems to track memory and CPU usage over time. You can add statistics for different memory page sizes by adding the “-p” flag, like this:
vmstat -w -t -p ALL 2
This invocation says to start vmstat with a wide display so all of the data isn’t scrunched together, timestamp each line and to include statistics for every size of memory page the system currently has configured. The 2 says to output this data every two seconds. Most of you will see output with 4K and 64K under the pgsz column, like the display in the image below . It’s only if you’ve specifically configured the large (16MB) and supreme (16GB) pages that you will see entries for them.
The SVMON utility gets more granular in how various sized pages are allocated. There are two basic invocations for this purpose. You can list a summary of memory page size breakdown in your system by simply typing “svmon” – working as root – at a command prompt then hitting <enter>. Your output will look like what you see in the image below.
In this output, we will only concern ourselves with a single column, “PageSize.” In this column, you will see the total number of pages that have been allocated in your system, based on size. Notice we have counters in the “s” (small) and “m” (medium) sizes, but nothing in the “L” (large) or “S” (supreme) rows; we remember that large and supreme pages need to be configured manually, while the small and medium sizes are preconfigured by default. The next SVMON invocation is done on a per process basis. In this form, you can take a look at how an individual executable is using memory. The invocation looks like this.
svmon -P 19267612
Where “-P” is the process flag, which is followed by a PID, or process ID.
Example output for this form is in the image above. Here, you see whatever the process is that you’re examining (in this case, it’s a Korn shell process), with how many memory pages of which size broken out in the “PageSize” column. Again, in this example, we see counters for small and medium pages only. There are many other tidbits of information in this output like how different page sizes are allocated to things like the process’ working segments and shared library code, and with the addition of other flags, you can obtain this information and drill down deeply into a process’ architecture.
Those are the basics on how to measure memory in its default configuration in any AIX system. Now, let’s move on to the really large (pun intended) memory and general systems performance enhancers. The rest of this article will deal mostly with the large 16MB pages. These are common in high performance computing environments. From a performance angle, the supreme 16GB pages utilize the same concepts as the large pages, so we won’t break them out into a separate section, other than to run down their configuration method.
Large Memory Pages
Large memory pages in AIX made their debut way back with the POWER4 architecture. One of the reasons they were developed had to do with how memory was segmented in the Power architecture and the fact that Power Systems servers required each virtual page in these segments to be the same size.
Another reason had to do with a thing called the TLB or “Translation Lookaside Buffer.” The TLB is a memory cache that’s used to access a user memory location. TLBs are used in most hardware architectures, not just Power Systems boxes; they store recent translations of virtual memory to physical memory.
The advantage to using large pages in these constructs is to reduce TLB misses; a larger page size lets you map to a larger virtual memory range. Also, large pages also improve memory prefetching by eliminating the need to restart prefetch operations on 4 KB boundaries. But arguably the biggest performance advantage that large memory pages have over the small- or medium-sized pages is that they’re nonpageable. This means they’re “pinned” pages, which are exempt from page outs to disks. Whatever other conditions obtain in your system, large memory pages will stay right where they are: in main memory.
Building a Large Page Pool
In AIX, you start your configuration of large pages by building a pool of them, carving this pool out of the totality of your available RAM. Building this pool is straightforward. You use two commands available to the vmo (the “Virtual Memory Options”): lgpg_size (or large page size) and lgpg_regions (in this case, the term “regions” simply refers to the number of large pages you want to configure). Recently, I configured large pages in a system to be used by several instances of a database. The command I issued was this:
vmo -r -o lgpg_regions=20000 -o lgpg_size=16777216
I used this command string to build a large page pool of roughly 300GB.
One thing about building your large page pool: Although you can configure large pages dynamically, you should get in the habit of running a bosboot and then a reboot of your system to make sure the configuration is preserved across restarts. Also, when you build a pool of large pages, the memory you have assigned to this pool cannot be used by any other memory page size; in AIX, groups of 4K pages can be “promoted” into a 64K page and 64K pages can be “demoted” to the 4K size. But a large, 16MB page, in addition to being exempt from paging, is also exempt from this promotion/demotion scheme.
Pros and Cons of Large Page Pooling
As with any performance tuning endeavor, there are upsides and downsides to the implementation of large memory pages. What you need to do is make sure that your application can make good use of large pages. Check with your vendor; most can give a yay/nay quickly, and we will deal the configuration of large pages for them later in this series.
Databases, on the other hand, almost always benefit from the use of large memory pages. Individual tools and utilities can also benefit from large pages, depending on what they do. Later in this article I’ll show you how to alter individual programs to use large pages, but first, we need to be able to measure their activity in our systems. So let’s return to our old standby, vmstat.
There are two ways to zero in on large page activity in your system, using vmstat: the first is to use a form of the last example I cited in this article:
vmstat -w -t -p ALL 2
This invocation will add a 16M row to the psz column in your vmstat output. If you want to invoke vmstat to look at large page activity exclusively, do it like this:
vmstat -lw 2
In this form, a “large-page” column will be added to the output, with two sub-columns under it: alp and flp;” these stand for the number of large pages currently in use and how many large pages are on the freelist, respectively. This output will look like that in the image below.
SVMON output will differ only in that you’ll now have an “L” row at the bottom of the generic svmon invocation with data in it (svmon <enter>), in addition to the s and m rows. At this stage of the game, your S row (for “supreme” or 16GB pages) will still show zeros. And while we’re on the subject of SVMON, let’s talk about buffer sizes. What happens when you poll a process that’s using large pages with SVMON? Well, for one thing, SVMON has a much greater memory area to look at than if it was examining a process that only used small or medium pages, right? So unless you add a specific flag to your svmon command when looking at a large page database process, you’ll probably get this error:
svmon -P 7407798 Pid Command Inuse Pin Pgsp Virtual 64-bit Mthrd 16MB 7407798 *** Max Buffer Segment Size exceeded. Use -O maxbufsize to increase the size***
This is telling you that you need to increase your buffers to get a full snapshot of the large page process. Fortunately, there’s an option that specifically deals with this error. Invoke svmon with the “maxbufsize” option, like this:
svmon -O maxbufsize=16MB -P 7407798
You’ll see the output you expected. In this example, I use a buffer size of 16MB; adjust this size up or down as your needs require. Not all large page processes are subject to this error. It mostly occurs with database processes, regardless of the vendor. Applications processes may or may not require the “maxbufsize” flag, while system processes hardly ever do. My best advice when learning svmon is to tinker with its dozens of flags on a test system. You can’t really hurt anything with SVMON, but, for example, if you want to mess around with various flag combinations while looking at all of the processes in your system at the same time, you’re going to slow things down.
How Programs Use Large Pages
Now that we have the monitoring of large pages down, let’s look at how to make programs use them. In most database systems, there are settings that enable the use of large memory pages. For example, with Oracle, you enable a setting called “lock_sga,” and in Cache, the same functionality is enabled with the “memlock” parameter. These settings instruct the database instances to look to a configured large page pool for their memory needs, and use the pages in that pool for various constructs. We should also note that in database systems, there’s a security mechanism that enables the user IDs that actually start database instances to use large pages. This is the “CAP_BYPASS_RAC_VMM” parameter; the theory is that using large memory pages needs to be carefully considered. You can’t just use large pages for every process in your system, because doing so will quickly run you out of memory. To counter this scenario, the AIX developers came up with a scheme to limit the use of large pages on a user-by-user basis. Let’s say you start your Oracle database instances with the “Oracle” user; you would enable this user to use large pages like this.
And with other database systems, you’d use the same command structure to enable whatever UID you wish to use large pages. Note that the root user can use large pages by default, so there is no need to adjust its abilities in this regard.