Administrator > Performance

Features

Finding the Bottleneck

Using filemon for performance analysis

Performance - Using filemon for performance analysis

Bookmark and Share Print Email

For whatever reason, I/O seems to be the one component of system performance that doesn't receive much scrutiny in performance analysis. Also, there's confusion about which tools to use and how to interpret that data. How many vmstat-derived charts have you seen where the analyst has included the CPU wait column in the chart showing utilized CPU resources? This notion is incorrect for two reasons: First, the CPU is put into a wait state typically upon a cache miss. Designers of modern processors use many technologies (called fine- and coarse-grained multithreading and simultaneous multithreading) to mitigate the CPU "wait" state. Second, modern disks use DMA, which alleviates the processor from all of the I/O work, except request initiation.

Also, if a logical volume is draped across numerous or even hundreds of disks in a large storage area network (SAN) environment, the I/O-related vmstat values are ambiguous. Therefore, if non-zero values in the "wa" are shown, that's an indicator to look at I/O using another tool.

Without a doubt, my tool of choice for analyzing I/O on AIX* is filemon. While lostat can provide some details, this investigation will take me deeper than lostat can go. I also believe that analyzing I/O requires fundamental knowledge of queuing theory, functional knowledge of the disk mechanics and I/O workloads, and some knowledge of the Logical Volume Manager (LVM) and the Virtual Memory Manager (VMM).

This article doesn't define the numerous parameters that can be measured with filemon. For this analysis, I'll focus on physical volume metrics. (Note: Any performance data contained in this article was determined in a controlled environment. Actual results may vary significantly and are dependent on many factors including system hardware configuration and software design and configuration.)

Actual Analysis
In a production environment, several machines were clustered as Lightweight Directory Access Protocol (LDAP) replica servers (using a DB2* database). A capacity study was initiated to determine the performance bottleneck and guide the direction of an upcoming hardware upgrade. The workload was determined to be open with two classes: database reads and database updates. The transaction component deemed most critical was the read response time.

The data shown in Figure 1(see below) was collected on a production server using the following command:

#filemon -T 1000000 -u -o lv,pv -
O/$HOME/filemon.out;
  sleep 3600;trcstop

(Note: The data-collection period in this analysis was 3,600 seconds. This long collection period follows best practices in capacity analysis. However, 60 to 180 seconds is typically used for performance analysis.)

Next page: >>

Page 1 2 3 4 5

Tom Farwell is a technical editor for IBM Systems Magazine, Open Systems edition. He can be reached through www.tomfarwellconsulting.com.

Advertisement



Buyers Guide

Browse products and services for Administrator.







Advertisement