Don't Be Misled By MIPS
Companies are spending large amounts of money on processor capcity based on poorly understood indicators, but no one number describes capacity.
By Ted MacNeil11/02/2004
One of the most misused terms in IT has to be MIPS. It's supposed to stand for "millions of instructions per second," but many alternate meanings have been substituted:
- Misleading indicator of processor speed
- Meaningless indicator of processor speed
- Marketing indicator of processor speed
- Managements impression of processor speed
Jokes aside, management has a tendency to want one figure to represent a processor's capacity. And companies are spending large amounts of money based on a poorly understood indicator, for both software and hardware acquisitions.
Unfortunately, no one number describes capacity. Processor speed varies depending on many factors, including (but not restricted to):
- Workload mix
- Memory/cache sizes
- I/O access density
- Software levels (OS and application subsystems)
- Hardware changes
Workload mix is the largest contributor to the variability of capacity figures. An online workload has more of a random access pattern than batch (sequential) processing. Online subsystems, by design, rarely access data in a sequential pattern; they constantly request new records from disk or (hopefully) from buffers.
This brings forward the next point. Memory and cache sizes can significantly improve the throughput of a processor. Online subsystems buffer data and manage it on a least recently used (LRU) basis. The theory is that recently referenced data may be accessed again. If there is a hit (data residing in buffers), then we bypass the expensive physical I/O. This also applies to the processor cache. Memory access isn't direct; the processor moves the data to a local cache and then accesses it from there. Frequent cache updates lead to a slower-running processor.
The more I/O an application does, the less processor throughput it sustains. This is due to interrupt processing, suspending the task until the I/O completes and re-dispatching the task after the I/O completes. Again, larger memory sizes can reduce the impact, if the application makes use of it.
As software enhancements come out, they can and do make use of differing techniques and can improve or degrade the expected processor capacity. An example is the conversion from 31-bit addressing to 64-bit.
Prior to OS/390 2.10 (in 64-bit mode), mainframes were limited to 2 GB of central storage with expanded storage satisfying any further requirement. Central storage is byte-addressable (expanded-page addressable). So if data resides in expanded storage, it has to move to central storage to be used. This is less expensive than getting the page from auxiliary storage, but it still requires resources.
After implementing 64-bit addressing, all memory is byte-addressable, but it requires more resources to manage memory at this level. There's a trade-off with the overhead of managing expanded storage, but processor throughput reduces slightly overall.
The final major factor is that hardware (or microcode) changes can also introduce a lower instruction rate. In the mid 1980s, IBM changed the way pages moved from memory (before expanded) to disk. This used to take approximately 5,000 instructions. IBM introduced a single instruction that would do the same movement. This improved performance, but the instruction rate dropped. Is this a capacity reduction? To my mind, no.
Partitioning a physical processor using Processor Resource/Systems Manager (PR/SM) introduces new factors that affect throughput. First, the processor cache(s) now supports different images with, most likely, different workload and data mixes. So not all of the cached data will be available, and re-fetching that data reduces throughput. Also, some cycles are required to manage the logical processors, assigning them to physical processors when work has to be done. Some of this runs "outside" the individual image and some runs within. Both reduce effective capacity.
All of these factors introduce so many variables that having an effective single MIPS figure for any single processor is impossible; hence, "M" for meaningless or misleading. But a major issue remains: Companies are spending large sums of money based on what MIPS are believed to be.
Some analysts have tried to classify zSeries hardware-based on research, good and shoddy-but no two analysts agree. Years ago, many companies would have complex benchmarking schemes and many terms and conditions to guarantee performance. IBM even formalized the process with a methodology called Large Systems Performance Reference (LSPR). But all of these approaches only proved that the great variance in the number of MIPS derived, based on the aforementioned issues.
Benchmarking also consumes many resources, both people and systems, and most companies depend on outside consultants to derive the figures, again leading to problems since nobody agrees on a consistent source for MIPS derivations. Even IBM doesn't measure all processors and workload mixes within LSPR. The problem has just become too large to manage.
IBM and other vendors have tried many classification schemes over the years, such as MIPS, relative capacity, software groups, and MSUs (millions of service units). The only real constants are customer confusion and no real understanding of processor capacity.
I've found the best approach is to accept that processor capacity isn't a constant, and negotiate with IBM a methodology to "prove" the capacity in your processing environment, with guarantees written into the contract. And take the same approach when negotiating pricing for software, with an understanding as to how you can both agree on ratings to use for capacity-based software.
Ted MacNeil is a capacity/performance analyst with more than 25 years in the IBM mainframe environment.
Sponsored ContentSoftware-defined Secondary Storage Solutions Come of Age
Post a Comment
Note: Comments are moderated and will not appear until approvedcomments powered by Disqus