IBM z13 Vector Facility Works to Accelerate Features
As part of the announcement of the new IBM z13 machine in January, IBM announced the addition of the Vector Facility allowing acceleration of certain features on programming languages and products that can exploit it.
The Vector Facility was implemented in z13, using the AltiVec architecture with extensions. IBM and others developed it in the end of the 1990s. This architecture defines and implements new registers and a large instruction set that supports Single Instruction, Multiple Data (SIMD) operations.
One of the first AltiVec implementation was made in Power G5 processor largely used in desktops and laptops,used mainly for multimedia acceleration of graphics and video. Several software producers also took advantage of this new set of instructions to accelerate their multimedia applications. IBM servers equipped with POWER processors also took advantage of the AltiVec, especially when used in large clusters of computers for parallel scientific applications. In mid-2006, IBM jointly developed Cell processor that equipped the Playstation 3 family of video game consoles and some specific IBM Blades. The Cell processor has also implemented a set of SIMD instructions and the AltiVec architecture, however this implementation was done in the eight synergistic processors known as SPEs that equipped each of the Cell processor chip. AltiVec is still present in the modern POWER processors and servers made by IBM and other manufacturers.
How the Vector Facility Works
The Vector Facility in the z13 system implements 32 registers of 128 bits and 139 new instructions.
The first 16 vector registers (VR) are shared with 64 bits floating-point registers (see Figure 1), already existing in z Systems architecture and used by traditional floating-point (FP) instructions. The first registers numbered 0 to 15 are mutually exclusive between the vector instructions and the FP instructions, any read or write operation in one of these registers by a non-vector floating-point instruction will void the contents of the companion vector register. Likewise any vector instruction that reads or write in one of these registers will void the contents of the companion floating-point register.
The data is loaded and manipulated into the VRs in five different formats: byte, halfword, word, doubleword or quadword (see Figure 2).
SIMD instructions are capable to handle independent data in the same VR in one single operation.
Lets suppose there are 16 integer numbers to be summed in pairs and save the result of each operation. To accomplish this in a conventional way, at least eight sum and store operations will be needed (see Figure 3).
If SIMD instructions are used instead of conventional ones, the same group of 16 elements can be loaded as halfwords into two distinct VRs and the result of each independent sum can be stored in a third VR. That complete sum can be done with a single instruction or step (see Figure 4).
To support the Vector Facility in z/Architecture, 139 new instructions were added to the existing ones in the z Systems processor. They are grouped as:
- Vector support: 46 instructions do operations as gather, generate mask, load, merge, pack, permute, replicate, scatter, select, sign, store and unpack.
- Vector integer: 66 instructions do operations as add, and, average, checksum, element compare, compare, count, xor, galois field, load, maximum, multiply, nor, or, population count, rotate, shift, subtract, sum and test under mask.
- Vector floating point: 21 instructions do operations as add, compare, convert, divide, load, multiply, perform sign operation, square root, subtract and test data.
- Vector string: Six instructions allowing acceleration when processing character data strings. Do operations as find, isolate and string range compare.
All instructions are 48 bits long beginning with the same opcode x’E7’ and the operation to be done is defined by the byte value located in the position 40 to 47. (see Figures 4, 6 and 8).
The vector integer instruction to add the numbers of the example in Figure 4 should be the VECTOR ADD that has opcode x’E7’ and the operation to be done x’F3’ (see Figure 5).
This instruction sums each element of the second operand, a VR (V2), with the correspondent element of the third, a VR (V3) and stores the result in first operand, a VR (V1). Each element is treated as a signed binary integer. The size of each element into the VR is defined by the value in M4 in bits 32 to 35 (see Figure 6).
Another example is the VECTOR MULTIPLY EVEN (Vector Integer) that multiplies elements (see Figures 7 and 8).
In this instruction, the elements indexed in the Even position (0, 2, 4 and 6) of the second operand, a VR (V2), are multiplied by the corresponding elements indexed in the Even position of the third operand, a VR (V3) and the individual multiplication result of each element are stored in the correspondent pair Even+Odd of the first operand, a VR (V1). The elements are treated as signed binary integers.
An immediate implementation of this instruction can be observed in XL C/C++ 2.1.1 for z/OS in the statement Vector Multiply Even that should be codified as vec_mule in a C/C++ program:
x = vec_mule(a, b)
This statement returns a value in a vector (x) that contains the result of the multiplications done for each Even element of the first vector (a) and its corresponding pair in the Even position of the second vector (b) (see Figure 9).
To help search and parsing of text and characters, functions largely used in analytics, commercial programs, XML parsing and many other uses, a set of Vector String instructions was also incorporated helping to improve such functions that usually requires many loops and instructions to accomplish the task.
An example of a Vector String manipulation is the VECTOR FIND ELEMENT EQUAL instruction (see Figure 10) and where its operation is: starting from the left to right, each element of the second operand, a VR (V2), is compared with the corresponding elements of the third operand, a VR (V3). If two elements are equal a index is stored in byte seven of the first operand, the VR (V1).
More details about the Vector Instructions can be found in the “z/Architecture Principles of Operation” manual – SA22-7832-10.
Like what you just read? To receive technical tips and articles directly in your inbox twice per month, sign up for the EXTRA e-newsletter here.
comments powered by