What are the major changes to the z/OS V1R11 LSPR?
The LSPR ratios reflect the range of performance between System z servers as measured using a wide variety of application benchmarks. The latest release of LSPR contains a major change to the workloads represented in the tables. In the past, workloads have been categorized by their application type or software characteristics (for example, CICS, OLTP-T, LoIO-mix). With the introduction of CPU MF (SMF 113) data on z10, insight into the underlying hardware characteristics that influence performance is made possible. Thus, the LSPR defines three new workload categories, LOW, AVERAGE, and HIGH, based on a newly defined metric called “Relative Nest Intensity (RNI)” which reflects a workload’s use of a processor’s memory hierarchy. For details on RNI and the new workload categories, please reference the LSPR Workload Categories section of the LSPR.
What is the multi-image table in the LSPR?
Typically, IBM System z processors are configured with multiple images of z/OS. Thus, the LSPR continues to include a table of performance ratios based on average multi-image z/OS configurations for each processor model as determined from the profiling data. The multi-image table is used as the basis for setting MIPS and MSUs for IBM System z processors.
What multi-image configurations are used to produce the LSPR multi-image table?
A wide variety of multi-image configurations exist. The main variables in a configuration typically are: 1) number of images, 2) size of each image (number of logical engines), 3) relative weight of each image, 4) overall ratio of logical engines to physical engines, 5) the number of books and 6) the number of ICFs/IFLs. The configurations used for the LSPR multi-image table are based on the average values for these variables as observed across a processor family. It was found that the average number of images ranged from 5 at low-end models to 9 at the high end. Most systems were configured with 2 major images (those defined with >20% relative weight). On low- to midrange models, at least one of the major images tended to be configured with a number of logical engines close to the number of physical engines. On high-end boxes, the major images were generally configured with a number of logical engines well below the count of physical engines reflecting the more common use of these processors for consolidation. The overall ratio of logical to physical engines (often referred to as “the level of over-commitment” in a virtualized environment) averaged as high as 5:1 on the smallest models, hovered around 2:1 across the majority of models, and dropped to 1.3:1 on the largest models. The majority of models were configured with one book more than necessary to hold the enabled processing engines, and an average of 3 ICFs/IFLs were installed.
Can I use the LSPR multi-image table for capacity sizing?
For high-level sizing, the multi-image table may be used. However, the most accurate sizings require zPCR’s LPAR Configuration Capacity Planning function, which can be customized to exactly match a specific multi-image configuration rather than the average configurations reflected in the multi-image LSPR table. The zPCR tool is available to customers.
What model is used as the “base” or “reference” processor in the z/OS V1R11 LSPR table?
The 2094-701 processor is used as the base in the z/OS V1R11 table. Thus, the ITRR for the 2094-701 appears as 1.00.
What "capacity scaling factors" are commonly used with the z/OS V1R11 LSPR?
The LSPR provides capacity ratios among various processor families. It has become common practice to assign a capacity scaling value to processors as a high-level approximation of their capacities. The commonly used capacity scaling factor for a z/OS V1R11 single-image configuration is 593. Note this is reduced slightly from the 602 and 604 used in previous LSPRs. The reason for the reduction is primarily due to a need to “re-center” the PCIs (see below). Later releases of operating system and subsystem software and the improved representativeness of LSPR workloads have increased the scalability of workloads on older and newer processors. To minimize the resulting changes to the PCIs of older processors, the scaling factor needed to be slightly reduced. For the z/OS V1R11 multi-image table the commonly used scaling factor is .944x593=559.792. Note the .944 factor reflects the fact that the multi-image table has processors configured based on the average client LPAR configuration; on a 2094-701, the cost to run this complex configuration is approximately 5.6%. The commonly used capacity scaling values associated with the z196 may be approximated by multiplying the AVERAGE column of ITRRs in the LSPR z/OS V1R11 multi-image table by 559.792. The PCI (Processor Capacity Index) column in the z/OS V1R11 multi-image table shows the result of this calculation. Note that the PCI column was actually calculated using zPCR, thus the full precision of each ITRR is reflected in the values. Minor differences in the resulting PCI calculation may be observed when using the rounded values from the LSPR table.
How much variability in performance should I expect when moving a workload to a z196 processor?
As with the introduction of any new server, workloads with differing characteristics will see some variation in performance when moved to a z196. The performance ratings for a server are determined by the performance of the average workload that represents what we understand to be the major components of our customers' production environments. While the ratings provide good "middle-of-the-road" values, they do represent an average, and by definition some workloads fall higher than the average and some workloads fall below. The z196 has an improved micro-processor design (higher clock speed and out-of-order execution) and significantly improved memory hierarchy (on-chip shared cache and much larger book-level shared cache). Workloads that stress the memory hierarchy on z10 EC can expect to see above average performance on z196, while those that made light use of the memory hierarchy may see slightly below average performance. The range in performance variation on moves to z196 will be tighter than the range seen on moves to z10 EC.
Once my workload is up and running on a z196, how much variability in performance will I see?
Minute-to-minute, hour-to-hour and day-to-day performance variability generally grows with the size (capacity) of the server and the complexity of the LPAR configuration. With its improved micro-processor and memory hierarchy design, and the capability to be configured with up to 80 engines, the z196 has the capability to deliver over 1.6 times the capacity of the largest previous server. Continued enhancements to HiperDispatch have been made to help reduce the potential for increased performance variability. In the spirit of autonomic computing, PR/SM and the z/OS dispatcher cooperate to automatically place and dispatch logical partitions to help optimize the performance of the hardware, and minimize the interference of one partition to another. However, while the average performance of workloads is expected to remain reasonably consistent when viewed at small increments of time or by individual jobs or transactions, performance can potentially see some variation simply due to the expected larger and more complex LPAR configurations that can be supported by the z196.
What is HiperDispatch and how does it impact performance?
HiperDispatch is the z/OS exploitation of PR/SM’s Vertical CPU Management (VCM) capabilities and is exclusive to z10 and z196. Rather than dispatch tasks randomly across all logical processors in a partition, z/OS will tie tasks to small queues of logical processors, and dispatch work to a “high priority” subset of the logicals. PR/SM provides processor topology information and updates to z/OS, and ties the high priority logical processors to physical processors. HiperDispatch can lead to improved efficiencies in both the hardware and software in the following two manners: 1) work may be dispatched across fewer logical processors therefore reducing the “multi-processor (MP) effects” and lowering the interference among multiple partitions; 2) specific z/OS tasks may be dispatched to a small subset of logical processors which PR/SM will tie to the same physical processors thus improving the hardware cache re-use and locality of reference characteristics such as reducing the rate of cross-book communication. A white paper is available concerning HiperDispatch at: http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP101229