What are the major changes in the z/OS V1R9 LSPR?
The LSPR ratios reflect the range of performance between zSeries servers as measured using a wide variety of application benchmarks. The latest release of LSPR contains several updates. First, the z10 BC models have been added to the tables. Second, the z/OS V1R9 single-image table shows ITRRs for greater than 32way single-image z/OS. Third, a Processor Capacity Index (PCI) value has been added to the multi-image table (see below for further discussion on PCI).
Why are there two tables in LSPR?
The LSPR was enhanced to include performance ratios reflecting both "single-image" z/OS and "multi-image" z/OS environments when z9 was introduced. Typically, zSeries processors are configured with multiple images of z/OS. Thus, the LSPR continues to include a table of performance ratios based on average multi-image z/OS configurations for each processor model as determined from the profiling data. Since the multi-image z/OS table is much more representative of the vast majority of customer configurations, it is used as the basis for setting MIPS and MSUs for the z10 EC.
What multi-image configurations are used to produce the LSPR multi-image table?
A wide variety of multi-image configurations exist. The main variables in a configuration typically are: 1) number of images, 2) size of each image (number of logical engines), 3) relative weight of each image, 4) overall ratio of logical engines to physical engines, 5) the number of books and 6) the number of ICFs/IFLs. The configurations used for the LSPR multi-image table are based on the average values for these variables as observed across a processor family. It was found that the average number of images ranged from 5 at low-end models to 8 at the high end. Most systems were configured with 2 major images (those defined with >10% relative weight). On low- to midrange models, at least one of the major images tended to be configured with a number of logical engines close to the number of physical engines. On high-end boxes, the major images were generally configured with a number of logical engines well below the count of physical engines reflecting the more common use of these processors for consolidation. The overall ratio of logical to physical engines (often referred to as "the level of over-commitment" in a virtualized environment) averaged as high as 5:1 on the smallest models, hovered around 2:1 across the majority of models, and dropped to 1.3:1 on the largest models. The majority of models were configured with one book more than necessary to hold the enabled processing engines, and an average of 2 ICFs/IFLs were installed.
Which LSPR table should I use for capacity sizing?
For high-level sizing, most users will find the multi-image table to reflect configurations closest to their own. This is simply due to the fact that most systems are run with multiple z/OS images. However, the most accurate sizings require zPCR's LPAR Configuration Capacity Planning function, which can be customized to exactly match a specific multi-image configuration rather than the average configurations reflected in the multi-image LSPR table.
If I compare the two tables, why are the capacity ratios for some models higher in the single-image table while other models have higher ratios in the multi-image table?
Just as capacity ratios are sensitive to workload characteristics (note the varying capacity ratios within a table associated with different workloads), capacity ratios will also be sensitive to the configuration of z/OS images on a processor. If one compares a processor configured only with a single, large z/OS image to the same processor configured with multiple z/OS images, there are both pluses and minuses that come into play. There is a cost incurred to manage multiple z/OS images and their associated logical processors. There is also a cost incurred as the size of a z/OS image increases. Thus, if one compares a configuration of a single large z/OS image to a configuration of multiple but smaller z/OS images, the net result can vary as the magnitude of the pluses and minuses will vary. The sensitivity of the multi-image configurations to the number of images, size of each image, relative weights and overall logical: physical ratio will cause a fair amount of variability in the capacity ratios of these configurations. The multi-image table provides a representative view of these ratios based on average configurations. However, "your mileage will vary" as configurations deviate from average. zPCR's LPAR Configuration Capacity Planning function can provide capacity ratios customized to specific configurations.
What model is used as the "base" or "reference" processor in the z/OS V1R9 LSPR tables?
The 2094-701 processor is used as the base in both the single-image and multi-image z/OS V1R9 tables. Thus, the ITRR for the 2094-701 appears as 1.00 in both tables. Note that this is a change from the z/OS V1R6 and z/OS V1R8 LSPR tables in which the multi-image and single-image table shared a common base from the single-image table.
What "capacity scaling factors" are commonly used for the z/OS V1R9 tables?
The LSPR provides capacity ratios among various processor families. It has become common practice to assign a capacity scaling value to processors as a high-level, gross approximation of their capacities. The commonly used capacity scaling factor for the z/OS V1R9 single-image table is 604. For the z/OS V1R9 multi-image table the commonly used scaling factor is .944x604=570.176. Note the .944 factor reflects the fact that the multi-image table has processors configured based on the average client LPAR configuration; on a uniprocessor, the cost to run this complex configuration is approximately 5.6%. The commonly used capacity scaling values associated with the z10 BC may be approximated by multiplying the â€œMixedâ€ť ITRRs in the LSPR z/OS V1R9 multi-image table by 570.176. The new PCI (Processor Capacity Index) column in the z/OS V1R9 mutli-image table shows the result of this calculation. Note that the PCI column was actually calculated using zPCR, thus the full precision of each ITRR is reflected in the values. Minor differences in the resulting PCI calculation may be observed when using the rounded values from the LSPR table.
How much variability in performance should I expect when moving a workload to a z10 EC?
As with the introduction of any new server, workloads with differing characteristics will see some variation in performance when moved to the z10 EC. The performance ratings for a server are determined by averaging the performance of a variety of workloads that represent what we understand to be the major components of our customers' production environments. While the ratings provide good "middle-of-the-road" values, they do represent an average, and by definition some workloads fall higher than the average and some workloads fall below. The z10 EC has been specifically designed to focus on new and emerging workloads where the speed of the processor is a dominant factor in performance. The result is a quantum jump in clock speed - the z10 EC runs at 4.4 Ghz compared to the z9 EC which ran at 1.7 Ghz. The storage hierarchy design of the z10 EC is also improved over z9 EC, however, the improvement is somewhat limited by the laws of physics so the latencies have increased relative to the clock speed. Thus, workloads that are CPU-intensive will tend to run above average while workloads that are storage-intensive will tend to run below average, and the spread around the average will likely be larger than seen in recent processors. Additionally, newer applications, such as those with compiler optimizations for the z10 EC may see even higher benefits, particularly those that may be enhanced over time to exploit some of the new instructions provided with the z10 EC. The LSPR measurements can provide an indication of the potential variability when moving z/OS workloads to a z10 EC. For example, using the single-image z/OS measurements on a 2097-716 versus a 2094-716, we saw performance ratios of: a) 1.51x for the average workload mix, b) 1.62x for the highest workload ODE-B (CPU-intensive), and c) 1.42x for the lowest workload OLTP-W (storage-intensive). The variation of individual jobs or transactions can be even larger, for example, the average job in our CB-L workload improved 1.58x but the range in individual job improvement was from 1.2x to 2.1x.
Once my workload is up and running on a z10 EC, how much variability in performance will I see?
Minute-to-minute, hour-to-hour and day-to-day performance variability generally grows with the size (capacity) of the server and the complexity of the LPAR configuration. With its improved processor speed and the capability to be configured with up to 64 engines, the z10 EC has the capability to deliver nearly 1.7 times the capacity of the largest previous server., Significant enhancements to the z/OS dispatcher and the PR/SM management algorithms (see HiperDispatch discussion below) have been made to help reduce the potential for increased performance variability. In the spirit of autonomic computing, PR/SM and the z/OS dispatcher cooperate to automatically place and dispatch logical partitions to help optimize the performance of the hardware, and minimize the interference of one partition to another. However, while the average performance of workloads is expected to remain reasonably consistent when viewed at small increments of time or by individual jobs or transactions, performance could potentially see more variation than in the past simply due to the expected larger and more complex LPAR configurations that can be supported by the z10 EC.
What is HiperDispatch and how does it impact performance?
HiperDispatch is the z/OS exploitation of PR/SM's new Vertical CPU Management (VCM) capabilities and is exclusive to the z10 EC. Rather than dispatch tasks randomly across all logical processors in a partition, z/OS will tie tasks to small queues of logical processors, and dispatch work to a "high priority" subset of the logicals. PR/SM provides processor topology information and updates to z/OS, and ties the high priority logical processors to physical processors. HiperDispatch can lead to improved efficiencies in both the hardware and software in the following two manners: 1) work may be dispatched across fewer logical processors therefore reducing the "multi-processor (MP) effects" and lowering the interference among multiple partitions; 2) specific z/OS tasks may be dispatched to a small subset of logical processors which PR/SM will tie to the same physical processors thus improving the hardware cache re-use and locality of reference characteristics such as reducing the rate of cross-book communication.
What kind of performance improvement can I expect to see from HiperDispatch?
The magnitude of the potential improvement from HiperDispatch is related to: a) the number of physical processors, b) the size of the z/OS images in the configuration, c) the logical:physical overcommit ratio and, d) the memory reference pattern or storage hierarchy characteristics of the workload Generally, a configuration where the largest z/OS image fits within a book will see minimal improvement. Workloads that are fairly CPU-intensive (like batch applications) will see only small improvements even for configurations with larger z/OS images since they typically have long-running tasks that tend to stick on a logical engine anyway. Workloads that tend to have common tasks and high dispatch rates as often seen in transactional applications may see larger improvements, again depending on the size of the z/OS images involved. LPAR configurations that are over committed, i.e. have higher logical to physical ratios, may see some improvement although the benefit of dispatching to a reduced number of logicals overlaps with benefits already available with IRD and various automation techniques that tend to reduce the number of online logical processors to match capacity needs. The range in benefit is expected to be from 0% to 10% following the sensitivities described above; specifically, configurations with z/OS images small enough to fit in a book or running batch-like workloads will tend to fall at the low-end of the range, multi-book configurations with z/OS images in the 16way to 32way range and running transactional workloads will tend to fall toward the middle of the range, and very large multi-book configurations with very large z/OS images and running workloads with intense memory reference patterns will tend to fall toward the high end of the range.