|System z on Facebook
Under the Covers of Flash Express - Implementation Highlights
Caroline Exum 270004MPQK firstname.lastname@example.org | | Tags:  sm io flashexpress economics appinfra flash systemz flash_express security zec12 storage bao data zenterprise | 5,364 Visits
By Peter Szwed, Ed Chencinski, and Ken Oakes, System z Firmware / Hardware Development
With the release of the "RSM Enablement Offering" on zEnterprise, IBM now offers internal flash memory for mainframe-class computing. This new function brings an enhanced client experience on z/OS applications by providing additional storage having higher performance than external disks. In this blog, we open up the covers to show how we have incorporated this exciting and rapidly maturing technology into our new server. We will reveal how, through careful management by system software and a design policy of redundancy, we have taken industry-standard flash memory technology and packaged it for high performance and enterprise-class reliability, availability, and serviceability.
Much of the Flash management is accomplished using new panels on the SE. Through the Flash Increment Allocation panel, flash memory increments can be assigned to partitions. The Flash Status Panel is used to display the current status of the flash adapters, arrays, and inter-card cables. The SE also stores and serves the authentication key used to unlock the flash devices (described below).
On the CEC, the flash software stack is split between z/OS running in a partition on general purpose processors, and the I/O firmware, running on System Assist Processors (SAPs). z/OS accesses the flash memory space, also called storage class memory (SCM), via system-allocated subchannels using I/O instructions (such as SSCH), and conforming to a new architecture called Extended Asynchronous Data Mover (EADMF). The I/O firmware processes the EADM subchannels by converting the SCM requests into device requests targeted to the flash adapters. Additionally, I/O firmware manages increment allocation, recovery and repair actions, firmware updates, and status tracking.
The flash memory resides on PCI express adapters plugged into I/O drawers. The adapters are installed in pairs that are connected to each other with dual redundant external mirroring cables. The main components on the adapter are a custom IBM-designed RAID controller chip and four flash SSDs (solid state devices). Each SSD is roughly 350GB and the data is protected using RAID10 (striped & mirrored) across all of the SSDs in a pair. A pair of adapters supplies approximately 1.4TB of addressable SCM memory. Up to 4 adapter pairs can be installed in a system.
The flash memory on an adapter pair is managed as a set of 16GB increments. An increment may be allocated for use by a single partition. A pair of flash adapters can serve increments for up to 60 partitions. The translation between adapter-sourced PCIe addresses and partition-based system addresses is done by the I/O hub chip operating in a special mode known as firmware managed partitioning (FMP) mode.
When processing a request from an EADM subchannel (1), the I/O firmware programs the adapter (2) to directly accesses partition memory by putting the partition number into the upper bits of the PCIe address. When an upbound (adapter-sourced) PCIe address (3) is processed by the I/O hub in FMP mode, the upper bits of the PCIe address are used as an index into the hub-resident zone relocation table, and the partition-absolute address is produced (4).
A key characteristic of System z hardware is its resiliency to failures through system redundancy. The Flash Express subsystem carries on this design principle. At all levels of the system, the hardware and software have been designed to recover from spontaneous failures without interrupting application operation. In addition, we leverage this design of redundancy to permit non-disruptive service for the subsystem:
Throughout the lifetime of data on the SSDs and during its transit from SSD to the application, protection measures are in place to detect and, in most cases, correct errors.
On the SSDs, each 512 byte block of data is stored with a CRC, and two sequence numbers related to the logical block address of the data on the disk and the SCM address of the data. When data is read out, these three check fields are compared against expected values, and if there is an error the block is either re-read or the data is retrieved from the mirrored SSD.
When data is transferred through the I/O subsystem the protocol standards (SAS, PCIe) provide protection and recovery mechanisms.
The I/O hub provides range checking on the PCIe addresses sent up from the adapter.
Lastly, all data in System z memory is protected using an error detection and correction scheme.
Data and Key Encryption
During installation of the flash feature, the smart card is directed by SE code to create an authentication key and to encrypt it (i.e., wrap it) using a private key internal to the smart card. The wrapped authentication key is stored on a disk in the SE, and is securely copied to the backup SE.
Before the I/O firmware formats the SSDs, it requests the authentication key from the SE by sending the public key from its public-private key pair. The smart card unwraps the key file, then wraps the authentication key with the public key. The wrapped key is sent to the I/O firmware and is unwrapped using the private key. The SSDs are formatted with this authentication key.
After it has been formatted, the authentication key must be provided to the SSDs after every power-off cycle. During IML, the I/O firmware uses the public-private key protocol to request the key from the SE, and the SE uses the smart card to unwrap the key file and re-wrap the key to serve it up.
The smart card is configured such that it will only unwrap a key file on the same SE that was used to create it. Thus, the data on a flash card can only be accessed with a key served by the smart card / SE (or the redundant smart card / SE) that was used during the format process.
The design and implementation of the flash express subsystem was a complex, multi-year project. It involved many different development groups from around the world, working within IBM and with it's key suppliers to create an offering that meets the System z standards for client value. But the hard work paid off and Flash Express was a spotlighted feature for the zEnterprise EC12 system announced late last year.
This blog post is part three of a four part series on FlashExpress from the Flash development team. Check out other posts on the Mainframe Insights blog:
Peter Szwed is a Senior Engineer in the Systems and Technology Group. He joined IBM in 1988 after graduating from the University of Illinois at Urbana-Champaign with a B.S. degree in Computer Engineering. Since then, he has held a wide variety of development roles including ASIC designer for IBM's supercomputer offerings, architecture author for System z and System p, and I/O firmware developer for System z PCI. He was a member of the system design team for Flash Express and is currently team leader for Parallel Sysplex firmware development.
Ed Chencinski is a Senior Technical Staff Member and chief engineer of an I/O subsystem development team. He received his B.S. degree in electrical engineering from Lehigh University. He joined IBM in 1980. He worked on the ES/3090 system controller element and expanded storage hardware design and in the logic support element design of the ES/9000. He later joined the G3/G4 CMOS cryptographic hardware processor design team, focusing on pervasive functions, simulation, timing, and modular exponentiation. Since then, he has performed high-level design of numerous elements within the System z I/O subsystem.
Ken Oakes is a Senior Technical Staff Member in the Systems and Technology Group. He joined IBM in 1977 after receiving his B.S. degree in electrical engineering from the University of New Haven. Mr. Oakes has held various technical and technical leadership positions in the zSeries I/O design area. He is currently involved in I/O subsystem design for the next-generation zSeries Server.