Your business needs fewer large servers rather than lots of small, single and dual-socket systems. But how do you mitigate the risk of a large, mission-critical server failing? What you need is a server that is built from the ground up to survive in the event of component failure.
That’s why IBM is introducing the IBM System x3850 X6 (see photo below) server with enterprise-grade reliability and availability.
The new IBM System x3850 X6 four-socket server uses Intel Xeon E7-4800 v2
and E7-8800 v2 processors with enterprise-grade RAS features
This new server is designed not only to continue operating in case of a component failure but also to help you reduce planned and even unplanned downtime. This is a big deal! If your servers remain operational even after a failure, your applications stay up, your support costs are lower, and your end users remain happy!
Here’s what you get with the reliability, availability and serviceability features of the new IBM X6 servers:
- Predict failures before they happen
Predictive Failure Analysis (PFA) allows the server to monitor the status of critical subsystems and to notify the system administrator when components appear to be degrading. Thanks to this information, in most cases, replacement of failing parts can be performed as part of planned maintenance activity. This reduces the need for unscheduled outages and so your system continues to run (did I mention lower support costs and happy end users?)
- Find failed components fast
Light path diagnostics, a feature of the X6 servers, allows systems engineers and administrators to easily and quickly diagnose hardware problems. The LCD display (see photo below) gives you more information than a few LEDs about the problem at hand, so failures can now be evaluated in seconds and costly downtime can be reduced or avoided altogether. Reducing service time this significantly is a huge advancement!
The LCD display will show information current state
of the server (pre-production model shown)
- Survive a processor failure
The server is designed to recover from a failed processor and restart automatically. Even if the primary processor (the one used for booting the operating system) fails, the X6 system is designed so it can boot from another processor using redundant links to key resources.
- Survive memory failures
The combination of IBM Chipkill and Redundant Bit Steering (RBS, also known as Double Device Data Correction or DDDC) allows the server to tolerate two sequential DRAM memory chip failures without affecting overall system performance.
- Survive an adapter failure… and replace it while the server is running!
The new servers have up to six adapter slots that support hot-swapping. This means the I/O Books (see photo below) can be removed and any failed adapters can be replaced without any server downtime.
The I/O Book can be removed and adapters added
or replaced why the server is running
- Swap components easily thanks to lidless design
There is no need to pull this server in or out of the rack because all components can be accessed either from the front or from the rear. This design allows for faster maintenance by simplifying service procedures. If you’ve ever replaced servers in a BladeCenter chassis or Flex System chassis, you’ll know how quick and easy it is.
The Compute Book (like the Storage Book and I/O Books) pulls out for easy servicing
With a design that helps prevent component failures from bringing down the entire machine, you can feel confident that an X6 server is an ideal platform for any mission-critical application. It makes good business sense.
David Watts is an IBM Redbooks Project Leader. He writes books and papers on many areas related to IBM Flex System, IBM BladeCenter and IBM System x. Follow David on Twitter at @DavidAtRedbooks.