Of CPUs and GPUs
Graphics Processing Unit (GPU) is a potentially confusing term. Not so long ago there was very little mention of GPUs. The PC’s major processing load was catered for by the x86 CPU, except for the graphical display, which was handled by a graphics board on which a GPU resided. Unless you were deeply involved in graphical applications you probably neither knew nor cared what a GPU was or how it worked. Well, times have changed and now you may well care.
GPUs are purpose designed for graphics processing, which means parallel processing across multiple cores. Conveniently, it turns out that there are other workloads that can profit substantially from being run on a GPU or being divided between a CPU and a GPU. Typically they are applications that are mathematical in some way. This has led to the emergence of a new kind of x86 server which combines an x86 CPU with a GPU - typically, an Intel processor with an NVIDIA GPU.
I spoke with Scott Denham, IT Architect at IBM Deep Computing, who has been working in the area of high performance computing (HPC) for years. It was an education.
Bloor: Why are companies suddenly buying servers with GPUs on them?
Denham: Processors are just not getting any faster. The simple fact is that once you raise a chip’s clock speed to 3 gigahertz and above, you run into diminishing returns as regards computer power. So the model of using an accelerator coupled with a general purpose processor to deliver extra power arose. You have a standard Intel or Power processor running the standard logic of the program and you have a very fast specialized processor for number crunching. This is by no means a new idea. It goes back to the 1970s but now we’re using GPUs as the accelerator for floating point operations.
Bloor: So where does the GPU server make most sense and how do you exploit it?
Denham: It’s very application dependent. It’s not a trivial thing to move an application from a conventional processing model to this “GPU model.” You have to write the code. There’s no automatic language translators that can move a program from one model to the other. So you have to organize the program to move data off the standard processor to do something with it on the GPU and then move it back. You have to have the GPU do a lot with the data for this to be worthwhile.
Bloor: What kind of applications are we talking about here?
Denham: One of the biggest and most enthusiastic adopters of these servers is the upstream petroleum industry. It’s an interesting field because they have an almost limitless appetite for CPU capacity. They’re always running a few levels back from what’s theoretically possible. With the right amount of power they can “increase the resolution” of their problem - improve the accuracy of their algorithms - and it’s all gated by the compute capacity that they can get their hands on. The kinds of problems that they are routinely solving today, they knew how to do 10 years ago, but it wasn’t practical to solve them.
For example, on a seismic problem, working with a group at the University of Houston exploring an algorithm, when we first investigated it, we concluded that it could be done, but with the best computer we could put our hands on it would take 25 years to run. Now that’s much reduced.
Bloor: And which apps are the low-hanging fruit?
Denham: The best suited applications are those that are characterized by a very small number of hot spots - so most of the processing effort goes into a small number of loops. In applications like weather modeling, the code profile is flat so it takes much more effort to transfer the code to work on a GPU. Where you have a few hundred lines of code that take up 90 percent of the compute time, that’s a great candidate for CPU computing.
Bloor: What level of acceleration can you get with a GPU server?
Denham: In general, people are seeing application speed-up in the 5x to 20x range. So you can take an application that was not economically feasible before, and now you can do it, or you can accelerate an application considerably. In the main, the oil companies are looking to get a competitive advantage by getting better answers sooner.
Bloor: Do engineering modeling applications, such as modeling the airflow over a new design for the chassis of a car qualify as applications that can profit from a GPU.
Denham: I think they certainly can. It’s always been the upstream petroleum industry that has been the leading adopter of high performance computing, because the economics are really compelling. They create the market-place that gets the product out there. So a lot of other problems can now start to fall into this space - but not necessarily at the scale of hundreds or thousands of GPUs.
Bloor: What size do you think the market for GPU servers is?
Denham: It’s too early to put a reliable number on it. We have some early adopters who put GPUs into production about 3 years ago. So this is an early stage market. Recently, using figures from several sources, we estimated the market opportunity at about $500million over 3 years. The market is growing quickly. In some cases, we’re not talking about customers wanting just a few GPU servers, but wanting hundreds.
Bloor: How did IBM get into this market?
Denham: We’re customer driven. We had enough customers asking for servers with GPUs to get our attention. So it became a matter of looking at our portfolio and finding the places where it made most sense to integrate these and where we could provide scalability. It’s a simple enough thing to just add a GPU to a workstation, but if you want real scale, servers need to be manageable and you need to address the cooling issue.
Bloor: So where is IBM’s edge in this market?
Denham: I think our iDataplex and our blade platform are differentiators. They are both mature platforms. Adding GPUs to these, rather than starting with a new packaging model, allows us to attack the market.
Bloor: So how do you fit GPUs in the iDataPlex?
Denham: iDataplex is designed primarily for large scale computing. So you typically buy compute capability by the rack rather than by the server. It is optimized at a rack level rather than a server level.
Starting with the rack, IBM did something different - it oriented it differently. So think of a standard enterprise rack that’s narrow and deep, we took that rack and rotated it 90 degrees. We take the same floorspace footprint, of approximately two floor tiles, and instead of having one vertical column of servers we have two vertical columns of servers - and they’re shallower.
So that gives us a couple of things. It gets us density. In the same footprint we can put twice as many servers. And it turns out that there’s an advantage to making these servers shallow. A lot of the power consumption goes into moving air through the server boards and you use much less with this arrangement.
Bloor: Ultimately, how much of an advantage can you get?
Denham: Most of it is commodity components. If, say 40% of the power goes to the processor, 25% goes to memory, 10% goes to the chip set; these are things you can’t do anything about and neither can your competitors. So the area of power consumption for cooling becomes much more important because that’s the area where you can compete.
That’s what the iDataPlex guys went after. By making the racks shallower they reduced the air impedance so that air flows through easily. The boards were designed to provide as little impedance to air flow as possible. And they only had to deal with 20 inch depth as opposed to the 30 to 36 inch depth you see on a typical server. And that means you can use a less powerful fan and that translates to fewer watts.
So this is a radically different approach from the packaging perspective, but it only uses commodity components so it doesn’t drive the costs up.
Bloor: So the design is an optimization exercise?
Denham: Yes. It’s a balanced approach to try to find the sweet spot. iDataPlex is probably not the densest solution on the market, but if you drive the density up beyond where we’ve taken it, then you start paying on the power front, because you really have to force air through it at a very high rate.
Bloor: I’m presuming that iDataPlex was not built specifically for GPUs. Is that right?
Denham: Right. The basic building block is a two server chassis. It’s two-U high and half depth and those dimensions were chosen for power optimization. One large power supply is better than multiple small power supplies. That was a power economy that we leveraged when we went to blades. We’d run a whole rack of blades off a set of four power supplies, whereas in the typical rack-mounted situation you’d have one or two power supplies in each server.
The power supply needs to be running at a high rate of efficiency so that you’re not throwing power away in the conversion from AC to DC. With the new chassis we could use bigger cooling fans than you see in a one-U unit. We use ones like those you see in a workstation. They are more efficient in terms of the air they can move per watt.
So the iDataPlex building block is a two-U chassis with a pair of planar board plugged into it, which you can pull out just like blades. They are independently serviceable and they share power. So when we went to the GPU model, instead of putting on two servers in the two-U chassis we put in one server and a sidecar with a pair of GPUs in it.
Bloor: So you can mix and match?
Denham: We put 42 of these chassis in a rack which, in effect, means 84 GPUs in a single rack. If you’re just starting out you can put maybe just 8 GPU and have the rest as standard servers. Most customers start with a small number just to test what’s possible. But you’ve got common technology and a common management support structure, and your support staff don’t have to learn about two kinds of machine. We’ve reduced the risk.