To deign comparability between the available IBM POWER7 and Oracle's yet-to-be-released UltraSPARC T3 (Niagara 3) is like juxtaposing a BMW X6 and a school bus, respectively. Certainly both vehicles transport people, both are made of metal and burn hydrocarbon fuel – but this is where the comparison ends. Interestingly, Oracle claims a school bus analogy for their Chip Multi Threading (CMT) architecture, saying it represents computing requirements in today's data center. Oracle says it is more efficient to transport, say, 40 students in a school bus at one time, although slowly, than to transport 8 groups of 5 students in an X6 running back and forth at lightening speed. Unfortunately – we don't have 40 students to transport, but perhaps less than 5. A school bus is an application-specific vehicle, as is Oracle's CMT application-specific processor architecture.
Oracle's CMT argument also claims that single, heavy weight thread performance (the BMW X6) is not as important as the ability to execute multiple, low-performance threads (the school bus). In contrast, IBM's POWER and Intel's x86 are designed for general purpose computing requirements: heavy weight thread processing (ability to execute the maximum number of instructions/clock) with fast clocks, large low-latency local caches, branch prediction, and out of order execution. Today, these general purpose processors also execute many HW threads simultaneously without having been designed to sacrifice thread execution quality for thread quantity. One of the few widespread application-specific execution environments demanding the efficient execution of scores of low-demanding threads is a web server under heavy load. Another is shuffling around streams of data. UltraSPARC T3-based systems are good web servers, but are architecturally challenged in heavy processing of that data. Real-life benchmarks speak for themselves – see my previous blog entry.
Oracle claims that by doubling the HW thread context count in the UltraSPARC T3 over its predecessor, the UltraSPARC T2 (Niagara2) overall performance will double. Any increase in performance could only occur if the execution environment was thread starved. Alternatively, since few applications spawn scores of threads, executing that application on a processor that has double the thread contexts of its predecessor will not provide any more performance. This is similar to designing a new school bus that now holds 80 students, but is still only transporting 5.
Oracle's UltraSPARC T3 and IBM's POWER7 are both two billion transistor processors and dissipate about the same amount of heat. As seen in the UltraSPARC T3 die photograph just below, the processor has sixteen (1.6GHz) cores, each holding 8 HW thread contexts, providing a total of 128 HW thread contexts per socket. However, each core only executes one thread at any given time, if a thread is actually available for that core. Literature on this topic tends to be blurred – giving the impression that at any given time all 128 thread contexts are executing simultaneously. In fact, each core is so simple that even branch prediction is non-existent, forcing a thread switch on any cache miss. Cores communicate with a shared 6MB L2 cache via crossbar switches. The processor has on-board memory, PCIe, Ethernet, and SMP coherency controllers. With all pumping at full blast, a theoretical maximum BW of 2.4Tb/sec is achieved but can only be sustained with a large number of available threads and full bore I/O running.
Oracle UltraSPARC T3
In contrast, the POWER7 (see die below) has eight cores, each with 4 fully simultaneously executing threads. The POWER7 can execute twice the number of threads simultaneously as can the UltraSPARC T3. In order to decrease memory latency and insure the cores are fed with instructions and data, the POWER7 has a huge, on-board 32MB L3 cache feeding eight dedicated 256KB, 8-cycle latency L2 caches, pumping data into 2-cycle latency 32KB L1 data caches. Combined dual memory and SMP coherence controllers aggregate 2.9 Tb/sec of BW. The POWER7 has as many floating point units as threads.
IBM's POWER7 is the latest product in a successful road map of general purpose processors designed with the horsepower to pound through heavy-weight, compute-intensive tasks at nearly 4GHz.
It is worth noting that Oracle's UltraSPARC T3 is curiously missing from the this month's Hot Chips Conference agenda (see: http://www.hotchips.org/printableprogram.php), even though its general availability is set for later this year. At least two IBM's POWER-related sessions are scheduled at Hot Chips.
On August 17, 2010, IBM continues its roll-out of new POWER7-based systems,
software, and solutions. Register for webcasts: http://www-03.ibm.com/systems/power/advantages/