Processor architecture designs take years from initial conception to the first shipping CPU sliding out of the fab assembly. So it's clear that AMD's new CPU architecture has been in the works for some time. Whether that's a problem or not vis à vis the competition isn't yet known, since AMD isn't yet talking about performance details.
What is known is that Barcelona—as AMD has dubbed this first iteration—isn't so much a brand-new architecture as it is a highly refined, tweaked version of the existing AMD x86-64. Those tweaks are numerous and significant. It's probably fair to suggest that Barcelona is to the current Opterons as Intel's Core 2 is to the Pentium M—designed from the ground up, on a base of the old with a lot of new stuff rolled in.
The details of Barcelona discussed in this article were presented by Ben Sander, who led the performance modeling group for Barcelona. Sander's team cranked real-world application traces through iterations of the new processor—both simulated and real. Although he's uniquely positioned to discuss performance, Sander didn't really comment on performance yet. What he discussed instead were some of the enhancements built into Barcelona.
With these thoughts in mind, let's take a look at some of the shiny new features.
Speeding Up Floating Point
The first thing to realize is that Barcelona is the core of the next Opteron CPU, AMD's server and workstation product line. While it will also serve as the basis for AMD's next-generation desktop CPU, there will undoubtedly be some differences, though AMD isn't commenting on what those might be.
As such, the target markets for a quad-core Opteron are twofold:
High performance technical computing, including applications such as financial analysis, gas and oil exploration, and biological sciences.
Media encode and decode: HD-DVD authoring, video compression, and similar applications.
The key area of commonality between these two application spaces is high-performance floating-point processing. Software has been transitioning to SIMD-style floating point for the last decade, so AMD is substantially beefing up Barcelona's SSE unit, relative to previous Opterons. (SSE is actually an Intel term that stands for streaming SIMD enhancements.) Here's a list of the changes and enhancements.
In addition, SSE MOV instructions can be performed in the floating-point "store" pipe. Two SSE operations can be executed and one SSE move per cycle. There's also a capability now to support an unaligned load/execute mode, which can improve instruction packing and decoding efficiency.
These changes are fairly similar to what Intel has done with the Core 2 processor line, so it should be interesting to compare performance on SSE-heavy applications when the processor ships.
Something Old, Something New
Barcelona is not as radical a change to AMD's microarchitecture as Intel's Core was, relative to NetBurst. But the new quad-core CPU will offer substantial improvements to performance. Just how substantial the improvements will be isn't yet known.
AMD will be showing demos of Barcelona-based systems before the end of the year, with actual CPUs shipping by mid-2007. Until we can actually see a system in action, it's impossible to quantify performance. It will be faster than today's Opteron, to be sure. Will it be fast enough remain competitive? We'll have to see what both AMD and Intel will be offering next summer.
Desktop variants aren't scheduled to ship until the second half of 2007, giving Intel a solid 6 to 7 month lead to cement its reputation as the quad-core leader. AMD would argue that Kentsfield isn't a "true" quad-core, since the Intel processor is really two dual-core CPUs packaged together. But it's unclear whether consumers will really be concerned about the distinction.