Introduction: Desktop Zen 5 fits in the same die area as Zen 4
It’s roughly two weeks until AMD releases processors with the new Zen 5 architecture. This week, we finally got proper details on these CPUs’ architecture, which AMD revealed at the Tech Day event. So, we can now break down the changes the company has made to the core, compared to Zen 4 – and they’re pretty extensive, probably more so than they seemed in June. And AMD also reiterated its promise of a 16% increase in IPC for these CPUs.
IPC increased by 16 %
AMD already gave an indicative figure for improvement in performance per unit of clock speed, or the so-called IPC, back at Computex in June. It’s supposed to be improved by +16%, which is a value averaged from a certain selection of applications. It should be noted that this characteristic varies from application to application, so don’t take “IPC” or this “+16%” as some fixed feature of the Zen 5 core.
The biggest improvements can be expected in tasks that can take advantage of the 2× higher compute thoughput of the 512-bit AVX-512 instructions, AMD gives as an example the AES-XTS subtest in Geekbench 5.4 (+35%) or unnamed machine learning tasks (+32%).
Where does the IPC come from?
By the way, AMD has shown this interesting chart that represents how much the various previously described core changes contribute to the resulting 16% improvement in IPC. The biggest impact seems to be the addition of ALUs and that SIMD expansion (which probably counts under this as well), which is the lighter grey in the graph.
Right behind is the new decoder design with two clusters and the related redesign of the uOP cache (darker ochre). While it has reduced capacity, its ability to deliver up to 12 instructions per cycle (and also from two places in the code simultaneously to meet the needs of two decoder clusters) clearly has a decent performance benefit. Note: It is possible, however, that this represents multithreaded applications, not just single-threaded ones.
A very big (practically the same?) role seems to be played by the doubling of the L1 cache data bandwidth (dark grey), which is a bit surprising. From how big the benefit is, it seems that this doesn’t help just code exploiting AVX-512 operations on the SIMD unit, which is the main benefit you’d think of. Only fourth in line (the bright ochre) is the impact of improved fetch and branch prediction, but it’s still a pretty big impact.
Really the biggest change in the core since the first Zen? It definitely is
As you could see, the extent of the changes in the core goes really deep. For example, the reorganization of the ALU and FPU schedulers shows that engineers have been at work on things that are not as visible as the long-coming increase in the number of ALUs by 50% or the 2× compute throughput of SIMD instructions (and cache bandwidth).
The frontend overhaul (of the decoders and the associated changes to the instruction cache and fetch) may end up being the most far-reaching thing in Zen 5, as development in this part will probably continue into the next generations shaping their design, and it’s probably the biggest conceptual change in AMD’s CPU core we’ve seen since Zen 1 (Intel fans may be proud that this first appeared in its Tremont core, but the idea may be older).
We certainly can’t consider Zen 5 to be some kind of “mere refresh” (you can actually find such comments this on the internet). On the technical level, not only it is a new core, but definitely a “major new” architecture too.
So why is the increase in IPC just as much as it is, you may ask. The question is rather whether the number given is actually small, which is probably what a lot of people think. As we wrote at the beginning, perhaps sometimes the priority was mainly to reach those “milestones” like wider data paths, more ALUs and a redesigned frontend rather than to squeeze out all the potential out of them right away. That rarely happens anyway, one need only point out how relatively low the IPC of the Zen 1 core started and how far Zen 4 got with the same four-ALU base.
Further performance leaps are also likely to continue in future generations in this case, and according to the architect Mike Clark, many improvements in future cores will be partly something that has already had its foundations laid down in Zen 5.
It’s possible IPC increases will keep slowing down from now on
Beyond that, though, it’s worth pointing out that increasing performance per 1 MHz on a CPU core is not something that can be done forever at the same rate as scaling up a GPU to more and more blocks and shaders – GPUs process highly parallel tasks, and thus can scale up somewhat easily. But scaling up performance per 1MHz of clock speed is something where the law of diminishing returns almost certainly applies, and if you’re already at the cutting edge, further progress becomes harder and harder. Probably the same reason why rival Intel also claims “only” a similar IPC improvement for their own big CPU core upgrade (the Lion Cove architecture promises +14%, however don’t compare this number to Zen 5, it’s measured in a different way on different applications, so Lion Cove may not actually have a smaller IPC increase than Zen 5).

It’s true that the absolute IPC of Apple’s cores is still quite a bit higher, but they still reach significantly lower clock speeds in return, and the two things are related. With the design aiming for a higher clock speed, Apple’s core development would have arrived at a lower IPC, and similarly AMD’s cores would have had a better IPC if they kept designing for lower clock speeds. Future developments are likely to be such that both the clock speeds and IPC will converge for both competitors.
The law of diminishing returns, or the fact that further increases in IPC are becoming more and more difficult, is also evident at Apple. Although the company is rightly considered the “king” of high processor IPC, if you look at its incremental growth recently, there is also a significant slowdown. Measurements vary from one another, but it looks like Apple’s core IPC has only increased by about a tenth since the M1 processors of 2020, if we compare it to the now-current M4 processor (much more performance was gained by increasing clock speeds). This is despite the fact that both the M3 processor (which were apparently launched more than a year late compared to the original plans) and the M4 processor have a new core architectures (which didn’t seem to be the case with the M4 at first). And so for that 10% improvement in IPC, Apple actually needed combined gains of these two new architectures together.
We do not know for sure yet whether this is a temporary slowdown and new techniques and approaches in future architectures might bring (for a time…) higher generational advances. For now it appears likely that we are underestimating the significance of Zen 5’s 16% gain (and the can be said in Intel’s Lion Cove defense). However, it’s true that we’re still waiting for independent tests to verify or revise this number a bit, but all in all, it wouldn’t be bad progress.
Sources: Chips and Cheese, AnandTech, HardwareLuxx
English translation and edit by Jozef Dudáš
⠀
⠀
- Contents
- Introduction: Desktop Zen 5 fits in the same die area as Zen 4
- Frontend: The biggest new feature of the core
- Execution units upgrade I: 6× ALU
- Execution units upgrade II: Full-speed AVX-512
- Higher performance and finally larger L1 cache; on SMT
- The greatest architectural achievement since the transition to Zen architecture (Conclusion)










