Gracemont, the (not so) little Alder Lake core (µarch analysis)

Speed: „Little“ core’s performance is up to two thirds of a big one

Intel has revealed the Alder Lake CPU architecture, or actually two architectures this time. The CPUs are hybrid and besides the main „big“ ones, there are „little“ cores called Gracemont. These are not just for marketing or for low-power idle tasks like in mobile ARM SoCs, however. Gracemont should significantly add to the overall performance, the architecture is actually surprisingly beefy. Our analysis will show you more.

When presenting Gracemont, Intel is comparing its performance and efficiency with 2015 Skylake microarchitecture, which is one of the most (if not the most) common core in PCs thanks to the exceptionally long period during which Intel kept selling it. Intel even claims that Gracemont has higher IPC than this core and so it should have higher single-thread performance when the clocks are equal. We should point out that it is unlikely for Gracemont to achieve the same high frequencies (up to 5.3 GHz in the fastest Skylake iteration).

Intel’s comparison charts show Gracemont (Efficient Core) beating Skylake in performance in the SPECrate2017_int_base benchmark while simultaneously keeping power consumption lower—and Gracemont’s absolute performance curve is ending higher than Skylake’s. However, this might be just due to Intel limiting Skylake’s clock in this comparison to artificially lower values (I would guess it could be just 4.2 GHz which was top speed of the original Skylake, Core i7-6700K).

If both cores were clocked to achieve equal performance, Gracemont would require as little as 40 % or less of the power that Skylake would need. And looking at it the other way around, if Gracemont was allowed to consume as much power as Skylake, the new Efficient Core is said to be able to output up to 40 % more performance.

Performance and power consumption of Gracemont (E-Core) architecture compared toSkylake, in single-thread tasks (Source: Intel)

Intel then shows a multi-thread performance comparison, in which they pit four Gracemont cores against two Skylake cores with HT (and thus also four threads). In this scenario, Gracemont achieves up to 80% better performance in SPECrate2017_int_base than Skylake while again consuming less power. Intel even claims that the four Gracemont cores should be able to provide the performance of a 2C/4T Skylake at less than 20% of the power consumed. Sadly, Intel again doesn’t specify the clock frequencies at which these performance comparisons are drawn.

Performance and power consumption of Gracemont (E-Core) architecture compared to Skylake, in multi-thread tasks (Source: Intel)

It’s better to take these comparisons with a grain of salt, because when establishing power efficiency of a core (which is mostly what all these charts are about in the end), it is incredibly important to know at which frequency (and voltage, which is related to frequency) is the core running. We can be almost certain that if we for example clocked Skylake to 5.0 GHz, Gracemont would never be able to achieve 40% more single-thread performance, simply because its architecture likely has much lower frequency ceiling, meaning it will stop scaling higher much sooner. Due to the efficient architecture, Gracemont probably can’t even reach the same power consumption as Skylake does at the end of its frequency curve.

It is true that Intel has not yet revealed the depth (the number of stages) of Gracemont’s pipeline, which is one of the main factors limiting the frequency ceiling, but other factors (like the partially 3-cycle L1D latency) point to lower frequency ceiling compared to Skylake (or Golden Cove). When comparing power consumed by Skylake and by Gracemont, there is another important caveat to keep in mind, by the way: Skylake is 14nm, while Gracemont will have the benefit of Intel’s advanced 7nm manufacturing node (previously known as 10nm Enhanced SuperFin).

Gracemont vs. Golden Cove?

What’s perhaps more useful, is the comparison where Intel shows Gracemont performing against the big core of the same generation, Golden Cove (P-Core in the chart). While Intel might be downplaying its old product Skylake to make the new Gracemont look better, the company should have an incentive to not do the same with Golden Cove and we believe that this comparison should not therefore be at risk of being biased.

Intel’s charts suggest that the big performance core (Golden Cove) can achieve up to 50% higher single-thread performance than the best of what a Gracemont (E-Core) can reach. This of course would be at much higher power consumption, which is kind of the point of Efficiency Cores after all. It’s not clear what the power delta is, P-Core might even need multiples of the watts consumed by E-Core. E-Core does however too display an extremely knee-like curve at its top clocks. There is probably a relatively steep power cost to the last percents of its maximum performance. Reaching 100% of the achievable clock might need extreme additional power compared to the power consumption at 90% of the clock speed.

Performance and power consumption of Gracemont (E-Core) architecture compared to Golden Cove (Performance Core) (Source: Intel)

This showing would actually be pretty good for Gracemont. If Golden Cove achieves 50% better single-thread performance, this means Gracemont achieves two thirds of Golden Cove’s peak ST performance—which again could warrant putting it in the big core category, even if it means being comparable to big cores of a few generations ago. Such single-thread performance is still quite acceptable in the current day, particularly when we keep in mind that Efficiency core’s point is multi-thread rather than single-thread performance.

For multi-thread performance, Intel shows a comparison between a processor with four Golden Cove cores and a hybrid configuration that combines two performance Golden Cove cores and eight Gracemont efficient cores (2+8, such configuration is reportedly what Intel will sell in the 15W notebook CPU segment). Intel’s chart shows the hybrid alternative achieves more than 50% more multi-thread performance compared to the homogenous Golden Cove config. This is an interesting data point if true. It means that four Gracemont cores can add the same or even better multi-thread performance as two big Golden Cove cores. The first four efficient cores in other words fully make up for the loss of two performance cores and the second added quadcore group adds the same amount of performance as a bonus. If Intel’s projection is accurate, the 2+8 Alder Lake processors should be as fast as a hypothetical Golden Cove hexacore or better.

This comparison is however not made at the same power consumption. The 2+8 processor is shown to draw more power in return for its extra performance. It means that four Gracemont cores take more energy when fully loaded than one Golden Cove. It however has to be said that the power consumption and efficiency of both of the architectures will greatly depend on the clock and voltage used in the particular scenario.

The exact performance and IPC is something that should be left for the time when independent reviews are available after launch. Gracemont cores seem to reach clock speeds as high as 3.6 to 3.9 GHz acording to leaked specs of 125W desktop Alder Lake processors though, which promises nice performance if their IPC beats Skylake as Intel claims. At least in tasks that don’t profit from Hyper Threading (the second thread of big cores) tremendously, that is.

The Gracemont Core takes much less space in an Alder Lake processor than what is needed for one Golden Cove core, but we again don’t know the exact values yet. Intel’s schematic illustrations would make you think that four Gracemonts fit in the same area as one Golden Cove, but this is unlikely to be a fully accurate representation. Elsewhere in Intel’s presentation, it has been said that the cluster with four Gracemont cores takes similar area as a Skylake core so perhaps this 4:1 ratio could be close though (Skylake is however manufactured at much less dense 14nm node).

In practice, the “math” that removing the little E-cores would allow Intel to add merely one big P-core for every four little E-cores thus removed, might be pretty accurate. It looks like the big.LITTLE combination should indeed enable higher multi-thread performance, in line with what we were talking about in the first chapter. That’s the efficiency and scalability Intel is touting. But to be absolutely certain about this being true, we have to wait for Intel to reveal the actual core silicon area and power consumption data.

The little core is worth more than a little attention

As with Golden Cove, the first performance reviews of the Gracemont cores will be very interesting reads, perhaps even more so than with the Performance core. When Alder Lake launches some two months from now, it should be one of the most noteworthy and intricate CPU architecture noveltis of recent years, and that is regardless of how well will Intel manage to materialise the potential and performance promises. It will bring more new phenomena to study, benchmark and perhaps get used to than any other recent processor.

We have been focusing on Alder Lake all this time but perhaps it should be said that this type of efficiently big (rather than as big as possible) core that Gracemont represents should be useful for more than just padding the multi-thread benchmark scores of hybrid CPUs. This core should be a viable architecture even on its own, in homogenous all-Gracemont processors. One area where it should be good are cheap mobile processors, or CPUs for fanless devices. An SoC continuing the tradition of Intel’s low-power little-core processors (Apollo Lake, Gemini Lake, Jasper Lake…) would be a nice product. We however have no information as to whether Intel is working on one, at this time.

Sources: Intel, AnandTech (1, 2)

Jan Olšan, editor for Cnews.cz


  •  
  •  
  •  
Flattr this!

Leave a Reply

Your email address will not be published. Required fields are marked *