Intel unveils Meteor Lake processors: 4nm, tiles, Xe LPG graphics

First Core Ultra processors bring the biggest changes to Intel processors in many years

Meteor Lake is Intel’s first processor manufactured on in-house 4nm node, an important milestone. It is also, paradoxically, Intel’s first processor manufactured at TSMC, as many of its parts are outsourced in this way – a milestone too. This is the first mainstream Intel processor to use chiplets (or tiles) and advanced 3D packaging. It’s almost and extra beyond that, that there are new CPU cores, new GPU, and a new NPU for AI acceleration.

Meteor Lake is not the first such processor from Intel, as the chiplet design was experimentally tested three years ago with the relatively unsuccessful mobile Lakefield SoC (which was also Intel’s first big.LITTLE processor). In mainstream processors, however, this change is coming only now.

Combining different technologies (and even from different suppliers) in such a “disaggregated processor” allows to use the optimal manufacturing process for each part of the resulting SoC – i.e. a cheaper and more mature older process for connectivity and chipset blocks (harder to scale to new processes) and an expensive and more complicated newer process for CPU and GPU cores. It is also possible to vary individual components, for example adding differently performing GPU and CPU tiles to the same base.

Individual tiles/chiplets contain their own power management controller (PMC) to help them save power when idle. Power draw and power saving modes are controlled independently for each tile.

Meteor Lake processors in BGA version for laptops

The chiplet design of the processor has disadvantages in terms of energy efficiency, where communication between chiplets increases power draw. Intel combats this by using advanced 3D Foveros packaging, where the tiles are interconnected by the silicon of the underlying tile, instead of using the usual traces running through the substrate underneath the silicon (this is the difference between this and AMD’s Ryzen chiplet processors).

Foveros Die Inerconnect technology is emplyed. The interconnect density on the Base Tile is very high (36 µm bump pitch) and at a transfer frequency of 2 GHz the interface should draw only 0.15–0.3 pJ per transmitted bit.

4+1 tiles

The Meteor Lake processor is composed of four main tiles that are mounted on the fifth Base Tile, which essentially acts as a silicon interposer.

The Graphics Tile contains the compute units of the integrated GPU and is manufactured by TSMC’s 5nm node (for which the GPU architecture adopted from discrete GPUs seems to be optimized). The Compute Tile contains two to six large cores (P-Core) and eight efficient E-Cores. It is manufactured on a 4nm node (Intel 4, in earlier days it used to be labelled as 7nm process, but that was back when Intel was ahead in naming and its 7nm was roughly comparable to TSMC’s 5nm generation). It’s worth noting that the Intel 4 process is Intel’s first technology using EUV.

These components are then connected by the SoC Tile, which contains the functionality of the chipset, memory controller and more, the NoC (Network-on-Chip) interconnect logic, which links components in other chiplets in particular. The fourth part, the IO Tile, does not have the same role as AMD’s IO chiplet (this role is played by the SoC Tile). It is actually just a sort of breakout extension of the PHY part (PCI Express, Thunderbolt and USB4) outside the SoC Tile.

The reason for this complication is that once assembled, the SoC Tile is in the central part of the entire chiplet assembly and thus the space around the edges of this silicon which can be utilised to bring out various interfaces, is limited. Adding the SoC Tile (which is placed in the corner next to the Compute Tile in the finished processor) greatly increases the space around the edges usable to bring out external connectivity. Both the IO Tile and the SoC Tile should use TSMC’s 6nm node (N6).

CPU: a die-shrink, or an improved architecture?

Meteor Lake got new or at least updated processor cores, but not much was disclosed about them. The P-Core architecture is referred to as Redwood Cove and it should not be a radically new architecture, but rather an evolution of the Raptor Cove core of 7nm Raptor Lake processors, which itself was almost unchanged (except for a larger L2 cache) compared to Golden Cove of Alder Lake.

Tip: Intel Alder Lake/Golden Cove CPU core unveiled (µarch analysis)

According to Intel, the core has higher efficiency, but there are no explicit mentions of IPC (performance per MHz) increases. In the schematic that Intel showed, we can’t identify any major changes, it seems that the amount of ALUs (five), AGUs (five), decoders (six) or ports hasn’t changed.

The core has a 2MB L2 cache like Raptor Cove and a 48KB L1 data cache. The capacity of the L1 instruction cache is indicated, which has been increased from 32KB to 64KB. This probably won’t have as much effect as if the L1 data cache was enlarged, but it may still increase the IPC.

It is still possible that the core will have additional changes, for example the Reorder Buffer (ROB), the main queue within which Out-of-Order code execution optimization takes place, or other queues could be made larger (deeper). Improved prefetchers and branch predictors are also possible but also unconfirmed. Such changes would increase IPC, but admittedly, if the core had such changes, we would expect Intel to advertise them. Thus, it is also possible that these aspects of the architecture were left touched and the IPC will not increase.

A possible hypothesis is that Redwood Cove is simply a 4nm version of the Raptor Cove core. It may have been a year since Raptor Lake’s release, but it’s important to remember that Meteor Lake and its core were originally supposed to come out much sooner. Intel, for example, announced the completion of the Compute Tile design two and a half years ago, but the release probably had to wait until the 4nm process and associated packaging technology (or possibly even some of the other chiplets) matured. So maybe Redwood Cove represents the state of Intel CPU architecture development from a while back, representing something similarly old or even slightly older than the Raptor Cove / Raptor Lake CPU core (which could actually be a derivative port of the Redwood Cove architecture to the 7nm process, and thus ironically a younger design).

Intel says it has improved the Performance Monitoring Unit in the processor and the related Intel Thread Director technology, which will assist the operating system with correct assigning of programs to large or small cores.

E-Core also features a new or at least evolutionarily improved architecture called Crestmont. It should still be based on Gracemont. In this case, Intel does in fact promise increased IPC (improved performance per 1 MHz) in the slides. The functioning of Intel Thread Director technology should be improved as well And Intel also explicitly mentions integrating an improved branch predictor into  Crestmont.

Tip: Gracemont, the (not so) little Alder Lake core (µarch analysis)

The core should have improved support for instruction extensions and VNNI instructions will have up to twice the performance. That’s because they can be processed by twice as many ports/units in the core – this should speed up AI applications using VNNI.

Low Power Island: even lower-power low-power cores

Speaking of small cores, one of the biggest highlights of the Meteor Lake processor is that it comes with a Low Power Island, a part that aims to reduce power draw as much as possible during idle and low loads to extend the battery life of laptops. It’s implemented by concentrating the components that the system needs to be active all the time – the video output block, the memory controller, the multimedia block (which is needed not during idle but during video playback, during which you also need to save power) within the SoC Tile.

In power saving modes, the processor puts other parts to sleep or shuts them down. Moving the GPU compute blocks to the Graphics Tile allows them to be shut down, and moving the CPU cores to the Compute Tile also allows them to be shut down. However, the problem would be that even when running idle, the CPU cores and Compute Tile would still need to be periodically woken up. Meteor Lake therefore adds two so-called Low Power Island E-Cores or LP E-Cores to this Low Power Island. They are physically located directly in the SoC Tile, and since they remain active and take over the running of the operating system and programs, the CPU Tile can be completely put to sleep even for long periods of time. This would be quite interesting feature for AMD’s chiplet processors as well – we’ll see if the company applies something similar someday.

Intel Thread Director will try to prioritise running operating system processes and programs on these cores and activate the Compute Tile and send processes to the remote little and big cores only when high performance is needed.

These LP E-Cores should run at lower voltages and have more fine voltage and clock speed scaling thanks to using an integrated digital linear voltage regulator (DLVR). Running code on these cores should draw less power than running on E-Cores in the Compute Tile, even though they are manufactured on a less advanced 6nm node (TSMC N6).

GPU Xe LPG: Integrated version of Arc graphics even with ray tracing

While the changes to the CPU cores may only be minor, the the integrated GPU has seen a rather significant architectural upgrade . Current Intel CPUs still use GPUs based on the “Gen12” Xe LP architecture that has seen their debut in 2020 in Tiger Lake processors. Meteor Lake is transitioning to a new architecture that is derived from Intel’s latest standalone GPUs, the Arc “Alchemist” generation. For the purpose of Meteor Lake processors, it has been designated Xe LPG (Low Power Gaming).

For Xe LPG Intel no longer lists the number of EU units, but talks about Xe Cores, of which the GPU has eight, giving it 128 Vector Engines (these do correspond to the previously quoted EU blocks) and 1024 shaders. It is therefore a GPU basically at the level of the discrete Arc A380 graphics card (which uses the ACM-G11 die), although it will probably operate at a lower power draw and therefore the performance will be lower. Intel doesn’t list L2 cache capacity, but the GPU is supposed to contain 8 Pixel Backends, 8 Samplers and two geometry pipelines (all this means a significantly increased resources compared to the GPU in Alder Lake and Raptor Lake processors).

Xe LPG supports hardware acceleration of raytracing effects, the GPU in Meteor Lake contains eight RTUs (Ray Tracing Unit), same as the Arc A380 card. This is the first time that ray tracing acceleration is being supported by Intel integrated GPUs (competing RDNA 2 based GPU in Ryzen 6000 “Rembrandt” and the GPU of Ryzen 7040 “Phoenix”, which already has an RDNA 3, have already supported ray tracing for some time, however).

On the other hand, it appears that the Xe LPG GPU may not include the special XMX (Xe Matrix Accelerator) AI acceleration units that are present in the discrete Arc Alchemist graphics cards. Unless Intel just accidentally forgot to mention them in the presentation, this may be one of the differences separatingXe LPG from the Xe HPG architecture in discrete GPUs.

There’s not much known about the performance yet, but Intel says Xe LPG is supposed to achieve higher clock speeds than the previous Xe LP architecture in Alder Lake and Raptor Lake. So performance will go up both on a per-unit basis and due to more units being present.

Neural Processing Unit

It’s possible that the XMX units have been dropped due to the Meteor Lake processors having a separate dedicated AI accelerator as a substitute. Intel calls it the NPU (Neural Processing Unit). This is a bit surprising, as back in the early summer they were talking about the VPU (Versatile Processing Unit) and it’s not clear if they are the same thing, or if the NPU is a unit for high AI performance and the VPU is some sort of auxiliary additional unit used for low-power AI acceleration while running on battery and for the needs of long-running background tasks.

The NPU will be a neural network accelerator consisting of two Neural Compute Engines that can be used for different applications simultaneously, or coupled to accelerate the same task. These Engines consist of a DSP block and a set of MAC units used for accelerating neural networks’s matrix calculations. The NPU is designed for inference (which means application of previously trained AI models), not artificial intelligence models training.

NPU will be used through software frameworks and interfaces such as OpenVINO, pyTorch, Caffe, TensorFlow or WinML and DirectML. Intel is expected to work with Microsoft and Adobe to add support for using acceleration on these processors in their software applications.

Connectivity

The SoC Tile and IO Tile of the Meteor Lake processor will provide PCI Express 4.0, Thunderbolt 4 and USB4 connectivity (plus USB 3.2, of course). Wireless connectivity is also integrated (or at least the digital part of it, you’ll probably still need to connect a separate radio part via the CNVio interface). The digital part in the processor supports Bluetooth 5.4 and WiFi 6E. Intel even mentions WiFi 7 as an option, but it’s not clear if the silicon implementation is ready for it and just waiting for later certification and validation, or if Intel is talking about adding a dedicated (external) adapter and not about functionality integrated into the processor itself.

Support for the latest video outputs is also important: Meteor Lake supports both HDMI 2.1 and DisplayPort 2.1. We don’t know yet what UHBR speeds will be supported on DisplayPort (but at least UHBR 10 will probably be possible). The maximum supported resolution of a connected monitor is 8K including HDR at 60fps, alternatively you can have up to four 4K screens at 60Hz. 1080p and 1440p resolutions can be run at up to 360 Hz.

As mentioned, to optimize power draw during video playback, multimedia functions have been decoupled from the GPU and moved to the SoC Tile, into the power saving island. The processor supports both playback and encoding of 10-bit 8K video including HDR and AV1 format. The processor contains two of these engines.

Real availability in December

During the reveal, Intel has presented various aspects of these interesting processors, but they are not yet on sale. Just as nothing has been said about the details of the CPU cores, Intel is not yet disclosing the specs of the specific processor models it is preparing.

The actual release of the processors, or laptops with these processors, will occur in December, supposedly on December 14. Intel typically prefers August to October launch dates for laptops, which intercepts pre-Christmas sales season and ideally also the so-called back-to-school season. Based on the date, it appears that Intel didn’t manage to push Meteor Lake to release in these earlier periods, but at least managed to get the release out before the end of 2023, which was the publicly announced date, so it was important to meet such a goal.

According to the unofficial rumors that have been coming out so far, Meteor Lake is coming only as a mobile processor for laptops in a BGA package (i.e. CPUs soldered directly onto the motherboard). Intel has kept radio silence about a socketted desktop version, so there’s no change to the fact that it has probably been cancelled – instead, the 14th generation Core processors for desktop, based on a refresh of the 7nm Alder Lake and Raptor Lake chips, are to be released. These are supposed to go on sale on October 17.

Sources: Intel, AnandTech

English translation and edit by Jozef Dudáš


  •  
  •  
  •  
Flattr this!

Leave a Reply

Your email address will not be published. Required fields are marked *