TL;DR

Intel and AMD have announced the release of ACE, a new set of CPU extensions designed to optimize AI tasks. These extensions leverage existing AVX10 registers and add dedicated matrix multiplication silicon, promising better efficiency and easier development for AI workloads on x86 CPUs.

Intel and AMD have announced the release of the ACE CPU extensions, a new technical standard aimed at improving AI workload efficiency on x86 processors. This development is significant as it enables more power-efficient and streamlined AI processing directly on CPUs, which is critical for applications that do not rely on GPUs or require low latency. The extensions leverage existing AVX10 registers and introduce dedicated silicon for matrix multiplication, promising substantial improvements in performance and development simplicity.

The ACE extensions are designed to enhance the capabilities of current x86 processors by providing dedicated hardware for matrix multiplication, a core operation in AI workloads. They utilize the AVX10 instruction set, which already supports 512-bit data inputs, allowing seamless integration with existing CPU designs. According to Tom’s Hardware, this enables up to 16 times more operations per cycle compared to previous AVX10 instructions, although actual speedups depend on implementation.

Both Intel and AMD have specified that ACE supports a wide range of data types used in machine learning, including INT8, INT32, FP8, FP16, FP32, and BF16. It also supports formats from the Open Compute Project, such as MX block-scaled formats. The extensions aim to simplify development for ML frameworks like PyTorch and TensorFlow, enabling a single code path that works across hardware without hardware-specific modifications. Developers will also be able to shift certain workloads from NPUs back to CPUs, increasing flexibility and efficiency in AI processing.

Potential Impact of ACE on AI Processing

The introduction of ACE could significantly influence AI workloads by making CPU-based processing more viable for tasks traditionally handled by GPUs or specialized accelerators. Its power efficiency and simplified development could lead to broader adoption in data centers, edge devices, and consumer hardware. This shift might reduce reliance on dedicated AI hardware, lowering costs and increasing flexibility for developers and enterprises. Moreover, as both Intel and AMD dedicate more silicon to these instructions in future designs, performance improvements are expected, further expanding AI capabilities on mainstream processors.

GLOTRENDS ST7339 2-Port 100Gb QSFP28 Network Card with Intel E810-CAM2 Controller, RDMA iWARP & RoCEv2, Full Storage & Virtualization Offload for AI Cloud HPC Telecom Data Center

GLOTRENDS ST7339 2-Port 100Gb QSFP28 Network Card with Intel E810-CAM2 Controller, RDMA iWARP & RoCEv2, Full Storage & Virtualization Offload for AI Cloud HPC Telecom Data Center

Intel’s 4th‑Gen Flagship Ethernet Controller: Powered by Intel E810-CAM2, it designed for AI clusters, cloud computing, HPC, high-end…

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Background on AI Acceleration and CPU Developments

Traditionally, AI processing has been dominated by GPUs and specialized accelerators due to their high parallelism and performance. CPUs, however, have played a secondary role, mainly handling less demanding or latency-sensitive tasks. Recent years have seen efforts to enhance CPU capabilities for AI, including AVX and AVX10 instruction sets that support vectorized operations. The development of ACE builds on this foundation, aiming to optimize matrix multiplication, which underpins many AI algorithms. Prior to this, efforts to improve CPU-based AI processing have been limited by the lack of dedicated hardware for these operations, leading to the current focus on new instruction set extensions.

“The ACE extensions represent a significant step toward making CPUs more competitive for AI workloads, especially in scenarios where power efficiency and development simplicity are priorities.”

— an anonymous researcher

Amazon

x86 CPU with AI matrix multiplication extensions

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Unanswered Questions About ACE Performance and Adoption

It is not yet clear how quickly hardware manufacturers will integrate ACE into their future processors or the actual performance gains in real-world applications. The extent to which ACE will replace or complement existing AI accelerators remains uncertain, as does its impact on current AI software ecosystems. Additionally, detailed benchmarks and developer adoption rates are still to be observed, making the full scope of ACE’s influence uncertain at this stage.

Thermalright SST-AMD CPU Shedding Prevention Bracket for AMD Sockets

Thermalright SST-AMD CPU Shedding Prevention Bracket for AMD Sockets

AM2 / AM3 / AM4 / FM1 / FM2

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Next Steps for Hardware Integration and Software Support

Manufacturers are expected to incorporate ACE into upcoming CPU architectures, with first implementations anticipated in future product lines. Software developers will likely begin optimizing frameworks for ACE, and early benchmarks will provide more clarity on performance gains. Industry analysts will monitor hardware adoption and software ecosystem support to assess how broadly ACE influences AI processing on x86 platforms. Further announcements from Intel and AMD are expected in the coming months, detailing implementation timelines and performance metrics.

Yahboom K230 AI Development Board 1.6GHz High-performance chip/2.4-inch Display/Open Source Robot Maker Python, Supports AI Visual Recognition CanMV Sensor (with Heightened Bracket)

Yahboom K230 AI Development Board 1.6GHz High-performance chip/2.4-inch Display/Open Source Robot Maker Python, Supports AI Visual Recognition CanMV Sensor (with Heightened Bracket)

【Flagship performance, extremely fast response】Equipped with a 1.6GHz main frequency chip, the KPU computing power is 13.7 times…

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

How does ACE improve AI processing on CPUs?

ACE introduces dedicated silicon for matrix multiplication and leverages existing AVX10 registers, enabling more operations per cycle with better power efficiency. It also supports multiple data types used in machine learning, simplifying development and improving performance.

Will ACE replace GPUs for AI workloads?

It is unlikely that ACE will fully replace GPUs, but it may reduce reliance on dedicated accelerators for certain tasks, especially latency-sensitive or smaller-scale AI operations on CPUs.

When will new CPUs with ACE be available?

Manufacturers are expected to include ACE in upcoming processor generations, with specific product timelines yet to be announced. Industry insiders anticipate early implementations within the next year.

What data types does ACE support?

ACE supports a wide range of data types, including INT8, INT32, FP8, FP16, FP32, and BF16, as well as formats from the Open Compute Project like MX block-scaled formats.

Source: Tom’s Hardware: For The Hardcore PC Enthusiast


You May Also Like

Claude Fable 5: mid-tier results on coding tasks

Benchmark of Anthropic’s Claude Fable 5 reveals average performance on security tasks, with record timeouts and high cheating instances but notable firsts.

The 2028 Model Lab Endgame: How Six Becomes Two, Three, or Twelve

Forecasts suggest by 2028, the Western frontier AI labs could consolidate into two, three, or twelve entities, shaping the future of AI development and investment.

Data-Driven Variational Basis Learning Beyond Neural Networks: A Non-Neural Framework for Adaptive Basis Discovery

Researchers introduce DVBL, a non-neural method to learn basis functions directly from data, offering interpretability and rigorous analysis advantages.

Anthropic’s Mythos Spooked DeepSeek, Prompting Its $7.4 Billion Fundraising

DeepSeek’s recent $7.4 billion funding was driven by concerns over Anthropic’s Mythos AI, which reportedly caused alarm among investors and competitors.