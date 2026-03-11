Modern encryption technologies enable modern systems to protect data while it is stored on a storage device, transferred over various links inside the system, or even hosted in DRAM waiting for its turn to be processed. But once it reaches the CPU, GPU, or other type of processor, it gets decrypted and shows itself essentially as plain text, making it vulnerable to various classes of attacks.

To completely protect data from risks like side attacks, DMA attacks, or hypervisor snooping, Intel has developed a processor that operates on encrypted data without first decrypting it. Now the company has demoed the chip, reports IEEE Spectrum.

Intel introduced and demonstrated its Heracles accelerator featuring fully homomorphic encryption (FHE) — meaning that it ingests encrypted data, processes it, and outputs it in an encrypted format — last month at the International Solid-State Circuits Conference (ISSCC). The chip is by no means an x86 CPU. It cannot execute normal software or run an operating system, as it is designed exclusively to accelerate fully homomorphic encryption (FHE) math.

Being a purpose-built chip, when it comes to acceleration of FHE math, the new chip operating at 1.20 GHz is roughly 1,074 to 5,547 times faster than a 24-core Intel Xeon W7-3455 'Sapphire Rapids' running at 2.50 GHz – 4.80 GHz in seven operations used in this type of workload, according to Intel.

From a technical standpoint, Heracles is a sharp departure from conventional CPUs and GPUs, both of which struggle with the mathematical demands of encrypted workloads. FHE math depends on extremely large integers, intensive polynomial calculations, and complex data transformations that quickly overwhelm general-purpose processors. Intel's Heracles relies on a purpose-designed architecture that uses an 8192-way SIMD compute engine composed of 64 tile-pairs (i.e., each tile-pair contains 128 parallel arithmetic lanes) arranged in an 8×8 mesh. Each tile integrates arithmetic units optimized for modular addition, subtraction, multiplication, and specialized butterfly operations that support number-theoretic transforms (NTT) and inverse NTTs.

These NTTs and inverse NTTs are key to encrypted computation but require heavy data movement and tightly coordinated permutations. In addition, the accelerator supports automorphisms and bootstrapping operations to remove accumulated cryptographic noise and enable longer computational chains.

The system-on-chip operates with 32-bit arithmetic slices (i.e., each lane inside TP processes a 32-bit arithmetic slice) to preserve precision and ensure high parallelism, which greatly improves the efficiency of processing encrypted math at scale. However, efficient explicitly parallel execution also requires high memory bandwidth. To that end, the chip is equipped with 48 GB of HBM3 memory using two stacks as well as custom data paths to maximize the internal bandwidth of terabytes per second. The chip further includes 64 MB of internal scratchpad memory, large register files, and dedicated buffers that stage data close to compute engines.

At peak, Hercules reaches approximately 29.5 TOPS for butterfly primitives, about 9.8 TOPS for modular arithmetic, and multi-terabit per second throughput for transform operations, according to Intel. The processor supports multiple major FHE schemes, including BGV, BFV, and CKKS, and allows programmability across different parameter sets and security levels.

The processor runs at 1.2 GHz, occupies 197 mm², operates within a 176W power envelope, and is fabricated using Intel 3 process technology. Heracles is currently implemented as a PCIe accelerator card installed alongside standard servers and uses liquid cooling to manage its thermals.

