
The AI Hardware Problem
One of the most significant barriers to progress in artificial intelligence today is not purely algorithmic but physical. The challenge lies in getting AI systems to run efficiently on hardware as current approaches rely on vast amounts of specialized silicon, massive energy consumption, and complicated cooling systems. This approach is anything but sustainable, and if AI continues to scale at its current pace, could see serious resource shortages. Data centers already strain power grids, and the amount of heat generated requires industrial-scale cooling, often supported by water-intensive processes that introduce their own environmental and economic issues.
Of all AI processes, training remains the most energy-intensive. While inference (the process of running a model once it has been trained), is not trivial in cost, it pales in comparison to the resources demanded during training. The reason for this is simple: training requires extreme precision, billions to trillions of parameters, and the ability to process and adjust enormous data sets in parallel. The GPUs and TPUs that underpin this effort are engineered for high-throughput floating-point operations, yet they still burn through staggering amounts of power.
The underlying difficulty faced with AI is that neural networks are fundamentally analog in nature. Human neurons operate with continuous signals, and artificial neurons attempt to model that. However, digital computers work with binary digits, not analog values. To approximate analog weights in a neural network, a system must use floating-point representations, which demand considerable memory and bandwidth. This conversion between the digital and the analog world is inherently inefficient and creates a widening gap between what the mathematics suggests and what the hardware can practically deliver.
Updating these weights during training further compounds the problem. Each adjustment requires writing to memory, but memory architectures are not optimized for constant, parallel updates across billions of values. Current hardware moves data back and forth between processing units and memory banks, a bottleneck that not only increases energy consumption but also slows training. This inefficiency explains why training large AI models is so expensive and why it often requires specialized chips clustered together in supercomputing configurations.
Another important consequence of this architecture is that neural networks cannot realistically train while in use. Unlike biological brains that learn continuously, artificial systems must go through discrete phases of training and then deployment. This rigid separation arises from the difficulty of updating weights on-the-fly without destabilizing the entire model. Until hardware evolves to allow more flexible, efficient updates, AI will remain locked in this two-stage cycle.
French Researchers Develop Hybrid Memory for On-Chip AI Learning and Inference
Recognising the challenges faced with AI inference and training, a French research team led by CEA-Leti has developed the first hybrid memory technology capable of supporting both adaptive local learning and inference on a single chip. Their results, published in Nature Electronics on September 29, 2025, mark a significant step toward energy-efficient, autonomous AI at the edge.
The breakthrough combines two technologies previously thought to be incompatible: ferroelectric capacitors (FeCAPs) and memristors. Each has advantages, but also critical limitations when applied individually. Memristors are well suited to inference, as they can store analog weights and operate with high energy efficiency during read operations. They also support in-memory computing, which reduces the need to shuttle data between separate memory and processing units. However, memristors struggle with the fine-grained, repeated weight adjustments required for training. In contrast, FeCAPs allow rapid and low-energy weight updates, making them ideal for learning tasks, but their destructive readout mechanism has historically prevented them from being used for inference.
CEA-Leti’s team solved this no-win tradeoff by engineering a single memory stack based on silicon-doped hafnium oxide with a titanium scavenging layer. By altering the electrical “forming” process, the device can behave either as a FeCAP or a memristor. This duality enables a new training method: weight updates are first stored with high precision in the FeCAP mode, while inference computations rely on the analog behavior of the memristor mode. Periodically, weights are reprogrammed into memristors using the most significant bits preserved in FeCAPs, striking a balance between accuracy, efficiency, and endurance. The method borrows from concepts in quantized neural networks, using low-precision operations for forward and backward passes, while maintaining stability through higher-precision references in FeCAPs.
The researchers fabricated and tested the technology on an 18,432-device array using standard 130 nm CMOS processes, proving that the system can be scaled within established semiconductor manufacturing techniques. Crucially, this integration of training and inference on a single chip reduces the need for constant external updates, lowering energy costs and extending the lifespan of edge devices. Potential applications include autonomous vehicles, industrial monitors, and medical sensors, all of which require the ability to adapt to real-world data streams without reliance on cloud infrastructure.
Could This New Concept Empower AI?
There is little doubt that today’s hardware is struggling to keep up with the demands of artificial intelligence. The issue is not that GPUs, TPUs, and other accelerators are poorly designed or inefficient; in fact, they are already operating at an impressive level of optimization for their architectures.
The deeper problem is that AI itself is a poor fit for conventional digital hardware. Training and inference demand enormous amounts of data movement, high-precision updates, and continuous computation that existing chips were never built to handle. This mismatch means that, even when operating at peak efficiency, current hardware consumes far more energy and cooling resources than is sustainable.
The arrival of devices such as memristors and ferroelectric capacitors has the potential to completely shift this paradigm. By combining analog-like computation with memory storage, these technologies bypass one of the biggest inefficiencies in AI hardware: the separation of memory and processing. Instead of constantly moving weights between compute cores and external memory, the weights themselves can exist in memory elements that also perform computation. This reduces latency, cuts energy use, and enables new forms of adaptive learning that would otherwise be impractical.
The most compelling possibility lies in the ability to perform both training and inference on the same device without switching between separate hardware or external systems. An AI system built on such a foundation could not only process information efficiently but also update and refine its own models in real time. This kind of self-improving system would mark a significant departure from the rigid two-stage process of today’s AI workflows, where training is carried out on large-scale clusters and inference is performed separately at the edge. In principle, hardware capable of both could unlock far more powerful and flexible AI, where systems adapt continuously to the environments in which they operate.
For example, autonomous vehicles could adjust their models based on regional driving patterns without relying on constant cloud retraining, industrial sensors could adapt to gradual changes in machinery performance, spotting failures earlier, and medical monitoring devices could learn to identify patient-specific trends rather than relying solely on generic models. All of these examples depend on the ability to learn locally while conserving energy, something hybrid memory devices promise to deliver.
However, there is an important caveat; these devices are still at the research stage, demonstrated in controlled experiments but not yet scaled to the billions of units needed for real-world systems. Issues of durability, variability, and large-scale manufacturability remain unresolved. Until memristor-ferroelectric hybrids or similar architectures can be proven reliable at industrial scales, their promise remains just that, a promise. The concept is undeniably powerful, but the timeline for deployment will depend on whether researchers and manufacturers can translate laboratory breakthroughs into commercially viable products.