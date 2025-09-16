A new AI architecture developed by a small Singaporean startup has made headlines after outperforming major large language models like OpenAI’s GPT-4 and Anthropic’s Claude on a notoriously difficult intelligence benchmark—using just a tiny fraction of the computational resources.

The technology, unveiled by researchers at Sapient, introduces a model called the Hierarchical Reasoning Model (HRM). According to a peer-reviewed preprint on arXiv, HRM achieved 40.3% accuracy on the Abstraction and Reasoning Corpus (ARC-AGI), a benchmark designed to test general problem-solving abilities without prior training on specific tasks. In comparison, OpenAI’s o3-mini-high scored 34.5%, Claude 3.7 came in at 21.2%, and Deepseek R1 at just 15.8%.

What’s caught the AI world off guard is the fact that HRM runs on just 27 million parameters—roughly 1,000 times fewer than mainstream models. It was trained using only 1,000 samples. No pretraining. No reinforcement learning. No fine-tuning on vast troves of internet data.

Inspired by the Brain, but Skipping the Buzzwords

Rather than throwing more data or compute at the problem, Sapient’s team opted for something different: structure. HRM’s design mimics the brain’s ability to process information across multiple timescales.

“It’s an architecture with two modules,” the authors explain in the arXiv paper. “A high-level controller plans abstract strategies, while a low-level executor handles rapid, granular computations.” These two layers operate together in a loop, allowing the system to refine its reasoning over time without relying on the Chain-of-Thought (CoT) method used by most language models today.

CoT, which breaks problems into step-by-step reasoning sequences, has become a dominant strategy in modern AI. But it comes with caveats: it requires massive datasets, introduces latency, and often produces brittle outputs. HRM, by contrast, executes tasks in a single forward pass, making it faster and more efficient in theory.

High Marks on Complex Logic Tasks

While many AI models can compose essays or generate images, few excel at logic-heavy tasks like Sudoku or maze navigation. HRM reportedly shines in these domains, solving complex puzzles with near-perfect accuracy.

The model’s performance hints at a broader shift in how researchers think about artificial general intelligence (AGI). Instead of scaling models endlessly, HRM’s creators argue that better reasoning may come from architectural innovation, not brute force.

Still, not everyone is convinced. When independent researchers attempted to reproduce Sapient’s results on ARC-AGI, they found that the hierarchical design itself played a limited role in boosting performance. Instead, much of the model’s success was linked to a novel training technique—a refinement loop—that was only briefly mentioned in the original paper.

This raised eyebrows in parts of the AI community. “We need more transparency about what’s really driving these gains,” one researcher involved in ARC’s evaluation process told Daily Galaxy. “The results are impressive, but we don’t yet know what’s under the hood.”

Small Model, Big Implications

If further validation confirms HRM’s capabilities, the implications could be wide-ranging. Models like GPT-4 and Claude require immense computational resources, contributing to the growing energy footprint of AI. Smaller, brain-inspired systems like HRM could offer a more sustainable alternative—faster to train, cheaper to deploy, and potentially better at reasoning.

That said, the architecture remains in its infancy. The Sapient paper has not yet been peer-reviewed, and no open-source version of HRM is currently available for testing. For now, the tech world will have to wait to see whether this new approach is the real deal—or another short-lived spark in the AI arms race.