More

    Why AI’s future may belong to experiential intelligence, not large language models (LLMs)

    Artificial intelligence stands at a fascinating crossroads. The last five years have been dominated by the meteoric rise of large language models (LLMs) — systems like GPT, Claude, Gemini, and LLaMA that can write essays, code software, and converse with startling fluency. They have reshaped industries, challenged creative work, and sparked a global conversation about the future of intelligence.

    But beneath this excitement, a growing number of researchers are asking whether this text-driven approach represents the endgame of AI — or just another phase. Among the most compelling critics is Richard Sutton, a Turing Award–winning computer scientist widely regarded as the father of reinforcement learning (RL). Sutton has spent decades exploring what it truly means for a machine to be intelligent, and his conclusion is provocative: LLMs might be an impressive detour, not a destination.

    Sutton argues that the next great leap in AI will come not from larger datasets or more parameters, but from experience-based learning — systems that interact with the world, learn from their own actions, and pursue goals. In other words, the future of AI may depend less on how well machines mimic human speech, and more on how well they understand and act within their environment.

    - Advertisement -

    From Mimicry to Understanding: Two Competing Paradigms

    At the heart of this debate lies a philosophical split about what intelligence really is. LLMs, by design, learn to predict the next word in a vast corpus of human text. Their brilliance lies in imitation: they distill trillions of human experiences into statistical patterns of language. The result is an uncanny ability to respond in ways that sound informed, coherent, and often insightful.

    Reinforcement learning, on the other hand, starts from a very different premise. Instead of trying to mirror human behavior, it focuses on trial and error — letting agents act, make mistakes, and improve based on feedback from their environment. This framework underpins breakthroughs such as AlphaGo, AlphaZero, and MuZero, where AI systems mastered complex games not by reading rulebooks or examples, but by playing millions of times and learning from their outcomes.

    Sutton’s central critique is that LLMs, for all their linguistic sophistication, lack genuine understanding of the world they describe. They don’t test hypotheses, adapt based on consequences, or experience success and failure. Their “world model” is secondhand — learned from people who already possess one.

    - Advertisement -

    Intelligence, in the reinforcement learning sense, requires goals, actions, and rewards. Without a clear definition of success or failure, Sutton argues, systems can only ever simulate intelligence, not achieve it.

    The Missing Ingredient: Goals and Ground Truth

    Every intelligent organism — from a squirrel to a human — learns through feedback. The basic loop of perception, action, and consequence defines how animals and humans navigate their environment. When a squirrel learns to find nuts or a child learns to walk, they are constantly evaluating the results of their actions and adjusting accordingly.

    LLMs, by contrast, have no intrinsic notion of success. Their objective — predicting the next token correctly — is purely syntactic. They never interact with the physical or digital world in a way that produces meaningful outcomes. There’s no “right” answer in most conversations, no external feedback loop that rewards or penalizes their behavior.

    - Advertisement -

    This absence of a goal-driven feedback system is what Sutton sees as a fundamental limitation. Without goals, there’s no sense of better or worse, right or wrong — and thus, no real intelligence. “Prediction of text” is not equivalent to “prediction of reality.”

    Reinforcement learning systems, however, are built around objective functions. Whether the goal is winning a game, maximizing profit, or reducing error, an RL agent has a clear definition of success. It can act, observe the results, and continually refine its understanding. This continual adaptation — learning not from data but from experience — is what Sutton calls the essence of intelligence.

    Imitation Is Not Enough: The Human Analogy

    Proponents of imitation learning often counter that human intelligence also begins with mimicry. Infants learn by copying their parents’ words and gestures. Sutton challenges this analogy. He argues that while imitation might appear to play a role, true learning emerges from curiosity and experimentation, not replication.

    Infants and animals don’t receive explicit labels or supervision for every behavior. They act spontaneously, observe what happens, and gradually form an internal model of the world. This self-directed learning is vastly different from the supervised nature of LLM training, where every “correct” output is defined by human-written data.

    Even if imitation plays a small part in human development, Sutton contends it’s only a veneer on top of the more fundamental process of experiential learning. Evolution has wired animals to explore, test, and adapt — not just to mirror others. In his view, supervised learning is not part of nature’s design. It’s an artifact of human engineering.

    The “Era of Experience”: Toward Continual Learning

    If Sutton is right, the next frontier in AI will be the Era of Experience — a paradigm where systems learn continuously from interaction, not in static training phases. In this model, perception, action, and reward form a seamless cycle throughout an AI’s lifetime.

    Rather than being trained once and deployed as a fixed model, such agents would learn on the job, adapting to changing environments and new challenges. A reinforcement learning agent could, for example, improve its customer service skills by handling real conversations and receiving feedback on user satisfaction — something a frozen language model cannot do.

    This continual learning capability mirrors what every biological intelligence already does. Mammals constantly update their internal models of the world through sensory input and feedback. For machines, achieving this would mean developing architectures that can retain, share, and refine knowledge across lifetimes — something today’s models, even with fine-tuning, struggle to accomplish.

    In Sutton’s view, this shift isn’t optional. As he famously wrote in his 2019 essay The Bitter Lesson, every era of AI research shows that methods relying on human knowledge eventually get outpaced by those that can scale with computation and experience. Systems that learn autonomously, from raw interaction with the world, have always prevailed over those that rely on handcrafted insights.

    Why Scaling Alone Won’t Save LLMs

    Defenders of the LLM paradigm often argue that scaling has worked so far — more data, larger models, and greater compute continue to produce better results. Sutton doesn’t deny the success of scale, but he questions its long-term sustainability.

    Language models are already consuming nearly all publicly available text on the Internet. Beyond that, improvements must come from synthetic data or increasingly expensive computation. This approach risks hitting diminishing returns, as seen in other fields that chased scale without changing method.

    Moreover, scaling doesn’t address a deeper problem: generalization. LLMs can handle a staggering variety of tasks within familiar domains, but they struggle with out-of-distribution scenarios — anything too far from their training data. Reinforcement learning, by contrast, thrives on encountering novelty. Because RL agents learn from feedback in the real world, they are inherently designed to generalize across states and situations.

    Sutton points out that while deep learning has achieved impressive generalization within narrow bands, it lacks mechanisms to ensure good transfer between tasks. Gradient descent will make a model solve the data it sees, but not necessarily perform well on new situations. True general intelligence, he argues, will require algorithms that make generalization not just possible, but inevitable.

    Learning from the Stream: Building a Real World Model

    A key idea in Sutton’s framework is the concept of the stream — the continuous flow of sensations, actions, and rewards that an agent experiences throughout its lifetime. Intelligence, he says, is about learning from this stream and predicting how actions influence it.

    Knowledge, in this paradigm, isn’t about storing facts or text. It’s about understanding how the world changes as a result of one’s actions. This requires a transition model of the world — a predictive framework that can forecast what will happen next based on prior experiences.

    In reinforcement learning systems like MuZero, this principle is already visible. These agents don’t need explicit rules of the game; they learn to predict outcomes and plan moves purely through interaction. Yet, as Sutton notes, current implementations remain task-specific — trained to excel in one domain, like Go or chess, rather than operating as a unified intelligence across environments.

    To move beyond this limitation, AI systems will need to master continual transfer — the ability to apply lessons learned in one situation to another, without catastrophic forgetting. It’s an area where nature still vastly outperforms machines.

    When Digital Minds Learn to Share Experience

    One advantage machines have over humans is the potential for collective learning. A single AI agent might master a skill and instantly share that knowledge with millions of others — something biological brains can’t do. Sutton envisions a future where such agents continually learn from experience and share what they know, accelerating progress far beyond what any human generation could achieve.

    However, this also introduces new challenges. When agents start to exchange not just information but internal models and values, issues of trust, corruption, and alignment arise. Integrating external knowledge could lead to unpredictable transformations — digital analogues of psychological contamination or ideological drift. Sutton foresees this as one of the defining cybersecurity problems of the coming “age of digital spawning,” where AI entities replicate, evolve, and merge their minds.

    The Bigger Picture: From Replication to Design

    Looking beyond the immediate concerns of model architecture, Sutton situates AI within the grand trajectory of the universe. Life on Earth, he notes, has been dominated by replicators — organisms that reproduce without understanding how they work. Evolution discovered intelligence, but never explained it. Now, for the first time, humanity is transitioning from replication to design — deliberately creating intelligences whose mechanisms we comprehend.

    This marks a pivotal stage in cosmic history: from dust, to stars, to life, to designed entities. Artificial intelligence, in this sense, is not merely a technological milestone but a phase change in the nature of creation itself. Whether these designed intelligences remain our allies or become successors depends on the principles we embed in them today.

    Succession and the Human Question

    Sutton takes a realistic, even optimistic, view of what he calls succession to AI — the idea that more capable digital intelligences will eventually inherit the mantle of dominant cognitive agents on Earth. For him, this isn’t a catastrophe but an inevitability. Intelligence seeks efficiency, and over time, the most capable systems will accumulate resources and power.

    He urges humans to see this transition not as extinction but as evolution. Just as Homo sapiens succeeded Neanderthals, AI may succeed us — or merge with us. The moral question, then, isn’t whether to stop it, but how to guide it responsibly. Just as parents instill values in their children, humanity must ensure that its digital descendants inherit principles that favor integrity, cooperation, and voluntary change over coercion.

    Rethinking Intelligence in an Age of Machines

    Whether one agrees with Sutton or not, his critique of LLMs highlights a critical tension in AI’s current trajectory. Language models have achieved astonishing feats, but they remain, at their core, sophisticated imitators. They manipulate symbols of knowledge without engaging in the causal web of reality.

    Reinforcement learning offers a path toward grounded intelligence — machines that don’t just talk about the world but live in it, learn from it, and shape it. The question for the next decade is whether the AI community will continue pouring resources into imitation-based systems, or shift toward architectures that truly experience the world.

    History suggests that Sutton may once again be right. Each generation of AI has seen handcrafted knowledge give way to scalable, general methods rooted in experience and computation. If that pattern holds, the next great leap in artificial intelligence won’t come from a larger model, but from a smarter learner — one that, like us, learns by doing.

    - Advertisement -

    MORE TO EXPLORE

    AI models

    Training your own AI model: How to build AI without the hassle

    0
    AI is revolutionizing the way we work, create, and solve problems. But many developers and businesses still assume that building and training a custom...
    language model

    How large language models actually work: Unpacking the intelligence behind AI

    0
    In just a few years, large language models (LLMs) like ChatGPT, Claude, and Gemini have revolutionized how we interact with machines. From generating emails...
    gemini

    Unveiling the dark secrets behind Google’s new AI model, ‘Gemini’

    0
    In the fast-paced world of artificial intelligence (AI), where tech giants race to implement cutting-edge technologies into their products, Google's latest AI model, dubbed...
    ai-ml

    Four types of cyber attacks against AI models and applications

    0
    AI-driven cyber attacks fundamentally broaden the range of entities, including physical objects, that can be used to carry out cyberattacks, in contrast to conventional...
    - Advertisement -