The arguments presented in this article are intentionally opinionated and express my intuitions about the relationship between time, entropy, and artificial intelligence. They are not intended as formal scientific proofs, nor are they backed by systematic experimentation. This piece sits at the intersection of AI theory and dynamical systems interpretation—read it as philosophical exploration rather than established consensus.

PS — Please reach out to me if you find anything inaccurate or overstated; I'll make sure to correct it.

Introduction: Three Kinds of Time, and Why Only One Matters

There is an important distinction that often gets blurred in discussions of AI and time. Consider three different things we might mean when we say a system "understands time":

Representational time: The training data encodes temporal structure—texts describe sequences of events, timestamps label documents, tokens appear in left-to-right order. A model can learn to represent this structure without having any ongoing relationship with time itself.
Architectural state: The model maintains an internal state that persists and updates across successive inference steps or episodes, allowing it to accumulate experience and revise its understanding as the world changes around it.
Subjective experience of time: The model experiences duration, the felt passage of moments—something closer to phenomenal consciousness.

Most of the interesting engineering questions cluster around the second level. The third level—subjective experience—is genuinely unknowable with current tools and not the concern of this piece. The first level—representational time—is already present in large language models: they learn a great deal about temporal structure from text. But representational time alone is not enough. My central claim is this: for an AI system to function as a genuine world model or to support the best plasticity-stability system in continual learning, it requires architectural state that persists and updates across time—not merely representations of temporal concepts frozen in static weights.

This distinction matters because conflating the three levels makes current LLMs look both more and less capable than they are. They are good at representing time (but not too developed). They have almost no architectural relationship with it.

Why Exploration Is the Core Problem

We stand at a transition from "learning from data" to "learning what data to learn from." Large models under unified architectures have shifted the bottleneck from how to train to what to acquire—a second-order problem that can be framed as exploration. Exploration is universal to all learning in open-ended domains. An intelligent agent must actively collect its own training experience, determining which data is most informative for expanding its capabilities on a continual basis.

This requires evaluating data along three axes: learning potential, diversity, and grounding to real tasks of value. Without architectural state that persists across episodes—without the ability to observe an action, wait for its consequences, and update internal representations based on delayed feedback—an agent cannot close this loop. It can only passively ingest pre-existing datasets, with no mechanism connecting what it learns to what it does.

The Core Problem: Representational Time Is Not Enough

To understand the limitation precisely, we need to be careful about what LLMs actually lack.

A standard transformer processes an entire prompt simultaneously through self-attention, using positional encodings to inject a sense of token order. This is representational time: the model learns that certain tokens tend to precede others, that narratives unfold in sequence, that causes typically appear before effects in text. This is real and useful. LLMs demonstrably learn a great deal of temporal structure from it.

What LLMs lack is architectural state across independent inference episodes. Given the same input $X$ at two different points in calendar time—after a week of real-world events, after new information has emerged, after its previous outputs caused downstream consequences—a standard LLM produces output $Y$ based entirely on the same frozen weights. It has no internal mechanism to register what happened between those moments. What appears to the user as a dynamic conversation is, from the model's internal perspective, a series of isolated forward passes, each processing a static sequence against a fixed set of parameters.

This is not a limitation of representation. The model may have read millions of texts describing how to update beliefs in light of new evidence. The limitation is architectural: there is no persistent internal state for those representations to update. The feedback loop between action and consequence—the loop that makes exploration meaningful—is severed at the boundary of each inference call.