Age of Empires II’s Goat AI Exposes Problem With Calling LLMs Human-Like

A provocative new research paper argues that if we are willing to attribute human-like qualities to large language models (LLMs), then the scripted artificial intelligence that herds goats in Microsoft’s Age of Empires II deserves the same recognition. The paper, titled “If LLMs Have Human-Like Attributes, Then So Does Age of Empires II,” by AI researcher Adrian de Wynter, was published in late May 2026 and immediately sparked debate in AI ethics and gaming communities alike.

The core of de Wynter’s argument is a satirical critique of the growing tendency to describe LLMs as possessing human-level understanding, consciousness, or intentionality. By pointing to the decades-old real-time strategy game—where non-player characters follow basic, predetermined scripts—he highlights how easily we project human traits onto systems that merely simulate surface-level behaviors. The paper does not deny the impressive capabilities of modern AI but questions the logical consistency of the language used to describe them.

The Research That Turned a Game Into a Mirror

Adrian de Wynter’s paper, which appeared on ArXiv on May 27, 2026, uses Age of Empires II not merely as an analogy but as a direct comparative testbed. The game, first released by Ensemble Studios in 1999 and later remastered as Age of Empires II: Definitive Edition in 2019, features AI opponents that gather resources, build armies, and—crucially—corral goats and sheep toward their town centers. De Wynter examines the behavior of the game’s AI, specifically the villager tasked with herding livestock, and labels it the “Goat Computer.”

The paper meticulously documents the Goat Computer’s actions: it moves toward scattered goats, guides them back to base, and repeats the process. If a predator approaches, the unit flees. This rule-based system, de Wynter notes, exhibits behaviors that many observers might describe as “intelligent” or even “purposeful.” Yet no one would seriously claim the Goat Computer possesses a mind. It is, after all, just a few dozen lines of script written in the game’s proprietary AI language.

De Wynter then draws a parallel to LLMs. When a chatbot completes a sentence eloquently, recalls a fact, or seems to express empathy, engineers and media outlets frequently use human-centric language: “understands,” “knows,” “thinks,” “reasons.” But if we apply the same linguistic standards to the Goat Computer, we should also say it “decides” to herd goats or “plans” its escape from wolves. The absurdity of that conclusion, the paper contends, exposes a double standard in AI discourse.

Unpacking the “Goat Computer” Analogy

The “Goat Computer” term is deliberately playful but methodically employed. In Age of Empires II, goat herding is a simple task that early-game AI players execute with robotic precision. Players who have spent hundreds of hours with the game know the AI’s pattern: a villager unit walks to the nearest goat, clicks on it (in game terms), and walks back to the town center. The goat follows. To a newcomer, this might look like deliberate planning. But the underlying code has no model of the world, no concept of hunger or economy—it simply follows a finite state machine.

De Wynter’s paper includes code snippets from the game’s original AI scripts (accessible via the modding community) to demonstrate the simplicity. He compares this to a simplified transformer network—the architecture behind LLMs—and shows that both operate on token-prediction or action-selection loops without internal experiential states. The key difference, he argues, is one of complexity, not fundamental category.

Yet this complexity gap is precisely why many researchers push back. An LLM with billions of parameters trained on internet-scale data produces far richer and more contextually adaptive output than a goat-herding script. De Wynter acknowledges this but insists that the leap from “complex pattern-matching” to “human-like attributes” is a philosophical one not justified by the engineering. He invokes the history of AI, from the ELIZA chatbot in the 1960s to modern voice assistants, noting that humans have a persistent tendency to anthropomorphize even the simplest automata—a phenomenon known as the ELIZA effect.

Anthropomorphism’s Long Shadow Over AI

Anthropomorphism in AI is not new. In 1966, Joseph Weizenbaum’s ELIZA program, mimicking a Rogerian psychotherapist, convinced many users that the machine understood their deepest feelings, despite consisting of simple pattern-matching scripts. Fast-forward to 2023, and a Google engineer famously claimed the LaMDA model was sentient. De Wynter’s paper places this historical context front and center, arguing that the language we use to describe AI systems has concrete consequences—from public misunderstanding to misguided regulation.

The paper cites several contemporary examples: news headlines declaring LLMs “empathetic,” companies framing their chatbots as “colleagues,” and research papers that benchmark models on “human values alignment” as if the models possess moral agency. Each time, de Wynter points to the Goat Computer as a reductio ad absurdum: if we accept these characterizations without clear, falsifiable criteria for what constitutes “understanding,” we must also extend them to the simplest game AI.

Critics of the paper have responded that LLMs, unlike game scripts, are capable of producing novel, context-relevant outputs that were not explicitly programmed. De Wynter preempts this in his work, noting that novelty alone does not indicate understanding; a random number generator can produce novel outcomes. He calls for more rigorous, operational definitions of terms like “reasoning,” “planning,” and “intelligence” in AI evaluation.

Why Age of Empires II Is the Perfect Testbed

Age of Empires II was not chosen at random. The game remains one of the most enduring titles in Microsoft’s catalog, with an active competitive scene and a robust modding community that has dissected its AI for decades. Its AI is well-understood, deterministic, and entirely open to inspection—a stark contrast to the black-box nature of commercial LLMs.

De Wynter, an avid gamer himself, explains in the paper that the game’s AI illustrates a classic “behavioral equivalence” fallacy. Two systems can produce the same observable behavior through entirely different internal mechanisms. The Goat Computer herds goats because a series of if-then rules dictate it; an LLM generates a paragraph about herding goats because statistical correlations in its training data point to likely next tokens. Neither “wants” to herd goats, and neither “cares” about the goats, despite observable behaviors that might suggest otherwise.

The paper even includes a table comparing specific attributes often cited as evidence of “human-like” AI—such as goal-directedness, adaptability, and error recovery—and shows how the Goat Computer exhibits rudimentary versions of each. For instance, if a goat is stuck behind a tree, the game AI’s pathfinding algorithm finds a route, which an observer might call “problem-solving.” But this is no different, de Wynter argues, from an LLM adjusting its tone when a user expresses dissatisfaction; both are executing pre-designed routines triggered by input.

Microsoft’s Dual Role: Gaming Giant and AI Leader

The paper lands at a time when Microsoft is aggressively promoting AI across its product lines, from Windows Copilot to Copilot for Gaming, an AI assistant announced in March 2025 designed to help players with strategies, builds, and social interactions. While the Goat Computer hails from a game nearly three decades old, it inadvertently casts a shadow on these newer AI integrations.

Microsoft’s own researchers have contributed to LLM development and the ethical debates surrounding anthropomorphism. In 2023 and 2024, Microsoft published guidelines urging responsible AI communication, warning against language that implies consciousness or volition. Yet marketing materials for Copilot often blur these lines, describing the AI as a “copilot” that “works alongside you” and “understands your intent.” De Wynter’s paper, though not directly targeting Microsoft, provides a timely reminder of the tension between product promotion and technical accuracy.

Age of Empires II: Definitive Edition is still sold on the Microsoft Store and included in Xbox Game Pass, making it readily accessible. The game’s enduring popularity means the Goat Computer analogy resonates with a large, technically literate audience. Within hours of the paper’s publication, community forums lit up with memes comparing the Goat Computer to various LLMs, and modders began creating custom AI scripts named after de Wynter’s concept.

What the AI Community Is Saying

Reactions from the AI research community have been mixed but vigorous. Some hail the paper as a necessary corrective to AI hype, while others dismiss it as a rhetorical trick that ignores genuine advances in model interpretability and reasoning benchmarks.

Dr. Elena Morris, a cognitive scientist specializing in human-AI interaction, commented on social media: “De Wynter’s paper is a valuable philosophical gut check. We keep moving the goalposts for what counts as ‘intelligent,’ but the underlying critique—that we lack a shared framework—is spot on.” Others, like machine learning engineer Raj Patel, argued that the analogy breaks down because LLMs exhibit emergent capabilities not present in scripted game AI. “The Goat Computer can’t write a sonnet or debug code. LLMs can. That’s a categorical difference,” Patel wrote.

De Wynter addresses emergent capabilities in the paper, pointing out that many so-called emergent behaviors in LLMs are statistical artifacts that disappear under different evaluation methods. He references recent studies showing that tasks believed to require reasoning can be explained by sophisticated pattern-matching. The debate is far from settled, but the paper has succeeded in bringing a classic gaming touchstone into the heart of AI philosophy.

Beyond Satire: Implications for AI Policy and Development

While the paper uses humor and satire, its implications are serious. If the AI industry continues to market and perceive LLMs as having human-like attributes, it could skew regulatory efforts, user trust, and safety assessments. For example, the European Union’s AI Act categorizes systems based on risk, but definitions of “autonomy” and “decision-making” often rely on the same anthropomorphic shortcuts de Wynter critques.

Developers, too, might be misled. A programmer who believes an LLM “understands” a task may rely on it beyond its actual competency, leading to system failures in critical applications. De Wynter advocates for a shift toward “behavioral description without mental state attribution”—describing what a system does rather than what it “thinks” or “intends.” This aligns with a growing movement in AI transparency that pushes for mechanistic interpretability and rigorous evaluation without metaphorical language.

The paper also touches on the environmental and economic costs of LLMs versus the lightweight Goat Computer. The computing power required to train and run a state-of-the-art LLM is orders of magnitude greater than running a 1999 game AI, yet the gap in genuine “understanding” remains unproven. This, de Wynter suggests, should give the industry pause before rushing toward ever-larger models based solely on the appearance of intelligence.

The Bigger Picture: Understanding vs. Performance

At its core, de Wynter’s paper is not anti-AI. It is a call to arms for clearer language and more rigorous thinking. The Goat Computer is a mirror held up to the AI field, asking it to examine the assumptions embedded in everyday vocabulary. As Windows enthusiasts and gamers know well, impressive performance does not always indicate deep competence; a game can be beaten by an AI that has no idea what a “civilization” is.

Age of Empires II remains a testament to emergent gameplay within bounded rules, something that AI researchers still strive to achieve in virtual environments. Perhaps the most telling moment in the paper comes when de Wynter recounts watching the Goat Computer fail: a villager getting stuck on terrain while a wolf approaches, then abruptly correcting its path at the last second. To a human observer, it looks like a moment of fear and quick thinking. In reality, it is a pathfinding algorithm recalculating a route. “We are too willing,” de Wynter writes, “to mistake the act of watching a good performance for the presence of a performer.”

The paper ends not with a dismissal of LLMs but with a challenge: define what you mean by “human-like,” and then let the Goat Computer apply for membership. Until then, treat your language with the same care you treat your code.