In 2025, the AI industry is abuzz with the concept of the "Year of the Agent," with conferences and media outlets proclaiming that intelligent agents will soon dominate the digital landscape, automating tasks and transforming workflows. However, in a sobering counterpoint, Andrej Karpathy, a leading figure in AI, delivered a thought-provoking speech at the AI Startup School in San Francisco. He cautioned against short-term hype and outlined a systematic vision for AI’s evolution over the next decade, emphasizing the need for a fundamental overhaul of digital infrastructure. This article summarizes the core insights from his talk, exploring the deeper logic behind the agent frenzy and the path forward for AI.
A Paradigm Shift in Software: From Code to Agents
Karpathy began by asserting that the software industry is undergoing its most significant paradigm shift in 70 years. He categorized software development into three distinct eras:
Software 1.0: The Hand-Coded Era
In the first era, humans manually wrote code to instruct computers, from early languages like FORTRAN to modern ones like Python and Java. Programmers crafted precise instructions, resulting in billions of lines of code hosted on platforms like GitHub. While this era produced remarkable achievements, it was limited by human cognitive bottlenecks and the high cost of modifying complex codebases.
Software 2.0: The Data-Driven Era
The rise of deep learning ushered in Software 2.0, where neural networks replaced explicit code with weights trained on vast datasets. For example, image recognition models like AlexNet were not programmed with rules but trained on millions of images to "learn" features. Platforms like Hugging Face’s Model Atlas serve as the GitHub of this era, storing pre-trained models rather than code, enabling developers to leverage data-driven programs.
Software 3.0: The Natural Language Era
The advent of large language models (LLMs) marks Software 3.0, which Karpathy describes as the most disruptive shift yet. LLMs transform neural networks into general-purpose computers, with natural language as the programming interface. Tasks like sentiment analysis, once requiring custom models or code, can now be performed by prompting an LLM. This "prompt engineering" lowers the programming barrier, making anyone who can communicate a potential "programmer."
LLMs as Operating Systems
To illustrate the role of LLMs, Karpathy likened them to a new kind of operating system. He expanded on Andrew Ng’s analogy of AI as "the new electricity," suggesting that companies like OpenAI and DeepMind act as "power plants," training massive models in AI data factories and delivering services via APIs. Users demand low-latency, reliable access to these models, much like electricity, with platforms like OpenRouter enabling seamless switching between models. A simultaneous outage of major LLMs would be akin to a global "AI blackout," disrupting the digital world.
At a technical level, LLMs function like a CPU for reasoning, with their context window serving as memory to store task-relevant data. The surrounding ecosystem, like an operating system, orchestrates resources to handle multi-step tasks, such as data analysis. This architecture redefines software development, as LLMs can coordinate tasks that previously required multiple specialized modules.
Karpathy also drew parallels between the current LLM market and the early operating system wars. Closed-source models like GPT-4 and PaLM dominate with technical prowess, while open-source alternatives like Llama drive innovation through collaboration. This dual-track ecosystem balances commercial stability with rapid experimentation, fostering continuous progress.
The Strengths and Limitations of LLMs
Karpathy highlighted the remarkable strengths of LLMs:
- Unparalleled Knowledge Base: Trained on internet-scale text, LLMs encompass nearly all publicly available human knowledge, surpassing any individual scholar.
- Robust Short-Term Memory: Their context windows can process thousands of tokens, equivalent to memorizing an entire book in a single interaction.
- Cross-Domain Generalization: LLMs excel in diverse tasks, from coding to creative writing, due to their generalized training.
However, he also emphasized their limitations:
- Hallucinations: LLMs can generate false information, like claiming Einstein won three Nobel Prizes when he only won one.
- Jagged Intelligence: They may excel in complex tasks but fail at simple ones, displaying inconsistent performance.
- Anterograde Amnesia: LLMs reset their context after each interaction, lacking the ability to accumulate experience without external memory tools.
- Security Vulnerabilities: They are susceptible to prompt injection attacks, posing risks in real-world applications.
The Case for Partial Autonomy
Given these limitations, Karpathy advocated for "partially autonomous applications" that prioritize human-AI collaboration over fully automated systems. He cited two examples:
Cursor: A Collaborative Code Editor
Cursor, an AI-powered code editor, exemplifies partial autonomy. It integrates project-wide context into the model, uses multiple models for tasks like code generation and diffing, and provides a visual interface for users to review AI suggestions. A key feature is the "autonomy slider," allowing users to adjust AI’s control—from human-led code completion to AI-driven file edits—balancing efficiency and safety.
Perplexity: Enhanced Information Retrieval
Perplexity applies similar principles to search, aggregating data from multiple sources, cross-validating with LLMs, and presenting results with citations. Its interface allows users to verify the reasoning process, supporting both quick searches and in-depth analyses.
These applications emphasize human oversight for decision-making and validation, with AI handling repetitive tasks, creating an efficient collaboration loop. Visual feedback in interfaces leverages human visual processing, enabling millisecond-level error detection, while autonomy sliders address psychological needs for control, fostering trust.
The Infrastructure Bottleneck
Karpathy shared insights from developing MenuGen, an app he built using LLMs. While LLMs enabled rapid prototyping—allowing him to code core features in hours despite limited Swift experience—deployment took a week due to DevOps challenges like authentication, payment integration, and cloud configuration. This contrast highlights a critical issue: current digital infrastructure is designed for humans (via GUIs) or traditional programs (via APIs), not AI agents.
AI agents struggle with human-centric interfaces, like web forms, which require parsing visual elements and simulating clicks—an inefficient and error-prone process. This "last-mile" barrier hinders AI innovation.
A Systemic Solution: AI-Friendly Infrastructure
To address this, Karpathy proposed a "two-way" approach, where humans adapt infrastructure to better suit AI, rather than forcing AI to navigate human systems. His solutions include:
-
LM.txt Files: Similar to robots.txt, these files would use Markdown to describe a website’s functionality, APIs, and data structures (e.g., "/api/weather?city=[city]"). This machine-readable format would drastically improve AI interaction efficiency compared to parsing complex HTML.
-
Dual-Mode Documentation: Documentation should cater to both humans and AI, combining human-readable instructions with structured API calls or command-line instructions. Companies like Vercel and Stripe are already optimizing docs for LLMs, enhancing both AI and human developer efficiency.
-
Bridge Tools: Tools that convert human-centric data (e.g., GitHub pages or Excel tables) into AI-friendly formats (e.g., plain text or JSON) can bridge the gap without requiring extensive system overhauls.
Karpathy argued that expecting multimodal models to mimic human clicks is inefficient, like teaching humans to type with their feet. Instead, humans should meet AI halfway by providing machine-readable interfaces, structured docs, and translation tools.
A Decade-Long Journey
Drawing from his experience in autonomous driving at Waymo, Karpathy warned of the "reliability gap" between technical demos and production-ready systems. Despite a near-perfect demo in 2013, fully autonomous vehicles remain elusive in 2025. Similarly, the hype around 2025 as the "Year of the Agent" risks overlooking infrastructure gaps and pushing premature full automation.
He likened current AI strategies to Iron Man’s suit, blending human-controlled augmentation with semi-autonomous features. Partial autonomy, supported by intuitive interfaces, maximizes LLM strengths while mitigating weaknesses, offering a pragmatic path forward.
A Universal Revolution
Unlike past technological shifts driven by governments or corporations, the AI revolution is uniquely democratic. LLMs, accessible via the internet, empower billions to participate in programming through prompt engineering. Entrepreneurs can build on this infrastructure, making AI a global, participatory revolution.
Karpathy concluded by urging developers to embrace LLMs’ potential while remaining grounded. The path to AI’s future lies not in chasing hype but in building reliable, collaborative ecosystems. This steady approach, though less flashy, is the most dependable route to realizing AI’s transformative promise.