Karpathy's AutoResearch Passes 60,000 GitHub Stars as the 630-Line Script That Runs AI Experiments Overnight Reshapes How Labs Approach Optimization

Overview

Andrej Karpathy, the former Tesla AI lead and OpenAI co-founder, released a deceptively simple open source project in early March that has rapidly become one of the most-starred repositories of 2026. Called AutoResearch, the tool is a 630-line Python script that lets AI coding agents autonomously conduct machine learning experiments while researchers sleep, according to VentureBeat. As of late March, the project has surpassed 62,000 GitHub stars and 8,800 forks on its repository, making it one of the fastest-growing open source releases of the year.

How It Works

The system operates as an autonomous optimization loop. An AI agent is given a training script and a fixed compute budget of five minutes per experiment on a single GPU. The agent reads the source code, forms a hypothesis for improvement, modifies the training code, runs the experiment, and evaluates the results before beginning the next cycle, as described in the project’s GitHub repository.

The core design centers on three files: prepare.py, which contains immutable data-loading utilities; train.py, which the agent is free to modify across experiments; and program.md, which holds human-authored instructions that guide the agent’s research direction. The primary evaluation metric is validation bits per byte (val_bpb), providing a single, clear optimization target.

At roughly 12 experiments per hour, a single overnight session can yield approximately 100 completed runs without any human intervention, according to VentureBeat.

What We Know

Karpathy’s own two-day run of the system processed approximately 700 autonomous changes, discovered around 20 additive optimization techniques, and achieved an 11 percent efficiency gain when those improvements were applied to a larger model, according to Fortune. The stacked optimizations reduced a key benchmark, dubbed “Time to GPT-2,” from 2.02 hours to 1.80 hours.

The concept quickly attracted industry attention. Shopify CEO Tobias Lutke reported running 37 experiments overnight using the same approach and achieving a 19 percent performance gain on an internal model, according to Fortune. Industry analyst Janakiram MSV coined the term “the Karpathy Loop” to describe the three-component pattern: an agent with access to modifiable files, a measurable optimization metric, and fixed time limits per experiment.

Karpathy’s initial announcement post garnered more than 8.6 million views within two days, according to Fortune, and the project’s MIT license has encouraged a wave of community forks targeting macOS, Windows, and AMD GPU platforms.

What We Don’t Know

The current system operates on a simplified training codebase spanning just 630 lines of Python, which is orders of magnitude simpler than production AI training infrastructure. Whether the approach scales to the complexity of frontier model development remains untested.

Karpathy described the methodology as “the final boss battle” for frontier AI labs, envisioning multiple collaborative agents exploring different optimization pathways simultaneously rather than single-agent linear progression, according to Fortune. However, experts have noted that the system optimizes separate, smaller models rather than improving itself, placing it firmly outside the realm of recursive self-improvement scenarios that have drawn concern from AI safety researchers.

It also remains unclear how effectively the optimizations discovered on small models will transfer to the much larger architectures used by companies like OpenAI, Anthropic, and Google DeepMind. The 11 percent efficiency gain demonstrated by Karpathy is promising but was measured on relatively small-scale training tasks.

Analysis

AutoResearch represents a practical instantiation of a trend that has been building throughout 2025 and into 2026: the use of AI agents not just to write code, but to conduct the experimental science that drives AI progress itself. The project’s explosive popularity suggests significant latent demand for tools that automate the trial-and-error grind of hyperparameter tuning, architecture search, and training optimization.

The fixed five-minute compute budget per experiment is a particularly elegant design choice, making results comparable across different hardware configurations while keeping the barrier to entry at a single NVIDIA GPU. The MIT license and minimal codebase have further lowered the threshold for adoption, turning what might have been a research curiosity into a widely accessible tool.

Whether the Karpathy Loop evolves from a community phenomenon into a standard practice at major AI labs will depend on whether the approach can be extended beyond toy-scale training runs. For now, it has demonstrated that autonomous AI agents can conduct meaningful optimization work overnight, a capability that could accelerate the pace of machine learning research across the industry.