Back

March 12, 2026

By Percy D. Louvier

grep: The Hundred-Billion-Dollar Overnight Hack

In November 1973, Ken Thompson sat down at Bell Labs and solved a colleague's problem overnight. The colleague, Lee McMahon, needed to search the Federalist Papers for linguistic patterns — part of the long-running scholarly effort to determine who wrote the disputed essays. McMahon had been using the ed text editor, but ed was interactive. It choked on large files. It was not built for batch search across documents.

Thompson ripped the regular expression engine out of ed, wrapped it in a standalone program, and handed McMahon a new tool before breakfast. He named it after the ed command it replaced: g/re/p — global regular expression print.

Fifty-three years later, that overnight hack runs on every Linux server, every Mac, every Android phone, every supercomputer in the TOP500, and — most recently — inside the AI coding agents rewriting how software gets built. If Thompson had charged a royalty, he could be the richest person alive.

The Man Behind the Hack

Ken Thompson's resume reads like a fabrication. He co-created UNIX (1969). He designed the B language, which Dennis Ritchie evolved into C. He published the foundational algorithm for compiling regular expressions into finite automata (1968 CACM paper). He co-designed UTF-8 on a placemat at a New Jersey diner with Rob Pike — then implemented it the next day. He co-created Go at Google. He won the Turing Award in 1983.

grep is not even in his top five contributions. He wrote it between dinner and dawn.

But grep may be the single most used piece of software Thompson ever produced. Not the most important — UNIX wins that — but the most frequently invoked. Every time a developer searches a codebase, every time a sysadmin filters a log, every time an AI agent scans a repository to understand its structure, a descendant of Thompson's overnight hack does the work.

What grep Actually Did

The name encodes the entire specification. In ed, the command g/re/p meant: scan every line globally, match a regular expression, and print the results. Thompson extracted that pipeline into a command-line tool that accepted files as arguments and wrote matches to standard output.

The design was pure UNIX philosophy — the philosophy Doug McIlroy articulated as "write programs that do one thing and do it well." grep does exactly one thing. It filters. Data flows in, matching lines flow out. Compose it with pipes and you have an analytical engine:

cat access.log | grep "500" | grep -v "health" | sort | uniq -c | sort -rn

That pipeline — grep as a filter stage between other small tools — became the foundational paradigm of UNIX computing. Every log analysis workflow, every deployment script, every CI/CD pipeline that has ever searched for an error pattern owes its structure to this idea.

The Family Tree

grep did not stay alone. Alfred Aho — another Bell Labs giant, co-author of the Dragon Book, later a Turing Award recipient himself — wrote egrep and fgrep in 1979. egrep added extended regular expressions. fgrep used the Aho-Corasick algorithm for multi-pattern matching at blistering speed.

GNU grep (1988) unified the family. Mike Haertel's implementation introduced Boyer-Moore string searching — a trick that lets grep skip bytes it doesn't need to examine. His 2010 email "why GNU grep is fast" is a masterclass in systems engineering: mmap the file, search for the rarest character in the pattern first, avoid looking at most of the input.

Then came the programmer-focused era: ack (2005), The Silver Searcher (2011), and finally ripgrep (2016). Andrew Gallant wrote ripgrep in Rust. It respects .gitignore automatically, supports Unicode by default, parallelizes directory traversal, and uses SIMD-accelerated matching. It returned to Thompson's original NFA approach — guaranteeing linear-time matching with no catastrophic backtracking. The 1968 algorithm turned out to be better than its "modern" replacements.

ripgrep is now the default search engine inside Visual Studio Code. Over 15 million developers use it every time they press Ctrl+Shift+F without knowing it.

How grep Gave Claude Code a Nervous System

An AI coding agent without grep is a reader in a library with no card catalog. It can read any book — but it does not know which shelf to walk to.

Claude Code runs locally on a developer's machine, inside the terminal, with direct access to the filesystem. When a user says "fix the authentication bug," the agent faces an immediate problem: a typical codebase has thousands of files. The context window cannot hold all of them. The agent must search selectively — find the relevant files, read only what matters, and understand the structure before touching anything.

grep (specifically ripgrep, running under the hood) solves this. Claude Code's Grep tool is a first-class capability, elevated from shell command to core agent infrastructure. The system instructions are explicit: "ALWAYS use Grep for search tasks. NEVER invoke grep or rg as a Bash command." The tool is not an afterthought bolted onto a chat interface. It is the primary mechanism by which the agent comprehends code.

A single coding task might involve dozens of searches: find where a function is defined, find every file that imports it, check which tests cover it, verify that no other code depends on the behavior being changed, confirm that the fix is consistent across the codebase. Each search completes in milliseconds. The agent reads the results, forms a plan, and acts. Without that search loop, the agent would be guessing.

This pattern is not unique to Claude Code. Cursor, GitHub Copilot, Windsurf, Cline, Aider — every AI coding tool that operates on real codebases uses some form of pattern matching across files as a foundational capability. The architecture is universal because the problem is universal: AI agents need to search before they can understand, and they need to understand before they can modify.

Andrew Gallant's ripgrep has become critical infrastructure not just for human developers but for AI systems operating on code. The performance characteristics matter at AI scale. An agent running 50 searches per task across a repository with tens of thousands of files needs each search to complete in single-digit milliseconds. ripgrep delivers that. Thompson's NFA algorithm, channeled through Gallant's Rust implementation, powers the perception layer of modern AI coding.

The Hundred-Billion-Dollar Napkin Math

Thompson never charged a royalty. grep was created at Bell Labs, AT&T owned the intellectual property, and the UNIX philosophy of freely shared tools enabled the ubiquity that makes grep valuable. The tool's worth comes from its universality, and its universality comes from its being free. Charge for it and it never spreads. Never charge and it becomes indispensable.

But indulge the thought experiment.

Over 600 million Linux servers run worldwide. Every Mac ships with grep. Every Android device — 3.5 billion of them — has access to grep utilities. The TOP500 supercomputers run Linux. CI/CD systems invoke grep in millions of pipeline runs per hour. VS Code's 15 million users trigger ripgrep searches constantly. AI coding agents add tens of millions of additional invocations daily.

A conservative estimate: grep and its direct descendants execute several billion invocations per day across all platforms. At \$0.001 per invocation — one-tenth of a penny, less than a single API call to any cloud service — the annual royalty stream exceeds \$1 billion. At \$0.01, which is still cheap for a tool this essential, you approach \$10 billion annually.

Capitalize that revenue stream at a modest multiple and the enterprise value of "grep, Inc." lands somewhere between Google and the GDP of New Zealand. For an overnight hack built to search the Federalist Papers.

The Diner, the Placemat, and the Overnight Hacks

The best detail in Thompson's career is the pattern: consequential work done casually, at speed, outside business hours.

UNIX began as a three-week project on a PDP-7 while Thompson's wife was out of town visiting relatives. UTF-8 was designed on a placemat at a New Jersey diner, implemented the next day. grep was extracted from ed overnight to help a colleague with a literary analysis problem.

Thompson and Pike designed UTF-8 so that ASCII characters — including the newline that grep uses as a line delimiter — retained their single-byte encoding. Existing tools, including grep, would "just work" on UTF-8 text. The backward compatibility was deliberate. Thompson was protecting his own overnight hack, among other things.

The throughline is a style of engineering that barely exists anymore: people who understood the full stack from hardware to application, who could produce correct systems code in hours rather than sprints, and who gave their work away because sharing it was the point. Bell Labs in the 1970s concentrated more consequential systems builders per square foot than any institution before or since. Thompson was first among them.

The Debt

Every AI coding agent — Claude Code included — owes a direct, unbroken debt to Ken Thompson's overnight hack. Not a metaphorical debt. A literal one. The code path from a user typing "find the bug" to an agent searching the repository runs through the same NFA algorithm Thompson published in 1968. Gallant's ripgrep is a modern implementation of Thompson's construction, optimized for modern hardware but faithful to the original theory.

The tool that helped Lee McMahon search the Federalist Papers now helps AI agents search repositories with millions of lines of code. The problem — "find the pattern in the text" — has not changed. The scale has. The speed has. The consumer has shifted from a human at a terminal to an AI agent executing a plan. But the operation is the same one Thompson built in a single night at Bell Labs: g/re/p.

Born in '73. Still carrying the team.

Comments (0)