AI utility: a quest for clarity

Ever since agentic coding became all the rage in online and offline software engineering circles, I have felt what may be termed "vibe coder's impostor syndrome"—a worry that I am holding it wrong. The following is an attempt to overcome this impasse through a characterization of a class of problems that may be reliably delegated to LLM agents.

Intelligence & rigor

We naturally feel that a person's intelligence and rigor generally go hand in hand—this is not the case for LLMs; they are irredeemably untrustworthy oracles.

By rigor I mean the discipline required to steer clear of security vulnerabilities, performance degradation, broken features, and so on. Intelligence on the other hand is a measure of knowledge and analytical skill.

If an engineer's labor requires more rigor than what is achievable by LLMs—which is not a difficult threshold to cross—then they are not replaceable. Access to LLMs' immense knowledge and poor—but highly scalable—analytical skill may make one more efficient and more rigorous.

But how does one leverage LLMs effectively in a rigorous domain?

Existential statements vs. universal statements

Efficient methods of solution verification make LLMs (cost-)effective. Needless to say, if verifying LLM output requires more effort than producing it manually, then burning tokens is not worthwhile.

Universal statements in logic assert properties about arbitrary sets of objects, e.g. "the square of every real number is non-negative". To prove such a statement, one needs to carefully construct a sound step-by-step proof—the more complex the logic of the proof, the harder it is to verify it. Errors in some mathematical proofs have only been spotted decades after the fact. For a software engineer, a correct program must compute the desired data or side effects given any possible input—this is a universal statement as well.

Conversely, Existential statements hand assert the existence of at least one object that satisfies a certain property, e.g. "There exists a valid solution to this 9×9 Sudoku puzzle". If we could somehow obtain a candidate solution from a dubious oracle, then verifying said solution by substitution would be more efficient than solving the Sudoku puzzle by hand. We don't stand to waste much effort if the solution turns out to be incorrect.

If a problem may be framed in terms of an existential statement while also supporting an efficient verification method, then it may be reliably delegated to an LLM—not for the goal of utilizing AI in and of itself, but because we stand to spare ourselves the substantial effort of searching the solution space. The following table illustrates this perspective with three concrete use cases.

Use case	Example	Verification method
Generate safe-to-fail code	Sandboxed throwaway shell scripts	Execute it
Root cause analysis	Searching `darwin-xnu` for the source of an `EADDRNORAVAIL`	Test the hypothesis
Discovery of API elements	Finding out how to achieve a style in CSS	Cross-check with API documentation

Adversarial advantage

Another important class of existential statements that may be effectively tackled with LLMs falls within security research, e.g. "is there an exploitable vulnerability in this codepath?". One can easily picture agentic systems that continuously scan open-source libraries and network services for exploitable bugs.

The idea of agentic coding—i.e. the process of building systems with large swathes of LLM-generated code and human Cursor-y verification—is doomed from the start within rigorous domains, not only because of LLMs' inherent lack of rigor: agentic coding is sabotaged by the very existence of agents—the same tools provide adversaries with a scalable means of exploitation. This asymmetric advantage applies retroactively too: pre-existing low-quality systems are also under increased risk of exploitation.

I hope for greater care in software—certainly not less and less.