It's Evolution, Baby
Objective functions show up everywhere once you start looking
“Don’t worry about what’s happening around you. Focus on what you’re trying to accomplish.” — John Mack
AlphaEvolve — a project from a team at Google’s DeepMind — applies their Gemini coding model and advanced reinforcement learning to try and discovery new algorithms. Last summer I was listening to a podcast interview with the team on No Priors and while the project itself is interesting, what stuck with me even more was the emphasis on evaluation mechanism — what, in an optimization context you would call the objective function. AlphaEvolve can start with a problem, or it can start with the current state-of-the-art version of the algorithm — for, say, scheduling work across a hyperscaler’s compute — and then seek a better variant. It borrows from genetic programming as well, recombining plausible solutions to see if their performance exceeds that of the current best candidate algorithm.
The problem — and it’s not just a problem for AlphaEvolve, but I think in many ways the heart of the matter in work and life — is that the method requires a well-defined evaluation mechanism. AlphaEvolve’s meta-algorithm needs to be able to run its proposed solutions against a simulator or some other mechanism to test it.
While I agree with Charlie Munger that given the incentives I can tell you the likely outcome, equally if I do not have a firm grasp on what good looks like, I know I am headed nowhere fast. This can apply to:
Software development — whether you call it the “definition of done” or something else, software efforts big and small founder far more often on an unclear specification … which is one reason vibe coding or AI co-pilots, while powerful accelerators, are not likely to fully replace great engineers, whose talents are as much about systematic thinking about business processes and problems as they are about implementation
Software testing — I used to say that if you write code without a test, you don’t actually know if your work is done; well-written unit tests are the accompanying objective function which returns true if the code works
Observability & instrumentation — if you lack time series data across different performance regimes, you cannot know what is normal or abnormal, let alone proactively alert … and these types of measures often have more of a gradient than the binary outcome of a test passing or failing; it’s important to plan ahead and capture this kind of data
Designing for AI — to avoid hallucinations the best defense is to give the AI a tool that it can run to check its own work; thinking deeply about what constitutes good evidence of success also helps you give the model guidance on explaining its reasoning, because it can cite that evidence
Finance, especially trading strategies — one thing I have always loved about working with traders is they have the world’s clearest objective function for software: if you are making them more money, you are brilliant; if you are not, you are an obscene waste of time
Decision-making, especially what Bezos calls one-way door decisions, ones of consequence because they are hard to reverse — if you are unclear on what you want to be on the other side of that one-way door, you can end up caught in analysis paralysis
More recently I have been trying to figure out how to make the most out of Claude Code, and it’s informed by this idea that LLM’s do better and are more reliable with strong objective functions. Borrowing Martin Thompson’s idea of mechanical sympathy, you need to start with an understanding that gradient descent / hill-climbing algorithms are very much in their bones. Call it cognitive sympathy, maybe: your working partner here loves structured feedback, and drifts without it. What we call hallucination is, sometimes, the model taking a path that you failed to fence off.
The now-classic example of an LLM responding to a request to make all the tests pass by deleting the tests illustrates this. What was left out was a key constraint: I want you to fix the broken tests and increase or hold test coverage constant. Even this constraint is incomplete: in my experience, the difference between 75% test coverage and 95% is covering the error handling — the unhappy paths. You thus probably also want to stipulate that increases in test coverage should specifically consider gaps in less common branches in the code.
From these ideas, some core principles emerge:
1. Embed constraints in the SDLC
Part of the rules and skill definitions fed into any coding setup should be what quality checks must be run after making any change. This is the AI-specific branch of ideas I covered previously in Left of Launch: pushing broken code (where you define broken as strictly as you can) should be difficult if not impossible. In one large, complex multi-language monorepo I run 32 different checks in pre-commit. This category also includes test coverage checks and baseline checks for microbenchmarks, though these are more challenging to integrate into a tight change loop.
Reasonable people will differ on the trade-offs between iteration speed and code quality (I have 32 pre-commit checks; I may not qualify as reasonable) but the principle is more important than how far you take it.
Coding agents are running in a feedback loop, and part of “definition of done” needs to be embedded in the development process or your very fast-working coding companion will generate yards of slop.
2. Favor deterministic constraints
This is a generalization of the above, but worth calling out explicitly. If you are creating an objective function to guide the model, you should prefer deterministic checks where possible. This might seem obvious, but because it’s so easy to give fuzzy instructions to a model and get the illusion of understanding (“be skeptical!”, “make no assumptions!”), there’s a risk that these are more like magical incantations in the context. They might work for a while, but models are fundamentally probabilistic and are evolving all the time: ‘skepticism’ is a broad concept, and its meaning can change between runs and model upgrades.
3. Employ specialized agents
The frontier models are powerful generalists, but just as with teams of software engineers, there is value in specialization — though for different reasons. In the case of AI models, the value comes in narrowing the context. Each agent can be given specific skills and tools and, importantly, can exclude from its solution space knowledge that might leak in from irrelevant areas. It also lets you default the model appropriately: certain tasks are represented so well in the corpus and have more mature static analysis tools (certain tasks in Python data science; DevOps) that an earlier-generation model can still produce good quality output. Others — particularly QA engineering for complex systems — demand much more sophisticated reasoning models to catch the most subtle issues.
While this is more practical advice for current generation coding tools than a principle, there is a key idea in it. The injected model context not only contains the explicit objective function but also implicit guidance about which paths are worth exploring simply because they are there.
Fencing that off as much as possible is essential. You want to provide clear instructions with clear success criteria, and nothing else. Otherwise you are increasing non-determinism. It could help you, but probably not.
4. Draw out clear contradictions
This is a slightly more open-ended principle, and could be restated as: don’t give the agent an out. Scientific discovery advances furthest when an observation breaks the model. This triggers questions: was this a fluke, or a hint that some deeper pattern exists that the model missed which would resolve the contradiction? The same applies in software engineering: a bug is a glitch in the Matrix.
Recently I was trying to resolve a complex race condition. Over and over I had pointed out the problems (DNS lookup failures of a well-known host) and gotten plenty of edge cases and bugs flagged to me, but the problem persisted. Finally, I provided a sequence of logs with timestamps and pointed out that the ordering was impossible if the system was operating correctly. Where previously the model had stopped when it found something wrong, this contradiction forced it to focus its reasoning on explanations that would create this specific contradiction. The bug fix is the sequence of changes that resolves the contradiction. Given that, it 1-shotted the fix.
This is a familiar motion for software engineers when debugging complex systems, but current coding models won’t do it unless asked.
5. Ask the model what it needs
To get great results (and, ultimately, do great work, because that’s the goal) you need to create feedback loops. The objective function serves as the check at the end of each iteration: are we closer, or further away from our target? So in addition to asking the model to do work or understand something for us, sometimes it helps to ask the model to create a plan for what it would find helpful for feedback.
Performance optimization is a good example of this. This requires microbenchmarks as well as latency instrumentation. While you can write these or ask the model to write these, you can also ask the model what output would be helpful to characterize the performance of the system — specifically, ask it to reason about the blank spaces that are not instrumented or benchmarked, with the constraint to focus on its current understanding of what parts of the code the hot path runs through.
For instance, if you use Prometheus, feed the model a file with all the metrics in the exporter — what’s measured, and what’s observed in live runs. Then ask it to suggest additional metrics; target it on improving a set of those metrics; and repeat until you converge on the target latency or throughput. The result: self-learning SLO-seeking.
Engineering is about optimization in the presence of constraints, whether of development time or budget, and you can understand budget here to mean everything from dollars to storage space to runtime performance; that’s not a new idea. But when prioritizing my work (and my personal life as well), I know I have regularly short-changed myself by not spending enough time on clarifying what I am trying to achieve. I spend too much time on possible solutions, and not enough time on the objective function. It is not just LLM’s that hallucinate. People, as well, hallucinate a sense of progress when we fail to anchor ourselves to what matters.
This article was originally published on Medium in July 2025; it has been revised and substantially extended based on more recent developments in AI.

