Dreaming in Proofs

Novelty is not a better optimizer. It is a wild dreamer chained to a merciless judge.

March 2026

Kabir Murjani · March 2026

Here is something I have come to believe after staring at search problems for too long: optimization cannot be creative. Not "is bad at being creative." Cannot be. It is the wrong kind of process.

Every reward-following method we use, gradient descent, soft reinforcement learning, simulated annealing with the temperature turned sensibly low, is a hill-climber. It gets better at the thing it already knows how to do. Point it at a landscape and it will faithfully walk uphill to the nearest peak and stop. That is its entire personality. And for an enormous number of problems that is exactly what you want, so we have built a whole field on it and it works.

But it will never hand you the solution that changes the field. Because the solution that changes the field is almost never at the top of the hill you are standing on. It is across a valley. And to get there you have to be willing to walk downhill, to get worse on purpose, to spend a long stretch in territory that looks strictly stupid, on the bet that something better is waiting on the far slope. No reward-following process does this, because the reward gradient is screaming at it the entire time to turn around.

This is the part everyone who has actually had an idea already knows in their bones. The breakthrough does not arrive as a smooth improvement on the last good thing. It arrives as a chain of moves that make no sense, that you would be embarrassed to say out loud, right up until the moment it clicks and suddenly the whole thing is obvious and you cannot believe nobody saw it before. The not-making-sense is not a bug in the process. It is the process. You were in the valley.

So if you actually want a system that discovers rather than optimizes, you cannot build a smarter hill-climber. You have to build something stranger: a two-stroke engine, where the two strokes are as different from each other as possible.

The hill-climber finds a local peak and stops. An explorer is willing to walk through the valley.

The Fitness Valley, and Why Hill-Climbers Die in It

Make this concrete in a combinatorial space, because that is where it is cleanest. Take routing, scheduling, tensor decomposition, any NP-hard construction problem. The set of good solutions is not one smooth bowl. It is a rugged range of peaks separated by valleys of garbage. The globally great solution and the locally decent one are often related by a sequence of intermediate moves where every single step lowers your score before any of them pays off.

The classic shape: a move that temporarily wrecks your current tour but unlocks a repacking that ends up far shorter. Pick up the item that ruins this leg of the route because it enables a better load three cities later. A Lin-Kernighan-style swap that has to pass through a worse configuration to reach a better one. A reward-following agent encounters the first downhill step and refuses. It is structurally incapable of the patience required. So it converges to "good," declares victory, and the genuinely new construction sits untouched on the other side of a valley it will never cross.

You cannot fix this with a bigger network or a cleverer reward shaping. The problem is not capacity. The problem is that the objective itself is the thing keeping you out.

The generator dreams across the whole space. Most candidates fail the verifier. One holds.

Stroke One: A Chaos Engine

So the first stroke has to be a process that is willing to be wrong on purpose, and not timidly. Not epsilon-greedy, not a little entropy bonus sprinkled on top of the same old objective. Something that deliberately goes looking for the improbable.

But, and this is the part people get wrong, it cannot be random. Pure noise is useless here, because the combinatorial space is so astronomically large that random sampling never lands on anything coherent. AlphaTensor's search space had more than 10¹² actions at each step. You could sample uniformly until the heat death of the universe and find nothing.

What you want is structured improbability: candidates that are unlikely under your current model yet still internally coherent. The machinery for this already exists in the GFlowNet, which samples proportional to a reward rather than greedily maximizing it, so it explores the whole basin of plausible-but-weird instead of collapsing onto the one path it currently likes best. Shape that reward with a surprise term, an intrinsic drive toward regions where the model's own uncertainty is highest, and now you have a generator that dreams hardest about exactly the things it understands least. It is the 2am part of your brain, entertaining the idea you would never propose in a meeting. It proposes the locally insane move precisely because it is locally insane.

On its own, of course, this thing is a lunatic. It produces a firehose of beautiful nonsense. Which is why it is only half an engine.

Stroke Two: A Merciless Verifier

Chaos becomes discovery only when something catches the one dream in ten thousand that happens to be true. The generator's job is to be wild. The verifier's job is to have absolutely no sense of humor.

And here is the lucky thing about combinatorial spaces, the reason they are the right place to build this first: the verifier is free and it is absolute. A tour is valid or it is not. A schedule satisfies its constraints or it does not. A tensor decomposition reconstructs the matrix multiplication tensor exactly or it fails. There is no partial credit, no vibes, no "seems plausible." The oracle does not care how unhinged the path was that produced the candidate. It asks one question, does it hold?, and it answers in binary.

That binary stamp is the click. The instant a nonsensical chain of moves gets certified correct, it stops being nonsense. It retroactively becomes a method. The valley you crossed in the dark turns out to have a bridge on it, and now everyone can see the bridge, and they will spend the next decade wondering how it was ever invisible.

Why the Two Have to Be Opposite, and Bolted Together

This is the whole thesis, so let me say it plainly. Novelty lives in a single intersection: things that are improbable and true at the same time. Improbable-and-false is noise. Probable-and-true is rediscovery, the stuff already at the top of your current hill. The only territory that contains genuine discovery is the narrow band that is both surprising to you and certified by reality.

To search that band you need both poles cranked to their extreme and pointed in opposite directions. If the generator is too sensible, you only ever recover known solutions, because sensible means high-prior means already-discovered. If the verifier is too lenient, you flood your model with confident garbage, because a soft critic will wave through the merely-plausible. A creative system is therefore not a balanced, well-tempered thing. It is a high-variance dreamer handcuffed to a zero-tolerance judge, and the violence of the disagreement between them is where the new ideas fall out. Turn either knob toward moderation and the engine stops producing anything you did not already have.

This is not a metaphor I am hopeful about. It already happened. AlphaTensor was handed matrix multiplication as a single-player game with a brutal binary oracle (the decomposition is exact or it is not), and a search wild enough to wander a space of 10¹² moves, and it crossed a valley that human mathematicians had not crossed in fifty years. It found a way to multiply 4×4 matrices in a finite field using 47 multiplications where Strassen's celebrated method had needed 49 since 1969. It first rediscovered Strassen, climbed to the top of the human hill, and then kept going down the other side into territory no human had productively occupied.

Making the Fluke Permanent

One more move, because finding the bridge once is a fluke and flukes do not change fields. What changes a field is when the discovered construction becomes cheap, automatic, the thing the solver now reaches for without searching.

That is consolidation, the slow stroke that comes after the dream: take the verified-but-weird trajectory and carve it into the policy's priors until constructing it costs almost nothing. Cross the valley once, pay the full search cost a single time, and then dig a tunnel so that every future traversal is free. This is exactly the lived experience of an unorthodox insight becoming "obvious in retrospect." The idea was expensive to find and is nearly free to use, and the gap between those two is the entire value of having discovered it. A discovery you cannot cheaply re-derive is just a lottery ticket.

The Lesson, Compressed

Stop trying to build a more intelligent optimizer. Optimizers are loyal to the hill they are standing on, and the thing worth finding is across the valley.

Build instead the two things optimization refuses to be. A generator unhinged enough to propose the move that makes no sense. A verifier merciless enough to throw out everything except the rare proposal that turns out to be true. Then bolt them together and let the tension between them do the work. Novelty is not a quantity you maximize. It is what falls out when wild and strict are forced to argue, and one of them happens to be right.

Discovering faster matrix multiplication algorithms with reinforcement learning (Fawzi et al., Nature, 2022). The existence proof for everything above.

Abandoning Objectives: Evolution Through the Search for Novelty Alone (Lehman & Stanley, 2011). The foundational argument that chasing the objective directly is often the worst way to reach it.
Illuminating Search Spaces by Mapping Elites (Mouret & Clune, 2015). The formal machinery for keeping a portfolio of weird-but-good solutions.
GFlowNet Foundations (Bengio et al.) and recent combinatorial instantiations. The practical sampler that draws proportional to reward instead of maximizing it.