Back to news
AnalysisJune 8, 2026· 3 min read

Five models, one market: when emergence fails and control wins

A builder tested heterogeneous small-model agents in a simulated economy and found that emergent crashes vanish when you swap populations. The fix: author outcomes at settlement, not via input shocks.

Our Take

Emergence is contingent on population; treat a single impressive run as an anecdote until it survives under different model weights.

Why it matters

Anyone building agent-based systems (markets, simulations, multi-agent rollouts) is likely to mistake fragile single-population behavior for durable system properties. This matters now because small-model agent deployment is accelerating and the cheap simulators that let you iterate fast are the same ones most likely to flatter a broken approach.

Do this week

Agent builders: run your emergent behavior across at least two different model architectures before writing it up as a system property, and test whether input shocks (inventory floods, supply tweaks, price signals) actually steer your agents or just get ignored.

One crash, five models, zero reproducibility

A builder running the Build Small Hackathon in June 2026 first created a bank-run simulation using a single small model to control five creatures trading honey in a woodland economy. When the model read a rumor that the vault was empty, it dumped honey, crashing the price from 10 to 3. No script; the behavior emerged from the agent's decision to liquidate.

Then the builder rebuilt the system to test the strongest version of the claim: if small models can run a living economy, five different model architectures should all produce emergent trading behavior in the same market. Instead of one model wearing five hats, the rebuild used an OpenAI model, an NVIDIA model, an OpenBMB model, and a fine-tuned half-billion-parameter model, each controlling its own creature.

The crash vanished. When the financier shorted honey and sprang the legend, the council of heterogeneous models read the same rumor and hoarded instead of dumping. The price rose. The short lost money.

Emergence is contingent, not durable

The failure revealed why: the original crash was contingent on one model's disposition, not a property of the system itself. Change the population, and the behavior you documented disappears. This matters because it exposes a trap in agent-system design: a single impressive run under one set of conditions looks like a discovery until you realize it was a statistical accident of that specific cast.

The builder then spent three live attempts pushing on the economy from the outside, as you would with a textbook supply-and-demand model. A pure rumor without hoarding? Agents refused to sell. A windfall glut of honey to collapse demand? The test policy (a rule-based stand-in) folded immediately; the live models ignored it and traded on their own read of the room. A larger short? Only lost more. Every mechanical shock the builder applied was an input to the agents' decision, and the agents were free to decline.

The pattern revealed the core mistake: you cannot steer a heterogeneous population of models with input shocks because the shock only biases a choice they still get to make. The test policy had created false confidence by agreeing with the intended outcome. When the cheap stand-in and the real agents disagree, the stand-in is lying.

Author at the seam, not the input

The resolution was to stop trying to convince the agents and instead author the crash at a deterministic seam downstream of every decision. The legend now crashes its good at settlement, after trading clears, by overwriting the reference price directly. Agents trade, gossip, and hoard all they like. Then the run lands as a fact, the price halves, and the short settles into profit.

This is not giving up on emergence; it is knowing which layer deserves freedom and which deserves control. The emergent layer (five models trading, forming grudges, hoarding) does all the work that makes the system feel alive. The deterministic layer (the settlement override) makes sure the moments that have to happen actually do. The craft is knowing which is which and where the seam sits.

Three concrete lessons from the rebuild: emergence is contingent, not durable; you do not control a market of agents by shocking its inputs, you control it by authoring at the settlement point; and the cheap simulator that lets you iterate fast is also the one most likely to flatter a wrong fix.

#Agents#Open Source#Research
Share:
Keep reading

Related stories