When the crash disappears: control in 5 models | Keryc
I told a story I liked: a woodlands legend called "Run on Oona's Hoard" that rewrote a 1929 bank run as folklore. An owl who guards honey read the panic, started liquidating, and the honey price plunged from 10 to 3 in a few turns. Who programmed that directly? No one.
The thesis was simple: give a small model a role and a budget, and market behavior emerges on its own.
The experiment and the reconstruction
I reconstructed the wood with one key change. Instead of a single model controlling five creatures, I put together a council made of five different models: an OpenAI model, an NVIDIA model, an OpenBMB model, and two copies of a half-billion-parameter model I fine-tuned myself. The idea was honesty: if you're going to claim small models can sustain a living economy, let five distinct architectures make distinct decisions — not one model wearing five hats.
I also reworked the operator. In the new version the player is a shadowy financier: takes a short position, whispers a true tip to prime the fall, activates the legend and cashes out when the price crashes. I made the loop visible on-screen: objective, scoreboard and a first order with one click. Making the promise visible is the fastest way to discover the promise is false.
When I tried the move — short the honey and trigger Run on Oona's Hoard — the honey didn't crash. It rose. The council models, reading the rumor that the vault was empty and a bad harvest signal, didn't sell: they hoarded. Scarcity, not panic. The short lost money and the narrative I had written without irony was that the bet had gone sour.
Why the emergence failed
The technical lesson is clear: in an economy of agents, the reference price is not a dial you turn from outside. It is the residue of what agents actually choose to trade. The original crash was real, but it relied on the disposition of a single model; it wasn't a robust system property. Change the population and the emergent behavior can evaporate.
I tried three ways to force the fall, all failed live:
I left the legend as pure rumor and trusted the agents would react. They didn't sell.
I doused each creature with a windfall of honey, thinking the glut would collapse demand. It worked with my test policy mechanic (a quick stand-in) because that policy obeys rigid desire thresholds, but real models ignored the flood and acted on their own read.
I enlarged the short position, which only increased the loss.
Three live runs, three losses: -15, -26 and -27 pebbles. Every lever I moved was just an input to agents' decisions, and the agents were free to reject them. You can't steer a heterogeneous population with a mechanical shock: the shock only tilts a choice that remains theirs.
There is also an editorial trap worth naming: the cheap stand-in that lets you iterate fast is also the one most likely to flatter the wrong solution. When the test policy and the real agents diverge, the liar is the stand-in.
Attempt
Mechanism
Honey at settlement
Gambit P&L
Original, one model
that model chose to sell
10 to 3
showcase win
Council, rumor only
five models chose to hoard
rose due to scarcity
-15
Council, inventory glut
demand collapse, only test policy
almost no change
-26 to -27
Council, override in settlement
price halved by fiat
crashes post-cleanup
+40
Table 1. The same play in four worlds. The crash was emergent and fragile under one model, absent with a heterogeneous council, and reliable only when I authorized it in the settlement.
The solution: authorship in the settlement seam
I stopped trying to convince the agents and moved to making the run true by construction. A bank run is, by definition, a crash. So the legend now applies the drop in settlement: after the market finishes clearing each turn, I overwrite the reference price and the commodity devalues by half. Agents can trade and gossip all they want; then the run falls as fact and the short who anticipated it collects their gain.
That sounds like conceding to the emergence, but it isn't. The emergent layer — the five models negotiating, gossiping, hoarding, nursing grudges — still makes the wood feel alive. What I learned is practical: you don't get reliable results by pushing harder on the inputs of the emergence. You get reliable results by precisely choosing the seam where you impose a deterministic override, and leaving everything upstream free. Emergence for texture, authorship for the moments that must occur. The craft is knowing which is which and where the seam lies.
Technical implications and recommendations
Heterogeneity matters: validating an emergent behavior with a single agent is anecdotal. Test different populations and varied architectures before you declare a system-wide property.
Don't trust stand-ins blindly: a test policy that obeys rigid rules will give you speed, but also false positives. If the replication only happens with the stand-in, the finding isn't reliable.
Authorship in the settlement: if you need deterministic outcomes (liquidations, systemic losses, regulatory effects), implement the downstream change explicitly. That preserves the emergent richness upstream and guarantees critical conditions are met.
Control vs. texture: think of your game/market architecture in layers. Use agents for texture and experiential robustness. Use deterministic seams for invariants the system must uphold under any mix of agents.
Final reflection
The moral isn't technophobia or a rejection of emergence. It's methodological humility. Emergence is fascinating and useful, but contingent. If you treat an emergent run in a demo as a law of nature, you expose yourself to bitter surprises when the agent population changes or when you move from a stand-in to live models.
I've made these same mistakes at larger scales and with higher stakes. It was productive to be wrong again in a forest where the only things at risk were a few pebbles and a story I had told with too much confidence. Small models, big adventures, and sometimes a crash you have to authorize yourself so the story ends the way you want.