The Embarrassing Problem of Premature Exploitationpost by Richard Meadows (richard-meadows-1) · 2020-04-30T20:33:07.528Z · score: 38 (20 votes) · LW · GW · 7 comments
This is a link post for https://thedeepdish.org/exploring-exploiting/
The Multi-Armed Bandit Premature Exploitation Our Very Restless Bandit How to Go About Exploring More? None 7 comments
tl;dr: summarising some handy concepts from Algorithms to Live By, as they relate to optionality and tinkering.
Babies love putting things in their mouths: dirt, insects, bits of grass, their own poo. They have no sense of fear or self-preservation, and come up with endlessly creative ways to place themselves in mortal peril. Once they learn to talk, their constant experimentation with the world transcends the physical to the philosophical. They want to know everything. They are bottomless pits of curiosity, with very little in the way of attention span or self-discipline. Your typical two-year-old can only concentrate on a task for six minutes at a time. Young children are not self-aware enough to feel much in the way of shame, or embarrassment. Nothing is off-limits.
In a word, very young people spend almost all of their time exploring.
The elderly are set in their ways. The only foreign objects they put in their mouths are dentures and hard caramels; occasionally followed by a fork to extricate said caramels from said dentures. They tend to have stable routines, rituals, hobbies, and social circles. They rarely try new things or experiment with new identities. They’ve lived long enough to know what they’re about, and they intend to wring out every ounce of enjoyment before the curtains come down.
In a word, very old people spend almost all of their time exploiting.
The ‘explore-exploit’ constraint is one of the most useful ideas I’ve come across. Don’t worry about the connotations; these terms are borrowed from computer science, where they’re used neutrally.
The point is that there’s an inescapable trade-off between these activities. Time spent investigating new opportunities is time you could have spent enjoying what you already have, and vice versa.
In the language of optionality: if you want to open up attractive new options—to cultivate the fat purple figs on the possibility tree—you have to spend time exploring. But unless you eventually switch to exploiting, all your precious figs will blacken and rot and fall to the ground.
This creates dilemmas great and small: Keep your job or pivot to a new career? Try the new Chinese place or return to your favourite haunt?
The Multi-Armed Bandit
The classic explore-exploit scenario is the ‘multi-armed bandit’.
You walk into a casino, and are faced with countless banks of slot machines. Some of the machines pay out more often than others, but there’s no way of finding out which ones are ‘hot’ without sitting down and pulling the arm over and over.
Let’s say the very first machine you try pays out an average of one in every 10 pulls. If you choose to wander off and start pulling different levers, you’re forsaking the opportunity to receive that tried-and-tested reward. But if you stick with the first machine, you’re forsaking the opportunity to search for an arm with a higher payoff.
There’s no need to get into the formal solutions, because the optimal strategy is fairly intuitive: you start out by exploring as much as possible, and gradually move towards exploiting as your time runs out.
And this is exactly how we behave throughout our lives, even if we’re not consciously aware of it. Of course an old person isn’t going to try lots of new things with uncertain payoffs: their remaining time is limited. Of course a young person isn’t going to stick with the very first hobby, job, or lifestyle practice they try: they can almost certainly find better rewards by experimenting.
Which gives us our first heuristic for the timing component of collecting optionality:
When you have little to lose and everything to gain, volatility is your friend. Front-load as much randomness into your life as possible, run a lot of experiments, and then slowly temper over time.
Explore-exploit is not a binary. It’s more like a dimmer switch that you adjust during different stages of life, spending more or less time in each mode.
In the formal solutions to the multi-armed bandit, it pays to remain curious in the face of uncertainty. And in real life, old people rarely become completely calcified and closed off to opportunities. They can, and should, remain open to a little exploring, right up until the point when they’re no longer buying green bananas.
Which brings us to the second heuristic: we’re not exploring nearly enough.
You’re out in the countryside, in search of a scenic picnic spot. You stop to rest at the top of a hill, and look around. The surrounding landscape is obscured by mist, and you’re not sure if you’re at the highest point. If you want to explore any further, you’ll have to trudge all the way back down to the bottom, and set off in another direction. So you unpack the picnic basket, and settle in.
Here’s what you can’t see: there’s a much taller peak nearby, which rises above the mist and offers a spectacular view over the entire range. You might have a map that tells you it’s out there, or hear stories from passing hikers about how great the summit is, but life is comfortable here on the small hill.
If you choose to keep exploring, it’s not going to be easy. There’s a huge chasm between the first-order consequences (exertion, effort, possibility of failure) and the second-order consequences (getting to an elevated position).
You’re stuck in a local maximum. Here’s what it looks like:
Now it’s time to put your tinfoil hat on, and lean in while I do my best Alex Jones impression. A range of powerful forces—biological, cultural, economic—are conspiring to keep us trapped on that little hill, and prevent us from moving to higher ground. Those baby-harvesting demon maggots are tryin’ to take our freedom!
The first saboteur is the relentless optimising force of consumer capitalism—and it’s in cahoots with the DNA coiled through every cell in your body.
‘Superstimuli’ are distorted and amplified versions of the sensations we’re biologically hardwired to pursue. An apple has 100 calories and lots of vitamins, a MegaThirstyGulp soda has 1000 calories and no nutritional value whatsoever, but our brains don’t know the difference. Addictive drugs are the most obvious [superstimuli]; other mutant members of the extended family include fast food, alcohol, prescription meds, porn, reality TV, online gambling, virtual reality, and immersive video games.
Let’s unpack that last one. The rewards offered by a video game are a pale imitation of those available in real life, but you can get them with so much more certainty. Games reliably serve up hits of achievement on cue, with a near-instant feedback loop engineered to be as ‘sticky’ as possible. There’s a clearly defined pathway for completing quests, leveling up into a muscle-bound hero, and impressing scantily-clad elven princesses with your Very Big Sword.
The only problem is that none of your heroic endeavours carry over to real life, in which the only exercise you get is lifting cans of Mountain Dew, difficult quests don’t come with convenient save points, and the scantily-clad princess is a Russian dude named Stanislav.
Here’s the crux: if you want to climb to a higher peak, you have to go down before you can go up:
Hill-climbing is already hard enough. When you’re exploring unknown territory or attempting difficult things, the feedback loop between effort and reward is laggy and unpredictable. It’s doubly hard when you’re simultaneously trying to wean yourself off a reliable source of pleasure, and overcome the constant temptation to return to its mediocre but familiar embrace.
And so, it’s no surprise that the problem of premature exploitation is so pervasive.
The full force of consumer capitalism is aimed at keeping us trapped in ‘sticky’ local maxima, the moat only ever gets wider, our own biology is working against us, and we’re infamously bad at making these kind of trade-offs—at weighing the second, third and nth-order consequences of our actions.
In some cases, cultural norms help to keep these forces in check. There’s a strong social stigma attached to being a ‘junkie’, for example, and various unpleasant stereotypes, like the basement-dwelling gamer trope I leaned on just before (sorry gamers, no offense intended).
In other cases, the norms are roughly neutral: for example, I’m not sure whether being an Extremely Online ‘activist’ or binge-watching a series on Netflix until the small hours count as mildly embarrassing, or a badge of honour?
But in a few cases, our norms and cultural beliefs actively push us further towards premature exploitation.
Our Very Restless Bandit
For most of our human and human-ish ancestors, life came at you slow. Nothing changed. Like, ever. You’d almost certainly be born, grow old, and die without experiencing a single cultural or technological advance.
But you and I were born into a very weird little bubble. The same progress that once took two million years now takes all of two days. Entire new industries have come into existence since you visited the high school careers counselor, while others have shriveled and died. Imagine trying to explain Twitch streamers to your great-grandma.
Our increasingly dynamic world throws a spanner in the works of the multi-armed bandit. In that scenario, if you found a gig with a handsome payout, you could sit there pulling the lever over and over, getting fat and happy.
But in real life, the payouts change over time. Maybe the machine you’ve settled in on goes dead on the very next pull. Meanwhile, another machine that you previously ruled out as a dud might light up and start spraying coins everywhere.
With this in mind, the conventional wisdom around careers or education starts to look pretty dated.
Traditionally, the winning strategy was to work hard and climb through the ranks to specialise in your chosen field. This would either be the family trade, or whatever path your teenage self saw fit to send you down. You draw a monthly salary, pay your dues for several decades, and eventually collect your gold watch and pension.
Working for the sake of earning a paycheck early in one’s career is not quite as short-sighted as playing video games all day, but it’s still an example of premature exploitation.
Instead of repetitively pulling the first lever you find, it might be better to build knowledge, skills, and connections that open up more attractive opportunities—to keep exploring—even if it means earning less money in the short-term. You don’t necessarily want to specialise until you’ve explored quite a bit more of the possibility space.
Hence venture capitalist Marc Andreesen’s first rule of career planning: don’t.
As Andreesen explains, you have no idea what industries you’ll enter, what companies you’ll work for, what roles you’ll have, where you’ll live, what your preferences will be, or what you will ultimately contribute:
The world is an incredibly complex place and everything is changing all the time… trying to plan your career is an exercise in futility that will only serve to frustrate you, and to blind you to the really significant opportunities that life will throw your way.
It’s not as if older generations are deliberately dispensing stupid or malicious advice. It’s just that the rules of the game have changed, and so has the strategy.
Mathematicians were able to solve the multi-armed bandit by coming up with clever algorithms which tell you the optimal strategy for exploring and exploiting. But once you have a restless bandit, the problem becomes so hard that it’s incomputable.
This is what real life is like. And our bandit is very, very restless.
How to Go About Exploring More?
In a world ruled by uncertainty, all we have are heuristics:
If powerful forces consistently push us toward premature exploitation, we should almost always be biased towards exploring more.
Front-load as much randomness into your life as possible, run a lot of experiments, and then slowly temper over time.
OK. But what should these experiments look like? Size, duration, method? What kind of ‘randomness’? Where to find ideas to test out? How to think about the potential costs and benefits? When to stop tinkering?
Having made many lifestyle experiments great and small, and taken my fair share of lumps, I think I have some decent insights into this. Or at least, not-entirely-stupid ones.
In the second half of this post, I’ll share whatever lessons I've learned in the arena of self-experimentation.
Comments sorted by top scores.