You Get About Five Words

post by Raemon · 2019-03-12T20:30:18.806Z · LW · GW · 80 comments

Contents

80 comments

Cross posted from the EA Forum [EA · GW].

Epistemic Status: all numbers are made up and/or sketchily sourced. Post errs on the side of simplistic poetry – take seriously but not literally.


If you want to coordinate with one person on a thing about something nuanced, you can spend as much time as you want talking to them – answering questions in realtime, addressing confusions as you notice them. You can trust them to go off and attempt complex tasks without as much oversight, and you can decide to change your collective plans quickly and nimbly.

You probably speak at around 100 words per minute. That's 6,000 words per hour. If you talk for 3 hours a day, every workday for a year, you can communicate 4.3 million words worth of nuance.

You can have a real conversation with up to 4 people.

(Last year the small organization I work at considered hiring a 5th person. It turned out to be very costly and we decided to wait, and I think the reasons were related to this phenomenon)


If you want to coordinate on something nuanced with, say, 10 people, you realistically can ask them to read a couple books worth of words. A book is maybe 50,000 words, so you have maybe 200,000 words worth of nuance.

Alternately, you can monologue at people, scaling a conversation past the point where people realistically can ask questions. Either way, you need to hope that your books or your monologues happen to address the particular confusions your 10 teammates have.


If you want to coordinate with 100 people, you can ask them to read a few books, but chances are they won't. They might all read a few books worth of stuff, but they won't all have read the same books. The information that they can be coordinated around is more like "several blogposts." If you're trying to coordinate nerds, maybe those blogposts add up to one book because nerds like to read.


If you want to coordinate 1,000 people... you realistically get one blogpost, or maybe one blogpost worth of jargon that's hopefully self-explanatory enough to be useful.


If you want to coordinate thousands of people...

You have about five words.

This has ramifications on how complicated a coordinated effort you can attempt.

What if you need all that nuance and to coordinate thousands of people? What would it look like if the world was filled with complicated problems that required lots of people to solve?

I guess it'd look like this one.

80 comments

Comments sorted by top scores.

comment by Benquo · 2019-03-13T15:31:46.560Z · LW(p) · GW(p)

You're massively underestimating the upper bound.

I've interacted a bunch recently with members of a group of about 2 million people who recite a 245-word creed twice daily, and assemble weekly to read from an 80,000 word text such that the whole text gets read annually. This is nowhere near a complete accounting of engagement with verbal canon within the group. Each of these practices is preceded and followed by an additional standardized text of substantial length, and many people study full-time a much larger canonical text claiming to interpret the core text.

They also engage in behavior patterns that, while they don't necessarily reflect detailed engagement by each person with the content of the core text, do reflect a lot of fine-grained responsiveness to the larger interpretive canon.

You might be closer for what can be done very quickly (within a single generation) under current conditions. But a political movement plenty of people are newly worried about which likely has thousands of members has a 14-word creed.

Replies from: Raemon, CharlieHorse
comment by Raemon · 2019-03-13T20:09:14.539Z · LW(p) · GW(p)

Nod. Social pressure and/or organizational efforts to read a particular thing together (esp. in public where everyone can see that everyone else is reading) does seem like a thing that would work.

It comes with drawbacks such as "if it turns out you need to change the 80,000 word text because you picked the wrong text or need to amend it, I expect there to be a lot of political drama surrounding that, and the process by which people building momentum towards changing it probably would be subject to the bandwidth limits I'm pointing to [edit: unless the organization has specifically built in tools to alleviate that]"

(Reminder that I specifically said "all numbers are made up and/or sketchily sourced". I'm pointing to order of magnitude. I did consider naming this blogpost "you have about five words" or "you have less than seven words". I think it was a somewhat ironic failure of mine that I went with "you have four words" since it degrades less gracefully than "you have about five words.")

Replies from: Benquo
comment by Benquo · 2019-03-14T07:33:09.385Z · LW(p) · GW(p)

14 is still half an order of magnitude above 5, and I don't think neo-Nazis are particularly close to the most complex coordination thousands of people can achieve with a standardized set of words.

Replies from: Raemon
comment by Raemon · 2019-03-14T21:02:37.450Z · LW(p) · GW(p)

I suppose, but, again, "all numbers are made up" was the first sentence in this post, and half an order of magnitude feels within bounds of "the general point of the essay holds up."

I also don't currently know of anyone writing on LessWrong or EA forum who should have reason to believe they are as coordinated as the neo-Nazis are here. (See elsethread comment on my take on the state of EA coordination, which was the motivation for this post).

(In Romeo's terms, the neo-nazis are also using a social tech with unfolding complexity, where their actual coordinated action is "recite the pledge every day", which lets them them encode additional information. But to get this you need to spend your initial coordinated action on that unfolding action)

comment by CharlieHorse · 2022-02-03T20:52:16.450Z · LW(p) · GW(p)

Are you talking about Judaism? 

comment by drethelin · 2019-03-13T06:47:32.511Z · LW(p) · GW(p)

Walmart coordinates 2.2 million people directly and millions more indirectly.

Even the boy scouts coordinates 2.7 million.

Religions coordinate, to a greater or lesser extent, far more.

The key to coordination is to not consider yourself as an individual measuring out a ration of words you can force x number of people to read. Most people never read the bible.

Replies from: ryan_b, Raemon
comment by ryan_b · 2019-03-13T14:21:19.480Z · LW(p) · GW(p)

These are good examples that drive the point home.

Most people never read the bible.

They don't coordinate based on the nuanced information in it, either. Mostly they coordinate on a few very short statements, like:

Say you are Christian.

Go to church.

A much smaller group of people coordinates on a few more:

Give money to the church.

Run a food drive OR help build houses OR staff a soup kitchen OR ...

The Walmart example seems a little different, because it isn't as though working at Walmart is that different from any other kind of hourly employment. Mostly all employers try to get people to coordinate on a few crucial things:

Show up on time.

Count the money correctly.

Stock the shelves.

Sweep the floor.

And it seems to me there is never a shortage of preachers or employers complaining about people's inability to do even these basic things.

It looks to me like successful coordination on the scale of millions largely amounts to iterating four-word actions.

Replies from: romeostevensit
comment by romeostevensit · 2019-03-14T02:03:16.761Z · LW(p) · GW(p)

Agree, and I'd roll in the incentives more closely. It feels more like:

you have at most space for a few feedback loops

you can improve this by making one of the feedback loops a checklist that makes calls out to other feedback loops

the tighter and more directly incentivized the feedback loop, the more you can pack in

every employer/organization is trying to hire/recruit people who can hold more feedback loops at once and do some unsupervised load balancing between them

you can make some of people's feedback loops managing another person's feedback loops

Now jump to this post https://slatestarcodex.com/2017/11/09/ars-longa-vita-brevis/

another frame is that instead of thinking about how many bits you can successfully transmit, think about whether the behaviors implied by the bits you transmit can run in loops, whether the loops are supervised or unsupervised and what range of noise they remain stable under.

Replies from: mr-hire
comment by Matt Goldenberg (mr-hire) · 2019-03-14T13:05:10.401Z · LW(p) · GW(p)

I didn't make the leap from bits of information to feedback loops but it makes intuitive sense. Transmiting information that compresses by giving you the tools to figure out the information yourself seems useful.

Replies from: Raemon
comment by Raemon · 2019-03-14T20:44:55.812Z · LW(p) · GW(p)

Heh, "read the sequences" clocks in at 3 words.

Replies from: Gunnar_Zarncke
comment by Gunnar_Zarncke · 2023-05-07T11:50:16.616Z · LW(p) · GW(p)

Doesn't include which sequences.

Replies from: Raemon
comment by Raemon · 2023-05-07T17:42:48.034Z · LW(p) · GW(p)

In the original context this means Eliezer's posts from 2007 to 2009 (which got compiled as The Sequences and later recompiled into the half-as-long Rationality A-Z)

comment by Raemon · 2019-03-13T20:00:38.121Z · LW(p) · GW(p)

The point is not "rationing out your words" is the correct way to coordinate people. The point is that you need to attend, as part of your coordination strategy, to the fact that most people won't read most of your words. Insofar as your coordination strategy relies on lots of people hearing an idea, the idea needs to degrade gracefully as it loses bandwidth.

Walmart I expect to do most of it's coordination via oral tradition. (At the supermarket I worked at, I got one set of cultural onboarding from the store manager, who gave a big speech... which began an ended with a reminder that "the four virtues of the Great Atlantic and Pacific Tea company are integrity, respect, teamwork and responsibility." Then, I learned most of the minutia of how to run a cash register, do janitorial duties or be a baker via on-the-job training, by someone who spent several weeks telling me what to do and giving me corrective feedback)

(Several years later, I have some leftover kinesthetic knowledge of how to run a cash register, and the dangling words "integrity, respect, teamwork, responsibility" in my head, although also I probably only have that because I thought the virtues were sort of funny and wrote a song about it)

Replies from: catherio
comment by catherio · 2019-05-24T01:11:57.561Z · LW(p) · GW(p)

The recent EA meta fund announcement linked to this post (https://www.centreforeffectivealtruism.org/blog/the-fidelity-model-of-spreading-ideas ) which highlights another parallel approach: in addition to picking idea expressions that fail gracefully, to prefer transmission methods that preserve nuance.

comment by DanielFilan · 2020-12-21T05:53:51.520Z · LW(p) · GW(p)

I think this post, as promised in the epistemic status, errs on the side of simplistic poetry. I see its core contribution as saying that the more people you want to communicate to, the less you can communicate to them, because the marginal people aren't willing to put in work to understand you, and because it's harder to talk to marginal people who are far away and can't ask clarifying questions or see your facial expressions or hear your tone of voice. The numbers attached (e.g. 'five' and 'thousands of people') seem to not be super precise.

That being said: the numbers are the easiest thing to take away from this post. The title includes the words 'about five' but not the words 'simplifed poetry'. And I'm just not sure about the numbers. The best part of the post is the initial part, which does a calculation and links to a paper to support an order-of-magnitude calculation on how many words you can communicate to people. But as the paragraphs go on, the justifications get less airtight, until it's basically an assertion. I think I understand stylistically why this was done, but at the end of the day that's the trade-off that was made.

So a reader of this post has to ask themselves: Why is the number about five? Is this 'about' meaning that you have a factor of 2 wiggle-room? 10? 100? How do I know that this kicks in once I hit thousands of people, rather than hundreds or millions? If I want to communicate to billions of people, does that go down much? These questions are left unanswered in the post. That would be fine if they were answered somewhere else that was linked, but they aren't. As such, the discerning reader should only believe the conclusion (to the extent they can make it out) if they trust Ray Arnold, the author.

I think plausibly people should trust Ray on this, at least people who know him. But much of the readership of this post doesn't know him and has no reason to trust him on this one.

Overall: this post has a true and important core that can be explained and argued for. But the main surface claim isn't justified in the post or in places the post links to, so I don't think that this was one of the best posts of 2019, either by my standards or by the standards I think the LessWrong community should hold itself to.

Replies from: Raemon
comment by Raemon · 2020-12-21T06:06:25.358Z · LW(p) · GW(p)

The aspiring-rigorous-next-post I hope to write someday is called "The Working Memory Hypothesis", laying out more concretely that at some maximum scale, your coordination-complexity is bottlenecked on a single working-memory-cluster, which (AFAICT based on experience and working memory research) amounts to 3-7 chunks of concepts that people already are familiar with. 

So, I am fairly confident that in the limit it is actually about 5 words +/- 2, because Working Memory Science and some observations about what slogans propagate. (But, am much less sure about how fast the limit approaches and what happens along the way)

Replies from: DanielFilan, DanielFilan, DanielFilan
comment by DanielFilan · 2021-01-11T19:23:28.559Z · LW(p) · GW(p)

Aren't working memory chunks much bigger than one word each, at least potentially?

Replies from: Raemon
comment by Raemon · 2021-01-11T19:39:10.595Z · LW(p) · GW(p)

I think if you end up having a chunk that you use repeatedly and need to communicate about, it ends up turning into a word.

(like, chunks are flexible, but so are words)

Replies from: DanielFilan
comment by DanielFilan · 2021-01-12T19:27:47.943Z · LW(p) · GW(p)

To me, this suggests a major change to the message of the post. Reading it, I'd think that I have five samples from the bank of existing words, but if the constraint is just that I have five concepts that can eventually be turned into words, that's a much looser constraint!

Replies from: Raemon
comment by Raemon · 2021-01-12T19:50:58.639Z · LW(p) · GW(p)

Not 100% sure I understand the point, but for concepts-you-can-communicate, I think you are bottlenecked on already-popular-words.

Chunks and words don't map perfectly. But... word-space is probably mostly a subset of chunk-space?

I think wordless chunks matter for intellectual progress, where an individual thinker might have juuuust reached the point where they've distilled a concept in their head down into a single chunk, so they can then reason about how that fits with other concepts. But, if they want to communicate about that concept, they'll need to somehow turn it into words.

Replies from: DanielFilan
comment by DanielFilan · 2021-01-13T07:23:58.560Z · LW(p) · GW(p)

Is the claim that before I learn some new thing, each of my working memory slots is just a single word that I already know? Because I'm pretty sure that's not true.

Replies from: Raemon
comment by Raemon · 2021-01-13T07:43:01.801Z · LW(p) · GW(p)

First: the epistemic status of this whole convo is "thing Ray is still thinking through and is not very sure about."

Two, for your specific question: No, my claim is that wordspace is a (mostly) subset of chunkspace, not the other way round. My claim is something like "words are chunks that you've given a name", but you can think in chunks that have not been given names.

Three: I'm not taking that claim literally, I'm just sorta trying it out to see if it fits, and where it fails. I'm guessing it'll fail somewhere but I'm not actually sure where yet. If you can point to a concrete way that it fails to make sense that'd be helpful.

But, insofar as I'm running with this idea:

An inventor who is coming up with a new thing might be working entirely with wordless chunks, that they invent, combine them into bigger ideas, compress into smaller chunks, without ever being verbalized or given word form.

Replies from: ryan_b, Raemon, DanielFilan, Raemon
comment by ryan_b · 2021-01-21T15:35:13.542Z · LW(p) · GW(p)

might be working entirely with wordless chunks, that they invent, combine them into bigger ideas, compress into smaller chunks, without ever being verbalized or given word form.

This part points pretty directly at research debt and inferential distance, where the debt is how many of these chunks need to be named and communicated as chunks, and the distance is how many re-chunking steps need to be done.

comment by Raemon · 2021-01-13T07:58:33.039Z · LW(p) · GW(p)

Thinking a little more: I think when I'm parsing a written sentence, words are closer like one-word-to-one-chunk correspondence. When I'm thinking them, I think groups of words tend to be more like a chunk. "Politics is the mind killer" might collapse into a single slot that I'm not looking at at super-high resolution, allowing me to reason something like "'Politics is the mindkiller' is an incomplete idea.'"

comment by DanielFilan · 2021-01-13T21:22:50.360Z · LW(p) · GW(p)

If wordspace is a subset of chunkspace and not the other way around, and you have about five chunks, do you agree that you do not have about five words, but rather more?

Replies from: Raemon
comment by Raemon · 2021-01-13T22:21:49.392Z · LW(p) · GW(p)

Yes, although I've heard mixed things about how many chunks you actually have, and that the number might be more like 4.

Also, the ideas often get propagated in conjunction with other ideas. I.e. people don't just say "politics is the mindkiller", they say "politics is the mindkiller, therefore X" (where X is whatever point they're making in the conversation). And that sentence is bottlenecked on total comprehensibility. So, basically the more chunks you're using up with your core idea, the more you're at the mercy of other people truncating it when they need to fit other ideas in. 

I'd argue "politics is the mindkiller" is two chunks initially, because people parse "is" and "the" somewhat intuitively or fill them in. Whereas Avoid Unnecessary Political Arguments is more like 4 chunks. I think you typically need at least 2 chunks to say something meaningful, although maybe not always.

Once something becomes popular it can eventually compress down to 1 chunk. But, also, I think "sentence complexity" is not only bottlenecked on chunks. "Politics is the mindkiller" can be conceptually one chunk, but it still takes a bunch of visual or verbal space up while parsing a sentence that makes it harder to read if it's only one clause in a multi-step argument. I'm not 100% sure if this is secretly still an application of working memory, or if it's a different issue.

comment by Raemon · 2021-01-13T08:00:44.218Z · LW(p) · GW(p)

Continuing to babble down this thought-trail:

I'm wondering how Gendlin Focusing interacts with working memory.

I think the first phase of focusing is pre-chunk, as well as pre-verbal. You're noticing a bunch of stuff going on in your body. It's more of a sensation than a thought.

The process of focusing is trying to get those sensations into a form your brain can actually work with and think about.

I... notice that focusing takes basically all my concentration. I think at some part of the process it's using working memory (and basically all of my working memory). But I'm not sure when that is.

One of the things you do in focusing is try to give your felt-sense a bunch of names and see if they fit, and notice the dissonance. I think when this process starts, the felt-sense is not stored in chunk form. I think as I try to give it different names

Gendlin Focusing might be a process where

a) first I'm trying to feel out a bunch of felt-data that isn't even in chunk form yet

b) I sort of feel it out, while trying different word-combos on it. Meanwhile it's getting more solid in my head. I think it's... slowly transitioning from wordless non-chunks into wordless chunks, and then when I finally find the right name that describes it I'm like "ah, that's it", and then it simultaneiously solidifies into one-or-more chunks I can store properly in working memory, and also gets a name. (The name might be multiple words, and depending on context those words could correspond to one chunk or multiple)

Replies from: ryan_b
comment by ryan_b · 2021-01-21T15:59:37.832Z · LW(p) · GW(p)

Not about Gendlin, but following the trail of relating chunks to other things: I wonder if propaganda or cult indoctrination can be described as a malicious chunking process.

I've weighed in against taking the numbers literally elsewhere, but following this thread I suddenly wondered if the work that using few words was doing isn't delivering the chunk, but rather screening out any alternative chunk. If what we are interested in is common knowledge, it isn't getting people to develop a chunk per se that is the challenge; rather everyone has to agree on exactly which chunk everyone else is using. This sounds much more like the work of a filter than a generator.

When I thought about it in those terms, it occurred to me that it is perfectly possible to drive this in any direction at all; we aren't even meaningfully constrained by reality. This feels obvious in retrospect - there've been lots of times when common knowledge was utterly wrong - but doing that on purpose never occurred to me.

So now it feels like what cults do, and why they sound so weird to everyone outside of them, is deliberately create a different sequence of chunks for normal things for the purpose of having different chunks. Once that is done, the availability heuristic will sustain communication on that basis, and the artificially-induced inferential distance will tend to isolate them from anyone outside the group.

comment by DanielFilan · 2021-01-12T19:29:03.728Z · LW(p) · GW(p)

Do working memory chunks come in order? Like, I'd kind of expect that if you have 5 concepts in working memory, you can't additionally remember the order they should go in, because that's another working memory chunk. Or if you can remember the order they should go in, then introspectively I'd imagine they'd become one working memory chunk.

Replies from: Raemon
comment by Raemon · 2021-01-12T19:58:41.401Z · LW(p) · GW(p)

I don't really know, but my guess is that, well, it's a bit messy, and yes if your chunks need to fit in a particular combination that you don't have a good grasp on, that strains your working memory.

But, I don't think there are literal chunks and ordering them literally costs a chunk. Chunks are patterns of thought that can bring associations of other patterns of thought, and those associations can be stronger or weaker. If the associations are sufficiently strong it makes sense to model the chunk-cluster as a single chunk.

(I notice I'm somewhat confused about this, and somewhat going off "there's enough working memory research that I'm fairly confident 'chunks' is a useful abstraction, but I'm not sure why.")

I'm kinda brain-dead right now and can't introspect well enough to figure out how it subjectively feels for me.

I think this post of mine is... probably relevant, although it might require some additional inference to make the relevance obvious:

https://www.lesswrong.com/posts/n7vPLsbTzpk8XXEAS/what-s-your-cognitive-algorithm

comment by DanielFilan · 2020-12-21T06:23:10.021Z · LW(p) · GW(p)

The thing I'm unsure about here is why does that not apply to one-on-one communication? And if one-on-one communication doesn't suffer from this limit, why does it not hold for getting a message to thousands by mathematical induction? Perhaps the problem is that you lose bits in the retelling when people forget things or word things badly - but surely you also pick up bits in more people actually thinking about the message and seeing flaws in it and ways it can be tweaked to be more true?

Replies from: Raemon
comment by Raemon · 2020-12-21T08:37:42.484Z · LW(p) · GW(p)

I think all communication is bottlenecked by the working memory limit, but the limit has different ramifications in different contexts.

I agree with Romeo's take elsethread that part of what's going on here is "how many feedback loops you can have going on at once. Feedback loops can unpack into larger things, but you have to actually do the unpacking."

(I have a bunch more thoughts on this that are probably need to be a top-level post)

Perhaps the problem is that you lose bits in the retelling when people forget things or word things badly - but surely you also pick up bits in more people actually thinking about the message and seeing flaws in it and ways it can be tweaked to be more true.

note that if people are seeing flaw and improving your idea, then they aren't coordinating on a single thing, and if it matters that lots of people are moving in lockstep it can be actively harmful if they're 'improving' your idea.

But, more realistically: most people aren't necessarily improving things, they're adapting them to make them better/more-convenient/more-aligned for them. (Or, just forgetting or misremembering or whatever)

Preserving a complex idea at high fidelity is very hard.

comment by Zvi · 2020-12-03T16:50:49.887Z · LW(p) · GW(p)

I use this concept often, including explicitly thinking about what (about) five words I want to be the takeaway or that would deliver the payload, or that I expect to be the takeaway from something. I also think I've linked to it quite a few times.

I've also used it to remind people that what they are doing won't work because they're trying to communicate too much content through a medium that does not allow it. 

A central problem is how to create building blocks that have a lot more than five words, but where the five words in each block can do a reasonable substitute job when needed.

Replies from: Zvi, mr-hire
comment by Zvi · 2020-12-08T21:18:09.532Z · LW(p) · GW(p)

As an additional data point, a link to this post will appear in the 12/10 Covid weekly roundup.

comment by Matt Goldenberg (mr-hire) · 2020-12-03T17:41:11.067Z · LW(p) · GW(p)

luding explicitly thinking about what (about) five words I want to be the takeaway or that would deliver the payload, or that I expect to be the takeaway from something.

This is pretty cool. Can you give some example of about five word takeaways you've created for different contexts?

Replies from: Zvi
comment by Zvi · 2020-12-04T16:16:34.870Z · LW(p) · GW(p)

Here are some attempted takeaways for things I've written, some of which were explicit at the time, some of which were implicit:

Covid-19: "Outside, social distance, wear mask."

Simulacra (for different posts/models): "Truth, lies, signals, strategic moves" or "level manipulates/dominates level below" or "abstractions dominate, then system collapses"

Mazes: "Modern large organizations are toxic" or "middle management destroys your soul"

Asymmetric Justice:  "Unintentional harms count, benefits don't" or "Counting only harms destroys action" or similar. 

Or one can notice that we are abstracting out a conclusion from someone else's thing, or think about what we hope another will take away. Often but not always it's the title. Constantly look to improve. Pain not unit of effort. Interacting with system creates blameworthiness. Default AI destroys all value. Claim bailey, retreat to motte. Society stuck in bad equilibrium. Etc.

comment by Dagon · 2019-03-12T23:58:41.730Z · LW(p) · GW(p)

Hierarchies (which provide information-cheap mechanisms for coordination) and associative processes (which get people with shared information closer, so less information exchange is necessary) both would seem to expand the numbers greatly from those you suggest.

There are examples of fairly complicated cooperation across many millions. For example, all the expectations behind credit card usage take many pages of contracts, which implicitly depend on many volumes of law, which implicitly depend on uncountable bits of history and social norms.

Replies from: Raemon
comment by Raemon · 2019-03-13T00:50:33.186Z · LW(p) · GW(p)

Yes, but it's important to note that if you haven't purposefully built that hierarchy, you can't rely on it existing. (And, it's still a fairly common problem within an org for communication to break down as it scales – I'd argue that most companies don't end up successfully solving this problem)

The motivating example for this post at-the-time-of-writing was that in the EA sphere, there's a nuanced claim made about "EA being talent constrained", which large numbers of people misinterpreted to mean "we need people who are pretty talented" and not "we need highly specific talents, and the reason EA is talent constrained is that the median EA does not have these talents."

There were nuanced blogposts discussing it, but in the EAsphere, the shared information is capped at roughly "1 book worth of content and jargon, which needs to cover a diverse array of concepts, so any given concept won't necessarily have much nuance", and in this case it appeared to hit the literal four word limit.

Replies from: Dagon
comment by Dagon · 2019-03-13T21:05:53.759Z · LW(p) · GW(p)

It might be worth a second post examining the reasons that the standard and well-known coordination mechanisms (force, social pressure, hierarchy, broadcast/mass media, etc.) aren't available for the kind of coordination you think is needed, and what you're considering as replacements (or just accepting that a loosely-committed voluntary group with no direct rewards or sanctions has a cap on effectiveness).

(note: I'm not particularly EA-focused; this is a trap) Or perhaps a description of how "the EA community" can have needs that require such coordination, as opposed to actual projects that clearly need aggregated effort to have impact.

Replies from: Raemon
comment by Raemon · 2019-03-13T21:10:47.585Z · LW(p) · GW(p)

I do think that'd be a valuable post (and that sort of thing is going on on the EA forum right now, with people proposing various ways to solve a particular scaling problem). I don't know that I have particularly good ideas there, although I do have some. The point of this post was just "don't be surprised when your messages loses nuance if you haven't made special efforts to prevent it from doing so" (or, if it gets out-competed by a less nuanced message that was designed to be scalable and/or viral)

I wrote this post in part so that I could more easily reference later at some point when I had either concrete ideas about what to do, or when I think someone is mistaken in their strategy because they're missing this insight.

Replies from: Dagon
comment by Dagon · 2019-03-13T22:10:52.115Z · LW(p) · GW(p)

Fair enough. Interestingly, if I replace "coordinate with" with "communicate a nuanced belief to", my reaction changes radically, in favor of numbers shaped like yours. I'll have to think more about why those concepts are so different.

Replies from: Raemon
comment by Raemon · 2019-03-13T22:14:40.609Z · LW(p) · GW(p)

Nod. The claim here is specifically about how much nuance can be relevant to your coordination, not how many people you can coordinate with. (If this failed to come across, that also says something about communicating nuance being hard)

Replies from: Dagon
comment by Dagon · 2019-03-14T00:15:44.804Z · LW(p) · GW(p)

I think I was taking "coordination" in the narrow sense of incenting people to do actions toward a relatively straightforward goal that they may or may not share. In that view, nuance is the enemy of coordination, and most of the work is simplifying the instructions so that it's OK that there's not much information transmitted. If the goal is communication, rather than near-term action, you can't avoid the necessity of detail.

Replies from: Raemon
comment by Raemon · 2019-03-14T00:32:52.125Z · LW(p) · GW(p)

The whole point is that coordination looks different at different scales.

So, I think I was looking at this through a nonstandard frame (Maybe more nonstandard than I thought). There are two different sets of numbers in this post:

— 4.3 million words worth of nuance

— 200,000 words of nuance

— 50,000 words

— 1 blogpost (1-2k words)

— 4 words

And separately:

— 1-4 people

— 10 people

— 100 people

— 1000 people

— 10,000 people+

While I'm not very confident about any of the numbers, I am more confident in the first set of numbers than the second set.

If I look out into the world, I see clear failures (and successes) of communication strategies that cluster around different strata of communication bandwidth. And in particular, there is clearly some point at which the bandwidth collapses to 3-6 words.

comment by Raemon · 2019-03-13T19:50:11.615Z · LW(p) · GW(p)

So, I think I optimized this piece a bit too much as poetry at the expense of clarity. (I was trying to keep it brief overall, and have the sections sort of correspond in length to how much reading you could expect people to read at that scale).

Obviously people in the real world do successfully coordinate on things, and this piece doesn't address the various ways you might try to do so. The core claim here is just that if you haven't taken some kind of special effort to ensure your nuanced message will scale, it will probably not scale.

Hierarchies are a way to address the problem. Oral tradition that embeds itself in people's socializing process is a way to address the problem. Smaller groups is a way to address the problem. Social pressure to read a specific thing is a way to address the problem. But each of these address it only in particular ways and come with particular tradeoffs.

comment by Raemon · 2021-01-11T07:03:40.654Z · LW(p) · GW(p)

Partial Self Review:

There's an obvious set of followup work to be done here, which is to ask "Okay, this post was vague poetry meant to roughly illustrate a point. But, how many words do you actually precisely have?" What are the in-depth models that let you predict precisely how much nuance you have to work with?

Less obvious to me is whether this post should become a longer, more rigorous post, or whether it should stay it's short, poetic self, and have those questions get explored in a different post with different goals. 

Also less obvious to me is how the LessWrong Review should relate to short, poetic posts. I think it's quite important that this post be clearly labeled as poetry, and also, that we consider the work "unfinished" until there is a some kind of post that delves more deeply into these questions. But, for example, I think Babble last year was more like poetry than like a clear model, and it was nonetheless valuable and good to be part of the Best Of book.

So, I'm thinking about this post from two lenses. 

  1. What are simple net-improvements I can make to this post, without sacrificing it's overall aim of being short/accessible/poetic?
  2. Sketch out the research/theory agenda I'd want to see for the more detailed version.

I did just look over the post, and notice that the poetry... wasn't especially good. There is mild cleverness in having the sections get shorter as they discuss larger coordination-groups. But I think I probably could write a post that was differently poetic. Or, find ways of making a short/accessible version that doesn't bother being poetic but is is nonetheless clear.

I'm worried about every post having to be a rigorous, fully caveated explanation. That might be the right standard for the review, but not obviously.

Some points that should be made somewhere, whether editing the OP or in a followup:

1. Yes, you can invest in processes that help you reach more people, more reliably. But those tools are effortful to build. Those tools are also limited in their capability. You have to figure out how to incentive people to use the tools.

The point of this post is about the default world where you haven't built those tools. And if you're reading this, you almost certainly haven't. (I think the LessWrong team has made some effort to build such tools, but nonetheless, even here, most people haven't actually read the sequences. Most people forget that "Politics is the Mind Killer" actually is making a nuanced claim about "don't use unnecessarily political examples.")

This post is not about "what is the upper bound". But it is about what constraints you (yes, you, personally) are probably working under.

2. Unless you have built a system that preserves perfect nuance, as you add more people, you will gradually lose nuance, until that nuance approaches some minimum. 

What is that minimum?

Well, if you're measuring in wordcount, the lowest conceivable bound is... 1 word. That probably doesn't make much sense. 1 word memes are pretty rare.

I think "2-7 words, but the context is missing" is a typical unit of meme transfer.

 "2-7 words with missing nuance" seems like the eventual minimum. This is also consistent with the "working memory hypothesis", where this bound is specifically based on "how many existing concepts a human can hold together at once." But, that's a bit more theoretical and I'm not sure I'd endorse it after thinking about it more.

It occurs to me that you can end up with negative wordcount/nuance, once people start actively garbling your message. 

This is actually one of my motivations for the OP, now that I think about it. One way or another, your message will eventually distill down to 2-7 words. If you designed the message to distill gracefully, then you get to pick a message that is at least reasonably aligned with your original intent. If you tried to convey a massive nuanced message, there is a good chance it collapses into something you didn't intend, as people attempt to distill it for themselves. (Or, adversaries deliberately misrepresent it)

Oh, speaking of which:

3. Adversaries

Some people will actively misrepresent your idea.

4. More people == less competent people

One gear not spelled out in the OP is that as you coordinate with more people, you're probably coordinating with less competent people. People who are less smart. People with less background knowledge. If you're running a company, people who aren't as skilled but maybe juuust skilled enough to be worth hiring anyway. If you're running a political or religious movement, you're probably taking on a lot of average joes.

This isn't necessarily true. You can run a company with 100,000 elite skilled craftsfolk (though it's hard to find/hire them all without creating a moral maze), or you could have just started with average joes from the beginning so there isn't actually anywhere less sophisticated to go.

But, this is an additional reason that your nuance will be lost as the coordination effort grows.

5. People just... don't read, not reliably, not most of the time. 

This was meant to be an implied basic background fact, but, some rationalists are surprised by this. People have a ton of things competing for their attention, and they don't actually sit and read most things. People skim. They read headlines.

Yes, this is even true of rationalists. (I've had to tell someone who printed out a lovely Solstice Program that included instructions about Solstice that most people will not read it and they have to include the instructions verbally at the event itself if they want anyone to be able to act on the instructions)

6. Games of Telephone

Most larger-scale coordination effort requires multiple chains of "Alice tells Bob tells Charlie." Alice tells Bob a reasonably faithful but incomplete version of the thing. Bob tells Charlie a slightly more garbled version. Charlie only bothers repeating the headline to Donna, and when Donna asks for clarifications Charlie only has Bob's half-remembered explanation, which he garbles further.

Edward only ever hears the headline, and only ever repeats that.

6. People don't remember that much

People remember stuff when they actually think deeply about it repeatedly (especially if they actually use the concept regularly). 

They also remember stuff when they tell their friends about it. They remember stuff when other people remind them.

The Game of Telephone isn't just relevant for how the message gets distorted. It's also relevant for how the message gets repeated. If Alice read the entire blogpost and tries to repeat it faithfully.... but then later mostly hears Donna and Edward and Francine repeating the headline, Alice might end up forgetting most of the details and eventually not even remembering that Politics is the Mindkiller [LW · GW] was largely about Avoiding Unnecessarily Political Examples [LW · GW].

7. Coordination vs "What concepts most people end up interacting with"

A slightly different take from "how many people you're coordinating with" is "given how memes spread, most people who interact with your concept will be interacting with a dumbed down version of it, and the people people who know the concept the more likely it is that most of them know the minimal 2-7 word version."

I'm not sure whether this has implications or not, beyond "be ready for most people to only understand an oversimplified version of your meme."

8. Human Adversaries and Memetic Adversaries

Sometimes humans actively misrepresent your thing on purpose. Also, sometimes humans accidentally repurpose your thing because they had some more commonly-felt-need, and your square-peg concept was the nearest thing they could shove into a round hole. (See: "Avoiding Jargon Confusion [LW · GW].")

There's at least one concept I haven't written up, because I couldn't think of a title that wouldn't automatically get repurposed into a net-harmful bastardization of itself.

Also, people sometimes want your coordination concept to be something that is more convenient for them. (See: "Talent Gaps", which actually means "EA needs a few highly talented people with very specific qualities" but people re-imagined to mean "I'm pretty talented! EA needs me!" and then later were disappointed about)

9. I have very little idea how the scaling effect actually kicks in. I'm much more confident the eventual limit is 2-7 words than under which circumstances that limit kicks in at.

Future Work

What is the actual upper bound for large groups, if you're trying? How hard is it to try that hard?

If you make people dedicate their lives to memorizing an extremely detailed Tome, you can probably get a pretty large amount of shared context. The problem comes if you ever need to change the shared context. (See: people memorizing the Bible, but 2000 years later it turns out a lot of the Bible is false or philosophically antiquated)

What's the most live nuance, that you can have the power to shift over time?

I think this is easiest to get if you have an actual company where people are paid to pay attention. But even then people are pretty busy and lazy. You can start with instilling strong memes about "yes it's actually really god damn important that we all be aligned and read the memo every morning", but I think that requires a pretty significant cultural onboarding.

Scott Alexander's Ars Longa Vita Brevis seems relevant: you can get pretty far if you dedicate a culture to not only figuring out the new concepts to coordinate on, but also learning how to distill new concepts down, address common misconceptions, etc. 

comment by PhilGoetz · 2024-04-21T04:50:34.403Z · LW(p) · GW(p)

Isn't LessWrong a disproof of this?  Aren't we thousands of people?  If you picked two active LWers at random, do you think the average overlap in their reading material would be 5 words?  More like 100,000, I'd think.

comment by Benquo · 2019-03-13T16:46:58.394Z · LW(p) · GW(p)

A productive thing to do here would be to try to reconcile the claim that a large number of people can't reasonably be expected to read more than a few words, and the claim that something like EA or Rationalism is possible at anything like the current scale. These are in obvious tension.

Another claim to reconcile with yours would be a claim that there's anything like law going on, or really anything other than gang warfare.

Replies from: Raemon
comment by Raemon · 2019-03-13T20:33:50.022Z · LW(p) · GW(p)

My claim is "a large number of people can't reasonably be expected to read more than a few words in common", which I think is subtly different (in addition to the thing where this post wasn't about ways to address the problem, it was about the default state of the problem in the absence of an explicit coordination mechanism)

If your book-length-treatise reaches 1000 people, probably 10-50 of those people read the book and paid careful attention, 100 people read the book, a couple hundred people skimmed the book, and the rest just absorbed a few key points secondhand.

I think it is in fact a failure of law that that the law has grown to the point where a single person can't possibly know it all, and only specialists can know most of it (because this creates an environment where most people don't know what laws they're breaking which enables certain kinds of abuse)

I think the way EA and LessWrong work is that there's a large body of work people are vaguely expected to read (In the case of LessWrong, I think the core sequences are around [edit: a million words, I initially was using my cached pageCount rather than wordCount] not sure how big the overall EA corpus is). EA and LW are filtered by "nerds who like to read", so you get to be on the higher end of the spectrum of how many people have read how much.

But, it still seems like a few things end up happening:

Important essays definitely lose nuance. "Politics in the Mind Killer" is one of the common examples of something where the original essay got game-of-telephoned pretty hard by oral culture.

Similarly, EA empirically runs into messaging issues where, even though 80k had tried intentionally to downplay the "Earning to Give" recommendation, but people still primarily associated 80k with Earning to Give years later. And when they finally successfully switched the message to "EA is talent constrained", that got misconstrued as well.

Empirically people also successfully rely on a common culture to some degree. My sense is that the people who tend to do serious work and get jobs and stick around are ones who have read a lot of at least a good chunk of the words, and they somewhat filter themselves into groups that have read particular subsets. The fact that there are 1000+ people misunderstanding politics is the mind killer doesn't mean there's not also 100-200 people who remember the original claim.

(There are probably different clusters of people who have read different clusters of words, i.e. people who have read the sequences, people who have read Doing Good Better, people who have read a smattering of essays from each as well as the old Givewell blogs, etc)

One problem facing EA is that there is not much coordination on which words are the right ones to read. Doing Good Better was written with a goal of being "the thing you gave people as their cultural onboarding tool", AFAICT. But which 80k essays are you supposed to have read? All of them? I dunno, that's a lot and I certainly haven't, and it's not obvious that that's a better use of my time than reading up on machine learning or the AI Alignment Forum or going off to learn new things that aren't part of the core community material.

Replies from: Vaniver
comment by Vaniver · 2019-03-13T21:46:50.613Z · LW(p) · GW(p)
In the case of LessWrong, I think the core sequences are around 10,000 words, not sure how big the overall EA corpus is.

This feel like a 100x underestimate; The Sequences clocks in at over a million words, I believe, and it's not the case that only 1% of the words are core.

Replies from: Raemon
comment by Raemon · 2019-03-13T21:51:55.158Z · LW(p) · GW(p)

Whoops. I was confusing pages with words.

Replies from: Raemon
comment by Raemon · 2019-03-13T22:06:59.620Z · LW(p) · GW(p)

(The mental-action I was performing was "observing what seems to actually happen and then grab the numbers that I remembered coinciding with those actions", rather than working backwards from a model of numbers, which may or may not have been a good procedure, but in any case means that being off by a factor of 100 doesn't influence the surrounding text much)

comment by ryan_b · 2020-12-26T18:44:37.776Z · LW(p) · GW(p)

I think this post is excellent, and judging by the comments I diverge from other readers in what I liked about it.

In the first, I endorse the seriously-but-not-literally standard for posting concepts. The community - rightly in my view - is under continuous pressure to provide high quality posts, but when the standard gets too high we start to lose introduction of ideas and instead they just languish in the drafts folder, sometimes for years. In order to preserve the start of the intellectual pipeline, posts of this level must continue to be produced.

In the second, I have long thought that communication was a neglected topic here, particularly when it comes to groups. This post did an excellent job of pointing out how the problems of nuance and scale are in tension, and this has implications for coordination. Putting more emphasis on communication for coordination is going to be important in my view if we want the other ideas developed here to escape here. Even a more rigorous version of the concept would be a huge contribution.

In the third, a meta observation: I notice there is a pretty large segment of commenters who always interpret any numbers present as the thesis. The original title of this post was "You Have Four Words" and commentary focused enough on whether four words was correct that the title was changed to "You Have About Five Words" which is a strictly worse title in my opinion. Amusingly and I assume intentionally the change does illustrate the point. More interestingly the title change caused me to reflect on keeping numbers out of concept posts (and basically any other kind of post not focused on quantifying something), and it occurred to me it was pretty good advice in full generality rather than just in the case of LessWrong. It avoids all sorts of things: accidentally triggering analysis on the numbers, which are not the point; a particular number or statistic being mistakenly repeated as part of the claim, or worse replacing an actual core claim in the reader's memory; it also simply and easily preserves the understanding that rigorous quantification has not been done yet and is work that still needs doing. As a consequence in my head the conclusion is now Few Words, No Numbers.

comment by Raemon · 2019-03-14T21:53:54.550Z · LW(p) · GW(p)

I think the actual final limit is something like:

Coordinated actions can't take up more bandwidth than someone's working memory (which is something like 7 chunks, and if you're using all 7 chunks then they don't have any spare chunks to handle weird edge cases).

A lot of coordination (and communication) is about reducing the chunk-size of actions. This is why jargon is useful, habits and training are useful (as well as checklists and forms and bureaucracy), since that can condense an otherwise unworkably long instruction into something people can manage.

"Go to the store and get eggs" comes with a bunch of implicit knowledge about cars or bikes or where the store is and what eggs are, etc.

Replies from: Yoav Ravid
comment by Yoav Ravid · 2019-03-15T07:59:05.204Z · LW(p) · GW(p)

What is meant by 7 chunks? seems like that in itself was condensed jargon that i didn't understand :P

Replies from: Raemon
comment by Raemon · 2019-03-15T18:11:23.026Z · LW(p) · GW(p)

"Something that your mind thinks of as one unit, even if it's in fact a cluster of things."

The "Go to the store" is four words. But "go" actually means "stand up. walk to the door. open the door. Walk to your car. Open your car door. Get inside. Take the key out of your pocket. Put the key in the ignition slot..." etc. (Which are in turn actually broken into smaller steps like "lift your front leg up while adjusting your weight forward")

But, you are capable of taking all of that an chunking it as the concept "go somewhere" (as as well as the meta concept of "go to the place whichever way is most convenient, which might be walking or biking or taking a bus"), although if you have to use a form of transport you are less familiar with, remembering how to do it might take up a lot of working memory slots, leaving you liable to forget other parts of your plan.

Replies from: Yoav Ravid
comment by Yoav Ravid · 2019-03-15T19:42:45.729Z · LW(p) · GW(p)

So "7 chunks" was used as almost a synonym for "7 words"? I thought that was some cool concept from neuroscience about working memory :)

Replies from: Raemon
comment by Raemon · 2019-03-15T20:56:11.481Z · LW(p) · GW(p)

I think the near-synonym nature is more about convergent evolution. (i.e. words aim to be reflect a concept, working memory is about handling concepts).

https://en.wikipedia.org/wiki/Working_memory

comment by TristanTrim · 2019-08-26T16:01:24.757Z · LW(p) · GW(p)

I like this direction of thought, and I suspect it is true as a general rule, but ignores the incentive people have for correctly receiving the information, and the structure through which the information is disseminated. Both factors (and probably others I haven't thought of) would increase or decrease how much information could be transferred.

Replies from: Mauricio_AG
comment by Mauricio_AG · 2020-12-27T18:22:01.125Z · LW(p) · GW(p)

This is a good point. We can explain why students in medical school carefully digest millions of words by discussing the near-term incentives of final exams and the long-term incentives of increased salary and social status.

comment by ryan_b · 2019-03-13T14:38:20.078Z · LW(p) · GW(p)

This puts me in mind of the mandatory reading of a narrative memo they use at Amazon, which appears to conform to the 'several blog posts' level of coordination. It is hierarchically enforced, and the people who use it are the senior leadership which has, I assume, a capability distribution heavily weighted towards the top of the scale.

Also relevant is the Things I Learned From Working With a Marketing Advisor [LW · GW] post.

comment by orthonormal · 2021-01-10T05:51:11.810Z · LW(p) · GW(p)

This is a retroactively obvious concept that I'd never seen so clearly stated before, which makes it a fantastic contribution to our repertoire of ideas. I've even used it to sanity-check my statements on social media. Well, I've tried.

Recommended, obviously.

comment by DirectedEvolution (AllAmericanBreakfast) · 2020-12-28T17:02:34.529Z · LW(p) · GW(p)

I see where Raemon is going with this, and for a simplified model, where number of words is the only factor, this is at least plausible. Super-simplified models can be useful not only insofar as they make accurate predictions, but because they suggest what a slightly more complex model might look like.

In this case, what other factors play into the number of people you can coordinate with about X words?

Motivation (payment, commitment to a cause, social ties, status) Repetition, word choice, presentation Intelligence of the audience Concreteness and familiarity of the message

Out of all the insights from the field of public communications, why is “you have about five words” the key insight for us to remember?

Maybe because it’s actionable and because our educational training asks us to write 10-page essays far more often than it asks us for five well-chosen words.

comment by Ben Pace (Benito) · 2021-01-10T23:18:08.717Z · LW(p) · GW(p)

Okay, whenever I read this post, I don't get it.

There's some fermi-estimation happening, but the fermi is obviously wrong. As Benquo points out, certain religions have EVERYONE read their book, memorize it, chant it, discuss it every Sunday (or Saturday).

I feel like the post is saying "there are lots of bandwidth problems. the solution to all of them is '5'." and I don't get why 5.

So I read Ray's comment on Daniel Filan's review, where he says:

...at some maximum scale, your coordination-complexity is bottlenecked on a single working-memory-cluster, which (AFAICT based on experience and working memory research) amounts to 3-7 chunks of concepts that people already are familiar with. 

So, I am fairly confident that in the limit it is actually about 5 words +/- 2, because Working Memory Science and some observations about what slogans propagate.

Now THAT is a great point. If you CANNOT assume shared context beyong this idea, and you want to be able to have common knowledge of the idea whilst continuing to make further points... sounds like you get about 5 words.

That does change my mind significantly about the idea. That said I would want a basic version of that worked into the post. I think it can be done, even if it's not the 'rigorous' version Ray wants.

Before reading that, I was going to downvote the post in the review. Now I'm kinda neutral. If Ray says he'll very likely incorporate it in, should it pass review, then I'm moving toward like voting with strength 1-3 on it.

P.S. Zvi suggests "You GET about five words" and I also like that. Would encourage Ray to seriously think about the alternative then pick which one seems best to him.

Replies from: Raemon, Benito
comment by Raemon · 2021-01-11T00:21:10.125Z · LW(p) · GW(p)

P.S. Zvi suggests "You GET about five words" and I also like that. Would encourage Ray to seriously think about the alternative then pick which one seems best to him.

I currently am neutral between "have" and "get", but prefer "have" just because changing a post title on a whim makes it harder to find. If most people preferred "get" I'd be happy to change.

Replies from: Benito
comment by Ben Pace (Benito) · 2021-01-11T00:33:53.670Z · LW(p) · GW(p)

If it were easy to make elicit things, I'd post one here for people to give a probability that "You get about five words" is better than "You have about five words". Would appreciate someone doing that.

Replies from: kjz
comment by kjz · 2021-01-12T21:39:14.165Z · LW(p) · GW(p)

I prefer "get". It implies more strongly that if someone actually needs to convince others of their argument, they need to make sure their message is as concise and optimized as possible, before trying to convince anyone. As the original post says:

What if you need all that nuance and to coordinate thousands of people?

You still only get five words.

comment by Ben Pace (Benito) · 2021-01-10T23:19:39.291Z · LW(p) · GW(p)

I'm also interested in someone else (e.g. Kaj, Zvi, Orthonormal, etc) who managed to get this stuff from the post, trying to make me less confused about how people are getting things from this post.

comment by Kaj_Sotala · 2020-12-02T13:08:57.172Z · LW(p) · GW(p)

I've found this valuable to keep in mind.

comment by Jacob Falkovich (Jacobian) · 2019-03-13T15:58:32.025Z · LW(p) · GW(p)

This immediately got me thinking about politics.

How many voters could tell you what Obama's platform was in 2008? But 70,000,000 of them agreed on "Hope and Change". How many could do the same for Trump? But they agreed on "Make America Great Again". McCain, Romney, and Hillary didn't have a four-words-or-less memorable slogan, and so...

Replies from: Raemon
comment by Raemon · 2019-03-13T19:54:05.627Z · LW(p) · GW(p)

I'm actually two levels of surprised here. I'd have naively expected McCain, Romney and Hillary to have competent enough staffers to make sure they had a slogan, and sort of passively assumed they had one. It'd be surprising if they didn't have one, and if they did have one, surprising that I hadn't heard it. (I hung out in blue tribe spaces so it's not that weird that I'd have failed to hear McCain's or Romneys)

Quick googling says that Hillary's team thought about 84 slogans before settling on "Stronger Together", which I don't remember hearing. (I think instead I heard a bunch of anti-Trump slogans like "Love Trumps Hate", which maybe just outcompeted it?)

Replies from: philh
comment by philh · 2019-03-18T07:58:21.794Z · LW(p) · GW(p)

I had been under the impression that Hillary's was "I'm with her"? But I think I mostly heard that in the context of people saying it was a bad slogan.

comment by Yoav Ravid · 2019-03-13T14:37:29.740Z · LW(p) · GW(p)

So, an action coordination website [? · GW] should be able to phrase actions in four words?

This idea seems interesting, i'd love to see it somehow more formulated.

Do shorter kickstarter descriptions get funded more?

Do protest events on Facebook which have a shorter description get more attendees?

It probably also depends on personality - if you want to coordinate people who are high in contentiousness, you may need more words. for low contentiousness, less words. and if you want both, than you need to give a clear 4-word heading, and a bunch of nuance below.

Replies from: Raemon
comment by Raemon · 2019-03-13T22:08:38.879Z · LW(p) · GW(p)

I don't think this directly bears on how to build an action coordination website, more than in lieu of such a site you should expect action coordination to succeed at the 4-word level of complexity. I haven't thought as much about how to account for this when trying hard to build a coordination platform.

But, I do think that kickstarters tend to succeed more if the 4-word version of them are intuitively appealing.

comment by David James (david-james) · 2024-05-23T00:24:46.322Z · LW(p) · GW(p)

Preface: I feel like I'm wearing the clown suit to a black tie event here. I'm new to LW and respect the high standards for discussion. So, I'll treat this an experiment. I'd rather be wrong, downvoted, and (hopefully) enlightened & persuaded than have this lingering suspicion that the emperor has no clothes.

I should also say that I personally care a lot about the topic of communication and brevity, because I have a tendency to say too much at one time and/or use the wrong medium in doing so. If anyone needs to learn how to be brief, it is me, and I'll write a few hundred words if necessary to persuade you of it.

Ok, that said, here are my top two concerns with the article: (1) This article strikes me as muddled and unclear. (i) I don't understand what "get" five words even means. (ii) I don't understand how coordination relates to the core claims or insight. My confusion leads to my second concern: (2) what can I take from this article?

Let's start with the second part. Is the author saying if I'm a CEO of a company of thousands I only "get" five words?

A quick aside: to me, "get" is an example of muddled language. What does the author mean w.r.t. (a) time period; (b) ... struggling for the right words here ... meaning? As to (a), do I "get" five words per message? Or five words some (unspecified) time frame? As to (b), is "get" a proxy for how many words the recipient/audience will read? But reading isn't enough for coordination, so I expect the author means something more. Does the author mean "read and understand" or "read and internalize" or "read and act on"?

Anyhow, due to the paragraph above, I don't know how to convert "You only get five words" into a prediction. In this sense, to me, the claim it isn't even wrong, because I don't know how to put it into practice.

Normally I would stop here, put the article aside, and move on. However, this article is featured here on LW and has many up-votes which suggests that others get a lot of value out of it. So I'm curious: what am I missing? Is there some connection to EA that makes this particularly salient, perhaps?

I have a guess that fans of the article have some translation layer that I'm missing. Perhaps if I could translate what the author means by get and coordination I would have the ah-ha moment.

To that end, would someone be so kind as to (a) summarize the key point(s) as simply as possible; with (b) clear intended meanings for "coordinate" and "get" (as in you only "get" X words) -- including what timeframe we're talking about -- and (c) the logic and evidence for the claims.

It is also possible that I'm not "calibrated" with the stated Epistemic Status:

all numbers are made up and/or sketchily sourced. Post errs on the side of simplistic poetry – take seriously but not literally."

Ok, but what does this mean for the reader? The standards of rationality still apply, right? There should still be some meaningful, clear, testable takeaway, right?

Replies from: Raemon
comment by Raemon · 2024-05-23T00:35:14.337Z · LW(p) · GW(p)

Thanks for the thoughts (no need to be nervous about arguing against a post – that's kinda the whole point of the site)

For an example of what I mean, here's another post on a pretty similar subject, by someone with experience seeing how it played out at different large companies [LW · GW] (Dan Luu)

One thing it took me quite a while to understand is how few bits of information it's possible to reliably convey to a large number of people. When I was at MS, I remember initially being surprised at how unnuanced their communication was, but it really makes sense in hindsight.

For example, when I joined Azure, I asked people what the biggest risk to Azure was and the dominant answer was that if we had more global outages, major customers would lose trust in us and we'd lose them forever, permanently crippling the business.

Meanwhile, the only message VPs communicated was the need for high velocity. When I asked why there was no communication about the thing considered the highest risk to the business, the answer was if they sent out a mixed message that included reliability, nothing would get done.

The fear was that if they said that they needed to ship fast and improve reliability, reliability would be used as an excuse to not ship quickly and needing to ship quickly would be used as an excuse for poor reliability and they'd achieve none of their goals.

When I first heard this, I thought it was odd, but having since paid attention to what happens when VPs and directors attempt to communicate information downwards, I have to concede that it seems like the MS VPs were right and nuanced communication usually doesn't work at scale.

I've seen quite a few people in upper management attempt to convey a mixed/nuanced message since my time at MS and I have yet to observe a case of this working in a major org at a large company (I have seen this work at a startup, but that's a very different environment).

I've noticed this problem with my blog as well. E.g., I have some posts saying BigCo $ is better than startup $ for p50 and maybe even p90 outcomes and that you should work at startups for reasons other than pay. People often read those posts as "you shouldn't work at startups".

I see this for every post, e.g., when I talked about how latency hadn't improved, one of the most common responses I got was about how I don't understand the good reasons for complexity. I literally said there are good reasons for complexity in the post!

As noted previously, most internet commenters can't follow constructions as simple as an AND, and I don't want to be in the business of trying to convey what I'd like to convey to people who won't bother to understand an AND since I'd rather convey nuance

But that's because, if I write a blog post and 5% of HN readers get it and 95% miss the point, I view that as a good outcome since was useful for 5% of people and, if you want to convey nuanced information to everyone, I think that's impossible and I don't want to lose the nuance

If people won't read a simple AND, there's no way to simplify a nuanced position, which will be much more complex, enough that people in general will follow it, so it's a choice between conveying nuance to people who will read and avoiding nuance since most people don't read

But it's different if you run a large org. If you send out a nuanced message and 5% of people get it and 95% of people do contradictory things because they understood different parts of the message, that's a disaster. I see this all the time when VPs try to convey nuance.

BTW, this is why, despite being widely mocked, "move fast & break things" can be a good value. It coneys which side of the trade-off people should choose. A number of companies I know of have put velocity & reliability/safety/etc. into their values and it's failed every time.

MS leadership eventually changed the message from velocity to reliability First one message, then the next. Not both at once When I checked a while ago, measured by a 3rd party, Azure reliability was above GCP and close enough to AWS that it stopped being an existential threat

Azure has, of course, also lapped Google on enterprise features & sales and is a solid #2 in cloud despite starting with infrastructure that was a decade behind Google's, technically. I can't say that I enjoyed working for Azure, but I respect the leadership and learned a lot.

One motivating example at the time was seeing how the EA community organizers/leaders had lots of trouble communicating nuanced ideas. For example, "EA is talent constrained" was how a blogpost about "EA needs more extremely talented people in particular domains, more than it needs marginal money, right now". But people heard it as "EA needs people who are talented... I'm talented!" and then felt frustrated when they tried to apply for jobs, but, actually, what the post originally meant was specific talent gaps.

Replies from: david-james
comment by David James (david-james) · 2024-05-23T01:03:28.880Z · LW(p) · GW(p)

Thanks for your quick answer -- you answered before I was even done revising my question. :) I can personally relate to Dan Luu's examples. / This immediately makes me want to find potential solutions, but I won't jump to any right now. / For now, I'll just mention the ways in which Jacob Collier can explain music harmony at many levels.

comment by Decaeneus · 2023-04-05T15:59:21.822Z · LW(p) · GW(p)

Might LLMs help with this? You could have a 4.3 million word conversation with an LLM (with longer context windows than what's currently available) which could then, in parallel, have similarly long conversations with arbitrarily many members of the organization, adequately addressing specific confusions individually, and perhaps escalating novel confusions to you for clarification. In practice, until the LLMs become entertaining enough, members of the organization may not engage for long enough, but perhaps this lack of seductiveness is temporary.

comment by Mary Chernyshenko (mary-chernyshenko) · 2021-01-11T08:10:19.979Z · LW(p) · GW(p)

Seems like today the size of the phone screen defines how much of the text one is willing to read (an unselected someone). It's still unclear what people do with it later and how much they retain. But reading in itself seems not so tightly limited; five-words-at-most is what I expect from billboards. But I also expect them to be more like roadsigns/reminders, not original messages (and really I would be surprised if someone treated the words as something beyond advertisement.)

Also, repeated exposure is a thing, which is often the case when one coordinates many people. And the ability of factions to work together although their "core texts" are very different.