Why Have Sentence Lengths Decreased?
post by Arjun Panickssery (arjun-panickssery) · 2025-04-03T17:50:29.962Z · LW · GW · 74 commentsContents
74 comments
“In the loveliest town of all, where the houses were white and high and the elms trees were green and higher than the houses, where the front yards were wide and pleasant and the back yards were bushy and worth finding out about, where the streets sloped down to the stream and the stream flowed quietly under the bridge, where the lawns ended in orchards and the orchards ended in fields and the fields ended in pastures and the pastures climbed the hill and disappeared over the top toward the wonderful wide sky, in this loveliest of all towns Stuart stopped to get a drink of sarsaparilla.”
— 107-word sentence from Stuart Little (1945)
Sentence lengths have declined. The average sentence length was 49 for Chaucer (died 1400), 50 for Spenser (died 1599), 42 for Austen (died 1817), 20 for Dickens (died 1870), 21 for Emerson (died 1882), 14 for D.H. Lawrence (died 1930), and 18 for Steinbeck (died 1968). J.K Rowling averaged 12 words per sentence (wps) writing the Harry Potter books 25 years ago.
So the decline predates television, the radio, and the telegraph—it’s been going on for centuries. The average sentence length in newspapers fell from 35wps to 20wps between 1700 and 2000. The presidential State of the Union address has gone from 40wps down to under 20wps, and the inaugural addresses had a similar decline. (From Jefferson through T. Roosevelt, the SOTU address was delivered to Congress without any speech, and print was the main way that inaugural addresses were consumed for most of their history.) Warren Buffett’s annual letter to shareholders dropped from 17.4wps to 13.4wps between 1974 and 2013.
SlateStarCodex’s ten recommended blog posts have 22wps. My own top 10 posts have 20wps. Even top medical journals have under 25wps. The FAA, the European Commission, and various legal institutions have style guides recommending to stay under 20wps. Skimming r/writing, it looks like people recommend 10-15wps for fiction (HPMOR has 15wps). It’s possible that sentence lengths will stop declining only when we hit a physical limit on how short sentences can reasonably become. The best-selling hardboiled novella The Postman Always Rings Twice (1934) has 11wps, while I saw one source claiming that Jurassic Park (1990) has only 9wps.
Several explanations present themselves for why sentence lengths have decreased. They aren’t mutually exclusive; it could be that all of them contributed.
- The average reader has gotten dumber and prefers shorter, simpler sentences.
- Longer sentences are more suitable for reading out loud, but shorter sentences are more suitable for reading silently.
- Shorter sentences are just better, i.e. they promote faster reading and better comprehension.
The reason the average reader could have been smarter in the past is because literacy used to be more limited.
Full literacy didn’t appear until the turn of the 20th century in England. America had an earlier rise in literacy and the vast majority of free men could read by the 1800s, though like England it took until the 1900s to reach full literacy. It does seem broadly true that sentence lengths are higher in areas with more advanced readers; Stuart Little, the 1945 children’s book quoted at the top, has 13wps, while scientific journals often have 25wps. On the other hand, sentence lengths continued to decline throughout the 1900s, well after we reached full literacy.
Another theory is that journalists inspired a terser style. The newspaper industry grew throughout the 19th century and they saved money when they used fewer words. Many great American writers like Twain, Whitman, Hemingway, and Steinbeck were journalists and influenced by newspaper style. There are whole grammatical structures like the appositive noun phrase (the part set off by commas in “Mr. Smith, a Manhattan accountant, said…”) that are associated with newspapers and clearly have brevity in mind.
Another theory has to do with a transition from reading aloud to reading silently. Reading texts aloud to a group continued as a social practice into the Victorian era, and illiterates would even pay to listen to readings of Dickens. Works written up to this period would have often been written with listeners in mind. An interesting 2008 paper discusses how Dickens in particular uses punctuation and other markers to help orators read his novels. But eventually it became most common to read silently and one consequence was that punctuation became standardized on syntactic (i.e. grammatical) rather than prosodic grounds. I’m not sure if it follows that sentence lengths would also go down. Spoken language is surprisingly complex and actually contains more subordinate clauses than professional/academic writing. For example, I found some transcripts of interviews from Brandon Sanderson—a popular fantasy author whose Stormlight Archive series averages only 9 words per sentence—and measured his extemporaneous speech at ~20 words per sentence (and that includes a bunch of short sentences like “Yeah” or “I don’t know”).
The simplest theory is just that shorter sentences reflect better writing. When you see those ratings of a text’s reading difficulty in terms of a 4th-grade reading level or 10th-grade reading level and so on, those ratings are based on the Flesh-Kincaid readability score, which is just a weighted sum of the text’s words-per-sentence and syllables-per-word measures. A decrease of one grade level in readability thus comes from ~10 additional words per sentence or ~0.11 additional syllables per word. Studies invariably show that sentences with fewer words are easier for readers to understand quickly.
Others have suggested this for a long time; in one of the earliest analyses of sentence length, Lucius Sherman in Analytics of Literature (1893) wrote that the “heaviness” of sentences also decreased over time as sentence lengths decreased, and that “Elizabeth writers “are prevailingly either crabbed or heavy … ordinary modern prose, on the other hand, is clear, and almost as effective to the understanding as oral speech.”
Part of this was because older writers affected a Latinate style. The “periodic sentence,” which saves the main clause for the end after multiple dependent clauses are presented first, was common and exemplified in the extreme by writers like Samuel Johnson and Henry James. Consider the Stuart Little quote at the top: the main clause “Stuart stopped to get a drink of sarsaparilla” is preceded by a prepositional phrase “in the loveliest town of all” and four lengthy dependent clauses starting with “where.” This Latinate style included a preference for hypotaxis (connecting clauses with conjunctions or relative pronouns) over parataxis (presenting clauses sequentially without subordination):
Hypotaxis: When the alarm sounded, the firefighters, who had been sleeping, quickly jumped into action.
Parataxis: The alarm sounded. The firefighters had been sleeping. They quickly jumped into action.
It seems like the improved-readability effect provides most of the explanation. As more readers appeared and read more often (and read silently), selective pressure increased for styles that could be read and understood quickly. The telegraph and newspapers encouraged brevity as well. In principle, you could imagine that the Internet would have enabled a wordier style because it removed the financial costs of physically printing more words, but any effect like that hasn’t overcome the other trends.
74 comments
Comments sorted by top scores.
comment by TAG · 2025-04-03T21:23:30.405Z · LW(p) · GW(p)
What is a sentence anyway... is there something special about a period, as opposed to other punctuation marks? Many are available: the colon is a possibility; also its half-brother; and the comma,of course...also the ellipsis -- even the mighty m-dash!
Replies from: bfinn, TsviBT, mateusz-baginski↑ comment by bfinn · 2025-04-04T16:09:22.483Z · LW(p) · GW(p)
Orwell noted that the semicolon is almost redundant. I wonder if sentences that once would have had a semicolon half way through are now split into two sentences.
Replies from: MondSemmel↑ comment by MondSemmel · 2025-04-05T16:55:40.902Z · LW(p) · GW(p)
This mermaid of the punctuation world—period above, comma below—is viewed with suspicion by many people, including well-known writers. George Orwell deliberately avoided semicolons in his novel Coming Up for Air (London: V. Gollancz, 1939). As he explained to his editor (Roger Senhouse) at the time, “I had decided … that the semicolon is an unnecessary stop and that I would write my next book without one” (quoted in George Orwell: The Collected Essays, Journalism & Letters, ed. Sonia Orwell and Ian Angus, in Vol. 4: In Front of Your Nose, Jaffrey, NH: David R. Godine, 2000). Kurt Vonnegut had this advice for writers: “First rule: Do not use semicolons. They are transvestite hermaphrodites representing absolutely nothing. All they do is show you’ve been to college” (A Man Without a Country, New York: Seven Stories Press, 2005).
[...] British journalist Lynne Truss affirmed that “a full stop ought always to be an alternative” to the semicolon (Eats, Shoots & Leaves, New York: Gotham Books, 2004). The American writer Noah Lukeman views the semicolon as a mark more suitable for creative writing. Otherwise, he argues, “The first thing to realize is that one could always make a case for not using a semicolon. As an unnecessary form of punctuation, as the luxury item in the store, we must ask ourselves: why use it at all?” (A Dash of Style: The Art and Mastery of Punctuation, New York: Norton, 2006).
And this article has an infographic "number of semicolons per 100,000 words" for a bunch of famous authors. And it includes this claim (though note that statistics from tools like Google Books Ngram Viewer can suffer from stuff like OCR ideosyncrasies).
Replies from: AllAmericanBreakfastYou probably notice the older authors I’ve selected use far more than modern authors. Google Books Ngram Viewer, which includes novels, nonfiction, and even scientific literature, hows that semicolon use has dropped by about 70 percent from 1800 to 2000.
↑ comment by DirectedEvolution (AllAmericanBreakfast) · 2025-04-05T19:36:24.231Z · LW(p) · GW(p)
Semicolons are unnecessary? That doesn’t go far enough. Cormac McCarthy got rid of quotation marks, most commas, and almost exterminated the colon.
Replies from: MondSemmel↑ comment by MondSemmel · 2025-04-06T10:30:40.413Z · LW(p) · GW(p)
The colon seems optional to me, but quotation marks absolutely aren't, as evidenced by how comparatively unreadable this author's dialogue looks. From his book "The Road":
He screwed down the plastic cap and wiped the bottle off with a rag and hefted it in his hand. Oil for their little slutlamp to light the long gray dusks, the long gray dawns. You can read me a story, the boy said. Cant you, Papa? Yes, he said. I can.
That already looks unnecessarily hard to read even though the dialogue is so short. I guess the author made it work somehow, but this seems like artificially challenging oneself to write a novel without the letter 'E': intriguing, but not beneficial to either reader or prose.
Replies from: AllAmericanBreakfast↑ comment by DirectedEvolution (AllAmericanBreakfast) · 2025-04-06T19:35:38.952Z · LW(p) · GW(p)
I respectfully disagree. As with the minor edit on the Boccaccio quote in another of my comments here, eliminating quotes fundamentally changes the way we interpret the scene.
With quotes (and especially with the way dialog is typically paragraphed), human speech is implicitly shown to be so drastically separate from the sensory component of the scene that it requires completely different formatting from the rest of the text.
By eliminating quotes and dialog paragraphing, human speech becomes just another element in the scene being depicted, not separate or any more or less important than the action of screwing down the plastic cap or the functional importance of the oil in the lamp.
The absence of quotes only makes it harder to read if you, the reader, resist this aesthetic and try to force the dialog to be of greater importance than McCarthy is allowing it to be in his novel.
He screwed down the plastic cap and wiped the bottle off with a rag and hefted it in his hand. Oil for their little slutlamp to light the long gray dusks, the long gray dawns.
"You can read me a story," the boy said. "Cant you, Papa?"
"Yes," he said. "I can."
See how the social interaction between Papa and the boy is now positioned as separate from and more important than Papa's work on the lamp?
He screwed down the plastic cap and wiped the bottle off with a rag and hefted it in his hand. Oil for their little slutlamp to light the long gray dusks, the long gray dawns. "You can read me a story," the boy said. "Cant you, Papa?" "Yes," he said. "I can."
Even if you just add quotation marks, the marks call special and separate attention to the dialog, placing it as a separate component of the paragraph.
Replies from: MondSemmel, roger-scott↑ comment by MondSemmel · 2025-04-06T20:45:13.712Z · LW(p) · GW(p)
I see. I guess I can appreciate that the style is aiming for a particular aesthetic, but for me it's giving up more in clarity than it gains in aesthetic. In a phrasing like "Cant you, Papa? Yes, he said. I can." I have to think about who each part of the dialogue belongs to, and which parts are even dialogue, all due to the missing quotation marks.
This style reads to me like someone removed a bunch of parentheses from a math formula, ones which may not be strictly necessary if one knows about some non-universal order of operations. This may look prettier in some sense, but in exchange it will definitely confuse a fraction of readers. I personally don't think this tradeoff is worth it.
Replies from: AllAmericanBreakfast, Daphne_W↑ comment by DirectedEvolution (AllAmericanBreakfast) · 2025-04-06T20:52:49.127Z · LW(p) · GW(p)
That's a valid reaction. However, my take is that removal of the quotes is aesthetically useful precisely because it complicates our ability to parse the words as dialog and muddles that sort of naive clarity. Spoken words are sounds, sounds are part of the environment, and it is both a choice and an effort to parse those sounds as dialog.
Most authors opt to do this work for the reader through punctuation, which also enforces interpreting these passages as dialog first and sounds second, if at all. McCarthy makes it easier to interpret spoken words as sounds that are part of the environment. If your aim as a reader is to parse dialog, it will be harder to do this in a McCarthy novel. If your aim is instead to have an aesthetic experience of spoken words as sensation interlaced with other impressions of the environment, then McCarthy's method of punctuation makes this simpler (and even plants the suggestion that this might be something you as a reader might want to do, if you hadn't considered the possibility before).
↑ comment by Roger Scott (roger-scott) · 2025-04-23T03:25:45.060Z · LW(p) · GW(p)
Given that the default, non-quotation text is not, in general, describing sounds in the environment, why do you think a reader would interpret unquoted text as environmental sounds rather than as simply more of the author's description of goings on in the scene? I can see that presenting spoken words in some format that allows or encourages their interpretation as environmental might be artistically useful, I just don't see that removing the quotation marks from otherwise-quoted dialog accomplishes that.
Replies from: AllAmericanBreakfast↑ comment by DirectedEvolution (AllAmericanBreakfast) · 2025-04-23T15:04:08.803Z · LW(p) · GW(p)
By environment, I mean the setting of the scene. Spoken words are sounds in the setting, like the sound of the wind, a gunshot, or an animal’s cry. It just happens that a human voice box is what’s making those particular sounds. McCarthy’s central theme across all the novels of his that I’ve read is the inhumanity of the Mexican-American frontier, and treating human speech as just a sound among other sounds is a key part of how he expresses that theme in his writing style.
Replies from: roger-scott↑ comment by Roger Scott (roger-scott) · 2025-04-23T16:55:34.890Z · LW(p) · GW(p)
That still leaves the question of how the reader is to distinguish a sound (speech) from a description of sounds.
Replies from: AllAmericanBreakfast↑ comment by DirectedEvolution (AllAmericanBreakfast) · 2025-04-24T14:49:24.312Z · LW(p) · GW(p)
Can you give an example?
↑ comment by Mateusz Bagiński (mateusz-baginski) · 2025-04-04T06:04:59.005Z · LW(p) · GW(p)
Since this is about written English text (or maybe more broadly, text in Western languages written in Latinic or Cyrillic), the criterion is: ends with a dot, starts with an uppercase letter.
Replies from: TAG↑ comment by TAG · 2025-04-05T14:19:31.339Z · LW(p) · GW(p)
Then the phenomenon could be stem from punctuation habits, as @bfinn says. Did you notice that my original comment doesn't contain a sentence, by your standards?
Replies from: KenoubiWhat is a sentence anyway… is there something special about a period, as opposed to other punctuation marks? Many are available: the colon is a possibility; also its half-brother; and the comma,of course...also the ellipsis—even the mighty m-dash!
↑ comment by Kenoubi · 2025-04-06T14:48:53.892Z · LW(p) · GW(p)
Question marks and exclamation points are dots with an extra bit. Ellipses may be multiple dots, but also indicate an uncertain end to the sentence. (Formal usage distinguishes "..." for ellipses in arbitrary position and "...." for ellipses coming after a full stop, but the latter is rarely seen in any but academic writing, and I would guess even many academics don't notice the difference these days.)
comment by kilgoar (william-walshe) · 2025-04-06T19:48:49.417Z · LW(p) · GW(p)
Should we extend the scope of the data to include pre-Carolingian texts, it would of course approach infinite sentence length as punctuation had rarely been implemented. Even worse, should we go back into ancient Roman or Greek texts, a naive appraisal might also lead us to believe that syllables per word also approach staggering levels of complexity, since the convention of placing spaces or interpuncts between words was uncommon.
Indeed, spacing between words, capitalization, and punctuation were expressly introduced for readability incidentally, a consequence of the practical ease in the mindless and error-prone process of manual copying of documents, which came before the invention of the printing press.
It's not controversial to say that writing with sentences and spacing between words is easier to understand. But what do we gain by counting the punctuation in Chaucer's romance, a drama by Dickens, and a novel by Rowling, and comparing them? The fact of the matter is that modern editions of Chaucer's works have recognizable punctuation only in translation. You're going to see strange things like interpuncts which are not clearly correlating to modern punctuation, and are requiring interpretation, and they aren't going to be consistent between versions. Gothic punctuation is a fun rabbit hole, if you are interested in that.
I am unsure that the data is showing sentences are getting shorter. It is including eras in which the present norms or conventions for punctuation had not yet been established. There is great sense in studying how and why these conventions are developed, but I do not think these statistics are themselves a puzzle which needs solving.
comment by DirectedEvolution (AllAmericanBreakfast) · 2025-04-04T01:38:49.286Z · LW(p) · GW(p)
Interestingly, breaking up long sentences into shorter ones by replacing a transitional word with a period does not quite capture the same nuance as the original. Here's a translation of Boccaccio, and a version where I add a period in the middle.
Wherefore, as it falls to me to lead the way in this your enterprise of storytelling, I intend to begin with one of His wondrous works, that, by hearing thereof, our hopes in Him, in whom is no change, may be established, and His name be by us forever lauded.
Wherefore, as it falls to me to lead the way in this your enterprise of storytelling, I intend to begin with one of His wondrous works. By hearing thereof, our hopes in Him, in whom is no change, may be established, and His name be by us forever lauded.
By replacing ", that," with a period, my revision completely changes our relationship with the narrator. In the original translation, the narrator is both announcing his goal and describing what he plans to do to achieve it.
In the revised version, he's describing his plan of action and a potential effect of that plan. We might assume that he's choosing that plan in order to bring about that effect, but it's no longer explicit in the text. Each sentence stands on its own. It's up to the reader to perceive the narrator's intention.
I wonder if inserting periods systematically tends to disrupt explicit links between intention and action. If so, perhaps the shortening of sentences reflects the anomie of the modern era, the gradual decay of an explicit moral framework in the stories we tell.
comment by kave · 2025-04-05T02:09:28.751Z · LW(p) · GW(p)
I quite like the article The Rise and Fall of the English Sentence, which partially attributes reduced structural complexity to increase in noun compounds (like "state hate crime victim numbers" rather than "the numbers of victims who have experienced crimes that were motivated by hatred directed at their ethnic or racial identity, and who have reported these crimes to the state")
comment by David Gross (David_Gross) · 2025-04-04T00:34:33.207Z · LW(p) · GW(p)
There is a relatively new, practical reason to write short sentences: they are less likely to be mangled by automated translation software. Sentences often become long via multiple clauses. Automated translators can mangle such sentences by (for example) mistakenly applying words to the incorrect clause. If you split such sentences, you make such translations more reliable. Most of our writing now potentially has global reach. So you can be understood by more people if you meet translation software half-way.
Replies from: GAA↑ comment by Guive (GAA) · 2025-04-05T00:43:50.326Z · LW(p) · GW(p)
I agree this would be a good argument for short sentences in 2019, but does it still apply with modern LLMs?
comment by Judd Rosenblatt (judd) · 2025-04-06T10:28:40.212Z · LW(p) · GW(p)
Reading this post, my immediate hunch is that the decline in sentence lengths has a lot to do with the historical role of Latin grammar and how deeply it influenced educated English writers. Latin inherently facilitates longer, complex sentences due to its use of grammatical inflections, declensions, and verb conjugations, significantly reducing reliance on prepositions and conjunctions. This syntactic flexibility allowed authors to naturally craft extensive yet smooth-flowing sentences. Latin's liberating lack of fixed word order and its fun little rhetorical devices combine to support nuanced, flexible thinking. From my own experience studying Latin 7th-12th grade, I find this sort of stuff contributes significantly to freer, more expansive expression when writing or speaking in English, and I often can immediately tell when speaking with or reading something written by someone else who studied Latin. An easy "tell" is when they say "having done x."
Educated English writers historically learned Latin as a foundational part of their education, internalizing this syntactic complexity. As a result, English prose from authors like Chaucer, Samuel Johnson, and Henry James shows a clear preference for hypotaxis, complex sentences with nested subordinate clauses, rather than simpler paratactic structures consisting of shorter, sequential clauses.
The practical advantage of these complex sentence structures is the precise communication of nuanced and sophisticated ideas. Longer sentences enabled authors to maintain coherent, detailed arguments and descriptions within a single cohesive thought. I see this as reflecting "transcription fluency," where authors aim for fidelity in translating their complex internal thought processes directly into prose, trusting readers’ intelligence and attention span to engage deeply.
Here's a fun example from Thoreau’s "Walden," which makes it clear that such elaborate writing was intended to be understood even by poorer and less formally educated readers. Consider the following (just) two sentences:
Replies from: mruwnik“I have no doubt that some of you who read this book are unable to pay for all the dinners which you have actually eaten, or for the coats and shoes which are fast wearing or are already worn out, and have come to this page to spend borrowed or stolen time, robbing your creditors of an hour. It is very evident what mean and sneaking lives many of you live, for my sight has been whetted by experience; always on the limits, trying to get into business and trying to get out of debt, a very ancient slough, called by the Latins æs alienum, another’s brass, for some of their coins were made of brass; still living, and dying, and buried by this other’s brass; always promising to pay, promising to pay, tomorrow, and dying today, insolvent; seeking to curry favor, to get custom, by how many modes, only not state-prison offences; lying, flattering, voting, contracting yourselves into a nutshell of civility or dilating into an atmosphere of thin and vaporous generosity, that you may persuade your neighbor to let you make his shoes, or his hat, or his coat, or his carriage, or import his groceries for him; making yourselves sick, that you may lay up something against a sick day, something to be tucked away in an old chest, or in a stocking behind the plastering, or, more safely, in the brick bank; no matter where, no matter how much or how little.
↑ comment by mruwnik · 2025-04-10T09:41:47.267Z · LW(p) · GW(p)
Having studied Latin, or other such classical training, seems to be but one method of imbuing oneself with the the style of writing longer, more complicated sentences. Personally I acquired the taste for such eccentricities perusing sundry works from earlier times. Romances, novels and other such frivolities from, or set in, the 18-th century being the main culprits.
I suppose this sort of proves your point, in that those authors learnt to create complicated sentences from learning Latin, and the later writers copied the style, thinking either that it's fun, correct, or wanting to seem more authentic.
Replies from: judd↑ comment by Judd Rosenblatt (judd) · 2025-04-17T16:51:42.143Z · LW(p) · GW(p)
Cool point, yes, seems right!
comment by Wei Dai (Wei_Dai) · 2025-04-04T05:32:07.928Z · LW(p) · GW(p)
In China, there was a parallel, but more abrupt change from Classical Chinese writing (very terse and literary), to vernacular writing (similar to speaking language and easier to understand). I attribute this to Classical Chinese being better for signaling intelligence [LW(p) · GW(p)], vernacular Chinese being better for practical communications, higher usefulness/demand for practical communications, and new alternative avenues for intelligence signaling (e.g., math, science). These shifts also seem to be an additional explanation for decreasing sentence lengths in English.
comment by habryka (habryka4) · 2025-04-19T19:27:43.307Z · LW(p) · GW(p)
Promoted to curated: I don't think this post is earth-shattering, but it's good, short, and answers an interesting question, and does so with a reasonable methodology and curiosity. And it's not about AI, for once, which is a nice change of pace from our curation schedule these days.
comment by Kaj_Sotala · 2025-04-03T19:51:02.263Z · LW(p) · GW(p)
At first, I thought this post would be about prison sentences.
I got curious and checked if DeepResearch would have anything to add. It agreed with your post and largely outlined the same categories (plus a few that you didn't cover because you were focused on an earlier time than the screen era): "Cognitive Load & Comprehension, Mass Literacy & Broad Audiences, Journalism & Telegraphic Brevity, Attention Span & Media Competition, Digital Communication & Screen Reading, Educational & Stylistic Norms".
The last one I thought was interesting and not obvious from your post:
- Widespread literacy also had an effect on social norms. It wasn't just that sentences got shorter to accommodate the average reader, but also that it became more socially expected that writers accommodate the reader rather than the reader being expected to live up to the elite demands. This was partially connected to the rise of compulsory schooling. Once you're demanding that everyone learn to read, you kind of have to accommodate the limits of their abilities rather than just telling them "get good or gtfo".
- DR: More people could read, but to reach this broader audience, authors were compelled to write in a plainer style than the ornate constructions of previous centuries. We can view this as a shift in the social contract of writing: instead of readers straining to meet the text, the text was adjusted to meet the readers. Shorter sentences were a key part of that adjustment. [...] By the early 20th century, the norm had shifted – long-winded sentences were increasingly seen as bad style or poor communication, out of step with a society that valued accessibility.
- (This claim seems like it matches common sense, though DR didn't give me a cite for this specific bit so I'm unsure what it's based on.)
- DR also claimed that there was a "Plain Language movement" in the 1960s and 1970s, that among other things pushed for simpler sentences. Its only cite was to a blog article on readability.com, though Wikipedia also talks about it. You mentioned e.g. the Flesh-Kincaid formula in a descriptive sense, but it's also prescriptive: once these kinds of formulas get popularized as respected measures of readability, it stands to reason that their existence would also drive sentence lengths down.
- E.g. Wikipedia mentions that Pennsylvania was the first U.S. state to require that automobile insurance policies be written at no higher than a ninth-grade level (14–15 years of age) of reading difficulty, as measured by the F–K formula. This is now a common requirement in many other states and for other legal documents such as insurance policies.
There were a few other claims that seemed interesting at first but then turned to be hallucinated. Caveat deep researchor.
Replies from: Richard_Kennaway↑ comment by Richard_Kennaway · 2025-04-03T21:38:50.979Z · LW(p) · GW(p)
Re plain language movements, in the UK there were Gowers' "Plain Words" books from around that time (link provides links to full texts). I read these a very long time ago, but I don't recall if he spoke of sentence length, being mainly occupied with the choice of words.
comment by Mateusz Bagiński (mateusz-baginski) · 2025-04-03T18:33:52.967Z · LW(p) · GW(p)
Related: https://www.lesswrong.com/posts/Pweg9xpKknkNwN8Fx/have-attention-spans-been-declining [LW · GW]
Another related thing is that the grammar of languages appears to be getting simpler with time. Compare the grammar of Latin to that of modern French or Spanish. Or maybe not quite simpler but more structured/regular/principled, as something like the latter has been reproduced experimentally https://royalsocietypublishing.org/doi/10.1098/rspb.2019.1262 (to the extent that this paper's findings generalize to natural language evolution).
Replies from: simon, TAG↑ comment by simon · 2025-04-03T18:53:43.254Z · LW(p) · GW(p)
FWIW there is a theory that there is a cycle of language change, though it seems maybe there is not a lot of evidence for the isolating -> agglutinating step. IIRC the idea is something like that if you have a "simple" (isolating) language that uses helper words instead of morphology eventually those words can lose their independent meaning and get smushed together with the word they are modifying.
↑ comment by TAG · 2025-04-03T20:55:01.582Z · LW(p) · GW(p)
The idea that grammar is just inflection is misleading: languages that are mostly isolating can have complex ordering rules,like the the notorious adjective ordering of English.
As for french ...Moi, je ne me défile pas.
1st person. Sing.
1st person. Sing, again.
Negative.
1st person. Sing, reflexive.
Verb!!!
Negative,again.
Replies from: mateusz-baginski, raphael-roche↑ comment by Mateusz Bagiński (mateusz-baginski) · 2025-04-04T06:02:38.803Z · LW(p) · GW(p)
Fair enough. Modify my claim to "languages tend to move from fusional to analytic (or something like that) as their number of users expands".
↑ comment by Raphael Roche (raphael-roche) · 2025-04-20T21:59:17.687Z · LW(p) · GW(p)
Redundancy makes sure the information passes through. In French, the word 'aujourd'hui' ('today') etymologically means 'au jour de ce jour' ('on the day of this day'), but it is not uncommon to say 'au jour d'aujourd'hui' which would literally mean 'on the day of on the day of this day'. It is also common to say 'moi, je' ('me, I') and increasingly people even say 'moi, personnellement, je' ('me, personally, I'). This represents a kind of emphasis but also a kind of fashion, simular to what happens in the fashion industry, or a kind of drift, similar to what happens in the evolution of species.
comment by Mitchell_Porter · 2025-04-04T03:15:58.548Z · LW(p) · GW(p)
Humans didn't always speak in 50-word sentences. If you want to figure out how we came to be trending away from that, you should try to figure out how, when, and why that became normal in the first place.
comment by cdt (nc) · 2025-04-03T20:52:23.582Z · LW(p) · GW(p)
This may be because editing has become easier and faster to iterate.
It's comparatively easy to identify sentences that are too long. Is it easy to identify sentences that are too short? You can always add an additional sentence, but finding examples where sentences themselves should be longer is much harder. With more editing cycles, this leads to shorter and shorter sentences.
comment by DirectedEvolution (AllAmericanBreakfast) · 2025-04-04T00:49:26.182Z · LW(p) · GW(p)
Many short sentences can add up to a very long text. The cost of paper, ink, typesetting and distribution would incentivize using fewer letters, but not shorter sentences.
comment by Elizabeth (pktechgirl) · 2025-04-20T19:37:29.825Z · LW(p) · GW(p)
I write shorter sentences thanks to the editing work of LW editor @JustisMills [LW · GW] and the book Several Short Sentences About Writing.
comment by JenniferRM · 2025-04-06T17:21:09.421Z · LW(p) · GW(p)
I tend to follow the linguist, McWhorter, on historical trends in languages over time, in believing (controversially!) that undisrupted languages become weirder over time, and only gains learnability through pragmatic pressures, as in trading, slavery, conquest, etc which can increase the number of a language's second language learners (who edit for ease of learning as they learn).
A huge number of phonemes? Probably its some language in the mountains with little tourism, trade, or conquest for the last 8,000 years. Every verb conjugates irregularly? Likely to be found in the middle of a desert. And so on.
The normal, undisrupted, pattern is for every generation to make mistakes and play around, decorating the language with entropic silliness, and accidentally causing future children to "only really learn to speak fully properly" at older and older and older ages... until around 11 or 12 or 13 or 14 puberty strikes, and kids stop diligently learning any random bullshit the older people say based on trust. English competency arrives around age 8 because English is a toy language created by waves and waves and waves of trade, conquest, and cultural admixture. We have a lot of room to get much weirder and stay within traditional human bounds.
((That is, we have a lot of room for English, left alone, to mutate, IF this broader theory is correct. It might not be.
A way to test the larger theory would be to anthropologically construct a way of predicting from first principles when puberty tends to start in human subpopulations (because we have strong suggestions that diet and social patterns can change it), then reconstruct the predicted value of puberty onset over historical timescales, then correlate that to modern relatively easily measured "age until language mastery" for many modern languages.
That would confirm most of the theory. The other thing you'd need to track is the percentage of speakers who are speaking after learning any given language as a second language. High rates of this should simplify a tongue and cut against the other process that adds complexity by default.))
To show how weird English is: English is the only proto indo european language that doesn't think the moon is female ("la luna") and spoons are male (“der Löffel”). I mean... maybe not those genders specifically in every language. But some gender in each language.
I just looked up Gujurati, which is also descended from Proto-Indo-European and moon (chandri ("ચંદ્રા")) is feminine and ladle (chamcho ("ચમચો")) is masculine... but teaspoon (chamchi ("ચમચી")) is feminine(!)... so... yeah... that one retained gender and also has gender/semantic conflation! :-)
Except in English. The moon is a rock in English, not a girl. And a spoon is a tool, not a boy. Because English is a weird rare toy language (practically a creole, implying that it was a pidgin for many), that doesn't force people to memorize reams of playful historical bullshit, in order to "sound like they speak it properly" :-)
"English" traces all the way back to a language (with gendered declined nouns and verb conjugation) spoken by Eurasian Charioteers in 7000BC or whatever and at each step most of the changes were all just "part of the stream of invective".
...
Regarding word count specifically...
Something you find over and over and over again in language is agglutinating grammar where entire sentences are just. One. Word. But not like that... rather: Asinglebigwordcanbeusedtocommunicate oneideafromamongavastarray.
These languages are also often irregular! (6) Like the language was already agglutinative 1000 years ago, (9) and then people spent the next ten centuries making it more pronounceable, and punny, and fun??? (16)
Taakkua uqausiit taimaigajunngimmijuttauq! (3) Suurlu uqausiq katinngaliriilaurmat 1000 arraaguit aniguqsimaliqtut, (6) ammalu inuit atulaurmata qulinik arraagunik tukiqarniqsauliqtittinasuaqtutik, ammalu quvianarniqsauliqtuni??? (8)
> These words are not normal either! (6) Like language was already coherent 1000 years ago, (8) and people spent the last decade trying to make it more sensible, and cool??? (14)
The above paragraph round trips through "Google's understanding of Inuktut", which (I think?) is a simplified language arising from systematizing and averaging out dialects starting from relatively normally complex languages like Inuktitut... and basically all of those polar languages are agglutinative, and have been at least for centuries.
I brought that one paragraph back to English to suggest roughly how much was lost by Google's translation.
The parenthetic numbers show "words per clause" through the process:
6-->3-->5! 9-->6-->8, 16-->9-->14???
So here's my (half silly) proposal: maybe English experienced catastrophic simplifications between ~600AD and ~1500AD and then became preternaturally frozen once it was captured in text by the rise of printing, literacy, industrialization, and so on. The starting point itself was relatively unnatural, I think.
So then, in recent history, maybe what we're seeing is just a looooong and slooooow motion trend (that'll take a millennium or three to complete at this rate (unless we abandon literacy or something, and free the language from the strictures of printing and mass education?)) where English is still slowly trying to become an agglutinative language with irregular morphology?
Like (here's the deep crazy idea:) like maybe every language wants to ultimately be after >200 generations of accumulated youthful ignorance, cryptogenic wordplay, lazy mouths, and no writing?
For example: I just made up the word "cryptogenic" to be "having a genesis in a desire to be hard to understand" (which I considered myself to have a right to do, since english has a productive morphology) but when I looked up other skilled speakers have deployed it in other ways... Oxford thinks it means "(of a disease) of obscure or uncertain origin" and most of the usages are for "diseases not yet subjectively diagnosed by the doctor during the course of treatment (rather than diseases whose etiology is a known mystery to standard medical science)". It gets used like "Knowing the cause of a cryptogenic stroke can help prevent recurrent stroke" (source is the metadata summary of this webpage).
Whereas I'm claiming that many words are cryptogenic in the sense that they started out, like "skibidi", within youth culture because kids liked that grownups didn't know what it means. If "skibidi" catches on, and gains an intergeneratioanlly re-usable meaning (maybe related to being scared in a fun way? or yet-another-adjective like hep? or whatever?) then it will have been partly possible because kids liked having their own words that "parents just don't understand".
This is hard for English, because it is written. And because many second language speakers learn English every year.
But one thing that English can do (despite enormous pressures to be learnable and written in a stable way) is boil itself down to stock phrases for entire sentences. Later, these stock phrases could eventually agglutinate into single words, maybe, or at least they might if global civilization and travel and communication collapses in a way that leaves literally any humans alive, but trapped in tiny local regions with low literacy for many generations... which is a very specific and unlikely possible future. (Prolly we either get wildly richer and become transhuman or else just all end up dead to predatory posthumans.)
Replies from: mateusz-baginski, raphael-roche↑ comment by Mateusz Bagiński (mateusz-baginski) · 2025-04-06T17:59:54.404Z · LW(p) · GW(p)
To show how weird English is: English is the only proto indo european language that doesn't think the moon is female ("la luna") and spoons are male (“der Löffel”). I mean... maybe not those genders specifically in every language. But some gender in each language.
Persian is ungendered too. They don't even have gendered pronouns.
https://en.wikipedia.org/wiki/Persian_grammar
Replies from: JenniferRM↑ comment by JenniferRM · 2025-04-06T18:20:49.503Z · LW(p) · GW(p)
Thank you for the correction! I didn't realize Persian descended from PIE too. Looking at the likely root cause of my ignorance, I learned that Kurdish and Pashto are also PIE descended. Pashto appears to have noun gender, but I'm getting hints that at least one dialect of Kurdish also might not?!
If Sorani doesn't have gendered nouns then I'm going to predict (1) maybe Kurdish is really old and weird and interesting (like branching off way way long ago with more time to drift) and/or (2) there was some big trade/empire/mixing simplification that happened "more recently" with divergence later?
If neither of those are true, then my larger heuristic about "why English is weird" might have a deep abstract counter example, and deserve lower credence.
Persian is a language of empire and social mixing, so its "similar simplification" doesn't actually function as a strong counter-example to the broader thesis, but it is still great to be surprised :-)
↑ comment by Raphael Roche (raphael-roche) · 2025-04-20T22:30:44.878Z · LW(p) · GW(p)
This is interesting. I think English concentrates its weirdness in pronunciation, which is very irregular. Although adult native speakers don't realize it, this presents a serious learning difficulty for non-native speakers and young English-speaking children. Studies show that English-speaking students need more years of learning to master their language (at least for reading) than French students do, who themselves need more years than young Italian, Spanish or Finnish students (Stanislas Dehaene, Reading in the brain).
Replies from: JenniferRM, roger-scott↑ comment by JenniferRM · 2025-04-21T03:43:01.007Z · LW(p) · GW(p)
I think most of that is actually a weirdness in our orthography. To linguists, languages are, fundamentally a thing that happens in the mouth and not on the page. In the mouth, the hardest thing is basically rhoticism... the "tongue curling back" thing often rendered with "r". The Irish, Scottish, and American accents retain this weirdness, but a classic Boston, NYC, or southern British accents tends to drop it.
The Oxford English Dictionary gives two IPA transcriptions for "four": the American /fɔr/ makes sense to me and has an "r" in it, but the British is /fɔː/ has just totally given up on curling the tongue or trying to pretend in the dictionary that this is happening in human mouths.
That tongue curl is quite hard. Quite a few five year olds in rural Idaho (and maybe regions where rhotic dialects are maintained) often struggle with it, and are corrected by teachers and parents (and maybe made fun of by peers) for not speaking properly... for spontaneously adopting "a New York Accent" due a very common a childhood "speech impediment". Many ESL speakers drop it, hence the city dialects dropping it, not just in practice in the mouth, but officially.
("J" is a runner up for weirdness in the mouth, but I think that's just because the voiced postaveolar affricate /dʒ/ is a pretty rare phoneme.)
English orthography is kind of a disaster, I agree. It attempts to shoehorn a german/celtic/french/norse pidgin-or-creole into the latin letter system, and ... yeah. Tough task. It was never going to be clean.
If I was going to offer a defense of the status quo here, I'd say that there is no flat/simple orthography to switch to.
Every accent would need its own separate "spelling reform" and their texts would be less mutually intelligible, and it would hurt science and the letters quite a lot, and also probably lead to faster drift into a world where "English" denotes a language family rather than a language.
Interestingly, Interslavic is an attempt to "design by hand" a similar thing for slavic speakers to what English still has bascially for free: common words with stable spellings and meanings, and huge tolerance for how they are pronounced. Once you see the overarching vision for "a written language system" with these properties as a desirable end point... since English is already at that desirable end point, why change it? <3
Replies from: raphael-roche↑ comment by Raphael Roche (raphael-roche) · 2025-04-21T05:42:32.288Z · LW(p) · GW(p)
You're right. I said "pronunciation," but the problem is more exactly about the translation between graphemes and phonemes.
↑ comment by Roger Scott (roger-scott) · 2025-04-23T16:40:17.920Z · LW(p) · GW(p)
I don't think children have any more difficulty learning to speak English than other languages. The difficulty comes in learning to spell in writing and, to a lesser extent, learning to pronounce written words when writing. Btw, there's actually much more regularity in English spelling/pronunciation than may appear, and than is routinely taught. Much of the "weirdness" is the result of historical processes which are fairly regular in themselves, once you know the rules.
comment by jmh · 2025-04-04T14:01:43.806Z · LW(p) · GW(p)
Literacy seems to make sense to me but I might be missing something in the post. Writing is language and language is communication so at least two sides.
As more people learned to read, they also learned to write, and written communications increases. However, even with modest literacy one can read a long sentence. Or can do that when it is written by a good/skilled writer. But being able to read does not really lead to writing skills in most cases I suspect.
As more people started communicating via writing (think things like schools and education expansion) the skill level of the average writer likely declined. That probably lead to training next generation writes to write in a more simple sentence structure.
comment by leogao · 2025-04-03T22:22:56.331Z · LW(p) · GW(p)
shorter sentences are better because they communicate more clearly. i used to speak in much longer and more abstract sentences, which made it harder to understand me. i think using shorter and clearer sentences has been obviously net positive for me. it even makes my thinking clearer, because you need to really deeply understand something to explain it simply.
Replies from: arjun-panickssery↑ comment by Arjun Panickssery (arjun-panickssery) · 2025-04-03T22:27:25.758Z · LW(p) · GW(p)
Shorter sentences are better. Why? Because they communicate clearly. I used to speak in long sentences. And they were abstract. Thus I was hard to understand. Now I use short sentences. Clear sentences.
It's been net-positive. It even makes my thinking clearer. Why? Because you need to deeply understand something to explain it simply.
Replies from: leogao↑ comment by leogao · 2025-04-03T22:43:48.769Z · LW(p) · GW(p)
goodhart
Replies from: TsviBT↑ comment by TsviBT · 2025-04-04T05:52:18.095Z · LW(p) · GW(p)
Yes, and this also applies to your version! For difficult or subtle thoughts, short sentences have to come strictly after the long sentences. If you're having enough such thoughts, it doesn't make sense to restrict long sentences out of communication channels; how else are you supposed to have the thoughts?
comment by Dmitry Vaintrob (dmitry-vaintrob) · 2025-04-06T17:01:07.780Z · LW(p) · GW(p)
Thanks for this post. I would argue that part of an explanation here could also be economic: modernity brings specialization and a move from the artisan economy of objects as uncommon, expensive, multipurpose, and with a narrow user base (illuminated manuscripts, decorative furniture) to a more utilitarian and targeted economy. Early artisans need to compete for a small number of rich clients by being the most impressive, artistic, etc., whereas more modern suppliers follow more traditional laws of supply and demand and track more costs (cost-effectiveness, readability and reader's time vs. beauty and remarkableness). And consumers similarly can decouple their needs: art as separate from furniture and architecture, poetry and drama as separate from information and literature. I think another aspect of this shift, that I'm sad we've lost, is the old multipurpose scientific/philosophical treatises with illustrations or poems (my favorite being de Rerum Natura, though you could argue that Nietzsche and Wagner tried to revive this with their attempts at Gesamtkunstwerke).
comment by Violet Hour · 2025-04-06T12:28:56.229Z · LW(p) · GW(p)
One small, anecdotal piece of support for your 'improved-readability' hypothesis: ime, contemporary French tends to use longer sentences than English, where I think (native Francophones feel free to correct me) there's much less cultural emphasis on writing 'accessibly'.
E.g., I'd say the (state-backed) style guidelines of Académie Française seem motivated by an ideal that's much closer to "beautiful writing" than "accessible writing". And a couple minutes Googling led me to footnote 5 of this paper, which implies that the concept of "reader-centred logic" is particular to Anglophone speakers. So if your hypothesis is right, I'd expect a weaker but analagous trend (suggestive evidence) showing a decline in French sentence length.[1]
- ^
I have some (completely unbiased) quibbles with the idea that "short sentences reflect better writing", or the claim that short sentences are strictly "more readable" (e.g., I find the 'hypotaxic' excerpt much more pleasant to read than the 'parataxic'). But the substantive point about accessibility seems right to me.
↑ comment by Raphael Roche (raphael-roche) · 2025-04-20T23:24:32.798Z · LW(p) · GW(p)
You're right. The idea behind Académie française style guidelines is that language is not only about factual communication, but also an art, literature. Efficiency is one thing, aesthetics another. For instance, poetry conveys meaning or at least feeling, but in a strange way compared to prose. Poetry would not be very effective to describe an experimental protocol in physics, but it is usually more beautiful to read than the methodology section of a scientific publication. I also enjoy the 'hypotaxic' excerpt above much more than the 'parataxic' one. Rich sentences are not bad per se, they need more effort and commitment to read, but sometimes, if well written, give a greater reward, because complexity can hold more subtlety, more information. Short sentences are not systematically superior in all contexts; they can look as flat as a 2D picture compared to a 3D picture.
comment by eggsyntax · 2025-04-04T23:39:37.711Z · LW(p) · GW(p)
The average reader has gotten dumber and prefers shorter, simpler sentences.
I suspect that the average reader is now getting smarter, because there are increasingly ways to get the same information that require less literacy: videos, text-to-speech, Alexa and Siri, ten thousand news channels on youtube. You still need some literacy to find those resources, but it's fine if you find reading difficult and unpleasant, because you only need to exercise it briefly. And less is needed every year.
I also expect that the average reader of books is getting much smarter, because these days adults reading books are nearly always doing so because they like it.
It'll be fascinating to see whether sentence length, especially in books, starts to grow again over the coming years.
Replies from: Kenoubi↑ comment by Kenoubi · 2025-04-06T14:53:25.455Z · LW(p) · GW(p)
I agree that the average reader is probably smarter in a general sense, but they also have FAR more things competing for their attention. Thus the amount of intelligence available for reading and understanding any given sentence, specifically, may be lower in the modern environment.
Replies from: eggsyntax↑ comment by eggsyntax · 2025-04-07T16:11:33.067Z · LW(p) · GW(p)
Interesting point. I'm not sure increased reader intelligence and greater competition for attention are fully countervailing forces -- it seems true in some contexts (scrolling social media), but in others (in particular books) I expect that readers are still devoting substantial chunks of attention to reading.
Replies from: Kenoubi↑ comment by Kenoubi · 2025-04-08T12:50:58.916Z · LW(p) · GW(p)
That's possible, but what does the population distribution of [how much of their time people spend reading books] look like? I bet it hasn't changed nearly as much as overall reading minutes per capita has (even decline in book-reading seems possible, though of course greater leisure and wealth, larger quantity of cheaply and conveniently available books, etc. cut strongly the other way), and I bet the huge pile of written language over here has large effects on the much smaller (but older) pile of written language over there.
(How hard to understand was that sentence? Since that's what this article is about, anyway, and I'm genuinely curious. I could easily have rewritten it into multiple sentences, but that didn't appear to me to improve its comprehensibility.)
Edited to add: on review of the thread, you seem to have already made the same point about book-reading commanding attention because book-readers choose to read books, in fact to take it as ground truth. I'm not so confident in that (I'm not saying it's false, I really don't know), but the version of my argument that makes sense under that hypothesis would crux on books being an insufficiently distinct use of language to not be strongly influenced, either through [author preference and familiarity] or through [author's guesses or beliefs about [reader preference and familiarity]], by other uses of language.
Replies from: eggsyntax↑ comment by eggsyntax · 2025-04-08T20:45:34.106Z · LW(p) · GW(p)
even decline in book-reading seems possible, though of course greater leisure and wealth, larger quantity of cheaply and conveniently available books, etc. cut strongly the other way
My focus on books is mainly from seeing statistics about the decline in book-reading over the years, at least in the US. Pulling up some statistics (without much double-checking) I see:
(from here.)
For 2023 the number of Americans who didn't read a book within the past year seems to be up to 46%, although the source is different and the numbers may not be directly comparable:
(chart based on data from here.)
That suggests to me that selection effects on who reads have gotten much stronger over the years.
How hard to understand was that sentence?
I do think it would have been better split into multiple sentences.
the version of my argument that makes sense under that hypothesis would crux on books being an insufficiently distinct use of language to not be strongly influenced...by other uses of language.
That could be; I haven't seen statistics on reading in other media. My intuition is that many people find reading aversive and avoid it to the extent they can, and I think it's gotten much more avoidable over the past decade.
comment by ProgramCrafter (programcrafter) · 2025-04-03T19:49:17.604Z · LW(p) · GW(p)
I suggest additional explanation.
The bigger the audience is, the more people there are who won't know a specific idea/concept/word (xkcd's comic #1053 "Ten Thousand" captures this quite succinctly), so you'll simply have to shorten.
I took logarithm of sentence length and linearly fitted it against logarithm of world population (that shouldn't really be precise since authors presumably mostly cared about their society, but that would be more time-expensive to check).
Relevant lines of Python REPL
>>> import math
>>> wps = [49, 50, 42, 20, 21, 14, 18, 12]
>>> pop = [600e6, 700e6, 1e9, 1.4e9, 1.5e9, 2.3e9, 3.5e9, 6e9]
>>> [math.log(w) for w in wps]
[3.8918202981106265, 3.912023005428146, 3.7376696182833684, 2.995732273553991, 3.044522437723423, 2.6390573296152584, 2.8903717578961645, 2.4849066497880004]
>>> [math.log(p) for p in pop]
[20.21244021318042, 20.36659089300768, 20.72326583694641, 21.059738073567623, 21.128730945054574, 21.556174959881517, 21.97602880544178, 22.515025306174465]
>>> 22.51-20.21
2.3000000000000007
>>> 3.89-2.48
1.4100000000000001
>>> 2.3/1.41
1.6312056737588652
>>> [round(math.exp(26.41 - math.log(w)*1.63)/1e9, 3) for w,p in zip(wps,pop)] # predicted population, billion
[0.518, 0.502, 0.667, 2.234, 2.063, 3.995, 2.652, 5.136]
>>> [round(math.exp(26.41 - math.log(w)*1.63)/1e9 - p/1e9, 3) for w,p in zip(wps,pop)] # prediction off by, billion
[-0.082, -0.198, -0.333, 0.834, 0.563, 1.695, -0.848, -0.864]
↑ comment by Kaj_Sotala · 2025-04-03T21:54:55.575Z · LW(p) · GW(p)
Wouldn't people not knowing specific words or ideas be equally compatible with "you can't refer to the concept with a single word so you have to explain it, leading to longer sentences"?
comment by ryan_b · 2025-04-22T14:21:32.210Z · LW(p) · GW(p)
- How do we count specialized language? By this I mean stuff like technical or scientific specialties, which are chock-full of jargon. The more specialized they are, the less they share with related topics. I would expect we do a lot more jargon generating now than before, and jargon words are mostly stand-ins for entire paragraphs (or longer) of explanation.
- Related to jargon: academic publishing styles. Among other things, academic writing style is notorious for being difficult for outsiders to penetrate, and making no accommodation for the reader at all (even the intended audience). I have the sense that papers in research journals have almost evolved in the opposite direction, all though I note my perception is based on examples of older papers with an excellent reputation, which is a strong survivorship bias. Yet those papers were usually the papers that launched new fields of inquiry; it seems to me they require stylistic differences like explaining intuitions because the information is not there otherwise.
- Unrelated to the first two, it feels like we should circle back to the relationship between speaking and writing. How have sentences and wordcount fared when spoken? We have much less data for this because it requires recording devices, but I seem to recall this being important to settling the question of whether the Iliad could be a written-down version of oral tradition. The trick there was they recorded some bards in Macedonia in the early 20th century performing their stories, transcribed the recordings, and then found them to be of comparable length to Homer. Therefore, oral tradition was ruled in.
comment by ethanelasky (ethanelasky-1) · 2025-04-20T09:21:03.170Z · LW(p) · GW(p)
With regards to the State of the Union address, one contributing factor might be the method of delivery. The State of the Union is now intended for a television or radio audience, whose spoken format favors shorter, more simple sentence structures when compared with print.
comment by worse (Phib) · 2025-04-19T23:11:07.960Z · LW(p) · GW(p)
In school and out of it, I’d been told repeatedly my sentences were run-on, which, probably fair enough. I do think varying sentence length is nice, and trying to give your reader easier to consume media is nice. But sometimes you just wanna go on a big long ramble about an idea with all sorts of corollaries which seem like they should be a part of the main sentence, and it’s hard to know if they really should be a part of their own sentence. Probably, but maybe I defer too much to everyone who ever told me, this is a run-on.
Replies from: roger-scott↑ comment by Roger Scott (roger-scott) · 2025-04-23T16:51:09.152Z · LW(p) · GW(p)
Run-on refers primarily to topic drift, not word count. You can have a fairly short run-on sentence where the end of the sentence is talking about something different than the beginning. Frequent use of "and" between clauses is a clue.
comment by ChristianKl · 2025-04-05T15:41:35.220Z · LW(p) · GW(p)
It's worth noting that we observe other forms of simplication of language as well. English reduced the amount of inflections of verbs. The distinction between singular and plural pronouns disappeared.
comment by JJ Lawrence (jj-lawrence) · 2025-04-23T09:54:17.663Z · LW(p) · GW(p)
Reading this article made me immediately think about Russian literature, particularly from the 19th century. Mostly because of my background and how I am still working on adopting to English in my speaking and writing. Russian authors of that time are famous for their elaborate, intricate, and syntactically rich sentences. This isn't unique just to Tolstoy, Dostoevsky, Turgenev, and even Bulgakov later on. The Russian literary tradition is about constructing entire emotional and philosophical worlds within a single sentence where thoughts cascade into one another with modifiers piling up like layers of nested parentheses. Punctuation in Russian has its special role and many school children will describe feeling utterly tormented trying to use it correctly. However, besides rather complicated rules if to compare to English, punctuation in Russian functions more as emotional pacing rather than just strict grammatical markers.
This style encourages contemplation rather than efficiency. There is an obvious contrast with contemporary English, and American English in particular. In the modern technological age, journalism styles and contemporary design principles inspire authors to craft sentences that quickly capture attention and facilitate scanning rather than deep immersion.
Long sentences are often dismissed in contemporary writing because readers accustomed to quick, concise information can lose patience or attention, leading authors to favor brevity for maintaining engagement.
comment by drgunn · 2025-04-22T19:50:12.662Z · LW(p) · GW(p)
A useful direction for further research would be to analyze fiction and non-fiction as separate corpora. Longer sentences may reflect a tendency towards hypotaxis, and they may also be a deliberate stylistic choice for fiction writers, perhaps setting a more languid and relaxed tone not appropriate for much non-fiction.
comment by muggleschoolbus · 2025-04-22T02:31:31.625Z · LW(p) · GW(p)
I think the point about the Internet enabling a "wordier style" due to lower printing costs actually gets things backwards.
What actually matters is the competition for attention. Consider that as the barrier to entry to publishing has dropped, the number of suppliers has exploded, while the number of consumers has grown much more slowly. That means there's far more supply competing for limited attention. This creates enormous selection pressure for ideas to be consumable, instantly engaging, and spreadable.
I think eloquence and beautiful language rightfully thrived in the past when scarcity gave room for richer forms, but now lose out to clarity and punch.
You can observe potentially the same trend with other media in history: theatre -> film -> TikTok.
comment by Scott Burson (slburson) · 2025-04-20T00:10:11.636Z · LW(p) · GW(p)
Having had my writing criticized for decades on account of an allegedly overly hypotactic style, with parentheticals and qualifiers inserted freely, sometimes to multiple levels, I think it's safe to say I am attempting to reverse this trend single-handedly.
comment by Purplehermann · 2025-04-07T08:49:22.190Z · LW(p) · GW(p)
Pros: less to hold in your head at once, letting you focus on the content rather than keeping the words straight. (The longer the worse, and using different languages also makes this harder)
Cons: writers have less stylistic space in less words
Sentences should be shorter rather than longer, expcept where there is good reason (keeping out the less intelligent or for stylistic reasons)
comment by Alex K. Chen (parrot) (alex-k-chen) · 2025-04-04T20:48:40.231Z · LW(p) · GW(p)
Related https://www.econlib.org/archives/2008/10/where_is_the_po.html?utm_source=chatgpt.com
comment by A_Raybould · 2025-04-20T19:13:12.710Z · LW(p) · GW(p)
This is an interesting question and you have made many pertinent points, but it remains unclear to me why a move from listening to silent reading creates selective pressure for styles that can be received and understood quickly. If that is an advantage in silent reading, why less so for the same words spoken? After all, listening seems to be burdened with a few additional barriers to comprehension, such as in disambiguating homophones and the inability to skip backwards and re-hear what was just said.
The preference for brevity in telegraphy and newspapers does not strike me as evidence for the above proposition (and might be regarded as examples of the phenomenon to be explained, rather than part of its explanation.) In particular, telegraphy is actually aurally-received communication! In the case of newspapers, an alternative hypothesis lies in there having been clear pressure to compress the message into few column-inches.
You have presented evidence that writers today tend to use longer sentences while speaking than when writing, which (if it holds generally) is consistent with the view that brevity is more valuable in silent reading, but it does not, by itself, establish that as a fact, and it is also consistent with alternative hypotheses, such as speech being produced in real time, without much time for optimization.
One could say much the same about the Flesh-Kincaid readability scores, unless there is evidence that this holds less strongly (if at all) for the spoken word (the observation of writers being more loquacious when speaking is not sufficient to establish that: we would need evidence that long spoken sentences are easier to understand than the same thought spoken as one or more short sentences, and then we would want to understand why this is not the case when reading.)