The possible shared Craft of deliberate Lexicogenesis
post by TsviBT · 2023-05-20T05:56:41.829Z · LW · GW · 5 commentsContents
Prefatory notes Disclaimers Acknowledgements Random access essay Synopsis Condensation Rhapsody of words What is lexicogenesis? Creating words Language production in general Language makers What this essay is not about A sense that more is possible Survivorship bias for expressibility The feral botanist Unspeakable mind Spark of thought Sparks Burning Lost upsparks Aside on withdrawal and the leap Scaffolding thought Kindling Firebreak Catching upsparks Palinsynopsis More reasons for lexicogenesis Overview of reasons Theoretical reasons Examples General examples Examples from AGI alignment Personal examples Other Desiderata for words Seeds of the craft Try and say :) General motions Rooting in the criterion Preexisting words Boiling it down Word formation Ask a ninja language model Semantic development The shared craft Deliberate lexicogenesis Shared lexicogenesis Seeds of the shared craft Applying existing understanding Shared space Expressivizing the morphemicon Building resources Objections (or, pitfalls) References None 5 comments
[Note: crossposted from https://tsvibt.blogspot.com/2023/05/the-possible-shared-craft-of-deliberate.html.]
Words are good. Making more good words is good. Being better and faster at making more good words would be more good. Maybe we can get better and faster at making more good words by working together.
Prefatory notes
Disclaimers
Wer fremde Sprachen nicht kennt, weiß nichts von seiner eigenen.
(Whoever doesn't know foreign languages, knows nothing of his own.)
——Johann Wolfgang von Goethe[1]
Since I only speak English, my perspective is English-centric and more generally Indo-European-centric, and this essay will fail to integrate huge regions of the possibilities of language. Since I'm not a linguist, there will be errors and incompletenesses in this essay. Since I work on AGI alignment, recent examples of language of creation will be drawn from people working on alignment.
This essay is speculative, and emphasizes a vision that's exciting to me.
Acknowledgements
Thanks to Rafe Kennedy and to TJ for useful conversations about lexicogenesis. Thanks to Sam Eisenstat for spiritually related conversations. Thanks to Daniel Filan for comments on a draft.
Random access essay
Sections and some subsections of this essay can be read out of order without losing much. It's a long essay, so I'd encourage looking around for something interesting.
Synopsis
Lexicogenesis is the creation of new words. People do lexicogenesis when they have to talk about something new. When people have to think difficult new thoughts, they need new language. By working together, people could help each other make new language, and could develop a craft of lexicogenesis that people could use to come up with suitable new language. If you have ideas that might need new words to carry them, or if you want to help people come up with words, or if you want to make a shared craft of lexicogenesis, maybe say so in the comments or join this Zulip group.
Condensation
Extended table of contents:
- Rhapsody of words: Humans carry the world——outer and inner, object and thought——with them, in their Words.
- What is lexicogenesis?
- Creating words: Lexicogenesis is the creation of words.
- Language production in general: Lexicogenesis stands in for all forms of creating new language.
- Language makers: In all new periods and realms, people have created new language.
- What this essay is not about
- A sense that more is possible: Don't you remember, in the prehistory of your waking soul, when every name was new?
- Survivorship bias for expressibility: Ideas without words are lost, so it seems as though all useful ideas already have words.
- The feral botanist: To see what is even happening at all in the blooming, buzzing confusion, demands words.
- Unspeakable mind: Minds are especially big and murky and we don't have good words for minds and it would be nice if we did.
- Spark of thought: When thought is sparked, it wants to burn too fast and widely for the mind to keep up. Urgent fragile thinking needs scaffolding.
- Sparks: A spark of thought may come where words fail, when reality is glimpsed or grasped despite the clumsy language.
- Burning: Burning thought gives opportunities to forge ideas.
- Lost upsparks: Precious embers float away from a burning thought.
- Aside on withdrawal and the leap: Some realms of thinking withdraw, and can only be reached with a running leap.
- Scaffolding thought: Fuller language and adepter lexicogenesis might better ignite, feed, channel, and preserve the spark at a heartier burn.
- Kindling: Words push up to the edge of what's speakable.
- Firebreak: Words wrangle thought.
- Catching upsparks: Words put thoughts into time, letting them unfold across episodes of thinking.
- Palinsynopsis: A greater ability to make new words opens a greater ability to put new thoughts into words.
- More reasons for lexicogenesis
- Overview of reasons: Lexicogenesis is shown to enhance thinking by its history, by its living role in thinking, and by its possibilities.
- Theoretical reasons: Words bring reality into light.
- Examples
- Desiderata for words: A new word should, in its sound and structure, well-serve a needed role in a communal context of thinking.
- Seeds of the craft
- Try and say :)
- General motions
- Rooting in the criterion: A new word is needed because there's a new proposition to be spoken.
- Preexisting words: Is there already a word for the idea?
- Boiling it down: Can the idea be rendered in a short phrase?
- Word formation: The ways that words are formed can be used as processes to generate new words.
- Ask a
ninjalanguage model - Semantic development: Can the word's relation to the idea be patterned off known ways that words relate to ideas?
- The shared craft: People could work together to refine and share methods that fluently create good new words.
- Deliberate lexicogenesis: People have consciously tried to create good new words, showing a want of a craft of lexicogenesis.
- Shared lexicogenesis: People working together might accumulate shareable skills for making words.
- Seeds of the shared craft
- Applying existing understanding: A lot of scientific work bears on lexicogenesis.
- Shared space: To grow a craft, have a shared (cyber)space for that craft.
- Expressivizing the morphemicon: A deeper store of meaningful elements combines to make a greater range of possible words.
- Building resources
- Objections (or, pitfalls)
- References
Rhapsody of words
Humans carry the world——outer and inner, object and thought——with them, in their Words.
Humans encounter novelty. A strange beast, a tasty plant, a glowing destroyer and warmth giver, an alien tribe; a glinting ore, an adamantine symmetry in a diagram; a stone that moves another stone without touching it; rage, terror, and ecstasy; perspectival vision frozen and flattened onto a canvas——an infinite self-transforming kaleidoscope.
Humans encounter novelty. Not just a mute, undirtied sightseeing, but interest (inter-esse, being-amongst)——a muddy, fighting, duck-your-head-to-climb-inside encounter.
When humans meet, ponder, taste, reckon, carry, resist, or play with a Thing, they do something no other animal does: they speak it. In speaking the Thing, a human takes the Thing with zer, even if the material object is left on the ground. The human sings about the thing around the fire. It lingers with zer; ze paints it on the rock face, bringing the Thing back more fully into zer mind's eye:
(Guthrie, page 4.[2])
Even with the Thing not there, the humans accumulate thoughts, ideas, intentions, and information about the Thing. Humans gather thought around Names. And humans think with Words, which don't have to stand for Things, but rather can in general gather and deploy thought into any shape.
The humans carve words in stone, bone, and clay, originarily inventing solidified speech in many times and places:
Uruk proto-cuneiform, Iraq, c. 3050 BCE
Sumerian cuneiform, c. 2600 BCE
Egyptian hieroglyphs, c. 2300 BCE
Indus Valley script, c. 2600–2000 BCE
Oracle bone inscriptions, China, c. 1050 BCE
Olmec Cascajal block inscription, c. 900 BCE
Maya script, Dresden Codex, c. 1100 CE
Even with the speaker not there, and the thing long gone, the word can be heard.
How do humans speak thought? How do humans put the world into words?
How did you get up there?
And yet you know tens of thousands of words and combine them to speak suitably in a vast range of possible situations.
Like a scaffold for builders, like a bush's branches holding up a delicate spider's web, like a crystal growing by knitting together molecules pulled from the froth, language borders the delicate frothy edge of thinking.
In our speech and thought, what desire paths want to form? If every thinker had a thousand lifetimes to craft thought, what words and word-making craft would be created?
What is lexicogenesis?
Creating words
Lexicogenesis is the creation of words.
A thousand years ago, the words we speak (in English) were either nonexistent (such as "laser"), waiting latent in the possibility-space implied by the material available (such as "electron"), or scattered across many lands in proto-form; ten thousand years ago, probably almost all the words we speak today were nowhere to be found on the face of the Earth; and a million years ago, there were fairly likely no words at all. These words came from us, somehow. As language is a human universal, the creation of words is a human universal. Lexicogenesis is found in every child.[3] It is found in such abundance that children can create whole new languages when growing in an environment lacking stable language (creoles emerging from eclectic, unstandardized pidgins[4]) or even almost entirely lacking accessible language (the creole-like Nicaraguan Sign Language that emerged almost de novo in the 1980s among children).
At some moment, a language can leave open a role in speaking and thinking that ought to be played by some word, but that no word is currently playing. How does a language come to have a word to play an unfilled role? The process can be called lexicogenesis. "Lexicogenesis" emphasizes word creation as a deliberate activity, done by speakers who have a language and need new words for that language.
Here is a definitely totally complete list of ways that words and roots are created:
- Compounding. (Wiki)
- lampshade ⟵ lamp + shade
- waterfall ⟵ water + fall
- Compounding (including bound roots). (Wiki)
- biology ⟵ bio- + -ology
- telepathy ⟵ tele- + -pathy
- Derivation. (Wiki)
- uncover ⟵ un- + cover
- backward ⟵ back + -ward
- truthful ⟵ truth + -ful
- gespielt ⟵ spielen + ge-><-t (German circumfix)
- vincō ⟵ victus (Latin nasal infix, inherited from PIE)
- salaam (peace), islam (submission) ⟵ s-l-m + [Arabic transfix patterns]
- ímport (noun) ⟵ impórt (verb) + [stress suprafix]
- the Whíte House ⟵ the white hóuse + [stress suprafix]
- Tmesis. (Wiki)
- absofreakinglutely ⟵ absolutely + freaking
- Apophony. (Apophony, ablaut)
- sing, sang, sung
- tooth, teeth
- think, thought
- Inflection. (Wiki)
- ladders ⟵ ladder + [plural]
- covered ⟵ cover + [past tense]
- David's ⟵ David + [possession]
- Backformation. (Wiki)
- burgle ⟵ burglar
- babysit ⟵ babysitter
- taxon ⟵ taxonomy ⟵ τάξις (táxis) + νόμος (nómos)
- Liberated affix. (Rebracketing, libfix, novel root extraction)
- {workaholic, chocoholic, ...} ⟵ -holic ⟵ alcoholic ⟵ اَلْكُحْل (al-kuḥl, Arabic)
- {morpheme, meme, sememe, ...} ⟵ -eme ⟵ phoneme ⟵ φώνημα ⟵ φωνέω + -μᾰ
- {cheeseburger, mushroomburger, nothingburger, ...} ⟵ burger ⟵ hamburger ⟵ Hamburger (native of Hamburg)
- {Waluigi, wawaluigi [LW(p) · GW(p)], WaDan} ⟵ wa- ("evil, inverted") ⟵ Wario ⟵ Mario + warui (Japanese 悪い, "bad")
- Productivization. (Wiki)
- {e-commerce, e-book, ...} ⟵ e- ⟵ e-mail ⟵ electronic mail
- {mindspace [? · GW], worldspace [? · GW], policyspace [? · GW], featurespace [? · GW], ...} ⟵ -space ⟵ thingspace [LW · GW] ⟵ space
- Blending (portmanteau). (Wiki)
- motel ⟵ motor + hotel
- smog ⟵ smoke + fog
- Clipping, truncation. (Wiki)
- fax ⟵ facsimile
- bot ⟵ robot
- Acronym. (Wiki)
- MIDI ⟵ Musical Instrument Digital Interface
- DNA ⟵ Deoxyribo-Nucleic Acid
- Borrowing (importing). (Wiki)
- behemoth (English, Latin) ⟵ בהמות (Hebrew)
- coach ⟵ kocsi (Hungarian)
- bamboo ⟵ bamboe (Dutch) ⟵ bambu (Portuguese) ⟵ bambu (Malay) ⟵ ಬಂಬು (Kannada)
- Revival (self-borrowing).
- Loan translation (calque). (Wiki)
- loanword ⟵ Lehnwort (German lehnen (to lend) + Wort (word))
- by heart ⟵ par cœur (Middle French)
- Bag End (in the Shire) ⟵ cul-de-sac (French)
- Reduplication (reshmuplication). (Wiki)
- chit-chat ⟵ chat
- fancy-shmancy ⟵ fancy
- like-like ⟵ like
- Onomatopoeia (motivated root-creation). (Wiki)
- meow ⟵ [the sound a cat makes]
- ding-dong ⟵ [the sound a bell makes]
- Ex nihilo root-creation. (Link)
- grok ⟵ ø
- Kodak ⟵ ø
- googol ⟵ ø
- Eponymization. (Wiki)
- quixotic ⟵ Don Quixote
- diesel ⟵ Rudolf Diesel
- boycott ⟵ Charles Boycott
- silhouette ⟵ Etienne de Silhouette
- Semantic development (change, progression). (Wiki)
- mouse (for a computer) ⟵ mouse (rodent)
- boojum (a pattern of topological defects in superfluid ³He-A) ⟵ boojum (a type of Snark)[6]
- awful (very bad) ⟵ awful (full of awe)
- liquor (alcoholic drink) ⟵ liquor (any liquid)
- grasp (understand) ⟵ grasp
- broadcast (mass transmission of signals) ⟵ broadcast (seeds in a field)
- press (reporters) ⟵ press (printing device)
- hands (people on a ship) ⟵ hands (appendage)
- corn (maize) ⟵ corn (grain)
- blade (a whole sword) ⟵ blade (cutting edge)
- Changes to the sound of a word. (Sound change, rebracketing) [This is somewhat of a token category, since sound changes alone don't create a new word function.]
- (an) apron ⟵ (a) napron (Middle English)
- speedometer (with the "-o-" added for ease of pronunciation, by analogy with words like "odometer") ⟵ speed + meter
- third ⟵ thridda (Old English)
- cinco "seenko" (Spanish) ⟵ quīnque "kweenkweh" (Classical Latin)
- England ⟵ Englaland (Old English)
- Boundary loss.
- today ⟵ to-day, todæg ⟵ tō dæġ (Old English "to day")
- gonna ⟵ going to
- alone ⟵ allone (Middle English) ⟵ all oon (Middle English, "all one")
- Grammaticalization. (Wiki)
- Spoken writing. (Link, Link)
- quote-unquote ⟵ ""
- dot dot dot ⟵ ...
- slash ⟵ /
- confetti ("congratulations") ⟵ 🎉
- "weak-star topology" ⟵ weak-* topology
- re (quotative) ⟵ Re: (from thread replies)[8]
Spoken writing goes far enough that Serge Lang had the chutzpah to title a textbook just SL₂(R):
- Morpheme upgrading (bound to free).
- omics ⟵ -omics
- ology ⟵ -ology
- ism ⟵ -ism
- ish (free adverb, as in A: "You know him?" B: "Ish.") ⟵ -ish (phrasal suffix, as in "three o’clockish") ⟵ -ish (suffix, as in "sheepish", "bluish", "elevenish")
- emic, etic ⟵ phonemic, phonetic
- Conversion (zero-derivation). (Wiki)
- a bite (noun) ⟵ to bite (verb)
- to clean (verb) ⟵ clean (adjective)
- up (verb) ⟵ up (adverb)
- ifs and buts (nouns) ⟵ if, but (conjunctions)
- where (noun, as in "the what and the where") ⟵ where (conjunction, question word, relative adverb)
Language production in general
Lexicogenesis stands in for all forms of creating new language.
First of all, if lexicogenesis is the creation of words, what even is a word?
This is a difficult question which many people have thought hard about. For example, is "apartment building" a word? Of course not, it's a phrase... except that prosodically, it's one word: it has one major stress. If you say "There's a green building.", there is more stress on "building" than in "There's an apartment building." (unless you're saying, no it's not a red building, it's a green building), so "apartment" isn't just some sort of adjective.[9] And the words "apartment" and "building" intuivitely seem to occur next to each other, in that order, far more frequently than one might have expected (they're a collocation). For an even weirder example: "big-plate" in Chinese is like a word (you can't say "very big-plate", because "big-plate" is a noun), but also rather like a phrase and not like a word (you can't say "white big-plate" because that would be the wrong order of adjectives, you'd have to say "big white plate").[10]
Is the "'s" at the end of a possessive, like "Alice's", a word? Is "ice cream" one word or two words? Some ways to newly use words don't clearly create new words: clipping, borrowing, and in general semantic development. Are these lexicogenesis? Would creating a new bound morpheme count as lexicogenesis?
Thankfully, these questions don't need to be answered before creating new language. Lexicogenesis is a synecdoche for the creation of new language in general. (What is a good word for that? "Glossopoesis"?) Glossopoesis can involve creating new:
- Sounds.
- Punctuation.
- For example, [brackets enclosing long phrases embedded in sentences where the end of the phrase wouldn't otherwise be clear] can make the sentence easier to parse (though often rewriting the sentence would be better).
- Daniel Filan points out that the Greek new testament has no quotation marks because Greek had no quotation marks, leading to confusion between the speaker and narrator——for example, Daniel points to John 3:16, "For God so loved the world that he gave his one and only Son...", which is attributed to Jesus in the English Standard Version but to the narrator in the New International Version and the Revised Standard Version. The Hebrew Old Testament also lacks quotation marks.
- Morphemes, etymons.
- For example, "-space" has been productivized, forming mindspace, thingspace, policy-space, distribution-space, etc.
- The etymon "-fer" is currently opaque and unproductive in English (think of refer, defer, infer, confer), but it could be reproductivized.
- Words.
- New uses for existing words.
- Phrases.
- Grammatical structures.
Language makers
In all new periods and realms, people have created new language.
Wherever there's a creative froth——new phenomena, new ideas, new events, new self-transforming stories, new contexts——people come up with words and syntax to speak new thoughts. They've engaged in what could be called lexicogenesis, onomaturgy, logogenesis, glossopoeisis, wordsmithing, neology, semiurgy, lexical innovation, morphological derivation, lexicalization, word formation, or simply: making new words and making new language.
-
Ursprecher. The first people to speak created our world. וְכֹל אֲשֶׁר יִקְרָא-לוֹ הָאָדָם נֶפֶשׁ חַיָּה הוּא שְׁמוֹ Their utterances are lost to time. Some refracted flavor of the event is given by Proto-Indo-European——for example, the English word "wolf" comes from PIE "*wĺ̥kʷos", which ramifies all across the PIE family tree (into Romance, Greek, Germanic, Celtic, Slavic, Sanskrit, Hittite, Persian, etc.).
-
We can't see the first languages, but we can see some originary languages——languages that radically (to the root) break from the lexicon and/or grammar of the creator's preexisting languages. (Autonomous languages spoken by young siblings may be a liminal example, perhaps construable as an idiosyncratic, conventionalized, incompletely-learned form of the language spoken by surrounding adults.)
-
Pidgins. A pidgin is a sort of originary quasi-language. It's a way of communicating that develops between multiple groups of adults who lack a shared language, forced to communicate by the context of trade contact, colonies, or plantations. Pidgin speakers use whatever linguistic material they happen to share with an interlocutor, from any of the input languages, to combinatorially construct adequate messages on the fly. To get a taste of this way of speaking, play Person Do Thing with some friends. Bickerton[4:1] mentions that some poor fool made an entry for the word "big box you fight him he cry" in a dictionary for New Guinea pidgin (it's presumably not a standardized expression but a nonce form, used to indicate a piano).
-
Creoles. Pidgins are unstable, with minimal grammar and a shifting lexicon. A creole language, though, is a full language. Creoles have expressive, regular grammar. More mysteriously, creoles have grammatical structures not present in the languages that gave input to the originating pidgin, and what's more, creoles from many times and places share perhaps more grammatical features than would be expected by chance.[11] The mountain of a creole emerges from the shifting sand dunes of a pidgin by the regularization pressure applied by children structuring their language[4:2] (though maybe creoles can also emerge among just adults[12]).
-
Nicaraguan Sign Language. In the 1980s in Nicaragua, a new language formed among deaf children, dozens of whom were for the first time put together in schools, having previously only communicated with their families via scattered home sign systems (another kind of originary language). Lexemes were drawn from home signs, from gestures made by Spanish speakers, and later from other preexisting sign languages; and created via iconicity, Spanish initialisms, abstraction and distillation from holistic gestures, and other inventions. Younger children created novel grammatical conventions which were picked up by later cohorts——e.g. multiple signs that were co-spatialized (i.e. performed with the hand in one specific spatial sector, such as to the left of the speaker) came to be necessarily interpreted as referring to a single referent.[13] Fundamental properties of language visibly emerged: for example, while hand gestures made by Spanish speakers are analog and holistic, successive NSL cohorts increasingly use discrete, analyzed signs (see the figure below, from Senghas 2004).[14] In this way children create a language that gives up some of the in-context information density of bespoke iconic gesture, in exchange for an infinitely enriched ability to flexibly combine simple conventionalized signs in serial time.
-
- Children. Lacking a fleshed-out mental lexicon, children productivize language to fill in gaps on the fly. E.g. "light-man" (man who fixes lights), "the weighter" (the scale), "nose-beard" (whiskers), "good earsight" (good hearing), "tennis it" (hit it with a tennis racket).[15]
- Natural scientists. By resurrecting, reshaping, and combining words from Greek and Latin, scientists mint new words to give fixed names to newly identified phenomena.[16] "Atom" (un-cuttable), "photon" (light-thing), "syndrome" (together-course), "Lepidoptera" (scale-wings), "hydrogen" (water-creator). Mathematicians create fractal vocabularies, making names from notation, from mathematicians (eponyms), from metaphors, to discuss ever-finer distinctions and ever-arcaner alien objects; e.g. Wiki's glossary of topology. See the sprawling lexicons with which the philosophers, the linguists, the botanists, and any kind of natural scientist speak in their own language suitable to saying the secret things that they can see.
-
Programmers. "There are two hard things in computer science: cache invalidation, naming things, and off-by-one errors." (h/t) Programmers wield names, and each one of their tools for wielding names is a name.
create_argparser
,ternary_search
,GroupNorm32
,zero_module
,start_interaction_loop
,InputRange
,improve_code
,NotifyThreadWatchpoint
. Programmers make the most names... -
Conlangers. ...Except maybe for conlangers. A conlang is a constructed language. Most conlangs appear in fiction (e.g. Tolkien's work) or as a personal or artistic experiment; on the other hand, there are hundreds of people who speak the constructed language Esperanto as a first language among other first languages, and many tens of thousands who speak it as a second language (Wiki). Conlangers create the most language overall, since they potentially choose not only words but also phonemes, morphemes, and orthography, and also the way that elements combine (phonological, morphophonological, morphological, and syntactic rules).
-
Language vitalization. When people perceive that a language they want to use needs to be able to say more things, they find or make words to say those things through importing, semantic generalization, and making new words.
-
In the 19th and 20th century, Hebrew was revived into a modern language by taking words from Biblical and Rabbinic Hebrew, adopting words from other languages, and inventing words (e.g. תפוז ("orange" (fruit)) ⟵ תפוח ("apple") + זהב("gold")). In the past few hundred years, Russian writers have consciously experimented with ways to expand the Russian lexicon to fill gaps, adding now-common words such as "предмет" ("object") and "маятник" ("pendulum").[17]
-
Cultural contact. As a special case of language vitalization, people brought into contact with novel material culture (e.g. by colonizing, by being colonized, or through trade and globalization) come up with words for artifacts novel to them. E.g. Hupa people native to California, faced with 1849 Gold Rush immigration, innovated their language (now very nearly extinct) through importing (e.g. whilba ⟵ "wheelbarrow") and semantic development (e.g. ts’iłting’ (rifle ⟵ bow) and dinday (bullet ⟵ arrowhead)), but mostly through making new words (e.g. te:lma:s ("something rolled up", cigarette), ’a:da:-nahł’its ("by itself it runs around", train, automobile), ’a:da:-yixine:wh ("by itself it talks", telegraph, radio)).[18] Trade brought "indigo" (literally "Indian") through Greek to English. The British Raj imported from India words such as "shampoo" and "jungle" (Link).
-
Language migration. When people start speaking a language that hadn't already developed within their cultural context, they make the new language their own. For example, today in many places English is spoken only somewhat recently (e.g. brought by colonization, or adopted as a local shared language or gateway to the global economy).[19] In such contexts, speakers will use words at hand in novel ways (semantic development) for the meanings they need, e.g. "globe" (electric lightbulb) and "chop" (eat) in Nigerian English[20] and "concrete" (solid food) and "opponenting" (opposing) in Ghanaian English[21]. Words might be taken from the preexisting language into the new language, e.g. "akara" (a Nigerian bean dish)[20:1], or brought over by loan translation, e.g. (quoting Bamiro[21:1]) "big man" (translation of Akan "okesee", affluent member of the Ghanaian society).
-
-
Writers. People who write about other worlds have need of other words like "telekinesis" and "hyperdrive" and weirder ones, and people who express subtle things have need of subtle language. Shakespeare created some hundreds of English words such as "immediacy" and "enmesh" (though mostly now-unused ones).
-
Cryptolects. Groups that want to hide what they're saying from other people invent secret languages. Thieves in Great Britain would say "cully" (a victim) and "bung" (a purse). Jane the resident asks her colleague with a troublesome and unhopeworthy patient: "Can't you slow-code him?"——meaning "Can't you let his cardiac arrest play out as it will?".[22] Tsarist and Soviet censorship stimulated Russian writers to use "Aesopian language" to hiddenly communicate antiregime thoughts. Someone looking to avoid censorship from social media might offer help getting an abortion by offering help with "going camping".
- Subcultures in general. Communities centered around some domain make their own sub-lexicon. At Microsoft they speak Microspeak, skaters speak skater-speak, the military has its (partly cryptolectic) slang, LW!rationality has (or consists of) a rich lexicon [LW(p) · GW(p)].
What this essay is not about
Some activities not covered by this essay:
- Lexicogenesis for the sake of having more words. There are plenty of words, so there's no nonspecific need for there to be more words. So this essay isn't talking about creating words just to have more words, or creating other language structures for their own sake.
-
Idea babble. Likewise, this is essay isn't about creating permutations of ideas, without purpose or context, and then coming up with words for them. This essay is about needful wordcrafting——wordcrafting for ideas that take center stage or play a key supporting role, which should have clear and handy words.
-
Words as art. E.g. Jabberwocky, Russian однословия ("odnoslovie", "univerbalness", one-word art)[17:1].
-
Words as play; lexicogenesis for its own sake. This essay is very serious business. But on the other hand, it is fun to make up words, e.g. as a way to play with sounds and morphology or to imagine things or because the words sound nice or because they make the world a little your own. E.g. this adorable comic by Grant Snider of Incidental Comics:
- Non-semantic language-alikes. Glossolalia, asemic writing. (Unlike the other items on this list, these can be almost safely excluded from lexicogenesis.)
-
Frivolous words; words that start as sounds. Beyond art and play, there are ugly words, made to grab attention, that have no wholesome purpose. E.g. the infamous Snickers ads ("satisfectellent"). There are also cutesy words, puns or portmanteaus, which... de gustibus non est disputandum. E.g. these ("hummucide") or these ("experimence"). This essay doesn't discuss words created by criteria that have little to do with the idea the word will carry——e.g. if someone looks for a meaning for a clever-sounding word.
-
Fancy-shmancy words. Even if an element of language isn't easy to justify as being necessary beyond preexisting language, sometimes it is valuable anyway, e.g. for making subtle distinctions. Still, some words are fancy for the sake of fanciness.
-
Constructing a language. Conlanging is the craft of creating whole languages. This essay's perspective is from within the preexisting language spoken fluently in everyday practice by a wordcrafter, aiming to make words that serve a purpose in thinking in that language and that fit within the preexisting structures (phonological, morphological, grammatical) of the language.
-
Revising or restricting language. This essay discusses creating new language to carry ideas that couldn't be easily carried before, not removing meaning from existing language. E.g. an attempt, in the name of precision, to make a word "mean only one thing". Precision is useful, but for example trying to revise "un-" to only mean negating a property and to not mean undoing an action both is impossible and also wouldn't be good, and anyway isn't discussed here.
- Systematizing or engineering language. Some people have created languages that aim to have desirable or interesting systematic properties, such as syntactic or lexical unambiguity, including sometimes with a goal of enhancing thought. See engineered languages. These explorations might shed light on how natural language works, but since we don't already know how natural language works and natural language is how we think, this essay focuses on lexicogenesis as incremental (endosystemic) addition to a natural language, rather than a broad (diasystemic) radical ("to the root") shift. Natural language can't be precircumscribed.
A sense that more is possible
Don't you remember, in the prehistory of your waking soul, when every name was new?
"A Sense That More Is Possible [LW · GW]" argues that there's no formidable shared art of rationality because people don't have the sense that such an art could exist. This section tries to gesture at a sense that more is possible with language——that thinking that matters is thirsty for new ways of speaking.
Pointing at a sense that more is possible with lexicogenesis is a bit like going to a monastery. In the monastery there's a monk who has lived only there for his entire life, knowing only those mountain paths. Now, try to justify to him the utility of learning one's way around a new place. At one time he did learn his way around new places, but he's long forgotten that time.
Survivorship bias for expressibility
Ideas without words are lost, so it seems as though all useful ideas already have words.
In "The words in science fiction", Larry Niven writes:
The "Newspeak" of 1984 was a language so designed that certain thoughts would be unthinkable in it. One must wonder if certain thoughts, crucial thoughts, are unthinkable in English, or in any human language, including mathematics.
We can think of a bunch of ideas that we like, and then check whether there are adequate words to express each idea. We will almost always find that there are adequate words. To conclude from this that we have an adequate lexicon in general, would ignore a survivorship bias. We can think of the ideas that we have words for, much more easily than we can think of the ideas we don't have words for.
All those forgotten ideas, concepts, and mental motions, the ones that weren't rightly put into words——the ideas in the past, and the ones that will come up in the future——there is gold there. How many times have you heard or said or thought a phrase like "...which there isn't a good word for..."? The referents of those phrases were lost. There are are more ideas lost to wordlessness than we know. Stefan George's poem "The Word" [23]:
Wonder from distant land or dream
I carried to my country's seam
And waited till the twilit norn
Had found the name within her bourne—
Then I could grasp it tight around
Now blooms and shines it, through the bound...
Once, I returned from happy sail,
with a prize so rich and frail,
She sought for long and tidings gave:
"No suchlike sleeps in this deep cave."
Thence escaped it from my hand—
The treasure never graced my land...
So I renounced and sadly see:
Where word breaks off no thing may be.
The feral botanist
To see what is even happening at all in the blooming, buzzing confusion, demands words.
Suppose you are a botanist, but a feral one. You'd like to describe plants carefully and in detail, so that you can distinguish different species and discern when a plant is growing healthily or not. But, you've not been enculturated into botanical vocabulary. What do you see here? How would you say what you see?
[Cropped from Steven Lucas, "Aroid (Araceae) and Tropical plant Botanical Terminology with Latin pronunciations", photo copyright 2010 Leland Miyano.]
What I (another feral botanist) see is a curled dark green leaf with red veins.
What an expressive and discerning botanist sees is a plant with, among other features, "supervolute vernation and leaf blades with scalariform secondary venation". To translate: When a new leaf blade of this species emerges, it is a single leaf blade, and one edge of the blade is curled inward while the other is curled around the first, so that the whole blade forms a spiral. The secondary veins (which come out of the primary central vein) of the leaf blade are parallel and spaced evenly, so that they are arranged uniformly like the rungs of a ladder.
You and I, feral botanists, see less, and bring back less to our country, than does the wordful botanist.
Unspeakable mind
Minds are especially big and murky and we don't have good words for minds and it would be nice if we did.
Imagine that we lacked words for anything mental. We can talk about non-person objects, and people's bodies, and we can describe low-level behavior, like "noise is coming out of his mouth" or "her hand is going upward". But we don't describe mental activity. We don't talk about thinking, knowing, belief, concepts, ideas, memory, understanding, bias, desire, attitude, personality, emotion, and so on.
In some ways this would be fine. We could still, for example, say "her body is traveling in a straightish line, so if I walk on this side of the sidewalk, we won't collide". But in a lot of ways we'd be very confused. Why are people buying a lot of toilet paper all of a sudden? (We can't say "they believe there will be a shortage and want to ensure their supply".) We can't pass on stories about people making decisions, learning things, or being in conflict, so we can't accumulate familiarity with and knowledge about those things.
We are still in this position with respect to minds (intelligence, the power of mind over the world, values, learning). We still lack the words and ideas to describe well what happens so that minds (and something within minds) come to determine the course of the world.
Spark of thought
When thought is sparked, it wants to burn too fast and widely for the mind to keep up. Urgent fragile thinking needs scaffolding.
In some forms of thinking, there comes a time, in the course of minutes or hours, when paydirt is glimpsed. Old questions are renewed, stagnant ideas are agitated and can be reforged, provisional concepts are connected to what they waited for, answers are nucleated and nourish new questions; and the unseen movements of the thinking thing are exerted, applied, and thereby intimated and adumbrated. The paydirt is prone to be mostly swallowed back up by the Earth. Lexicogenesis might better support the mineshafts and break the rock to keep the paydirt open.
Sparks
A spark of thought may come where words fail, when reality is glimpsed or grasped despite the clumsy language.
Some metaphorical situations where words fail, but still thinking may be called for:
- Reaching the edge. Towards the center of the island of knowledge, everything has names, the names suitably lay out separately what is suitable to be separately put together into propositions, and what's known can be put into spoken propositions. Towards the shoreline of knowledge, things lack suitable names, and the names that are there may only poorly separate out what should be separate. There are noumena there——those things which are poorly described by the names that are preliminarily there——noumena graspable as gestalt phenomena by ostensive definition and observation, waiting to be well described.
-
Crosshatch. Unsuitable ideas form a crosshatch pattern with reality. They don't carve reality at its joints; their grid is skew to reality's grid. "Climbing through the skewed window frame", using preexisting ideas to tell the story of a real thing enough that its reality shines through, takes a step closer towards the real thing.
-
Seeing behind things. Ideas present some aspects of reality and leave other aspects hidden, covered, marginal, unspoken, in the background. Behind or under the ideas, glimpses of reality can be caught.
-
Prying things loose. An idea as it already is has its role, even if it isn't ultimately a suitable idea. In its role, the idea is asked to be as it already is. Every improvement is a change. To improve, an idea may have to be pried loose from its role as it is, so that it can be changed to be suitable for an evolved form of its role.
-
Steady gaze. An alien thing can be stared at until its patterns are shown within the flux.
-
Contradiction. Words don't fully represent reality, so they are vague. Words make logical deductions easily available. Separate ideas can be put into words separately. Logical deductions from provisionally accepted wordings of ideas can bring separate ideas into contact or even apparent contradiction. A word used different ways in different contexts can be caught overworked and then interrogated.
Burning
Burning thought gives opportunities to forge ideas.
-
New questions. Seeing a new thing begs new questions.
-
Plasticity. Having been pried loose, ideas can self-modify, search through the space of ways of being, and recruit other ideas, to better play the role that they occupy. With multiple related ideas pried loose, the role itself can change along with the idea playing the role.
-
Visible exertions. Ideas and ways of thinking play themselves out in thought. These exertions and unfoldings may be big and complex enough that the thinker can barely support one instance of a pattern, one application of a heuristic, one investigation. The pattern of thought is just barely visible, even if it's already part of the thinker. Seeing the machine work, one wonders how it works, what is its core, what assumptions and context are and aren't needed, how far can it be pushed, what else it can be applied to.
-
Connections. When an idea exerts itself, it is broadcast to the mind. Then ideas can be parathesized——brought alongside each other. When two ideas are parathesized in the right context, isomorphisms between substructures of the two ideas are lighted up. The mental code then begs to be rewritten so that the analogy is made flesh. If an idea, plucked from its own original context and placed in a new context, is by surprise easy to make quite useful in the new context, then there is a new notion to be birthed: what is shared between how the idea is useful in its different contexts.
-
Desiderata. The mental environment created by the thinking provides a rich and rare ensemble of signals that could evoke new ideas.The processes searching for ideas are given detailed feedback from many dimensions of change and success. Apparent contradictions create an acute need to refactor ideas: what are suitable ideas to interpret these propositions, so that these propositions are not actually in conflict? The way that the word has been provisionally used points at how the wanted idea would be used. The sense of what a good idea in some context would be, can be made explicit; the criteria can be brought out and collated, sharpening the call for ideas.
Lost upsparks
Precious embers float away from a burning thought.
Some thinking is too much, and opportunities are left by the wayside. There's too much material, too many ideas and connections, too much to hold in memory, too many questions and free parameters; the thinking has gone too far down the path, too far past the edge, too many ideas have been pried loose, too many criteria are brought to bear. The solutions don't come easily enough to cope with the wild as it grows, and the thinking is lost in the wild. Connections and possibilities compete for attention, with too many losers. Without names for what is there, what is there can't be brought back from the thinking.
When external objects are available, thinking can be supported in external objects, which maintain themselves. But thinking that deals with diaphanous things is more urgent because more fragile. Urgent fragile thinking needs scaffolding.
Aside on withdrawal and the leap
Some realms of thinking withdraw, and can only be reached with a running leap.
A glimpsed thing withdraws, hides, runs away. The glimpse is cut off from the thing; the glimpse is assimilated to preexisting understanding. The thing is slippery, evades grasp; the pressure of the grasp pushes the thing away. The thing doesn't stay put, doesn't want to be enclosed. The thing slips through fingers like vapor, displaced by what grasps.
Reasons for withdrawal:
-
Some mental elements are provisional or even essentially provisional; they're open to being revised. What an element will become when its context has ripened further, is only hinted at by what the element is like right now. The context is always incomplete, not fully bloomed, so what the element will become is something that withdraws.
-
As a special case of provisionality, the element to be understood may change by virtue of being understood. The element therefore withdraws, always one step ahead of understanding. For example, in a revision theory of truth, the Liar sentence will dance away from its own truth value. More generally, self-prediction and self-understanding set up anti-inductivity: once a mind understands something about itself, the new understanding makes the mind richer, makes there be more to understand about the mind than before. Like trying to turn around so fast that you catch a glimpse of the back of your head. Douglas Adams: "There is a theory which states that if ever anyone discovers exactly what the Universe is for and why it is here, it will instantly disappear and be replaced by something even more bizarre and inexplicable. There is another theory which states that this has already happened."
-
Some elements emanate other interesting elements. The emanations are interesting and draw attention away from the emanator. The emanator withdraws into its emanations, cloaked within them.
-
Everything withdraws because of cognitive miserliness. Questions are substituted [LW · GW] with related questions that are easier to answer. Things in themselves are cloaked in verbal shadows. Why think of the thing itself when its name seems enough to play pretend? The difficult thing withdraws behind easier things, and the easier things are closer and closer approximations of the difficult thing——or more and more (falsely, shallowly) satisfying analogs of the difficult thing——or more and more deceptive fakes of the difficult thing.
-
Missing the bow for the arrow: Some elements work for the sake of invisible things. If an element works in invisibility and its results become visible, then its sake is found, and the element does not need to work anymore. The head-on approach to seeing the element, is to see its visible results. But if an element works in invisibility, then to try to see its visible results is also to try to put the element to rest and lose access to it. For example, how is a creative solution found to a problem? Whatever finds the solution is put to rest after the solution is found, or flies off to another work. Like Batman.
-
Ensembles of elements that work together create roles for each other. The ensemble creates a network pressure on each element: the element works well as it is with the network as it is. Even if some elements are pried loose, the ensemble's center of mass may stay in the basin of attraction around its preexisting network shape——the preexisting roles and ways of working together. The other possibilities for the network, outside the basin, seem to withdraw behind the rim of the basin as you slide back down the to local minimum.
-
Some ideas would be hard to cope with if true, would demand a large life change.
-
Some things are too big, such as hyperobjects.
-
Some ideas are, or seem to be, dangerous to possess, or dangerous to possess in a way that others can see. For example, if it seems like there is a coalition that punishes people for thinking that the coalition exists, then the idea that the coalition exists is self-withdrawing.
-
Some things are difficult. They don't provide footholds. So they build up around them a cruft of boredom and learned helplessness.
A thing that withdraws might not be reachable without a running leap, or maybe an orthogonal approach.
Chasing a thing that withdraws is like navigating in a hyperbolic space. The mark is always missed. The attempts to lay out the thing clearly are incomplete, askew, and false. Another course correction is always needed, as if the thing is repulsive. Course correction requires detecting the error and the direction of improvement, which at least requires clarifying the pattern——with words, maybe——of what is to be turned away from.
Scaffolding thought
Fuller language and adepter lexicogenesis might better ignite, feed, channel, and preserve the spark at a heartier burn.
If a thinking is too big and many-threaded to complete, it has to be abandoned. If the unsettled platform of the thinking is backfilled and shored up, compressed, and made handy, then the thinking can be returned to with a better prospect of progress.
Kindling
Words push up to the edge of what's speakable.
- To reach the edge of knowledge more easily, describe it clearly and stake out room for new ideas. To see behind things, name them to highlight their boundary and complement.
- To pry something loose, name its aspects and instances. Aspects are dimensions of potential variation, and different instances highlight different underlying things.
- To support a leap into thinking, clarify ideas surrounding the murkiness, as a flanking maneuver.
Firebreak
Words wrangle thought.
[Source]
- Ways of thinking (heuristics, patterns, methods, disciplines) that are verbally expressed (hence refineder and handier) help to cope with a multifarious burn of thinking. Widely applicable ideas are more handily applied if carryable with shortish words.
- In the midst of the thinking, new elements call for new words. New things are to be named, new ideas are to be put into words, new kinds of propositions might be best fitted by new grammar.
- The context created by the thinking gives a rare opportunity to make words that fit that context, words that name things in that context, and words that begin to bring out into the light those hidden ways of thinking that exert themselves precociously among the thinking.
- Having more words already in the arsenal, and making new words more quickly, bundles the many threads of thought that open up, making them less prone to become a big unmanageable tangled mess.
Catching upsparks
Words put thoughts into time, letting them unfold across episodes of thinking.
-
Quicklier making new words catches more upsparks before they fly away.
-
A wider net of possible words (a more expressive morphemicon) catches a wider variety of upsparks.
-
Making preciser new words pulls down subtler delicater stuff from the rushing plasma river.
-
More practicedly associating upsparks to rich, heartnear, or canonical metaphiers makes available more of the thinker's life to connect to upsparks, and carries them back in fuller detail.
-
By putting words to the context, those words will be there the next time to make things clearer. The activity of the thinking isn't just for the sake of that context, but for the sake of all overlapping contexts. The envelope of the thinking is expanded, and the way is cleared to the next steps of the path, which was otherwise obstructed and only dimly in view.
-
Striving to put thought into words sends tendrils of explicitness backwards in time, or generatorward, or murkward. (That is, the mind sets itself up to make hidden things more available to relate to; the mind asks itself the questions "What was that?" and "What had just been going on before this?"; it sends out the request, tunes the reinforcement signal to evoke explicitness, positions a listener to hear results.) Across episodes of thinking, the tendrils grow. The tendrils latch on to muffled distant hints of things, otherwise lost. The tendrils latch on to massive hidden diasystemic mental elements——generators, widely applied heuristics, and underlying ideas——otherwise operating invisibly.
Palinsynopsis
A greater ability to make new words opens a greater ability to put new thoughts into words.
Having more and suitabler concepts would make understanding expand further. To get more and suitabler concepts, look at unspeakable things and bring them back to speakability. Creating words quicklier, preciselier, and with a greater ambit, makes it easier to bring back unspeakable things. People at the edge of thinking need to have more and suitabler concepts and mental motions, which are unspeakable.
There is a certain kind of computation which has to happen: putting thoughts into words. Lexicogenesis isn't the same as that computation, but it is related and would support and enrich and accelerate that computation, at many steps along the computation and with compounding benefit. It's not that there should be more lexicogenesis for its own sake, but rather that lexicogenesis wants to happen more than it already does.
More reasons for lexicogenesis
Overview of reasons
Lexicogenesis is shown to enhance thinking by its history, by its living role in thinking, and by its possibilities.
The previous section "A sense that more is possible" argues that there are riches to be pulled from the burning edge of thinking into explicitly analyzable discourse. This section gives some more reasons for lexicogenesis, both as justifications for allocating attention to it and as desiderata for doing it well.
The next two subsections will give some theoretical reasons that new words are good, and examples of good, bad, and needed new words.
Some other sorts of reasons:
- Historical data. As discussed above, especially scientific language has benefited greatly from lexicogenesis. Integral to the Renaissance was New Latin, which involved widespread deliberate lexicogenesis. Data on which words are successful could be found by comparing New Latin and modern scientific lexicons.
- People are already doing it. Most of the people I'm aware of working in my field (AGI alignment) invent new words for concepts they're working with (in addition to learning words from existing understanding). My impression is that this is also the case in more standard scientific fields today. People have so far been mostly on their own in their lexicogenetic problems, but they put in the effort anyway. So lexicogenesis already wants to happen, and would probably happen beneficially more if it were easier and more effective.
- Later main sections give some gestures toward the individual craft, and the possible shared craft, of lexicogenesis. That might be self-evidently motivating, like how a hammer is self-evidently motivating to someone who's been using rocks to drive in nails.
Theoretical reasons
Words bring reality into light.
In his essay "Sapir-Whorf for Rationalists [LW · GW]", Duncan Sabien lays out five claims, quoted here:
- New conceptual distinctions naturally beget new terminology.
- New terminology naturally begets new conceptual distinctions.
- These two dynamics can productively combine within a culture.
- That which is not tracked in language will be lost.
- The reification of new distinctions is one of the most productive frontiers of human rationality.
Some more overlapping powers of words:
- Sentences. A central reason to make new words is to be able to say new sentences that want to be said.
- Indexing. Words index ideas so that ideas come up when helpful.
- Syn-opsis. Words help to together-see complex things.
- Wielding. Words wield whole concepts.
- Signposting. As a special case of wielding, words can wield whole ideas; and whole ideas can serve as crucial signposts in places where advice is needed and thought is constrained. For example, a name for a cognitive bias points the way to more truth-tracking thought. A name such as "hangry" points at the possibility of transitioning out of a mental state.
- Combining. By making whole concepts wieldable, words make it easier to combine concepts into complexer ideas (noun phrases, propositions).
- Abstracting. By using preexisting material as a metaphier, new paraphrands are created——the substructural isomorphism that constitutes the metaphor points at a new idea, the shared substructure.
- Nucleation. Like planting a seed or nucleating a crystal, a word creates a site that begins to play a role and begins to collect material that helps to play that role.
- Tentpoles. By preliminarily setting up a site that begins to play a role in thinking, a word helps set up a context where other ideas can more fully be called to play their own role. By raising one tentpole, the other tentpoles take on more meaning: together they lift the tent fabric to create more space.
- Freedom. A new word, unlike an old word, doesn't have to be pressured into a role by your previous thinking and by your language community. So the new word can make the world your own and gives a site for new thoughts to associate.
- Hermeneutical justice. When someone lacks the words to describe their experience to themselves and others and to relate their experience to the experience of others, they can't coordinate with themselves and others to decode and prevent harm. For example, the phrase "sexual harassment" helps along the way to preventing the hermeneutical injustice that relies on people not being able to describe sexual harassment.
- Verbal sunshine. Many times when I have an incomplete thought, putting the thought into words changes the shape of the thought. It shaves off subtleties, makes the thought more clumsy, limp, and trite, makes the thought less able to open the paths it wanted to open, and co-positions the elements of the thought less elegantly and suitably than they started out. The words I already have are not quite the right tools. This is related to verbal overshadowing. Verbal overshadowing might be prevented by learning to better put words to things [LW · GW].
- Programmable thinking. As a metonymic example of scaffolding thought with words, compare making named functions in computer programming. In many early programming languages the goto statement, which jumps from a point in the code to some other point specified by label placed arbitrary at some line of code, was used frequently to do things like inserting additional instructions a while after first coding a segment, making looping constructs, or calling subroutines. This led to "spaghetti code", a mess of instructions that execute in a confusing order with unclear effects and unclear boundaries between subroutines, lacking coordinates to understand what is happening. The structured programming discipline makes subroutines think-about-able and wieldable by protecting and containing them with names. Programmers come up with names for subroutines on the fly, and one could imagine being able to come up with suitable words for new thoughts on the fly. Also compare symbolic notation in math against e.g. rhetorical algebra.
Some of the "37 Ways That Words Can Be Wrong [LW · GW]" can be inverted or reframed to give ways that words can be right. For example:
The act of labeling something with a word, disguises a challengeable inductive inference you are making.
Also: The act of labeling something with a word concisely wields an inductive model.
You argue about a category membership even after screening off all questions that could possibly depend on a category-based inference.
You ask whether something "is" or "is not" a category member but can't name the question you really want answered.
Also: In practice, you don't know whether all possible relevant questions have really been screened off. Even if all the questions you've asked have been answered, you might suspect that there are distinct inductive nexi of reference to be discerned. So you might be asking which nexus applies to the object at hand, so that you can model what will happen when you expand the domain of discourse and ask new questions about the object.
You allow an argument to slide into being about definitions, even though it isn't what you originally wanted to argue about.
You argue over the meanings of a word, even after all sides understand perfectly well what the other sides are trying to say.
Also: We have senses of elegance and efficiency for concepts and words, like we have these senses about computer code. Those senses point toward well-engineered concepts and words.
Examples
General examples
- The section "Language makers" gives pointers to many instances of successful language creation, some recent. See especially the vocabularies of many scientific fields.
- Look in the wild for a [phrase enclosed in square braces that is supposed to be a fairly discrete unit], or a-string-of-words-connected-by-hyphens-that-want-to-be-one-word.
- Look in the wild for people saying "I wish there was a word for..." or "I don't have a good word for this, but...".
- Notice when you hear or speak a phrase or set of very similar phrases (maybe with a free parameter). Or in general, notice when you're "saying the same thought" over and over in different ways, but the thought nevertheless feels like basically one thing. The repeated phrase or idea might want a word.
- See Duncan's list [LW(p) · GW(p)] which includes many rationality-related terms, and see the LessWrong tags page [? · GW] which includes some novel coinages such as alief [? · GW].
- See niplav's proposal [LW · GW] for using subscripted numbers to denote probabilities of sentences.
Examples from AGI alignment
- There are many neologisms (and words given new meanings) used in the AGI alignment research community. For example, mesa- as in mesa-optimizer, mesa-objective; DT (decision theory) as in TDT, LDT, FDT, LIDT, UDT, ADT; the list on Arbital; infrabayesian; HCH, IDA, ELK; etc. There are also many protologisms floating around.
- Here's over 20 instances of Eliezer Yudkowsky asking for words (if you can't view the Facebook link, here's what looks like a partial scrape).
Personal examples
Language I want (some of these set a bad example in that they don't but should give real contexts where the term is wanted):
- Request for term: an abbreviation meaning "as a counterexample:", analogous to "e.g." meaning "for example:". Maybe "c.g." or "c.e.g." as in "contra-exempli gratia", after "e.g." = "exempli gratia".
- Request for term: a plural non-person pronoun. "Its" is the obvious candidate, but that's ambiguous with the possessive. "They" collides with the personal pronoun, and is often confusing. (Pronouns are probably especially difficult to do well.)
- Request for term: more flexible pronouns.
- For example: sentences like "[noun phrase] [verb phrase]" can be hard to parse if the noun phrase is long and complicated. E.g.: "The crows that had alighted on the telephone wire and sat there conversing all morning while people left for work flew away when the wood chipper started up.". It's readable, but there's a hiccup. Hypothetically, this could be made clear as a fluent modification of the sentence, for example like so: "The crows thar, that had alighted on the telephone wire and sat there conversing all morning while people left for work, thar flew away when the wood chipper started up."
- For example: sometimes there are multiple antecedents that would like to be referred back to with pronouns, but they'd take the same pronoun. E.g.: "The crows thar, that had alighted on the telephone wire and sat there conversing all morning while yon citydwellers left for work, thar flew away when the wood chipper started up. But thar were back on the wire by the time yon were returning to yon's driveways, giving yon the impression that thar were there on the wire all day long.".
- I wager there's a language that does something like this. Topic markers might have overlapping function with this notion.
- Request for term (from Sam): possessives that take phrases. For example: "The scion of the house of Aquarius's nemesis slunk into town." and "The nemesis of the scion of the house of Aquarius slunk into town." are both clumsy. There might be something better, like for example "The scion of the house of Aquarius, his nemesis slunk into town.", or something.
- Request for term: a word or construction that says when two things X and Y are displaced from Z in somewhat the same direction. An example use case is if Alice says "Z" and then Bob says "Actually, X" and then Carol says to Bob "I don't agree with X, instead I think Y, but we've both moved in a somewhat similar direction away from Alice's Z". The expression would mean that the angle XZY is acute, or in other words that X-Z and Y-Z have positive dot product. (And an expression for the reverse, an obtuse angle, would be nice too.) Maybe one can just say "X and Y have acute angle with Z"? My first instinct is to instead highlight that something is shared between X and Y, like "X and Y have shared component (from Z)", or something. Or more succinctly, "X and Y codiverge (from Z)".
- Request for term: a word for the preimage of a set under a projection map of a fiber bundle. There's a word ("fiber") for , but I don't know one for . It could simply be called "the preimage of ", but this doesn't suggest the fiber-bundle-ness. It could be called "the subbundle over ", but this talks about the whole subbundle structure including the map, rather than being a name for just the total space, analogous to "fiber". If it's called "the rope over ", then we can express questions like "how many distinct ropes does induce for such-and-such class of sets ", or something. (Also wanted: a single word for the set of sections of a sheaf over a set , ¿perhaps "sectionset".)
- Request for term: sometimes a person says in context meaning , and then says A in context meaning , and . What do you call , , and ? "Motte and bailey" describes a subclass of this pattern, but more generally there doesn't have to be one use that is more defensible than the other; just use of one expression that is non-uniform, or at least how it is uniform hasn't been made explicit. Maybe "equivocand" or "doubleword" for ? Maybe "halfword" or "semiconcept" for and ?
- Generally, words or combining forms for math ideas such as "derivative of", "integral of", "tangent space (at)", "function space (between X and Y)", "dual (of)", etc.
- Request for term: when a message appeals to a dynamic in the listener, requiring that dynamic for the message to be interpreted, what do you call the message and the dynamic? See "Created Already In Motion [LW · GW]" and "Gemini modeling".
Language I flubbed:
(There are more examples of this, but I'm not easily recalling them...)
- "Actualizing" to contrast with "possibilizing" seems a little off. Partly the idea isn't very clear, and partly the word is overused in a way that feels like it taints the meaning. "Possibilizing" seems technically right but still feels a bit clumsy.
- I tried to make names for all the powers of ten from -50 to 50 by steps of .1, by crossing nine possible onsets with nine possible rhymes (crossed with four initial vowel-only syllables for multiples of 10), so that for example five billion would be called "sproyp" and two thousand would be called "plohg". See here. Somehow, mysteriously, I didn't find myself using these names in my thinking.
Language I found:
- I wanted a word for "the set of possible worlds", in the sense of counterfactuals. "Sample space" would be the term for the space of possible "outcomes" for a probability distribution. A day later I remembered that "-space" has been productivized (as mentioned above), so I can just say worldspace or possibilityspace (and it turns out worldspace is already attested [? · GW]).
- I wanted a word to fill in the analogy foothill : mountain :: ? : valley/basin. With a tiny bit of prodding, ChatGPT gave me "rimdale", which I like. Fish River Canyon in Namibia, including rimdales (source):
- I wanted a word for a certain shape of wave, and came up with "hookwave".
- In some of my essays I made up new words. As a non-word example of language creation, in this essay I wanted to ask a question that ended by quoting a declarative sentence. It seemed strange, a little disjointed, to put the question mark at the end. Instead I put the question mark at the beginning, as in: "¿So is the answer just: Anything within a mind can counterfactually determine the mind's effects.".
Other
- Elizabeth Garrett and/or friends came up with the word bumfle. IIUC, to bumfle a thing X is to pay a cost to get the benefits of having X, because the personal benefit just to you of having X is worth the whole cost; but then also saying to others who might be able to benefit from X, "Hey if it's worth it to you, you could go in on X, feel free to send me money.". E.g. buying a fan to use in a shared space would be bumfling the fan. "Bumfle" can also be a noun, meaning the X which was bumfled.
- Elizabeth Garrett and/or friends say "womp-womp" or "florp-blorp" or similar (the first syllable said with a high tone, the second with a low-rising tone) to thank someone for taking on administrative burden, such as ordering food for the group, without implying "thanks for paying for dinner".
- A while ago a beautiful programmer excitedly answered my questions about her company's biotech work. Throughout the 20ish minute conversation, she scrupulously marked a high proportion of her statements with an epistemic/evidential status. Her categories were "speculation at a bar over drinks" and "confirmed by experts". In general, compact markers of confidence and provenance would be nice to have. Many languages have compact evidentials——markers that indicate the type of evidence that the speaker's belief in the statement is based on. These might provide inspiration for compact evidentials in English.
- Daniel Filan, hearing people pronounce "semaglutide" like "se-MAG-lutide", in good Aussie spirit started calling it "Maggy", eventually extending to formulations like "He's a friend of margaret.".
Desiderata for words
A new word should, in its sound and structure, well-serve a needed role in a communal context of thinking.
Here's an incomplete list of overlapping, not equally important, mutually incompatible, overly demanding criteria (which needn't be met, but can point the way) that describe what makes a good word:
- Motivated. For the craft that this essay proposes, new language is created in order to say something that wants to be said. In other words, speak new words while hugging the query [LW · GW].
- Sentenceworthy. A good new word goes in a new good sentence that is better expressed using the word than using preexisting words. Yon word is suitable for the context of the sentence: yon doesn't say far too much or far too little; yon is specific enough to firmly anchor the sentence in new territory so it can say something new; yon is flexible enough so that the listener can unfold [the provisional concept gestured at by the word] into what it should be to be correct and contentful in the sentence.
- Communal. A word should have a use that's shared between minds (including between one mind across time). It should especially be contextworthy——suitable within a context of thinking——even if it's not clear to people without the shared context.
- Memorable. A word should be easy to remember. It shouldn't be generic, like "nice function".
- Distinguished. A word shouldn't be easily confusable with other words and other meanings. It shouldn't sound like another word, especially one that might come up in the same context. The word shouldn't suggest a wrong interpretation.
- Joint-carving. A word and its idea should carve reality at its joints. (It should be arthrodiatomic, if you like.) If a word names many things together, without naming what they share, or if a word creates a fiction that haphazardly cuts through things, it's time to find new words.
- Well-factored. A word and its idea should be well-factored, like a hammer rather than like
a spoon welded to a drilla bottle-opener welded to gardening shears. - Self-documenting. A word should, to someone already familiar with the preexisting surrounding context and lexicon, indicate its meaning by its form. For example, "neologism" is self-documenting to someone already familiar with the prefix "neo-" meaning "new" and the root "log" meaning "word". For example, a "lift" of a map from X to Y is a map from X to Z, where Z can be thought of as being "over" Y, e.g. as in a covering space of Y——so "lift" is a suitable metaphor that points to the idea, gives enough of a hint to reconstruct the idea given enough context.
- Reevocative. A word should reevoke the idea behind it, to someone who is already familiar with the idea——like a sazen [LW · GW] does. It should call up the associations and structures of the idea. If the idea is provisional, the word should reevoke the questions and the context that points to what the idea should be.
- Graceful obsolescence. A word should still be useful and not misleading, even after what it names is investigated and understood more deeply. "Hippocampus" (horse-sea-monster) is still useful because, even though we might now want to give it a name involving memory, the hippocampus does still look kinda like a seahorse. Proteins are often slightly misnamed, though forgivably so; for example lamin is so named because it was found in a layer under the nuclear membrane, but it turns out it's also found throughout the nucleus. "Vitamin" (vita- + -amine, "a vital substance with an amine group") is in most cases a misnomer because most vitamins (including vitamin A, the first one) don't have an amine group (and vitamin D is arguably extra-misnomery, because it can be synthesized endogenously). A word shouldn't posit or assign much more than is known.
- Euphonious. A word should sound good. It should roll off the tongue fluently, and be easy to recognize by reading or hearing. It should fit in with the sounds, the phonological system, of the surrounding language. If the word will serve as a stem for other words (e.g. through inflection or affixation), the word should easily accommodate those changes.
- Aesthetic. A word should altogether feel good. It shouldn't be clunky or annoying. It should be pretty, elegant, strong, handy, agile.
- Short. A word should be short, so it doesn't take too much work to read, write, say, hear, and think.
- Long. A word should be long, so it doesn't take up too much of namespace by occupying one of the few short possible-words. A word shouldn't pretend to more generality and canonicity than it has.
- Productive. A word should be readily available for further lexicogenesis: producing inflections and derivations; participating in phrases, compounds, and sentences; extracting productive morphemes.
- Quilted. A word should fit in with the morphemicon of the surrounding language. It should make suitable use of existing morphemes, so that it has low complexity relative to the existing language. And, a word should enrich the meaning of its constituent morphemes through the new application of them, by making them stand for the analogy between applications, not just for one application. If there's a fully good way to say what wants to be said using the resources already given by the language, then say it that way. There doesn't need to be an ex nihilo invention like "rovenzine" to talk about a waterfall if "water" and "fall" are words.
- Anchored. A word should be immune to drift, appropriation, bleaching, and corruption. For example, "AI safety" lost its intended original meaning of "AI notkilleveryoneism".
- Morphemizable. A word should be usable as a morpheme to combine with other morphemes to make words. For example, a word should have a short, recognizable, and phonologically amiable form.
These criteria can be heard as describing forces bearing on a word. When a word balances the forces that bear on it, the word has the quality without a name.[24] A greater ability to create these little patterns might support, as the spoken substrate, a pattern language for thinking and living.[25]
Seeds of the craft
Here are some starting points for learning to come up with useful words.
Try and say :)
Since there's no systematic craft of deliberate lexicogenesis, you're not missing out on too much if you just do what comes natural when there's something you want to say and you don't have the words to say it. You can just make up words by whatever means will work and see if you like the words you made up.
Children do it intuitively, and as described above in "Language makers", lots of people make up words with no systematic method. That's how most words get invented. There's no rule against making up words (despite what you may have been told). It's fun! It's like being God: let there be bootpuddles, let there be borogoves, let there be boojums and upsparks and endosystemic novelty. And the way to learn to play chess well isn't to ask "Which opening should I play?" or "What books should I read?" or "Will I be able to get good at chess?". The way to get good at chess is to play chess.
General motions
- Just try things. Make up words however comes to mind. Try to put the idea into words somehow. Let associations slide around and bubble up, let analogies suggest themselves. What does the idea remind you of?
- Brainstorm some candidate words. At least five——they can be bad or silly. Is there one you like? If they're bad, why? What are they missing about the idea? If they're all sort of cut from the same cloth, what is the pattern, and can you jump outside of it now that you see it?
- Give it some time. Once you've tried a bit to find a good word, your brain has set up the question in itself. So you can give your brain some time to do its thing in the background. Also you might encounter relevant words, or relevant things and ideas that supply metaphors you can use to make a good word for the idea.
- Consult your store of words and morphemes in whatever languages you're familiar with. Can you put together a couple of morphemes to say the idea?
- If you have some candidates, try a Focusing resonance check [LW · GW]. Get the idea steadily in your mind, and say the word. Does it feel right? Does the idea resonate with the word; does it "attach" or "respond" or "pour itself into" the word, or is there a misfit, a clash, or what? Does the misfit point to something that might make a word a better fit?
- Look at words you like for inspiration. What are some similar or related ideas that already have words? Are there any words for ideas that are directly parallel to your idea, so that you could tweak the word, e.g. swap out a morpheme, to get an analogous word for your idea?
- Ask for help. Ask a friend, post somewhere, start a thread on Zulip. Say something about the idea, what the word is for; say something about why some preexisting words or candidate neologisms don't quite get at the idea.
- Check the overall aesthetics of the candidate word. Do you like it? Do you stumble over pronouncing it? Is it a word you might use in poem?
- Compare candidates against desiderata that you have for the word. Some possible desiderata are in the above section "Desiderata for words".
- Iterate. What's wrong with this word? Try to find candidates that at least don't have that flaw. What's good about this word? Try to find candidates that are even better on that dimension.
- Play Person Do Thing with some friends, to exercise your ursprecher instincts.
Rooting in the criterion
A new word is needed because there's a new proposition to be spoken.
An early step in finding a word for the idea is to clarify the idea by thinking the idea more thoroughly. Is there already a clear definition or synonymous phrase for the idea? Is there a central example of the idea or that evokes the need for the idea? Try expanding the domain of discourse: give examples, counterexamples, borderline cases, extreme cases, and other dimensions that flesh out and demarcate what the idea is and isn't about, what it does and doesn't say.
Is this idea clearly, convincingly a thing? Can I do without it, or say it perfectly well with expressions that already exist? Are there other factorings of the idea?
A central reason to make a new word is to be able to say a new sentence. A sentence that would use yon new word gives a criterion for yon: yon should make the sentence useful, make the sentence say what you wanted to say through it. Try just writing the sentence out using a placeholder, such as a candidate word for the idea, or a phrase in brackets that gives the idea. What makes the sentence useful, and what should the word say to support that use? To triangulate the idea, write more sentences that use it. Does the context suggest a handle for the idea, such as a distinguishing feature, an exemplar, or a metaphor?
Try making explicit criteria for the word. What should the word suggest and emphasize? What should it distinguish itself from or avoid suggesting?
Preexisting words
Is there already a word for the idea?
If there's a word that's sort of in the ballpark, try looking up synonyms for that word.
Maybe the idea is really just an instance of an idea that already has a word, plus some details that don't warrant a whole new word.
Is the idea something that some group of people have probably dealt with, and so probably have a word for? For example, most living things that most of us encounter already have names, even if we don't know them. Maybe you can find who has already discussed the idea and see what words they used.
(Aside: Note though that pickiness can be good. Although "neology" is a standard term for "the creation of new words", I just don't like it, perhaps only aesthetically. It's a bit dysphonious to my ears, and I don't much like "neologism" either, maybe because of the association with clunky forgettable initialisms, cutesy acronyms, and groanworthy pointless portmanteaus, or "pointlanteaus" as they are called. After reflection I can say that "neology" emphasizes newness, which maybe explains why the first dozen or so results on Google scholar for "neology" are about social aspects of neologisms——the newness is about a language community. The project of this essay is the activity, the cognitive process, the craft of creating new words, not the social event called neology. Thus lexicogenesis overlaps and draws on etymology and morphology, and focuses centrally on the problem posed to the wordcrafter. A term that's more general, to include creating phrases, grammatical structures, and notation, might be better——maybe "glossopoeisis". "Word formation" and "morphology" exclude, for example, ex nihilo root creation and semantic development.)
Boiling it down
Can the idea be rendered in a short phrase?
Try to distill the idea into a combination of a few, mostly short words. That phrase might already be a good term for the idea. The words in the phrase might suggest a good single word.
If the words express simple ideas, there might be a morpheme in some language that says that idea very succinctly. E.g. "together" in English is three syllables and eight letters, but Latin "con-" and Greek "syn-" are each one syllable and three letters. Try looking at lists of morphemes that you might be familiar with and see if you can make a suitable word from them. E.g. see Wiki's list of Greek and Latin roots that show up in English words, and this short list.
A fictional example of this procedure:
There's the idea: "when something makes something else get closer to it by pulling on it". How can this be boiled down to a short phrase? What about just "pull toward", like "something pulls something else towards it"? That's not bad, but its a bit long——it's two-and-a-half syllables and ten-ish letters——and more importantly it's not very wordish, since the two pieces get separated. Can we render the phrase with short morphemes? Latin has "ad-" meaning "towards". That's promisingly short. What would "pull" be? "Tract" is about right, as in "contract" = "together-pull". So we get "ad-tract" = "toward-pull", or to be phonologically smooth, "attract".
Word formation
The ways that words are formed can be used as processes to generate new words.
Here's the list from the above section "Creating words", with some comments:
- Compounding.
- Compounding (including bound roots).
- Derivation. Is there a transformation of an idea that can be indicated with an affix, and whose inverse might make the idea easier to put into a word? E.g. instead of thinking of a good word for "X", think of a good word for "not X" and then add "un-" to the word. E.g. instead of trying to think of a good word for "the quality of being Xish", first come up with a good word for "Xish" and then add "-ness".
- Tmesis.
- Apophany.
- Inflection.
- Backformation.
- Liberated affix. Is there some word that expressed an idea that is somewhat in yon same direction as your idea? Can you (perhaps fictitiously) impute a bracketing of the word, that exposes a morpheme which can take the meaning of yon shared direction? For example, you want to name an enzyme that breaks up lactose, and you know that diastase breaks up something, so you extract "-ase" meaning "breaker-upper", and derive "lactase".
- Productivization.
- Blending (portmanteau).
- Clipping, truncation. This can be used to create shorter, hence more handily productive morphemes for ideas that want to be carried by productive morphemes.
- Acronym.
- Borrowing (importing). Since different languages were created for [overlapping but with all regions of symmetric difference (that is, disoverlap, h/t Duncan [LW · GW]) inhabited] contexts, other languages have words for ideas that you don't have words for. They also may have handier morphemes.
- Revival (self-borrowing).
- Loan translation (calque).
- Reduplication.
- Onomatopoeia (motivated root-creation).
- Ex nihilo root-creation. It might be helpful to use tools for generating phonologically feasible words built by conlangers. Though making up sounds on your own might be fun and successful.
- Eponymization.
- Semantic development (change, progression).
- Changes to the sound of a word.
- Boundary loss.
- Grammaticalization. E.g. instead of trying to think of a new pronoun, try using a determiner as if it were a pronoun (e.g. "yon"). E.g. as candidates for markers for a grammatical category such as evidentials, consider shortening words or phrases that could express the feature.
- Spoken writing.
- Morpheme upgrading (bound to free).
- Conversion (zero-derivation).
Ask a ninja language model
Language models can serve as good indexes to language. One can ask for a single word, in English or Greek or German or in any language, for some idea; or one can ask for a made up word; or one can ask for roots meaning some component of the idea, and then combine the roots. For example, here I ask ChatGPT for a word meaning "people walking together":
Taste has to be exerted. See here for my full attempt to get a replacement word for "tools for thinking". Eventually ChatGPT gave "paratithemi":
I recognized παρατίθημι as having a root shared with συντίθημι from which comes "synthesis". So I settled on "parathesizers", meaning a thing that puts things beside each other——which is the sort of thing that automated tools can help with.
ChatGPT takes some wrangling. Asking the question a few different ways (full dialogue here) eventually gave:
I liked "upspark".
Semantic development
Can the word's relation to the idea be patterned off known ways that words relate to ideas?
If there's some thing that's closely related to the idea, then see if words about the thing can be used to say the idea. Nearby, similar, overlapping, analogous, intuitively resonant, reminiscent, causally entangled, evidentially entangled, more specific, more general, a part of, containing, sharing structure, sharing features, exemplifying, exemplified by, acting on, acted on by, predicating, predicated by, characterizing, characterized by, doing, done by. Can a semantic development from another language be imitated, as in the French semantic loan "souris" (originally meaning "mouse", the animal, now also the computer equipment, after English "mouse")? See semantic change and "Metaphors we live by" by George Lakoff (IPFS).
The shared craft
People could work together to refine and share methods that fluently create good new words.
Deliberate lexicogenesis
People have consciously tried to create good new words, showing a want of a craft of lexicogenesis.
All language creation is in some sense intentional. Whoever speaks in a new way does so in order to communicate something that they didn't know how to more conveniently communicate in another way. Most creation of language is spontaneous——distributed, haphazard, bottom-up, organic, improvised, ex tempore. Some communities have created language in a way that's conscious, designed, organized, systematic, explicit, regulated——in a word, deliberate. Examples (discussed above in "Language makers"):
- Conlangers very deliberately choose whole systems of words.
- Programmers impose naming disciplines on themselves and on shared projects, debate the merit of different conventions and of different styles (verbose, descriptive, type-indicating, part of speech), and make tools for navigating names. Programmers are given a perhaps almost unique experience of creating whole systems of names with their own hands, only to return months or years later and find that they no longer speak the language they created and are forced to learn the language again like a newcomer.
- Language vitalization. While not highly regular, language vitalization involves concerted, sometimes institutional effort. For example, recently the Hawaiian Lexicon Committee has been creating new words for Hawaiian by very deliberately, thoughtfully considering how a new word will fit with the language.[26]
- Natural scientists. See especially the highly regular chemical nomenclature (organic, inorganic, explainer, cheatsheet). Biological taxonomic nomenclature is much less regular, in that the name for a new species is chosen mostly freely by whoever publishes about it first, but the naming is still governed by extensive rules (botanical code, zoological code). On the less systematic side, but still deliberate, many natural scientists deliberately work out words and whole terminologies and notations so that their specialized language is clear, learnable, expressive, and feasible as a shared standard.
Although these are examples of deliberate lexicogenesis, including shared and systematized lexicogenesis, they don't demonstrate very much of a shared craft of lexicogenesis. Programmers mostly make their names out of symbols, preexisting words, or short strings of preexisting words; and when they step outside of that envelope, they are on their own, without guidelines. Conlangers for the most part do not have the all-important feedback of seeing how words they make will fare in the wild demanding flux of needful communication——though for example Esperanto has found substantial purchase in minds that have to speak. Scientists seem to have some craft; where is it written down?
Shared lexicogenesis
People working together might accumulate shareable skills for making words.
If lexicogenesis is an individual creative act, how can there be a shared craft? Maybe there can't be. I don't know what it would look like or how to grow it. But, I would like to see what happens if such a craft tries to grow. Speaking vaguely, a shared craft of deliberate lexicogenesis might grow in these ways:
- Just thinking about the craft together; sharing methods, sharing results.
- Pooling desires and efforts.
- Creating and indexing resources and skilled people.
Seeds of the shared craft
Much of what this essay wants, is just to avoid pluralistic ignorance about doing lexicogenesis together. Maybe there are lots of people who'd want to make up words for each other's ideas, and they just haven't said so where each other can hear.
Besides that, here are some specific ways that a shared craft might grow (though if it wanted to grow, it must grow unprecircumscribedly):
Applying existing understanding
A lot of scientific work bears on lexicogenesis.
For example, a morphophonologist might be able to improve a morpheme's suitability for combining with other morphemes. Some sources of understanding:
-
Languages.
- Someone who speaks multiple languages can import words and morphemes from one language to another. They could find words from another language that match a description of an idea.
- English words come from (old) German, Latin, and Ancient Greek. That means English speakers are familiar with many morphemes from those languages. Speakers of those languages might be able to fluently find morphemes that are appropriate to a task, and could easily become familiarly productive for English speakers.
- Any language will have morphemes, words, and grammatical structures that are novel relative to another language. For adding language to English, expertise especially in non-Indo-European might help. For example, "Polysynthesis for Novices, parts 1 & 2" by Lichen the Fictioneer (Youtube) gives many possible sources of morphemes and grammatical structures novel to English. This material could be copied directly, or could be taken as a template——imagine for example having affixes that mean things like "origin", "equilibrium", "equilibrium punctuation", "derivative", ...
-
Word creators. People who have been making new words have skills that could be applied and could be made copyable.
- Conlangers. What are some good resources discussing how to make words for constructed languages, specifically to satisfy the criterion of serving to say something that wants to be said to someone? The guides I've seen tend to focus mostly on, for example, phonology, which is sort of relevant but not central for lexicogenesis from within an existing language. The Lojban community probably has useful experience and ideas. There are useful ideas about lexicogenesis here: http://www.zompist.com/kitlong.html#lexicon
- Scientists. Are there good resources explaining how good scientific terms are created? There are e.g. Nybakken[27] and Hoffman[16:1].
-
The World Atlas of Language Structures, https://wals.info/. (To see what sorts of grammatical structures can fit in language.)
-
Morphology, word formation. E.g. Plag[9:1], Marchand[28], Dixon[29]. (To see how to put new words together from pieces.)
-
Language change. Neology, lexical innovation, lexicalization, morphemization, grammaticalization. (For examples of language change, to see the possibilities.)
-
Etymology. Wiktionary, Etymonline, cognate dictionaries (Mallory[30], McPherson[31]). (To see how successful lexicogenesis happened, in outline; and to revitalize fossilized morphemes.)
-
Semantics. Lexical semantics, semantic change, semasiology, onomasiology. (To suggest ways of putting ideas into words.)
-
Phonology. Phonotactics, prosody, morphophonology. (To understand how new morphemes should sound so they're combinable. For example, the words "suitabler" and "practicedly" appear above, but they pose phonological problems.)
-
Syntax. (To understand what sort of new grammatical structures might be feasible and useful.)
Shared space
To grow a craft, have a shared (cyber)space for that craft.
There, people can:
- Make requests for terminology (for morphemes, words, grammatical structures, and notation).
- Notice when multiple people have overlapping requests, which increases motivation and gives more context to triangulate the ideas. This is especially useful where there's a morpheme that doesn't exist, but would be very useful, but would take significant design effort.
- Work together on making terminology.
- Midwife a word (help someone birth a new word for an idea they couldn't already say fluently).
- Share metaphors from experiences that others haven't had or wouldn't think of.
- Share morphemes that others don't know or wouldn't think to productivize.
- Share expertise as described above.
- Describe and learn from experiences with trying to create language.
I propose this Zulip group as a shared place for lexicogenesis: https://lexicogenesis.zulipchat.com/login/
Expressivizing the morphemicon
A deeper store of meaningful elements combines to make a greater range of possible words.
A morpheme is, roughly speaking, a minimal meaning-bearing element of a language. Some English morphemes: -ing, cat, un-, 's, -ness, -ed, the, cardio-, so, re-, snap, ex-, -ology.
A morphemicon is a morpheme inventory for a language: the set of morphemes that combine to form the words of the language. (Also called "morphicon".) Overloading the word a bit, "morphemicon" can also mean the total inventory of morphemes held by some group of people. Here's the morphemicon for all human languages together:
A morphemicon is more expressive when it more readily puts ideas to words, for a wider range of ideas. With a more expressive morphemicon, more upsparks of thinking can be caught more precisely. A morphemicon can be expressivized in two ways:
- Making morphemes more productive——more able to combine to make words.
- Shortening a morpheme.
- Just using a morpheme, so that its own separate interpretability, as separate from the interpretability of words it participates in, is amplified. As a special case, fossilized morphemes such as "-fer" and "se-" could be revived.
- Stretching the use of a morpheme, e.g. by using it in a metaphorical way, so the range of its application is stretched.
- Getting new morphemes.
- Importing a morpheme from other speakers, e.g. from another dialect, language, or contextual lexicon.
- Creating a morpheme from nothing, or by extracting it from another morpheme, e.g. by rebracketing.
Building resources
I don't know what, if any, shared resources might be useful for lexicogenesis. Some possible ones:
- Guides, written by people who have created useful terminology about what worked for them.
- Lists of examples of experiences with trying to make terminology. Descriptions of the idea to be put into words, criteria for the words, candidates, reasons the candidates were good or bad, what worked.
- A morphemicon, written down, that tries for semantic completeness——if there's a morpheme with a meaning that no morpheme in the list already has, that morpheme should be added to the list, whatever language it comes from.
- A morphemicon (and a lexicon, and a list of grammatical structures) that's semantically searchable (e.g. using a large language model).
- A guide for phonologically converting morphemes between related languages according to historical sound changes, so that morphemes from different languages can be combined euphoniously.
- A way to explore cognates, as a way to productivize morphemes. I have a partial prototype of a system called
radix
that traverses Wiktionary to grow cognate trees. Let me know if you're interested in improving it. Example output:
Objections (or, pitfalls)
Some reasons not to work on lexicogenesis, with responses:
Lexicogenesis just seems irrelevant to stuff that matters. Having more words doesn't help with thinking. You're noticing that good science comes along with new words, and then Goodha... uh, you're cargo-culting good science. The hard part of doing stuff that matters is doing stuff, doing experiments, observing, making hypotheses, making predictions, developing skills, implementing ideas; not... words. Lexicogenesis is a distraction.
This is clearly somewhat true. In a lot of areas, stuff like pipetting and looking through microscopes is going to accomplish far more than armchair reasoning. But still, all of those activities rely on concepts——concepts structure perception, attention, design, and hypothesis. For many arenas, the concepts don't need words, or the words already exist, or the words are a minor inconvenience compared to other major obstacles. But some of the most important stuff is new under the sun, and relies on new concepts. For new concepts, there has to be new thinking, which I think would be helped by better lexicogenesis. In other words, lexicogenesis already wants to happen, and I'm proposing to make the lexicogenesis that already wants to happen, happen faster and better. Lexicogenesis is one among many bottlenecks to difficult thinking.
Well, then the hard part is wrestling with ideas, not making up words.
This might be right. I think they're related——that's the hypothesis put forward in the section "A sense that more is possible" above.
There's already plenty of words. It's too many.
The question isn't how many words there are, it's whether we have the right words for the speaking we want to do. You can retire words suitable for alien contexts but not for your contexts. People can be overexuberant, but unneeded words can just go unused. Each word has to prove itself to speakers.
There's already plenty of words. It's too hard to learn even the relevant ones.
This is a problem, but it argues for better words that better compress what's necessary to think about.
There's already plenty of words. There's already words for whatever you'd want to make up words for.
This is a crux for me. I think it's not true. I do think it's partly true, and it implies a want of some better way of finding words that people have already crafted.
It's better to just rewrite what you're writing using existing words.
This is reasonable advice in many contexts. But it doesn't apply to a science studying some novel things.
Lexicogenesis is cringe. You're just making up words for no reason because you think it's cool to make up words.
See the section "What this essay is not about". I'm talking about the sort of lexicogenesis that you do when you're trying to describe something that you want to describe, but don't have the words to describe. I do think it's absolutely key to hug the query, stay close to the need——treat as very valuable the data of what words are actually in real life needed, and the specific criteria provided by those contexts of need. Having a need, having a sentence that you want to say but that's clumsy without the new word, is the gold standard for when lexicogenesis is actually wanted. It's not cringe or crankish to say "electron" or "methylation" or "phylogenetic" or "diffeomorphic", if you're talking about those things.
There's no "missing craft" to be developed. It comes naturally enough when you actually need a word. You just make a nonce-formation like "good manifold" or "strong agency" and then keep thinking, and figure out a better word along the way if you need one.
This might be right, but I'd wonder how you know that. I would like to know how scientists talk when they've seen something but don't know what it is. This story matches only some of my experience; I often want a word and then have to either go without a good word, or else do a bunch of work to find or make a good word. Shoddy nonces don't work that well——they don't resonate with the idea, they're confusing to a listener, they aren't self-documenting, they aren't memorable, they don't strongly evoke the ideas and questions.
Maybe lexicogenesis is useful, but people are mostly too busy.
There are always opportunity costs. But I think the time savings are sometimes deceptive. An analogy is technical debt: writing hacky code means that you'll write more code that relies on the hacky code, and you'll write other code that does work that should have been done by the elegant, correct, general version of your hacky code. With more and more code piled on top of wrong code, the cost of rewriting the code correctly goes up and up. Some people, though far from everyone, are too busy to not do good lexicogenesis.
There's no "missing craft" to be developed. Lexicogenesis is just an ad hoc hodgepodge of putting together morphemes or thinking of metaphors or examples.
This might be right, though again I'd wonder how you know, and I'd like to see what happens when people try to develop a craft. My experience of trying to make words suggests that there's lots of room for shared efforts (because people know a lot of words and examples and metaphiers that I don't know) and room for a shared craft (because there's skills I feel I'm doing a beginner version of, and because there's lots of scientific knowledge and knowledge of languages that I'm aware of without myself knowing).
That's a motte and bailey. Sure, there could be benefit from shared efforts, but that's not the same as a shared craft to be developed.
Fair enough. There's two separate points there, and the point about shared effort is more solid than the point about shared craft.
Lexicogenesis is complex and unpredictable, and if you try to deliberately construct words, you'll miss the constraints of the organic language.
This is partly true. Feedback from speaking is pretty necessary for making words that have a good chance of being suitable. There are many failed attempts at making suitable words. But there are also many successful attempts, many of which were deliberate. William Whewell on purpose came up with words such as "scientist", "linguistics", "ion", "anode", and "cathode".
There's not much to be gained from lexicogenesis. Thinking is already fully general, and is adapted to the regime where it's not super easy to make up good new words. Something an expert wordcrafter could do, someone else could do about as well without lexicogenesis.
This might be right. I'd like to find out though. I suspect there's a kind of utility being left on the table here. Because of cognitive miserliness, the effort of creating new concepts is put off as along as preexisting concepts can do the trick. So new concept formation is underinvested in. And lexicogenesis helps with new concept formation.
Lexicogenesis seems qualitatively different from using the words you have. It's analogous to having the ability to easily, fluently introduce new named subroutines in programming, compared to just using the subroutines already named (with an occasional laborious undertaking of rewriting the compiler to add another named subroutine, or something).
Words cause verbal overshadowing. They could just as well make it harder to think, not easier.
That definitely seems like something that happens. I think lexicogenesis actually helps avoid verbal overshadowing. When you make a new word, it isn't burdened by the history and role of preexisting words, so it at least doesn't claim as strongly to give you what you need to know. And, if you can come up with new words quicklier and preciselier, you can "punch through" the muffling effect of verbal overshadowing on the real thing behind the words.
This does create an issue where people have their own idiolectic word for X, even though they really are talking about the same X. They resist treating their words as though they refer to the same thing, because they don't want to use the communal understanding of X——instead they want to make their own understanding. I don't know what if anything to do about this.
You propose lexicogenesis as especially helpful at the edge of thinking. But isn't the edge of thinking especially prone to verbal overshadowing?
I recommend against being satisfied with making words for muddled ideas. Instead try to be really clear, look at lots of concrete examples, cling to the thing itself, and only make new words out of necessity——only when there are sentences you want to say and thoughts you want to have that want a word. It's like in programming: first just make the hacky version that works, and only when you find yourself repeating yourself do you abstract substructures.
When there's jargon in a community, that amplifies verbal overshadowing. A newcomer is pressured to pick up the jargon, and so may adopt the word without the meaning. The newcomer won't know that ze has missed the meaning because ze can say the sentences that others say using the word. Ze doesn't make the word zer own.
(H/t Yulia Ponomarenko for that point.) This does seem worrisome. I'd hope that newcomers would feel licensed to refuse to pretend to understand a word. Caching words out into concrete examples is good.
Also, the jargon just makes things harder to understand, and pushes newcomers away.
It's a tradeoff, and pushes for treating word-slots as a resource. But if someone is actually a newcomer, if they are actually trying to come into a domain of discourse, then they will learn or make the words that are actually needed to discuss that domain.
If expressivizing the morphemicon is supposed to make you better at thinking, why aren't speakers of languages with more productive morphemicons (e.g. polysynthetic languages) much better at difficult thinking?
I don't know that they aren't, but I don't predict that they are. Their morphemicons are not (I imagine) expressivized for abstract domains much more than other languages, and the speakers aren't (I imagine) skilled at creating new morphemes on the fly much more than speakers of other languages. (If those generalizations aren't true, then I would predict that such speakers would be better at difficult thinking, all else equal.)
References
"Maximen und Reflexionen" by Johann Wolfgang von Goethe. IPFS, German, page 508. Link, English, search "foreign languages". ↩︎
"The Nature of Paleolithic Art" by Dale Guthrie, 2006. (IPFS) ↩︎
"Lexical Innovations" by Judith Becker Bryant in "Encyclopedia of language development", Patricia J. Brooks, Vera Kempe, 2014. (IPFS) ↩︎
"Gone but not forgotten: persistence and revival in the history of English word loss" by Elizabeth Grace Wang, 2004, chapter 11. (PDF) ↩︎
"Boojums All the Way Through: Communicating Science in a Prosaic Age" by N. David Mermin, pages 3-5. (IPFS) ↩︎
"Grammaticalization in English: a diachronic and synchronic analysis of the 'ass' intensifier" by Wilson Joseph Miller, 2017. (PDF) ↩︎
"How Medium Shapes Language Development: The Emergence of Quotative Re Online" by Stefanie Kuzmack, 2010. Page 293 in "Studies in the History of the English Language V", Elizabeth Closs Traugott, Bernd Kortmann. (IPFS) ↩︎
See "Word-formation in English" by Ingo Plag, 2003. (IPFS) ↩︎ ↩︎
See "Chinese: A Language of Compound Words?" by Giorgio Francesco Arcodia, 2007. ↩︎
"Pidgin and creole languages" by Salikoko Mufwene, 2002. ↩︎
"The emergence of Nicaraguan Sign Language: Questions of development, acquisition, and evolution" by Richard Senghas, Ann Senghas, and Jennie Pyers. (PDF) In "Biology and Knowledge Revisited: From Neurogenesis to Psychogenesis", 2014. ↩︎
"Children Creating Core Properties of Language: Evidence from an Emerging Sign Language in Nicaragua" by Ann Senghas, Sotaro Kita, and Aslı Özyürek, 2004. (Sci-hub) ↩︎
"'Sneak-shoes', 'sworders' and 'nose-beards': a case study of lexical innovation" by Judith Becker, 1994. (Sci-hub) ↩︎
"Everyday Greek, Greek Words in English, Including Scientific Terms" by H.A. Hoffman, 1919. (IPFS) ↩︎ ↩︎
"101. Individual initiatives and concepts for expanding the lexicon in Russian" by Wolfgang Eismann, in Word-Formation: An International Handbook of the Languages of Europe, Volume 3, eds. Peter O. Müller, Ingeborg Ohnheiser, Susan Olsen, Franz Rainer, 2015, page 1744 (196). (IPFS) ↩︎ ↩︎
"Lexical innovation and variation in Hupa (Athabaskan)" by Justin Spence, 2016. (PDF) ↩︎
"The Last Lingua Franca: English Until the Return of Babel" by Nicholas Ostler, 2010. (IPFS) ↩︎
"Lexical Innovation in World Englishes: Cross-fertilization and Evolving Paradigms" by Patrizia Anesa, 2019. (IPFS) ↩︎ ↩︎
"Lexical Innovation in Ghanaian English: Some Examples from Recent Fiction" by Edmund O. Bamiro, 1997. (Sci-hub) ↩︎ ↩︎
"Derogatory Slang in the Hospital Setting", Brian Goldman, 2015. ↩︎
"On the way to language" by Martin Heidegger, 1971. Slightly modified from the translation by Peter Hertz. (IPFS) German original here. ↩︎
"The Timeless Way of Building" by Christopher Alexander, 1979. (IPFS) ↩︎
"Rationality techniques as patterns" by Jessica Taylor, 2017. (Link) ↩︎
See the section "An example of new word creation" in "Indigenous New Words Creation Perspectives from Alaska and Hawai'i" by Larry Kimura and Isiik April G.L. Counceller. In "Indigenous Language Revitalization Encouragement, Guidance & Lessons Learned" edited by Jon Reyhner and Louise Lockard, page 126 (PDF). ↩︎
"Greek and Latin in Scientific Terminology" by Oscar E. Nybakken, 1959. (Libgen (djvu)) ↩︎
"The Categories and Types of Present-day English Word-formation: A Synchronic-diachronic Approach" by Hans Marchand, 1960. (IPFS) ↩︎
"Making New Words: Morphological Derivation in English" by R.M.W. Dixon, 2014. (IPFS) ↩︎
"The Oxford Introduction to Proto-Indo-European and the Proto-Indo-European World" by J.P. Mallory and D.Q. Adams, 2006. (IPFS) ↩︎
"Indo-European Cognate Dictionary" by Fiona McPherson, 2018. (Libgen) ↩︎
"Roget's Thesaurus of English Words and Phrases" by Peter Mark Roget, 1852. (IPFS) ↩︎
"A Dictionary of Selected Synonyms in the Principal Indo-European Languages" by Carl Darling Buck, 1949. (IPFS) ↩︎
5 comments
Comments sorted by top scores.
comment by romeostevensit · 2023-05-20T17:59:57.085Z · LW(p) · GW(p)
Here are Gendlin's videos on Thinking at the Edge (three parts, around 20 minutes total)
https://www.youtube.com/watch?v=Wv7rXHHBXDU
And inspired by the post I decided to try to come up with a better word for a thing I've been trying and repeatedly failing to communicate. I'll try this by using oobleck as a hyphenation for concepts that are able to be soft and flexible but firm up the more force you apply to them. So oobleck-boundaries is being soft enough to be open for anything but firm up if you get pushed too hard.
Replies from: TsviBT↑ comment by TsviBT · 2023-05-22T14:13:28.833Z · LW(p) · GW(p)
Oh, I ended up (through "non-Newtonian") with the same word for a similar idea! (I can't find any substantial notes, just a message to myself saying "mind as oobleck"; I think I was thinking about something around how when you push against an idea, test it, examine it, the idea or [what the idea was supposed to be] is evoked more strongly and precisely.)
comment by Mateusz Bagiński (mateusz-baginski) · 2023-09-04T13:08:44.458Z · LW(p) · GW(p)
think with Words
There are people who (report/claim that they) don't think in words. It looks like having internal monologue is a spectrum, perhaps related to spectra of aphantasia. (or maybe I'm misunderstanding what you mean here)
natural language is how we think
Again, are you gesturing towards something like Language of Thought?
Mathematicians create fractal vocabularies, making names from notation, from mathematicians (eponyms),
AFAIK, eponyms (naming inventions after their inventors) are ~unique to the West/WEIRD culture. (source: The WEIRDest People in the World; cites Wootton's The Invention of Science)
A wider net of possible words (a more expressive morphemicon) catches a wider variety of upsparks.
Maybe completely unrelated but reminds me of some observation that the set of 20-ish basic aminoacids used by terran life seems optimized for covering a sufficiently diverse range of parameters of the aminoacid-space.
Request for term: a plural non-person pronoun
Trivium/datapoint: Polish has three grammatical genders in the singular form (standardly: masculine, feminine, and neuter) but two in the plural form (plural-personal-masculine and plural-everything-else). Closely related Czech also has the same three grammatical genders in the singular but they don't change with pluralization, e.g., there are separate "they" for "plural-he", "plural-she", "plural-it".
Request for term: more flexible pronouns.
See: https://en.wikipedia.org/wiki/Grammatical_person#Additional_persons Also, my impression is that Lojban has some of the features you're thinking about (?)
Request for term: sometimes a person says in context meaning , and then says A in context meaning , and . What do you call , , and ?
(Partially) parametrized concepts?
Overall, I'm slightly surprised by no mention of dath ilan, as they seem to have invested quite a lot of labor-hours into optimizing language, including in some of the directions you sketch out in this post.
Replies from: TsviBT↑ comment by TsviBT · 2023-09-07T23:30:18.369Z · LW(p) · GW(p)
It looks like having internal monologue is a spectrum, perhaps related to spectra of aphantasia
IDK about people who claim this. I'd want to look at what kinds of tasks / what kinds of thinking they are doing. For example, it makes sense to me for someone to "think with their body", e.g. figuring out how to climb up some object by sort of letting the motor coping skill play itself out. It's harder to imagine, say, doing physics without doing something that's very bound up with words. For reference, solving a geometric problem by visualizing things would probably still qualify, because the visualization and the candidate-solution-generator are probably structure by concepts that you only had because you had words.
optimized for covering a sufficiently diverse range of parameters of the aminoacid-space.
Interesting. Didn't know about that. That reminds me of phonemes.
Additional persons
Oh cool. Yeah, lojban might.
(Partially) parametrized concepts?
Neh. I mean to ask for a word for [a word that one person has used in two different ways--not because they are using the word totally inconsistently, using it in two different ways in the same context, but because they are using the word differently in different contexts--but in some sense they "ought" to either use the word in "the same way" in both contexts, or else use two different words; they are confusing themselves, acting as though they think that they are using the word in the same way across different contexts]. (This requires some analogy / relation between the two contexts, or else there's no way to say when someone uses a word "the same way".)
Overall, I'm slightly surprised by no mention of dath ilan, as they seem to have invested quite a lot of labor-hours into optimizing language, including in some of the directions you sketch out in this post.
All I've read about dath ilan is the thing about moving houses around on wires. Where is it described what they do with language?
Replies from: mateusz-baginski↑ comment by Mateusz Bagiński (mateusz-baginski) · 2023-09-08T07:59:36.202Z · LW(p) · GW(p)
I'd want to look at what kinds of tasks / what kinds of thinking they are doing.
I don't have specific examples in the literature of people without internal monologue but here's a case of a person that apparently can do music without doing something very bound up with auditory imagination.
A case study of subject WD (male, 55) with sensory agnosia (auditory and visual) is reported. He describes his experiences with playing music to be similar to the experiences of people suffering from blindsight, maneuvering blindly in the auditory space, without the ability to imagine results of next move (hitting piano key). Yet after a long period of learning WD is able to improvise, surprising himself with correct cadencies, with no conscious influence on what he is playing. For him the only way to know what goes on in his brain is to act it out.
Anecdotal case: I worked with a person who claimed to have absolutely no inner monologue and "thinking in one's head" seemed very weird to her. She's one of the most elaborate arguers I know. A large part of her job at the time was argument mapping.
All I've read about dath ilan is the thing about moving houses around on wires. Where is it described what they do with language?
Mostly smeared across ProjectLawful (at least that's where I read about all of it). Usually, it's brought up when Keltham (the protagonist from dath ilan) gets irritated that Taldane (the language of the D&D world he was magically transported into) doesn't have a short word (or doesn't have a word at all) for an important concept that obviously should have a short word. Some excerpts (not necessarily very representative ones, just what I was able to find with quick search):
Occasionally Keltham thinks single-syllable or two-syllable words in Baseline that refer to mathematical concepts built on top of much larger bases, fluidly integrated into his everyday experience. link
The Baseline phrase for this trope is a polysyllabic monstrosity that would literally translate as Intrinsic-Characteristic Boundary-Edge. A translation that literal would be misleading; the second word-pair of Boundary-Edge is glued together in the particular way that indicates a tuple of words has taken on a meaning that isn't a direct sum of the original components. A slight lilt or click of spoken Baseline; a common punctuation-marker in written Baseline. link
"We've pretty much got a proverb in nearly those exact words, yeah." He utters it in Baseline: an eight-syllable couplet, which rhymes and scans because Baseline was designed in part to make that proverb be a rhyming couplet. link