The possible shared Craft of deliberate Lexicogenesis

post by TsviBT · 2023-05-20T05:56:41.829Z · LW · GW · 5 comments

Contents

  Prefatory notes
    Disclaimers
    Acknowledgements
    Random access essay
    Synopsis
    Condensation
  Rhapsody of words
  What is lexicogenesis?
    Creating words
    Language production in general
    Language makers
    What this essay is not about
  A sense that more is possible
    Survivorship bias for expressibility
    The feral botanist
      Unspeakable mind
    Spark of thought
      Sparks
      Burning
      Lost upsparks
      Aside on withdrawal and the leap
    Scaffolding thought
      Kindling
      Firebreak
      Catching upsparks
    Palinsynopsis
  More reasons for lexicogenesis
    Overview of reasons
    Theoretical reasons
    Examples
      General examples
      Examples from AGI alignment
      Personal examples
      Other
  Desiderata for words
  Seeds of the craft
    Try and say :)
    General motions
    Rooting in the criterion
    Preexisting words
    Boiling it down
    Word formation
    Ask a ninja language model
    Semantic development
  The shared craft
    Deliberate lexicogenesis
    Shared lexicogenesis
    Seeds of the shared craft
      Applying existing understanding
      Shared space
      Expressivizing the morphemicon
      Building resources
  Objections (or, pitfalls)
  References
None
5 comments

[Note: crossposted from https://tsvibt.blogspot.com/2023/05/the-possible-shared-craft-of-deliberate.html.]

Words are good. Making more good words is good. Being better and faster at making more good words would be more good. Maybe we can get better and faster at making more good words by working together.

Prefatory notes

Disclaimers

Wer fremde Sprachen nicht kennt, weiß nichts von seiner eigenen.

(Whoever doesn't know foreign languages, knows nothing of his own.)

——Johann Wolfgang von Goethe[1]

Since I only speak English, my perspective is English-centric and more generally Indo-European-centric, and this essay will fail to integrate huge regions of the possibilities of language. Since I'm not a linguist, there will be errors and incompletenesses in this essay. Since I work on AGI alignment, recent examples of language of creation will be drawn from people working on alignment.

This essay is speculative, and emphasizes a vision that's exciting to me.

Acknowledgements

Thanks to Rafe Kennedy and to TJ for useful conversations about lexicogenesis. Thanks to Sam Eisenstat for spiritually related conversations. Thanks to Daniel Filan for comments on a draft.

Random access essay

Sections and some subsections of this essay can be read out of order without losing much. It's a long essay, so I'd encourage looking around for something interesting.

Synopsis

Lexicogenesis is the creation of new words. People do lexicogenesis when they have to talk about something new. When people have to think difficult new thoughts, they need new language. By working together, people could help each other make new language, and could develop a craft of lexicogenesis that people could use to come up with suitable new language. If you have ideas that might need new words to carry them, or if you want to help people come up with words, or if you want to make a shared craft of lexicogenesis, maybe say so in the comments or join this Zulip group.

Condensation

Extended table of contents:

Rhapsody of words

Humans carry the world——outer and inner, object and thought——with them, in their Words.

Humans encounter novelty. A strange beast, a tasty plant, a glowing destroyer and warmth giver, an alien tribe; a glinting ore, an adamantine symmetry in a diagram; a stone that moves another stone without touching it; rage, terror, and ecstasy; perspectival vision frozen and flattened onto a canvas——an infinite self-transforming kaleidoscope.

Humans encounter novelty. Not just a mute, undirtied sightseeing, but interest (inter-esse, being-amongst)——a muddy, fighting, duck-your-head-to-climb-inside encounter.

When humans meet, ponder, taste, reckon, carry, resist, or play with a Thing, they do something no other animal does: they speak it. In speaking the Thing, a human takes the Thing with zer, even if the material object is left on the ground. The human sings about the thing around the fire. It lingers with zer; ze paints it on the rock face, bringing the Thing back more fully into zer mind's eye:

(Guthrie, page 4.[2])

Even with the Thing not there, the humans accumulate thoughts, ideas, intentions, and information about the Thing. Humans gather thought around Names. And humans think with Words, which don't have to stand for Things, but rather can in general gather and deploy thought into any shape.

The humans carve words in stone, bone, and clay, originarily inventing solidified speech in many times and places:

Uruk proto-cuneiform, Iraq, c. 3050 BCE

Sumerian cuneiform, c. 2600 BCE

Egyptian hieroglyphs, c. 2300 BCE

Indus Valley script, c. 2600–2000 BCE

Oracle bone inscriptions, China, c. 1050 BCE

Olmec Cascajal block inscription, c. 900 BCE

Maya script, Dresden Codex, c. 1100 CE

Even with the speaker not there, and the thing long gone, the word can be heard.

How do humans speak thought? How do humans put the world into words?

How did you get up there?

And yet you know tens of thousands of words and combine them to speak suitably in a vast range of possible situations.

Like a scaffold for builders, like a bush's branches holding up a delicate spider's web, like a crystal growing by knitting together molecules pulled from the froth, language borders the delicate frothy edge of thinking.

In our speech and thought, what desire paths want to form? If every thinker had a thousand lifetimes to craft thought, what words and word-making craft would be created?

What is lexicogenesis?

Creating words

Lexicogenesis is the creation of words.

A thousand years ago, the words we speak (in English) were either nonexistent (such as "laser"), waiting latent in the possibility-space implied by the material available (such as "electron"), or scattered across many lands in proto-form; ten thousand years ago, probably almost all the words we speak today were nowhere to be found on the face of the Earth; and a million years ago, there were fairly likely no words at all. These words came from us, somehow. As language is a human universal, the creation of words is a human universal. Lexicogenesis is found in every child.[3] It is found in such abundance that children can create whole new languages when growing in an environment lacking stable language (creoles emerging from eclectic, unstandardized pidgins[4]) or even almost entirely lacking accessible language (the creole-like Nicaraguan Sign Language that emerged almost de novo in the 1980s among children).

At some moment, a language can leave open a role in speaking and thinking that ought to be played by some word, but that no word is currently playing. How does a language come to have a word to play an unfilled role? The process can be called lexicogenesis. "Lexicogenesis" emphasizes word creation as a deliberate activity, done by speakers who have a language and need new words for that language.

Here is a definitely totally complete list of ways that words and roots are created:

Spoken writing goes far enough that Serge Lang had the chutzpah to title a textbook just SL₂(R):

Language production in general

Lexicogenesis stands in for all forms of creating new language.

First of all, if lexicogenesis is the creation of words, what even is a word?

This is a difficult question which many people have thought hard about. For example, is "apartment building" a word? Of course not, it's a phrase... except that prosodically, it's one word: it has one major stress. If you say "There's a green building.", there is more stress on "building" than in "There's an apartment building." (unless you're saying, no it's not a red building, it's a green building), so "apartment" isn't just some sort of adjective.[9] And the words "apartment" and "building" intuivitely seem to occur next to each other, in that order, far more frequently than one might have expected (they're a collocation). For an even weirder example: "big-plate" in Chinese is like a word (you can't say "very big-plate", because "big-plate" is a noun), but also rather like a phrase and not like a word (you can't say "white big-plate" because that would be the wrong order of adjectives, you'd have to say "big white plate").[10]

Is the "'s" at the end of a possessive, like "Alice's", a word? Is "ice cream" one word or two words? Some ways to newly use words don't clearly create new words: clipping, borrowing, and in general semantic development. Are these lexicogenesis? Would creating a new bound morpheme count as lexicogenesis?

Thankfully, these questions don't need to be answered before creating new language. Lexicogenesis is a synecdoche for the creation of new language in general. (What is a good word for that? "Glossopoesis"?) Glossopoesis can involve creating new:

Language makers

In all new periods and realms, people have created new language.

Wherever there's a creative froth——new phenomena, new ideas, new events, new self-transforming stories, new contexts——people come up with words and syntax to speak new thoughts. They've engaged in what could be called lexicogenesis, onomaturgy, logogenesis, glossopoeisis, wordsmithing, neology, semiurgy, lexical innovation, morphological derivation, lexicalization, word formation, or simply: making new words and making new language.

What this essay is not about

Some activities not covered by this essay:

A sense that more is possible

Don't you remember, in the prehistory of your waking soul, when every name was new?

"A Sense That More Is Possible [LW · GW]" argues that there's no formidable shared art of rationality because people don't have the sense that such an art could exist. This section tries to gesture at a sense that more is possible with language——that thinking that matters is thirsty for new ways of speaking.

Pointing at a sense that more is possible with lexicogenesis is a bit like going to a monastery. In the monastery there's a monk who has lived only there for his entire life, knowing only those mountain paths. Now, try to justify to him the utility of learning one's way around a new place. At one time he did learn his way around new places, but he's long forgotten that time.

Survivorship bias for expressibility

Ideas without words are lost, so it seems as though all useful ideas already have words.

In "The words in science fiction", Larry Niven writes:

The "Newspeak" of 1984 was a language so designed that certain thoughts would be unthinkable in it. One must wonder if certain thoughts, crucial thoughts, are unthinkable in English, or in any human language, including mathematics.

We can think of a bunch of ideas that we like, and then check whether there are adequate words to express each idea. We will almost always find that there are adequate words. To conclude from this that we have an adequate lexicon in general, would ignore a survivorship bias. We can think of the ideas that we have words for, much more easily than we can think of the ideas we don't have words for.

All those forgotten ideas, concepts, and mental motions, the ones that weren't rightly put into words——the ideas in the past, and the ones that will come up in the future——there is gold there. How many times have you heard or said or thought a phrase like "...which there isn't a good word for..."? The referents of those phrases were lost. There are are more ideas lost to wordlessness than we know. Stefan George's poem "The Word" [23]:

Wonder from distant land or dream
I carried to my country's seam

And waited till the twilit norn
Had found the name within her bourne—

Then I could grasp it tight around
Now blooms and shines it, through the bound...

Once, I returned from happy sail,
with a prize so rich and frail,

She sought for long and tidings gave:
"No suchlike sleeps in this deep cave."

Thence escaped it from my hand—
The treasure never graced my land...

So I renounced and sadly see:
Where word breaks off no thing may be.

The feral botanist

To see what is even happening at all in the blooming, buzzing confusion, demands words.

Suppose you are a botanist, but a feral one. You'd like to describe plants carefully and in detail, so that you can distinguish different species and discern when a plant is growing healthily or not. But, you've not been enculturated into botanical vocabulary. What do you see here? How would you say what you see?

[Cropped from Steven Lucas, "Aroid (Araceae) and Tropical plant Botanical Terminology with Latin pronunciations", photo copyright 2010 Leland Miyano.]

What I (another feral botanist) see is a curled dark green leaf with red veins.

What an expressive and discerning botanist sees is a plant with, among other features, "supervolute vernation and leaf blades with scalariform secondary venation". To translate: When a new leaf blade of this species emerges, it is a single leaf blade, and one edge of the blade is curled inward while the other is curled around the first, so that the whole blade forms a spiral. The secondary veins (which come out of the primary central vein) of the leaf blade are parallel and spaced evenly, so that they are arranged uniformly like the rungs of a ladder.

You and I, feral botanists, see less, and bring back less to our country, than does the wordful botanist.

Unspeakable mind

Minds are especially big and murky and we don't have good words for minds and it would be nice if we did.

Imagine that we lacked words for anything mental. We can talk about non-person objects, and people's bodies, and we can describe low-level behavior, like "noise is coming out of his mouth" or "her hand is going upward". But we don't describe mental activity. We don't talk about thinking, knowing, belief, concepts, ideas, memory, understanding, bias, desire, attitude, personality, emotion, and so on.

In some ways this would be fine. We could still, for example, say "her body is traveling in a straightish line, so if I walk on this side of the sidewalk, we won't collide". But in a lot of ways we'd be very confused. Why are people buying a lot of toilet paper all of a sudden? (We can't say "they believe there will be a shortage and want to ensure their supply".) We can't pass on stories about people making decisions, learning things, or being in conflict, so we can't accumulate familiarity with and knowledge about those things.

We are still in this position with respect to minds (intelligence, the power of mind over the world, values, learning). We still lack the words and ideas to describe well what happens so that minds (and something within minds) come to determine the course of the world.

Spark of thought

When thought is sparked, it wants to burn too fast and widely for the mind to keep up. Urgent fragile thinking needs scaffolding.

In some forms of thinking, there comes a time, in the course of minutes or hours, when paydirt is glimpsed. Old questions are renewed, stagnant ideas are agitated and can be reforged, provisional concepts are connected to what they waited for, answers are nucleated and nourish new questions; and the unseen movements of the thinking thing are exerted, applied, and thereby intimated and adumbrated. The paydirt is prone to be mostly swallowed back up by the Earth. Lexicogenesis might better support the mineshafts and break the rock to keep the paydirt open.

Sparks

A spark of thought may come where words fail, when reality is glimpsed or grasped despite the clumsy language.

Some metaphorical situations where words fail, but still thinking may be called for:

Burning

Burning thought gives opportunities to forge ideas.

Lost upsparks

Precious embers float away from a burning thought.

Some thinking is too much, and opportunities are left by the wayside. There's too much material, too many ideas and connections, too much to hold in memory, too many questions and free parameters; the thinking has gone too far down the path, too far past the edge, too many ideas have been pried loose, too many criteria are brought to bear. The solutions don't come easily enough to cope with the wild as it grows, and the thinking is lost in the wild. Connections and possibilities compete for attention, with too many losers. Without names for what is there, what is there can't be brought back from the thinking.

When external objects are available, thinking can be supported in external objects, which maintain themselves. But thinking that deals with diaphanous things is more urgent because more fragile. Urgent fragile thinking needs scaffolding.

Aside on withdrawal and the leap

Some realms of thinking withdraw, and can only be reached with a running leap.

A glimpsed thing withdraws, hides, runs away. The glimpse is cut off from the thing; the glimpse is assimilated to preexisting understanding. The thing is slippery, evades grasp; the pressure of the grasp pushes the thing away. The thing doesn't stay put, doesn't want to be enclosed. The thing slips through fingers like vapor, displaced by what grasps.

Reasons for withdrawal:

A thing that withdraws might not be reachable without a running leap, or maybe an orthogonal approach.

Chasing a thing that withdraws is like navigating in a hyperbolic space. The mark is always missed. The attempts to lay out the thing clearly are incomplete, askew, and false. Another course correction is always needed, as if the thing is repulsive. Course correction requires detecting the error and the direction of improvement, which at least requires clarifying the pattern——with words, maybe——of what is to be turned away from.

Scaffolding thought

Fuller language and adepter lexicogenesis might better ignite, feed, channel, and preserve the spark at a heartier burn.

If a thinking is too big and many-threaded to complete, it has to be abandoned. If the unsettled platform of the thinking is backfilled and shored up, compressed, and made handy, then the thinking can be returned to with a better prospect of progress.

Kindling

Words push up to the edge of what's speakable.

Firebreak

Words wrangle thought.

[Source]

Catching upsparks

Words put thoughts into time, letting them unfold across episodes of thinking.

Palinsynopsis

A greater ability to make new words opens a greater ability to put new thoughts into words.

Having more and suitabler concepts would make understanding expand further. To get more and suitabler concepts, look at unspeakable things and bring them back to speakability. Creating words quicklier, preciselier, and with a greater ambit, makes it easier to bring back unspeakable things. People at the edge of thinking need to have more and suitabler concepts and mental motions, which are unspeakable.

There is a certain kind of computation which has to happen: putting thoughts into words. Lexicogenesis isn't the same as that computation, but it is related and would support and enrich and accelerate that computation, at many steps along the computation and with compounding benefit. It's not that there should be more lexicogenesis for its own sake, but rather that lexicogenesis wants to happen more than it already does.

More reasons for lexicogenesis

Overview of reasons

Lexicogenesis is shown to enhance thinking by its history, by its living role in thinking, and by its possibilities.

The previous section "A sense that more is possible" argues that there are riches to be pulled from the burning edge of thinking into explicitly analyzable discourse. This section gives some more reasons for lexicogenesis, both as justifications for allocating attention to it and as desiderata for doing it well.

The next two subsections will give some theoretical reasons that new words are good, and examples of good, bad, and needed new words.

Some other sorts of reasons:

Theoretical reasons

Words bring reality into light.

In his essay "Sapir-Whorf for Rationalists [LW · GW]", Duncan Sabien lays out five claims, quoted here:

  1. New conceptual distinctions naturally beget new terminology.
  2. New terminology naturally begets new conceptual distinctions.
  3. These two dynamics can productively combine within a culture.
  4. That which is not tracked in language will be lost.
  5. The reification of new distinctions is one of the most productive frontiers of human rationality.

Some more overlapping powers of words:

Some of the "37 Ways That Words Can Be Wrong [LW · GW]" can be inverted or reframed to give ways that words can be right. For example:

The act of labeling something with a word, disguises a challengeable inductive inference you are making.

Also: The act of labeling something with a word concisely wields an inductive model.

You argue about a category membership even after screening off all questions that could possibly depend on a category-based inference.

You ask whether something "is" or "is not" a category member but can't name the question you really want answered.

Also: In practice, you don't know whether all possible relevant questions have really been screened off. Even if all the questions you've asked have been answered, you might suspect that there are distinct inductive nexi of reference to be discerned. So you might be asking which nexus applies to the object at hand, so that you can model what will happen when you expand the domain of discourse and ask new questions about the object.

You allow an argument to slide into being about definitions, even though it isn't what you originally wanted to argue about.

You argue over the meanings of a word, even after all sides understand perfectly well what the other sides are trying to say.

Also: We have senses of elegance and efficiency for concepts and words, like we have these senses about computer code. Those senses point toward well-engineered concepts and words.

Examples

General examples

Examples from AGI alignment

Personal examples

Language I want (some of these set a bad example in that they don't but should give real contexts where the term is wanted):

Language I flubbed:

(There are more examples of this, but I'm not easily recalling them...)

Language I found:

Other

Desiderata for words

A new word should, in its sound and structure, well-serve a needed role in a communal context of thinking.

Here's an incomplete list of overlapping, not equally important, mutually incompatible, overly demanding criteria (which needn't be met, but can point the way) that describe what makes a good word:

These criteria can be heard as describing forces bearing on a word. When a word balances the forces that bear on it, the word has the quality without a name.[24] A greater ability to create these little patterns might support, as the spoken substrate, a pattern language for thinking and living.[25]

Seeds of the craft

Here are some starting points for learning to come up with useful words.

Try and say :)

Since there's no systematic craft of deliberate lexicogenesis, you're not missing out on too much if you just do what comes natural when there's something you want to say and you don't have the words to say it. You can just make up words by whatever means will work and see if you like the words you made up.

Children do it intuitively, and as described above in "Language makers", lots of people make up words with no systematic method. That's how most words get invented. There's no rule against making up words (despite what you may have been told). It's fun! It's like being God: let there be bootpuddles, let there be borogoves, let there be boojums and upsparks and endosystemic novelty. And the way to learn to play chess well isn't to ask "Which opening should I play?" or "What books should I read?" or "Will I be able to get good at chess?". The way to get good at chess is to play chess.

General motions

Rooting in the criterion

A new word is needed because there's a new proposition to be spoken.

An early step in finding a word for the idea is to clarify the idea by thinking the idea more thoroughly. Is there already a clear definition or synonymous phrase for the idea? Is there a central example of the idea or that evokes the need for the idea? Try expanding the domain of discourse: give examples, counterexamples, borderline cases, extreme cases, and other dimensions that flesh out and demarcate what the idea is and isn't about, what it does and doesn't say.

Is this idea clearly, convincingly a thing? Can I do without it, or say it perfectly well with expressions that already exist? Are there other factorings of the idea?

A central reason to make a new word is to be able to say a new sentence. A sentence that would use yon new word gives a criterion for yon: yon should make the sentence useful, make the sentence say what you wanted to say through it. Try just writing the sentence out using a placeholder, such as a candidate word for the idea, or a phrase in brackets that gives the idea. What makes the sentence useful, and what should the word say to support that use? To triangulate the idea, write more sentences that use it. Does the context suggest a handle for the idea, such as a distinguishing feature, an exemplar, or a metaphor?

Try making explicit criteria for the word. What should the word suggest and emphasize? What should it distinguish itself from or avoid suggesting?

Preexisting words

Is there already a word for the idea?

If there's a word that's sort of in the ballpark, try looking up synonyms for that word.

Maybe the idea is really just an instance of an idea that already has a word, plus some details that don't warrant a whole new word.

Is the idea something that some group of people have probably dealt with, and so probably have a word for? For example, most living things that most of us encounter already have names, even if we don't know them. Maybe you can find who has already discussed the idea and see what words they used.

(Aside: Note though that pickiness can be good. Although "neology" is a standard term for "the creation of new words", I just don't like it, perhaps only aesthetically. It's a bit dysphonious to my ears, and I don't much like "neologism" either, maybe because of the association with clunky forgettable initialisms, cutesy acronyms, and groanworthy pointless portmanteaus, or "pointlanteaus" as they are called. After reflection I can say that "neology" emphasizes newness, which maybe explains why the first dozen or so results on Google scholar for "neology" are about social aspects of neologisms——the newness is about a language community. The project of this essay is the activity, the cognitive process, the craft of creating new words, not the social event called neology. Thus lexicogenesis overlaps and draws on etymology and morphology, and focuses centrally on the problem posed to the wordcrafter. A term that's more general, to include creating phrases, grammatical structures, and notation, might be better——maybe "glossopoeisis". "Word formation" and "morphology" exclude, for example, ex nihilo root creation and semantic development.)

Boiling it down

Can the idea be rendered in a short phrase?

Try to distill the idea into a combination of a few, mostly short words. That phrase might already be a good term for the idea. The words in the phrase might suggest a good single word.

If the words express simple ideas, there might be a morpheme in some language that says that idea very succinctly. E.g. "together" in English is three syllables and eight letters, but Latin "con-" and Greek "syn-" are each one syllable and three letters. Try looking at lists of morphemes that you might be familiar with and see if you can make a suitable word from them. E.g. see Wiki's list of Greek and Latin roots that show up in English words, and this short list.

A fictional example of this procedure:

There's the idea: "when something makes something else get closer to it by pulling on it". How can this be boiled down to a short phrase? What about just "pull toward", like "something pulls something else towards it"? That's not bad, but its a bit long——it's two-and-a-half syllables and ten-ish letters——and more importantly it's not very wordish, since the two pieces get separated. Can we render the phrase with short morphemes? Latin has "ad-" meaning "towards". That's promisingly short. What would "pull" be? "Tract" is about right, as in "contract" = "together-pull". So we get "ad-tract" = "toward-pull", or to be phonologically smooth, "attract".

Word formation

The ways that words are formed can be used as processes to generate new words.

Here's the list from the above section "Creating words", with some comments:

Ask a ninja language model

Language models can serve as good indexes to language. One can ask for a single word, in English or Greek or German or in any language, for some idea; or one can ask for a made up word; or one can ask for roots meaning some component of the idea, and then combine the roots. For example, here I ask ChatGPT for a word meaning "people walking together":

Taste has to be exerted. See here for my full attempt to get a replacement word for "tools for thinking". Eventually ChatGPT gave "paratithemi":

I recognized παρατίθημι as having a root shared with συντίθημι from which comes "synthesis". So I settled on "parathesizers", meaning a thing that puts things beside each other——which is the sort of thing that automated tools can help with.

ChatGPT takes some wrangling. Asking the question a few different ways (full dialogue here) eventually gave:

I liked "upspark".

Semantic development

Can the word's relation to the idea be patterned off known ways that words relate to ideas?

If there's some thing that's closely related to the idea, then see if words about the thing can be used to say the idea. Nearby, similar, overlapping, analogous, intuitively resonant, reminiscent, causally entangled, evidentially entangled, more specific, more general, a part of, containing, sharing structure, sharing features, exemplifying, exemplified by, acting on, acted on by, predicating, predicated by, characterizing, characterized by, doing, done by. Can a semantic development from another language be imitated, as in the French semantic loan "souris" (originally meaning "mouse", the animal, now also the computer equipment, after English "mouse")? See semantic change and "Metaphors we live by" by George Lakoff (IPFS).

The shared craft

People could work together to refine and share methods that fluently create good new words.

Deliberate lexicogenesis

People have consciously tried to create good new words, showing a want of a craft of lexicogenesis.

All language creation is in some sense intentional. Whoever speaks in a new way does so in order to communicate something that they didn't know how to more conveniently communicate in another way. Most creation of language is spontaneous——distributed, haphazard, bottom-up, organic, improvised, ex tempore. Some communities have created language in a way that's conscious, designed, organized, systematic, explicit, regulated——in a word, deliberate. Examples (discussed above in "Language makers"):

Although these are examples of deliberate lexicogenesis, including shared and systematized lexicogenesis, they don't demonstrate very much of a shared craft of lexicogenesis. Programmers mostly make their names out of symbols, preexisting words, or short strings of preexisting words; and when they step outside of that envelope, they are on their own, without guidelines. Conlangers for the most part do not have the all-important feedback of seeing how words they make will fare in the wild demanding flux of needful communication——though for example Esperanto has found substantial purchase in minds that have to speak. Scientists seem to have some craft; where is it written down?

Shared lexicogenesis

People working together might accumulate shareable skills for making words.

If lexicogenesis is an individual creative act, how can there be a shared craft? Maybe there can't be. I don't know what it would look like or how to grow it. But, I would like to see what happens if such a craft tries to grow. Speaking vaguely, a shared craft of deliberate lexicogenesis might grow in these ways:

Seeds of the shared craft

Much of what this essay wants, is just to avoid pluralistic ignorance about doing lexicogenesis together. Maybe there are lots of people who'd want to make up words for each other's ideas, and they just haven't said so where each other can hear.

Besides that, here are some specific ways that a shared craft might grow (though if it wanted to grow, it must grow unprecircumscribedly):

Applying existing understanding

A lot of scientific work bears on lexicogenesis.

For example, a morphophonologist might be able to improve a morpheme's suitability for combining with other morphemes. Some sources of understanding:

Shared space

To grow a craft, have a shared (cyber)space for that craft.

There, people can:

I propose this Zulip group as a shared place for lexicogenesis: https://lexicogenesis.zulipchat.com/login/

Expressivizing the morphemicon

A deeper store of meaningful elements combines to make a greater range of possible words.

A morpheme is, roughly speaking, a minimal meaning-bearing element of a language. Some English morphemes: -ing, cat, un-, 's, -ness, -ed, the, cardio-, so, re-, snap, ex-, -ology.

A morphemicon is a morpheme inventory for a language: the set of morphemes that combine to form the words of the language. (Also called "morphicon".) Overloading the word a bit, "morphemicon" can also mean the total inventory of morphemes held by some group of people. Here's the morphemicon for all human languages together:

A morphemicon is more expressive when it more readily puts ideas to words, for a wider range of ideas. With a more expressive morphemicon, more upsparks of thinking can be caught more precisely. A morphemicon can be expressivized in two ways:

Building resources

I don't know what, if any, shared resources might be useful for lexicogenesis. Some possible ones:

Objections (or, pitfalls)

Some reasons not to work on lexicogenesis, with responses:

Lexicogenesis just seems irrelevant to stuff that matters. Having more words doesn't help with thinking. You're noticing that good science comes along with new words, and then Goodha... uh, you're cargo-culting good science. The hard part of doing stuff that matters is doing stuff, doing experiments, observing, making hypotheses, making predictions, developing skills, implementing ideas; not... words. Lexicogenesis is a distraction.

This is clearly somewhat true. In a lot of areas, stuff like pipetting and looking through microscopes is going to accomplish far more than armchair reasoning. But still, all of those activities rely on concepts——concepts structure perception, attention, design, and hypothesis. For many arenas, the concepts don't need words, or the words already exist, or the words are a minor inconvenience compared to other major obstacles. But some of the most important stuff is new under the sun, and relies on new concepts. For new concepts, there has to be new thinking, which I think would be helped by better lexicogenesis. In other words, lexicogenesis already wants to happen, and I'm proposing to make the lexicogenesis that already wants to happen, happen faster and better. Lexicogenesis is one among many bottlenecks to difficult thinking.

Well, then the hard part is wrestling with ideas, not making up words.

This might be right. I think they're related——that's the hypothesis put forward in the section "A sense that more is possible" above.

There's already plenty of words. It's too many.

The question isn't how many words there are, it's whether we have the right words for the speaking we want to do. You can retire words suitable for alien contexts but not for your contexts. People can be overexuberant, but unneeded words can just go unused. Each word has to prove itself to speakers.

There's already plenty of words. It's too hard to learn even the relevant ones.

This is a problem, but it argues for better words that better compress what's necessary to think about.

There's already plenty of words. There's already words for whatever you'd want to make up words for.

This is a crux for me. I think it's not true. I do think it's partly true, and it implies a want of some better way of finding words that people have already crafted.

It's better to just rewrite what you're writing using existing words.

This is reasonable advice in many contexts. But it doesn't apply to a science studying some novel things.

Lexicogenesis is cringe. You're just making up words for no reason because you think it's cool to make up words.

See the section "What this essay is not about". I'm talking about the sort of lexicogenesis that you do when you're trying to describe something that you want to describe, but don't have the words to describe. I do think it's absolutely key to hug the query, stay close to the need——treat as very valuable the data of what words are actually in real life needed, and the specific criteria provided by those contexts of need. Having a need, having a sentence that you want to say but that's clumsy without the new word, is the gold standard for when lexicogenesis is actually wanted. It's not cringe or crankish to say "electron" or "methylation" or "phylogenetic" or "diffeomorphic", if you're talking about those things.

There's no "missing craft" to be developed. It comes naturally enough when you actually need a word. You just make a nonce-formation like "good manifold" or "strong agency" and then keep thinking, and figure out a better word along the way if you need one.

This might be right, but I'd wonder how you know that. I would like to know how scientists talk when they've seen something but don't know what it is. This story matches only some of my experience; I often want a word and then have to either go without a good word, or else do a bunch of work to find or make a good word. Shoddy nonces don't work that well——they don't resonate with the idea, they're confusing to a listener, they aren't self-documenting, they aren't memorable, they don't strongly evoke the ideas and questions.

Maybe lexicogenesis is useful, but people are mostly too busy.

There are always opportunity costs. But I think the time savings are sometimes deceptive. An analogy is technical debt: writing hacky code means that you'll write more code that relies on the hacky code, and you'll write other code that does work that should have been done by the elegant, correct, general version of your hacky code. With more and more code piled on top of wrong code, the cost of rewriting the code correctly goes up and up. Some people, though far from everyone, are too busy to not do good lexicogenesis.

There's no "missing craft" to be developed. Lexicogenesis is just an ad hoc hodgepodge of putting together morphemes or thinking of metaphors or examples.

This might be right, though again I'd wonder how you know, and I'd like to see what happens when people try to develop a craft. My experience of trying to make words suggests that there's lots of room for shared efforts (because people know a lot of words and examples and metaphiers that I don't know) and room for a shared craft (because there's skills I feel I'm doing a beginner version of, and because there's lots of scientific knowledge and knowledge of languages that I'm aware of without myself knowing).

That's a motte and bailey. Sure, there could be benefit from shared efforts, but that's not the same as a shared craft to be developed.

Fair enough. There's two separate points there, and the point about shared effort is more solid than the point about shared craft.

Lexicogenesis is complex and unpredictable, and if you try to deliberately construct words, you'll miss the constraints of the organic language.

This is partly true. Feedback from speaking is pretty necessary for making words that have a good chance of being suitable. There are many failed attempts at making suitable words. But there are also many successful attempts, many of which were deliberate. William Whewell on purpose came up with words such as "scientist", "linguistics", "ion", "anode", and "cathode".

There's not much to be gained from lexicogenesis. Thinking is already fully general, and is adapted to the regime where it's not super easy to make up good new words. Something an expert wordcrafter could do, someone else could do about as well without lexicogenesis.

This might be right. I'd like to find out though. I suspect there's a kind of utility being left on the table here. Because of cognitive miserliness, the effort of creating new concepts is put off as along as preexisting concepts can do the trick. So new concept formation is underinvested in. And lexicogenesis helps with new concept formation.

Lexicogenesis seems qualitatively different from using the words you have. It's analogous to having the ability to easily, fluently introduce new named subroutines in programming, compared to just using the subroutines already named (with an occasional laborious undertaking of rewriting the compiler to add another named subroutine, or something).

Words cause verbal overshadowing. They could just as well make it harder to think, not easier.

That definitely seems like something that happens. I think lexicogenesis actually helps avoid verbal overshadowing. When you make a new word, it isn't burdened by the history and role of preexisting words, so it at least doesn't claim as strongly to give you what you need to know. And, if you can come up with new words quicklier and preciselier, you can "punch through" the muffling effect of verbal overshadowing on the real thing behind the words.

This does create an issue where people have their own idiolectic word for X, even though they really are talking about the same X. They resist treating their words as though they refer to the same thing, because they don't want to use the communal understanding of X——instead they want to make their own understanding. I don't know what if anything to do about this.

You propose lexicogenesis as especially helpful at the edge of thinking. But isn't the edge of thinking especially prone to verbal overshadowing?

I recommend against being satisfied with making words for muddled ideas. Instead try to be really clear, look at lots of concrete examples, cling to the thing itself, and only make new words out of necessity——only when there are sentences you want to say and thoughts you want to have that want a word. It's like in programming: first just make the hacky version that works, and only when you find yourself repeating yourself do you abstract substructures.

When there's jargon in a community, that amplifies verbal overshadowing. A newcomer is pressured to pick up the jargon, and so may adopt the word without the meaning. The newcomer won't know that ze has missed the meaning because ze can say the sentences that others say using the word. Ze doesn't make the word zer own.

(H/t Yulia Ponomarenko for that point.) This does seem worrisome. I'd hope that newcomers would feel licensed to refuse to pretend to understand a word. Caching words out into concrete examples is good.

Also, the jargon just makes things harder to understand, and pushes newcomers away.

It's a tradeoff, and pushes for treating word-slots as a resource. But if someone is actually a newcomer, if they are actually trying to come into a domain of discourse, then they will learn or make the words that are actually needed to discuss that domain.

If expressivizing the morphemicon is supposed to make you better at thinking, why aren't speakers of languages with more productive morphemicons (e.g. polysynthetic languages) much better at difficult thinking?

I don't know that they aren't, but I don't predict that they are. Their morphemicons are not (I imagine) expressivized for abstract domains much more than other languages, and the speakers aren't (I imagine) skilled at creating new morphemes on the fly much more than speakers of other languages. (If those generalizations aren't true, then I would predict that such speakers would be better at difficult thinking, all else equal.)

References


  1. "Maximen und Reflexionen" by Johann Wolfgang von Goethe. IPFS, German, page 508. Link, English, search "foreign languages". ↩︎

  2. "The Nature of Paleolithic Art" by Dale Guthrie, 2006. (IPFS) ↩︎

  3. "Lexical Innovations" by Judith Becker Bryant in "Encyclopedia of language development", Patricia J. Brooks, Vera Kempe, 2014. (IPFS) ↩︎

  4. "Bastard Tongues" by Derek Bickerton, 2008. (IPFS) ↩︎ ↩︎ ↩︎

  5. "Gone but not forgotten: persistence and revival in the history of English word loss" by Elizabeth Grace Wang, 2004, chapter 11. (PDF) ↩︎

  6. "Boojums All the Way Through: Communicating Science in a Prosaic Age" by N. David Mermin, pages 3-5. (IPFS) ↩︎

  7. "Grammaticalization in English: a diachronic and synchronic analysis of the 'ass' intensifier" by Wilson Joseph Miller, 2017. (PDF) ↩︎

  8. "How Medium Shapes Language Development: The Emergence of Quotative Re Online" by Stefanie Kuzmack, 2010. Page 293 in "Studies in the History of the English Language V", Elizabeth Closs Traugott, Bernd Kortmann. (IPFS) ↩︎

  9. See "Word-formation in English" by Ingo Plag, 2003. (IPFS) ↩︎ ↩︎

  10. See "Chinese: A Language of Compound Words?" by Giorgio Francesco Arcodia, 2007. ↩︎

  11. "Roots of Language" by Derek Bickerton, 1981. (IPFS) ↩︎

  12. "Pidgin and creole languages" by Salikoko Mufwene, 2002. ↩︎

  13. "The emergence of Nicaraguan Sign Language: Questions of development, acquisition, and evolution" by Richard Senghas, Ann Senghas, and Jennie Pyers. (PDF) In "Biology and Knowledge Revisited: From Neurogenesis to Psychogenesis", 2014. ↩︎

  14. "Children Creating Core Properties of Language: Evidence from an Emerging Sign Language in Nicaragua" by Ann Senghas, Sotaro Kita, and Aslı Özyürek, 2004. (Sci-hub) ↩︎

  15. "'Sneak-shoes', 'sworders' and 'nose-beards': a case study of lexical innovation" by Judith Becker, 1994. (Sci-hub) ↩︎

  16. "Everyday Greek, Greek Words in English, Including Scientific Terms" by H.A. Hoffman, 1919. (IPFS) ↩︎ ↩︎

  17. "101. Individual initiatives and concepts for expanding the lexicon in Russian" by Wolfgang Eismann, in Word-Formation: An International Handbook of the Languages of Europe, Volume 3, eds. Peter O. Müller, Ingeborg Ohnheiser, Susan Olsen, Franz Rainer, 2015, page 1744 (196). (IPFS) ↩︎ ↩︎

  18. "Lexical innovation and variation in Hupa (Athabaskan)" by Justin Spence, 2016. (PDF) ↩︎

  19. "The Last Lingua Franca: English Until the Return of Babel" by Nicholas Ostler, 2010. (IPFS) ↩︎

  20. "Lexical Innovation in World Englishes: Cross-fertilization and Evolving Paradigms" by Patrizia Anesa, 2019. (IPFS) ↩︎ ↩︎

  21. "Lexical Innovation in Ghanaian English: Some Examples from Recent Fiction" by Edmund O. Bamiro, 1997. (Sci-hub) ↩︎ ↩︎

  22. "Derogatory Slang in the Hospital Setting", Brian Goldman, 2015. ↩︎

  23. "On the way to language" by Martin Heidegger, 1971. Slightly modified from the translation by Peter Hertz. (IPFS) German original here. ↩︎

  24. "The Timeless Way of Building" by Christopher Alexander, 1979. (IPFS) ↩︎

  25. "Rationality techniques as patterns" by Jessica Taylor, 2017. (Link) ↩︎

  26. See the section "An example of new word creation" in "Indigenous New Words Creation Perspectives from Alaska and Hawai'i" by Larry Kimura and Isiik April G.L. Counceller. In "Indigenous Language Revitalization Encouragement, Guidance & Lessons Learned" edited by Jon Reyhner and Louise Lockard, page 126 (PDF). ↩︎

  27. "Greek and Latin in Scientific Terminology" by Oscar E. Nybakken, 1959. (Libgen (djvu)) ↩︎

  28. "The Categories and Types of Present-day English Word-formation: A Synchronic-diachronic Approach" by Hans Marchand, 1960. (IPFS) ↩︎

  29. "Making New Words: Morphological Derivation in English" by R.M.W. Dixon, 2014. (IPFS) ↩︎

  30. "The Oxford Introduction to Proto-Indo-European and the Proto-Indo-European World" by J.P. Mallory and D.Q. Adams, 2006. (IPFS) ↩︎

  31. "Indo-European Cognate Dictionary" by Fiona McPherson, 2018. (Libgen) ↩︎

  32. "Roget's Thesaurus of English Words and Phrases" by Peter Mark Roget, 1852. (IPFS) ↩︎

  33. "A Dictionary of Selected Synonyms in the Principal Indo-European Languages" by Carl Darling Buck, 1949. (IPFS) ↩︎

5 comments

Comments sorted by top scores.

comment by romeostevensit · 2023-05-20T17:59:57.085Z · LW(p) · GW(p)

Here are Gendlin's videos on Thinking at the Edge (three parts, around 20 minutes total)

https://www.youtube.com/watch?v=Wv7rXHHBXDU

And inspired by the post I decided to try to come up with a better word for a thing I've been trying and repeatedly failing to communicate. I'll try this by using oobleck as a hyphenation for concepts that are able to be soft and flexible but firm up the more force you apply to them. So oobleck-boundaries is being soft enough to be open for anything but firm up if you get pushed too hard.

Replies from: TsviBT
comment by TsviBT · 2023-05-22T14:13:28.833Z · LW(p) · GW(p)

Oh, I ended up (through "non-Newtonian") with the same word for a similar idea! (I can't find any substantial notes, just a message to myself saying "mind as oobleck"; I think I was thinking about something around how when you push against an idea, test it, examine it, the idea or [what the idea was supposed to be] is evoked more strongly and precisely.)

comment by Mateusz Bagiński (mateusz-baginski) · 2023-09-04T13:08:44.458Z · LW(p) · GW(p)

think with Words

There are people who (report/claim that they) don't think in words. It looks like having internal monologue is a spectrum, perhaps related to spectra of aphantasia. (or maybe I'm misunderstanding what you mean here)

natural language is how we think

Again, are you gesturing towards something like Language of Thought?

Mathematicians create fractal vocabularies, making names from notation, from mathematicians (eponyms),

AFAIK, eponyms (naming inventions after their inventors) are ~unique to the West/WEIRD culture. (source: The WEIRDest People in the World; cites Wootton's The Invention of Science)

A wider net of possible words (a more expressive morphemicon) catches a wider variety of upsparks.

Maybe completely unrelated but reminds me of some observation that the set of 20-ish basic aminoacids used by terran life seems optimized for covering a sufficiently diverse range of parameters of the aminoacid-space.

Request for term: a plural non-person pronoun

Trivium/datapoint: Polish has three grammatical genders in the singular form (standardly: masculine, feminine, and neuter) but two in the plural form (plural-personal-masculine and plural-everything-else). Closely related Czech also has the same three grammatical genders in the singular but they don't change with pluralization, e.g., there are separate "they" for "plural-he", "plural-she", "plural-it".

Request for term: more flexible pronouns.

See: https://en.wikipedia.org/wiki/Grammatical_person#Additional_persons Also, my impression is that Lojban has some of the features you're thinking about (?)

Request for term: sometimes a person says  in context  meaning , and then says A in context  meaning , and . What do you call , and ?

(Partially) parametrized concepts?


Overall, I'm slightly surprised by no mention of dath ilan, as they seem to have invested quite a lot of labor-hours into optimizing language, including in some of the directions you sketch out in this post.

Replies from: TsviBT
comment by TsviBT · 2023-09-07T23:30:18.369Z · LW(p) · GW(p)

It looks like having internal monologue is a spectrum, perhaps related to spectra of aphantasia

IDK about people who claim this. I'd want to look at what kinds of tasks / what kinds of thinking they are doing. For example, it makes sense to me for someone to "think with their body", e.g. figuring out how to climb up some object by sort of letting the motor coping skill play itself out. It's harder to imagine, say, doing physics without doing something that's very bound up with words. For reference, solving a geometric problem by visualizing things would probably still qualify, because the visualization and the candidate-solution-generator are probably structure by concepts that you only had because you had words.

optimized for covering a sufficiently diverse range of parameters of the aminoacid-space.

Interesting. Didn't know about that. That reminds me of phonemes.

Additional persons

Oh cool. Yeah, lojban might.

(Partially) parametrized concepts?

Neh. I mean to ask for a word for [a word that one person has used in two different ways--not because they are using the word totally inconsistently, using it in two different ways in the same context, but because they are using the word differently in different contexts--but in some sense they "ought" to either use the word in "the same way" in both contexts, or else use two different words; they are confusing themselves, acting as though they think that they are using the word in the same way across different contexts]. (This requires some analogy / relation between the two contexts, or else there's no way to say when someone uses a word "the same way".)

Overall, I'm slightly surprised by no mention of dath ilan, as they seem to have invested quite a lot of labor-hours into optimizing language, including in some of the directions you sketch out in this post.

All I've read about dath ilan is the thing about moving houses around on wires. Where is it described what they do with language?

Replies from: mateusz-baginski
comment by Mateusz Bagiński (mateusz-baginski) · 2023-09-08T07:59:36.202Z · LW(p) · GW(p)

I'd want to look at what kinds of tasks / what kinds of thinking they are doing.

I don't have specific examples in the literature of people without internal monologue but here's a case of a person that apparently can do music without doing something very bound up with auditory imagination.

A case study of subject WD (male, 55) with sensory agnosia (auditory and visual) is reported. He describes his experiences with playing music to be similar to the experiences of people suffering from blindsight, maneuvering blindly in the auditory space, without the ability to imagine results of next move (hitting piano key). Yet after a long period of learning WD is able to improvise, surprising himself with correct cadencies, with no conscious influence on what he is playing. For him the only way to know what goes on in his brain is to act it out.

Anecdotal case: I worked with a person who claimed to have absolutely no inner monologue and "thinking in one's head" seemed very weird to her. She's one of the most elaborate arguers I know. A large part of her job at the time was argument mapping.

All I've read about dath ilan is the thing about moving houses around on wires. Where is it described what they do with language?

Mostly smeared across ProjectLawful (at least that's where I read about all of it). Usually, it's brought up when Keltham (the protagonist from dath ilan) gets irritated that Taldane (the language of the D&D world he was magically transported into) doesn't have a short word (or doesn't have a word at all) for an important concept that obviously should have a short word. Some excerpts (not necessarily very representative ones, just what I was able to find with quick search):

Occasionally Keltham thinks single-syllable or two-syllable words in Baseline that refer to mathematical concepts built on top of much larger bases, fluidly integrated into his everyday experience. link

The Baseline phrase for this trope is a polysyllabic monstrosity that would literally translate as Intrinsic-Characteristic Boundary-Edge. A translation that literal would be misleading; the second word-pair of Boundary-Edge is glued together in the particular way that indicates a tuple of words has taken on a meaning that isn't a direct sum of the original components. A slight lilt or click of spoken Baseline; a common punctuation-marker in written Baseline. link

"We've pretty much got a proverb in nearly those exact words, yeah." He utters it in Baseline: an eight-syllable couplet, which rhymes and scans because Baseline was designed in part to make that proverb be a rhyming couplet. link