eliminating bias through language?

post by KvmanThinking (avery-liu) · 2025-02-04T01:52:01.508Z · LW · GW · 12 comments

Contents

12 comments

Due to linguistic relativity, might it be possible to modify or create a system of communication in order to make its users more aware of its biases?

If so, do any projects to actually do this exist?

12 comments

Comments sorted by top scores.

comment by Viliam · 2025-02-06T09:10:48.570Z · LW(p) · GW(p)

What do you mean by bias? Noticing statistical differences between different parts of population?

What does it mean to be more aware of the biases? Noticing that "I have made a mathematically valid statistical conclusion, but..." -- what is the second part of the sentence? Something like "but this all is statistics, which only makes probabilistic statements about individuals, and maybe I should be looking for additional evidence (or is that introducing even more bias)"?

(One possible solution is to get rid of math itself and communicate in Toki Pona. Probably not what you wanted.)

I don't know any such language, but in principle I can imagine a language that doesn't allow one to make a sentence like "X are Y" without explicitly specifying whether you mean "all X are Y" or "some X are Y", and where saying "all" when you meant "some" would be perceived as an error (as grammatical incorrectness, not political). But I don't know whether such language exists.

The Sapir-Whorf hypotheses is exaggerated a lot, I think. I speak a few languages, and they... are mostly different ways to say the same things. With very small exceptions, such as different languages may split the color spectrum differently, so where one group sees "blue" as a natural category, another group may see "light-blue" and "dark-blue" as a natural category, and yet another may see "blue, green, or gray" as a natural category. Or like English is obsessed about the difference between definite and indefinite articles, and Slavs mostly just don't care... but when they do, they can simply say "some/any" instead of "a/an", and "that" instead of "the". Etc.

In my opinion, what actually makes the difference is jargon. But that's kinda the opposite of the Sapir-Whorf hypothesis, because it means that when people find interest in a topic, the language is flexible enough, and the community of snow-logists (is there a Latin word for that?) will find as many words for snow as they need.

Replies from: ChristianKl, avery-liu
comment by ChristianKl · 2025-02-06T21:42:10.243Z · LW(p) · GW(p)

English can distinguish between hear/listen/overhear/eavesdrop to distinguish different ways how people perceive sound.

As an English speaker it's however not easily possible to do the same with smell perception.

A language like Esperanto however has the ability to express the concept because you can combine syllables to make words in Esperanto.

A friend who who's deeply into Esperanto said that reasoning in Esperanto allowed him to understand things about meditation that can be expressed in Esperanto but not directly in English without making up new jargon this allowed him to understand things that would be harder otherwise.

Making up new words for a concept is always possible, but grammar that makes it possible to make up a term to express a concept that the listener hasn't heard before exists in some languages but not in others.

If you take math, not having to make up a new word to say 42 but be able to express the concept with existing building blocks is very valuable. If you would have a language that needs a new word for 42 you had a problem operating in modern society that you couldn't just fix by adding a lot of jargon for specific words.

Not easily being able to express the intentionality difference of hear/listen does make some conversations about meditation harder in English than in Esperanto.

If you would design a language for maximum intellectual utility you can look into systematizing fields of knowledge so that you can express concepts to without the need for making up jargon that has to be learned separately.

Replies from: Viliam, avery-liu
comment by Viliam · 2025-02-07T09:06:28.165Z · LW(p) · GW(p)

Could you please ask about the specific examples of the Esperanto words? (I speak Esperanto.)

I think a similar example would be the adjective "Russian" in English, which translates to Russian as two different words: "русский" (related to Russian ethnicity or language) or "российский" (related to Russia as a country, i.e. including the minorities who live there).

(That would be "rus-a" vs "rus-land-a / rus-i-a" in Esperanto.)

I noticed this in a video where a guy explained that "I am Rus-land-ian, not Rus-ethnic-ian", which could be expressed in English as "I am a citizen of Russian Federation, but I am not ethnically Russian". On one hand, it can be translated without any loss of information; on the other hand, four words in Russian expanded to over a dozen words in English. More importantly, in 99% of situations the English speaker would not bother making the distinction, while a Russian speaker would be making it all the time.

Still seems to me that these things are rare, and more importantly, they don't seem to have the impact one might naively predict based on the Sapir-Whorf hypothesis. For example, one could naively predict that such language nuance would lead to less nationalism (because the country is less linguistically conflated with the dominant ethnicity), and yet, ethnic Russians don't seem less nationalistic.

Similarly, English-speaking feminists spent a lot of effort changing the default "he", through "he or she", to the singular "they" (and some of them go even further). But there are languages, such as Hungarian, which never even had "he" and "she", and have always used a gender-neutral pronoun. And yet, I don't think that Hungarians are less sexist than their neighbors.

Replies from: ChristianKl
comment by ChristianKl · 2025-02-07T13:40:23.939Z · LW(p) · GW(p)

I don't speak Esperanto myself, but took that meditation example from someone who speaks it. I don't know how that actually boils down to Esperanto words.

Still seems to me that these things are rare, and more importantly, they don't seem to have the impact one might naively predict based on the Sapir-Whorf hypothesis.

Naive predictions often seem wrong in many domains. 

For example, one could naively predict that such language nuance would lead to less nationalism (because the country is less linguistically conflated with the dominant ethnicity), and yet, ethnic Russians don't seem less nationalistic.

The English are unlikely to say that the Irish are really English after all in the way that you have Russians say that the Ukranians are really Russian. The Russian idea that everyone who's descending from a culture that had mass with Old Church Slavonic is Russian, is quite different than how other people in Europe think about the relevant concepts of identity.

The idea that Ukrainians are really Russians seems to make a lot more sense to Russian speakers than it does to most Europeans. 

A Russian friend told me that when he speaks with other Russians, this involves a lot of references to Russian literature in a way that you wouldn't do in English or German conversation. Reasoning by literature analogy is quite different from a lot of the way reasoning happens in English or German.

comment by KvmanThinking (avery-liu) · 2025-02-07T00:45:59.819Z · LW(p) · GW(p)

Something like TNIL or Real Character might be used for maximum intellectual utility. But I cannot see how simply minimizing the amount of words that need to exist for compact yet precise communication would help correct the corrupted machinery our minds run on.

Replies from: ChristianKl
comment by ChristianKl · 2025-02-07T14:45:41.311Z · LW(p) · GW(p)

I don't think the mental model of "corrupted machinery" is a very useful one. Humans reason by using heuristics. Many heuristics have advantages and disadvantages instead of being perfect. Sometimes that's because they are making tradeoffs, other times it's because they have random quirks. 

Real Character was a failed experiment. I don't know how capable Ithkuil IV happens to be. 

comment by KvmanThinking (avery-liu) · 2025-02-07T00:41:36.312Z · LW(p) · GW(p)

By "make its users more aware of their biases" I mean, for example, a language where it's really obvious when you say something illogical, or have a flaw in your reasoning.

Some ideas I had for this:

  • Explicitly defined sematic spaces for every word, to dissolve questions [LW · GW] and help people agree on the locations of phenomena in thingspace. Mechanisms for searching thingspace (while, for example, you can say "red chair" to narrow the space of all chairs down to the space of all chairs which reflect red light, it would be nice to be able to express things like "thing such that there is no meaningful answer to the question 'is it a chair' "
  • A fourth grammatical category which, alongside tense, aspect, and mood, indicates your relationship to the statement. For example, "Apples taste good" would have a relationship of "speaker's observation-derived opinion", and "Mars is a planet" would have a relationship of "property of speaker's consensus-derived world model". This would make the statement "Socrates is a man" feel more like "My brain perceptually classifies Socrates as a match against the 'human' concept"
  • Distinctions between "logically true" (true in all possible worlds, i.e. 1+1=2), "Occam's-razor true" (~100% chance of being true in all minus a negligible amount of the worlds that could be ours, i.e. there is no God), and "empirically true" (~100% chance of being true in THIS world specifically, i.e. the sky is blue), and "illogically infinitely true" (true in this world but not in all possible worlds. This is impossible IRL and would only be used in thought experiments.)

     

comment by Milan W (weibac) · 2025-02-07T07:11:08.367Z · LW(p) · GW(p)

Not exactly what you were asking for, but maybe food for thought: what if we (somehow) mapped an LLM's latent semantic space into phonemes?

What if we then composed tokenization with phonemization such that we had a function that could translate English to Latentese?

Replies from: avery-liu
comment by KvmanThinking (avery-liu) · 2025-02-08T18:42:37.632Z · LW(p) · GW(p)

That's a pretty big "somehow".

Replies from: weibac
comment by Milan W (weibac) · 2025-02-08T19:07:15.209Z · LW(p) · GW(p)

Oh I know! That is why I added "somehow". But I am also very unsure over exactly how hard it is. Seems like a thing worth whiteboarding over for an hour and then maybe doing a weekend-project-sized test about.

comment by dirk (abandon) · 2025-02-07T02:30:30.212Z · LW(p) · GW(p)

Neither of them is exactly what you're looking for, but you might be interested in lojban, which aims to be syntactically unambiguous, and Ithkuil, which aims to be extremely information-dense as well as to reduce ambiguity. With regards to logical languages (ones which, like lojban, aim for each statement to have a single possible interpretation), I also found Toaq and Eberban just now while looking up lojban, though these have fewer speakers.

comment by KvmanThinking (avery-liu) · 2025-02-05T00:21:58.651Z · LW(p) · GW(p)

why are people downvoting?