Starting point for calculating inferential distance?

jenniferrm

Starting point for calculating inferential distance?

post by JenniferRM · 2010-12-03T20:20:03.484Z · LW · GW · Legacy · 9 comments

9 comments

One of the shiniest ideas I picked up from LW is inferential distance. I say "shiny" because the term, so far as I'm aware, has no clear mathematical or pragmatic definition, no substantive use in peer reviewed science, but was novel to me and appeared to make a lot of stuff about the world suddenly make sense. In my head it is marked as "super neat... but possibly a convenient falsehood". I ran across something yesterday that struck me a beautifully succinct and helpful towards resolving the epistemic status of the concept of "inferential distance".

While surfing the language log archives I ran across a mailbox response to correspondence about comparative communication efficiency. The author, Mark Liberman, was interested in calculating the amount of information in text and was surprised to find that something about the texts, or the subjects, or his calculation lead to estimating different amounts of information in different translations of the same text (with English requiring 20%-40% more bits than Chinese to say the things in his example text).

Mr. Liberman was helped by Bob Moore who, among other things, noted:

...why should we expect two languages to use the same number of bits to convey the same thoughts? I believe that when we speak or write we always simplify the complexity of what is actually in our heads, and different languages might implicitly do this more than others. Applying Shannon's source/channel model, suppose that when we have a thought T that we want to convey with an utterance U, we act as if our hearer has a prior P(T) over the possible thoughts we may be conveying and estimates a probability P(U|T) that we will have used U to express T. As you well know, according to Shannon, the hearer should find the T that maximizes P(U|T)*P(T) in order to decide what we meant. But the amount of effort that the speaker puts into U will determine the probability that the hearer will get the message T correctly. If the speaker thinks the prior on T is high, then he may choose a shorter U that has a less peaked probability of only coming from T. If I say to my wife "I got it," I can get by with this short cryptic message, if I think there is a very high probability that she will know what "it" is, but I am taking a risk.

My conjecture is that the acceptable trade-off between linguistic effort and risk of being misunderstood is socially determined over time by each language community and embodied in the language itself. If the probability of being misunderstood varies smoothly with linguistic effort (i.e., bits) without any sharp discontinuities, then there is no reason to suppose that different linguistic communities would end up at exactly the same place on this curve.

Application to inferential distance is left as an exercise for the reader :-)

9 comments

Comments sorted by top scores.

comment by prase · 2010-12-03T21:11:20.384Z · LW(p) · GW(p)

One of the things I hate in mathematical textbooks are proofs left as exercises for the reader.

I would be really interested to know what conclusion you have made about inferential distance.

Replies from: Tyrrell_McAllister, arundelo, JenniferRM

↑ comment by Tyrrell_McAllister · 2010-12-04T16:26:26.098Z · LW(p) · GW(p)

I would be really interested to know what conclusion you have made about inferential distance.

Jennifer is suggesting that these ideas could be used to quantify inferential distances. A first attempt might be to say that a Speaker and a Listener are separated by a large inferential distance when the Speaker has a much larger value for P(U|T) than the Listener does.

There seems to me to be something important left out, though. I take inferential distances to be about differences in the plausibility of a conclusion to different people. Even if you understand my claim perfectly (ie, you've mapped my U to the proper T) you might still consider T to be almost certainly wrong, while I consider T to be an inevitable conclusion of self-evident premises, even if it takes a long chain of inferences to get to the conclusion from the premises.

Replies from: prase

↑ comment by prase · 2010-12-04T22:43:50.376Z · LW(p) · GW(p)

Speaking from my personal experience, when I as a listener had problems accepting a conclusion which was considered natural and perhaps obvious by the speaker, it was rarely because I misinterpreted the meaning (now I am apeaking about conclusions which I have accepted as obvious later, so that I can judge whether I understood what has been said earlier). The reason was rather that I lacked some background knowledge or thinking habits which caused my P(T) being low, not P(U|T).

↑ comment by arundelo · 2010-12-03T23:19:30.866Z · LW(p) · GW(p)

You may already know this, but:

This comment has occasionally been attached to unsolved research problems by authors possessed of either an evil sense of humor or a vast faith in the capabilities of their audiences.

Replies from: Kaj_Sotala

↑ comment by Kaj_Sotala · 2010-12-04T11:37:35.167Z · LW(p) · GW(p)

And remember, if that one doesn't strike your fancy, you can always employ one of these alternative ways for proving your result.

↑ comment by JenniferRM · 2010-12-04T17:22:16.832Z · LW(p) · GW(p)

It seems that "exercises left for the reader" are not generally well taken. Arundelo and Kaj are correct in that I meant that sort of as a joke and sort of as an invitation to conversation with a capable audience. This is LW right? So the audience should be capable :-)

I liked this link and quote because it seemed so productive towards mechanization and experimentation of concrete issues around the relatively more hand-wavey concept of inferential distance. I'm not sure how such a research program would turn out, but the quote makes it more plausible to me that researchers could get traction if they dug in.

The link suggests that software already exists in rudimentary form (and might be developed with better calibration specifically for this issue) to operate on digital text to characterize the bits of information it contains. These bits are situated as measurements mathematically related to hypothesized bayesian distributions over human communication intent and sense of model plausibility.

It doesn't seem that hard to imagine a program of study around the issue, with efforts to refine text compression software so that it gives "the right answer" when applied to text generated by experimental human subjects encouraged to communicate about toy problems explained to one subject, communicated to another in the course of the experiment, and then validated to have been successfully transmitted with a comprehension test applied to the second subject.

Perhaps geometrical diagrams could be serialized and measured this way somehow? It would be interesting to use simple pictures as the "T" in part because the the bit about ""a prior P(T) over the possible thoughts we may be conveying" is obviously important but spelling out the details might be hard. I think some people would be tempted to use "language in one's head" as the model for thoughts (so U and elements of T are directly comparable via trivial methods), but starting out with "probability distributions over possible visual representations" seems likely to avoid a local research optimum where only "language focused people" think the results are very general.

You could also just use the existing software on existing text to try to predict inferential distance between existing domain experts (religious texts and priests from different religions? science texts and academics from different departments?) trying see if the software's numbers predict "something about interactions" to measure that would reveal "false inferential distance assumptions" if they really existed. If they do you'd have tools and details for picking apart the concept.

Assuming it exists, what if the expectation of short inferential distances wasn't something caused by the fact that we evolved up in small tribes with mostly common knowledge, but instead grew out of simple planning fallacies about how easy it is to teach something? Or maybe it changes based on one's experience of cultural homogeneity and some people are wrong in the other direction based on many experiences with people of radically different beliefs and no inclination to spend the time that would be required to update with them?

Those are just off the top of my head, but they are the kinds of things that came to mind when I thought about LW while reading in language log. I was hoping for some responses along the lines of "oh hey that's helpful, it makes me think of X" :-)

comment by JenniferRM · 2010-12-04T17:22:53.921Z · LW(p) · GW(p)

Replies from: prase

↑ comment by prase · 2010-12-04T22:50:03.489Z · LW(p) · GW(p)

This is LW right? So the audience should be capable

Maybe. You shouldn't underestimate the inferential distances :)

comment by kalloyd · 2013-10-09T17:45:16.151Z · LW(p) · GW(p)

Re: The term inferential distance "has no clear mathematical or pragmatic definition, no substantive use in peer reviewed science", seems consistently coherent with the mathematical concept of a "homology of chain complexes". More specifically, these represent "natural transformations" on page 3 of http://wattsys.com/Content/Domain%20Concepts%20Overview.pdf

Starting point for calculating inferential distance?

Contents

9 comments