A conceptual precursor to today's language machines [Shannon]
post by Bill Benzon (bill-benzon) · 2023-11-15T13:50:51.226Z · LW · GW · 6 commentsContents
6 comments
Cross-posted from New Savanna.
I'm in the process of reading a fascinating article by Richard Hughes Gibson, Language Machinery: Who will attend to the machines’ writing? It seems that Claude Shannon conducted a simulation of a training session for a large language model (aka LLM) long before such things were a gleam in anyone's eye:
The game begins when Claude pulls a book down from the shelf, concealing the title in the process. After selecting a passage at random, he challenges [his wife] Mary to guess its contents letter by letter. Since the text consists of modern printed English, the space between words will count as a twenty-seventh symbol in the set. If Mary fails to guess a letter correctly, Claude promises to supply the right one so that the game can continue. Her first guess, “T,” is spot-on, and she translates it into the full word “The” followed by a space. She misses the next two letters (“ro”), however, before filling in the ensuing eight slots (“oom_was_”). That rhythm of stumbles and runs will persist throughout the game. In some cases, a corrected mistake allows her to fill in the remainder of the word; elsewhere a few letters unlock a phrase. All in all, she guesses 89 of 129 possible letters correctly—69 percent accuracy.
In his 1951 paper “Prediction and Entropy of Printed English,”[1] Claude Shannon reported the results as follows, listing the target passage—clipped from Raymond Chandler’s 1936 detective story “Pickup on Noon Street”—above his wife’s guesses, indicating a correct guess with a bespoke system of dashes, underlining, and ellipses (which I’ve simplified here):
(1) THE ROOM WAS NOT VERY LIGHT A SMALL OBLONG (2) ----ROO------NOT-V-----I------SM----OBL---- (1) READING LAMP ON THE DESK SHED GLOW ON (2) REA----------O------D----SHED-GLO--O-- (1) POLISHED WOOD BUT LESS ON THE SHABBY RED CARPET (2) P-L-S------O--BU--L-S--O------SH-----RE--C-----
What does this prove? The game may seem a perverse exercise in misreading (or even nonreading), but Shannon argued that the exercise was in fact not so outlandish. It illustrated, in the first place, that a proficient speaker of a language possesses an “enormous” but implicit knowledge of the statistics of that language. Shannon would have us see that we make similar calculations regularly in everyday life—such as when we “fill in missing or incorrect letters in proof-reading” or “complete an unfinished phrase in conversation.” As we speak, read, and write, we are regularly engaged in predication games.
But the game works, Shannon further observed, only because English itself is predictable—and so amenable to statistical modeling.
After some elaboration and discussion:
Shannon then proposes an illuminating thought experiment: Imagine that Mary has a truly identical twin (call her “Martha”). If we supply Martha with the “reduced text,” she should be able to recreate the entirety of Chandler’s passage, since she possesses the same statistical knowledge of English as Mary. Martha would make Mary’s guesses in reverse. Of course, Shannon admitted, there are no “mathematically identical twins” to be found, “but we do have mathematically identical computing machines.”9 Those machines could be given a model for making informed predictions about letters, words, maybe larger phrases and messages. In one fell swoop, Shannon had demonstrated that language use has a statistical side, that languages are, in turn, predictable, and that computers too can play the prediction game.
Next thing you know, someone will demonstrate that the idea was there in Plato, and that he got it from watching some monkeys gesticulating wildly in the agora.
[1] Claude Shannon, “Prediction and Entropy of Printed English,” Bell Systems Technical Journal 30, no. 1 (January 1951), 54.
6 comments
Comments sorted by top scores.
comment by gwern · 2023-11-15T21:04:06.507Z · LW(p) · GW(p)
- https://gwern.net/doc/fiction/science-fiction/1943-jones.pdf#page=28
- https://gwern.net/doc/ai/1949-coupling.pdf
- https://gwern.net/doc/psychology/linguistics/1954-harris.pdf
↑ comment by Bill Benzon (bill-benzon) · 2023-11-15T21:34:37.569Z · LW(p) · GW(p)
Thanks for the links.
comment by Carlos Javier Gil Bellosta (carlos-javier-gil-bellosta) · 2023-11-15T14:19:36.978Z · LW(p) · GW(p)
Note that Shannon, 3 years before, had already trained the possibly first ever LLM. I could generate text such as
THE HEAD AND IN FRONTAL ATTACK ON AN ENGLISH WRITER THAT THE CHARACTER OF THIS POINT IS THEREFORE ANOTHER METHOD FOR THE LETTERS THAT THE TIME OF WHO EVER TOLD THE PROBLEM FOR AN UNEXPECTED.
See [_A Mathematical Theory of Communication_](https://people.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf).
Replies from: bill-benzon, quetzal_rainbow↑ comment by Bill Benzon (bill-benzon) · 2023-11-15T14:49:04.018Z · LW(p) · GW(p)
Yes, Gibson discusses that in his article.
↑ comment by quetzal_rainbow · 2023-11-15T14:35:55.905Z · LW(p) · GW(p)
I should note that it's LM, not LLM.
Replies from: bill-benzon↑ comment by Bill Benzon (bill-benzon) · 2023-11-15T14:47:23.376Z · LW(p) · GW(p)
LOL! Details. How about LMM: Little Manual Model?