## Posts

Developmental Stages of GPTs 2020-07-26T22:03:19.588Z · score: 119 (53 votes)
Map Errors: The Good, The Bad, and The Territory 2020-06-27T05:22:58.674Z · score: 25 (10 votes)
Negotiating With Yourself 2020-06-26T23:55:15.638Z · score: 23 (5 votes)
Choosing the Zero Point 2020-04-06T23:44:02.083Z · score: 136 (58 votes)
The Real Standard 2020-03-30T03:09:02.607Z · score: 16 (10 votes)
Does the 14-month vaccine safety test make sense for COVID-19? 2020-03-18T18:41:24.582Z · score: 59 (22 votes)
AlphaStar: Impressive for RL progress, not for AGI progress 2019-11-02T01:50:27.208Z · score: 118 (57 votes)
orthonormal's Shortform 2019-10-31T05:24:47.692Z · score: 9 (1 votes)
Fuzzy Boundaries, Real Concepts 2018-05-07T03:39:33.033Z · score: 64 (17 votes)
Roleplaying As Yourself 2018-01-06T06:48:03.510Z · score: 92 (35 votes)
The Loudest Alarm Is Probably False 2018-01-02T16:38:05.748Z · score: 184 (76 votes)
Value Learning for Irrational Toy Models 2017-05-15T20:55:05.000Z · score: 0 (0 votes)
HCH as a measure of manipulation 2017-03-11T03:02:53.000Z · score: 1 (1 votes)
Censoring out-of-domain representations 2017-02-01T04:09:51.000Z · score: 2 (2 votes)
Vector-Valued Reinforcement Learning 2016-11-01T00:21:55.000Z · score: 2 (2 votes)
Cooperative Inverse Reinforcement Learning vs. Irrational Human Preferences 2016-06-18T00:55:10.000Z · score: 2 (2 votes)
Proof Length and Logical Counterfactuals Revisited 2016-02-10T18:56:38.000Z · score: 3 (3 votes)
Obstacle to modal optimality when you're being modalized 2015-08-29T20:41:59.000Z · score: 3 (3 votes)
A simple model of the Löbstacle 2015-06-11T16:23:22.000Z · score: 2 (2 votes)
Agent Simulates Predictor using Second-Level Oracles 2015-06-06T22:08:37.000Z · score: 2 (2 votes)
Agents that can predict their Newcomb predictor 2015-05-19T10:17:08.000Z · score: 1 (1 votes)
Modal Bargaining Agents 2015-04-16T22:19:03.000Z · score: 3 (3 votes)
[Clearing out my Drafts folder] Rationality and Decision Theory Curriculum Idea 2015-03-23T22:54:51.241Z · score: 6 (7 votes)
An Introduction to Löb's Theorem in MIRI Research 2015-03-23T22:22:26.908Z · score: 16 (17 votes)
Welcome, new contributors! 2015-03-23T21:53:20.000Z · score: 4 (4 votes)
A toy model of a corrigibility problem 2015-03-22T19:33:02.000Z · score: 4 (4 votes)
New forum for MIRI research: Intelligent Agent Foundations Forum 2015-03-20T00:35:07.071Z · score: 36 (37 votes)
Forum Digest: Updateless Decision Theory 2015-03-20T00:22:06.000Z · score: 5 (5 votes)
Meta- the goals of this forum 2015-03-10T20:16:47.000Z · score: 3 (3 votes)
Proposal: Modeling goal stability in machine learning 2015-03-03T01:31:36.000Z · score: 1 (1 votes)
An Introduction to Löb's Theorem in MIRI Research 2015-01-22T20:35:50.000Z · score: 2 (2 votes)
Robust Cooperation in the Prisoner's Dilemma 2013-06-07T08:30:25.557Z · score: 73 (71 votes)
Compromise: Send Meta Discussions to the Unofficial LessWrong Subreddit 2013-04-23T01:37:31.762Z · score: -2 (18 votes)
Welcome to Less Wrong! (5th thread, March 2013) 2013-04-01T16:19:17.933Z · score: 27 (28 votes)
Robin Hanson's Cryonics Hour 2013-03-29T17:20:23.897Z · score: 29 (34 votes)
Does My Vote Matter? 2012-11-05T01:23:52.009Z · score: 21 (38 votes)
Decision Theories, Part 3.75: Hang On, I Think This Works After All 2012-09-06T16:23:37.670Z · score: 23 (24 votes)
Decision Theories, Part 3.5: Halt, Melt and Catch Fire 2012-08-26T22:40:20.388Z · score: 31 (32 votes)
Posts I'd Like To Write (Includes Poll) 2012-05-26T21:25:31.019Z · score: 14 (15 votes)
Timeless physics breaks T-Rex's mind [LINK] 2012-04-23T19:16:07.064Z · score: 22 (29 votes)
Decision Theories: A Semi-Formal Analysis, Part III 2012-04-14T19:34:38.716Z · score: 23 (28 votes)
Decision Theories: A Semi-Formal Analysis, Part II 2012-04-06T18:59:35.787Z · score: 16 (19 votes)
Decision Theories: A Semi-Formal Analysis, Part I 2012-03-24T16:01:33.295Z · score: 24 (26 votes)
Suggestions for naming a class of decision theories 2012-03-17T17:22:54.160Z · score: 5 (8 votes)
Decision Theories: A Less Wrong Primer 2012-03-13T23:31:51.795Z · score: 73 (77 votes)
Baconmas: The holiday for the sciences 2012-01-05T18:51:10.606Z · score: 5 (5 votes)

Comment by orthonormal on Decoherence is Falsifiable and Testable · 2020-09-16T22:51:12.583Z · score: 2 (1 votes) · LW · GW

There's certainly a tradeoff involved in using a disputed example as your first illustration of a general concept (here, Bayesian reasoning vs the Traditional Scientific Method).

Comment by orthonormal on A Technical Explanation of Technical Explanation · 2020-09-16T22:49:14.503Z · score: 6 (3 votes) · LW · GW

I can't help but think of Scott Alexander's long posts, where usually there's a division of topics between roman-numeraled sections, but sometimes it seems like it's just "oh, it's been too long since the last one, got to break it up somehow". I do think this really helps with readability; it reminds the reader to take a breath, in some sense.

Or like, taking something that works together as a self-contained thought but is too long to serve the function of a paragraph, and just splitting it by adding a superficially segue-like sentence at the start of the second part.

Comment by orthonormal on A Technical Explanation of Technical Explanation · 2020-09-15T06:31:16.211Z · score: 4 (2 votes) · LW · GW

It may not be possible to cleanly divide the Technical Explanation into multiple posts that each stand on their own, but even separating it awkwardly into several chapters would make it less intimidating and invite more comments.

(I think this may be the longest post in the Sequences.)

Comment by orthonormal on My Childhood Role Model · 2020-09-15T06:06:45.448Z · score: 6 (3 votes) · LW · GW

I forget if I've said this elsewhere, but we should expect human intelligence to be just a bit above the bare minimum required to result in technological advancement. Otherwise, our ancestors would have been where we are now.

(Just a bit above, because there was the nice little overhang of cultural transmission: once the hardware got good enough, the software could be transmitted way more effectively between people and across generations. So we're quite a bit more intelligent than our basically anatomically equivalent ancestors of 500,000 years ago. But not as big a gap as the gap from that ancestor to our last common ancestor with chimps, 6-7 million years ago.)

Comment by orthonormal on Why haven't we celebrated any major achievements lately? · 2020-09-12T17:34:49.935Z · score: 7 (4 votes) · LW · GW

Additional hypothesis: everything is becoming more political than it has been since the Civil War, to the extent that any celebration of a new piece of construction/infrastructure/technology would also be protested. (I would even agree with the protesters in many cases! Adding more automobile infrastructure to cities is really bad!)

The only things today [where there's common knowledge that the demonstration will swamp any counter-demonstration] are major local sports achievements.

(I notice that my model is confused in the case of John Glenn's final spaceflight. NASA achievements would normally be nonpartisan, but Glenn was a sitting Democratic Senator at the time of the mission! I guess they figured that in heavily Democratic NYC, not enough Republicans would dare to make a stink.)

Comment by orthonormal on Decoherence is Falsifiable and Testable · 2020-09-11T23:31:12.667Z · score: 10 (5 votes) · LW · GW

Eliezer's mistake here was that he didn't, before the QM sequence, write a general post to the effect that you don't have an additional Bayesian burden of proof if your theory was proposed chronologically later. Given such a reference, it would have been a lot simpler to refer to that concept without it seeming like special pleading here.

Comment by orthonormal on 2020's Prediction Thread · 2020-09-09T00:42:35.134Z · score: 2 (1 votes) · LW · GW

It's not explicit. Like I said, the terms are highly dependent in reality, but for intuition you can think of a series of variables  for  from  to , where  equals  with probability . And think of  as pretty large.

So most of the time, the sum of these is dominated by a lot of terms with small contributions. But every now and then, a big one hits and there's a huge spike.

(I haven't thought very much about what functions of  and  I'd actually use if I were making a principled model;  and  are just there for illustrative purposes, such that the sum is expected to have many small terms most of the time and some very large terms occasionally.)

Comment by orthonormal on 2020's Prediction Thread · 2020-09-08T15:50:18.938Z · score: 5 (3 votes) · LW · GW

No. My model is the sum of a bunch of random variables for possible conflicts (these variables are not independent of each other), where there are a few potential global wars that would cause millions or billions of deaths, and lots and lots of tiny wars each of which would add a few thousand deaths.

This model predicts a background rate of the sum of the smaller ones, and large spikes to the rate whenever a larger conflict happens. Accordingly, over the last three decades (with the tragic exception of the Rwandan genocide) total war deaths per year (combatants + civilians) have been between 18k and 132k (wow, the Syrian Civil War has been way worse than the Iraq War, I didn't realize that).

So my median is something like 1M people dying over the decade, because I view a major conflict as under 50% likely, and we could easily have a decade as peaceful (no, really) as the 2000s.

Comment by orthonormal on Frequently Asked Questions for Central Banks Undershooting Their Inflation Target · 2020-08-28T19:12:27.874Z · score: 10 (2 votes) · LW · GW

An improvement in this direction: the Fed has just acknowledged, at least, that it is possible for inflation to be too low as well as too high, that inflation targeting needs to acknowledge that the US has been consistently undershooting its goal, and that this leads to the further feedback of the market expecting the US to continue undershooting its goal. And then it explains and commits to average inflation targeting:

We have also made important changes with regard to the price-stability side of our mandate. Our longer-run goal continues to be an inflation rate of 2 percent. Our statement emphasizes that our actions to achieve both sides of our dual mandate will be most effective if longer-term inflation expectations remain well anchored at 2 percent. However, if inflation runs below 2 percent following economic downturns but never moves above 2 percent even when the economy is strong, then, over time, inflation will average less than 2 percent. Households and businesses will come to expect this result, meaning that inflation expectations would tend to move below our inflation goal and pull realized inflation down. To prevent this outcome and the adverse dynamics that could ensue, our new statement indicates that we will seek to achieve inflation that averages 2 percent over time. Therefore, following periods when inflation has been running below 2 percent, appropriate monetary policy will likely aim to achieve inflation moderately above 2 percent for some time.

Of course, this say nothing about how they intend to achieve this—seigniorage has its downsides—but I expect Eliezer would see it as good news.

Comment by orthonormal on Matt Botvinick on the spontaneous emergence of learning algorithms · 2020-08-14T02:59:02.142Z · score: 7 (4 votes) · LW · GW

The claim that came to my mind is that the conscious mind is the mesa-optimizer here, the original outer optimizer being a riderless elephant.

Comment by orthonormal on Scarcity · 2020-08-13T21:23:59.959Z · score: 4 (2 votes) · LW · GW

When University of North Carolina students learned that a speech opposing coed dorms had been banned, they became more opposed to coed dorms (without even hearing the speech).  (Probably in Ashmore et. al. 1971.)

De-platforming may be effective in a different direction than intended.

Comment by orthonormal on Savanna Poets · 2020-08-10T17:17:43.385Z · score: 2 (1 votes) · LW · GW

That link is now broken, unfortunately. Here's a working one.

It's a great story of an anthropologist who, one night, tells the story of Hamlet to the Tiv tribe in order to see how they react to it. They get invested in the story, but tell her that she must be telling it wrong, as the details are things that wouldn't be permissible in their culture. At the end they explain what really must have happened in that story (involving Hamlet being actually mad, due to witchcraft) and ask her to tell them more stories.

Comment by orthonormal on Qualitatively Confused · 2020-08-09T18:49:25.753Z · score: 2 (1 votes) · LW · GW

In addition to the other thread on this, some of the usage of "I'm not sure what I think about that" matches "I notice that I am confused". Namely, that your observations don't fit your current model, and your model needs to be updated, but you don't know where.

And this is much trickier to get a handle on, from the inside, than estimating the probability of something within your model.

Comment by orthonormal on Mind Projection Fallacy · 2020-08-09T18:28:10.427Z · score: 6 (4 votes) · LW · GW

As always, there's the difference between "we're all doomed to be biased, so I might as well carry on with whatever I was already doing" and "we're all doomed to be somewhat biased, but less biased is better than more biased, so let's try and mitigate them as we go".

Someone really ought to name a website along those lines.

Comment by orthonormal on Artificial Addition · 2020-08-07T18:48:44.284Z · score: 4 (2 votes) · LW · GW

"I don't think we have to wait to scan a whole brain.  Neural networks are just like the human brain, and you can train them to do things without knowing how they do them.  We'll create programs that will do arithmetic without we, our creators, ever understanding how they do arithmetic."

This sort of anti-predicts the deep learning boom, but only sort of.

Fully connected networks didn't scale effectively; researchers had to find (mostly principled, but some ad-hoc) network structures that were capable of more efficiently learning complex patterns.

Also, we've genuinely learned more about vision by realizing the effectiveness of convolutional neural nets.

And yet, the state of the art is to take a generalizable architecture and to scale it massively, not needing to know anything new about the domain, nor learning much new about it. So I do think Eliezer loses some Bayes points for his analogy here, as it applies to games and to language.

Comment by orthonormal on An Alien God · 2020-08-06T04:27:42.430Z · score: 2 (1 votes) · LW · GW

When I design a toaster oven, I don't design one part that tries to get electricity to the coils and a second part that tries to prevent electricity from getting to the coils.

On the other hand, there was a fleeting time (after this post) when generative adversarial networks were the king of some domains. And more fairly as counterpoints go, the body is subject to a single selective pressure (as opposed to the pressures for two rival species), and yet our brains and immune systems are riddled with systems whose whole purpose is to selectively suppress each other.

Of course there are features of the ecosystem that don't match any plausible goal of a humanized creator, but the analogy is on wobblier ground than Eliezer seems to have thought.

Comment by orthonormal on Crisis of Faith · 2020-08-05T23:07:09.636Z · score: 6 (3 votes) · LW · GW

For me, I'd already absorbed all the right arguments against my religion, as well as several years' worth of assiduously devouring the counterarguments (which were weak, but good enough to push back my doubts each time). What pushed me over the edge, the bit of this that I reinvented for myself, was:

Once I asked myself those words, it was clear where I was headed. I've done my best to remember them since.

Comment by orthonormal on Uncritical Supercriticality · 2020-08-05T22:37:05.800Z · score: 7 (4 votes) · LW · GW

(looks around at 2020)

Comment by orthonormal on The Halo Effect · 2020-08-05T22:28:15.642Z · score: 2 (1 votes) · LW · GW

Interesting case of an evolved heuristic gone wrong in the modern world.

Mutational load correlates negatively with facial symmetry, height, strength, and IQ. Some of these are important in assessing (desirability or inevitability of) leadership, and others are easier to externally verify. So in a tribe, you could be forgiven for assuming that the more attractive people are going to end up powerful, and strategizing accordingly by making favor with them. (Bit of a Keynesian beauty contest there, but there is a signal at the root which keeps the equilibrium stable.)

However, in modern society, we're not sampling randomly from the population; the candidates for office, or for a job, have already been screened for some level of ability. And in fact, now the opposite pattern should hold, because you're conditioning on the collider: X is a candidate either because they're very capable or because they're somewhat capable and also attractive!

Since all tech interviews are being conducted online these days, I wonder if any company has been wise enough to snap up some undervalued talent by doing their interviews entirely without cameras...

Comment by orthonormal on No, Really, I've Deceived Myself · 2020-08-05T18:23:47.514Z · score: 5 (3 votes) · LW · GW

Gah, not to persist with the Simulacra discussion, but most religious people (and most people, most of the time, on most topics) are on Simulacra Level 3: beliefs are membership badges. Wingnuts, conspiracy theorists, and rationalists are out on Level 1, taking beliefs at face value.

I'm now thinking the woman mentioned here is on Level 4: she no longer really cares that she's admitting things that her tribe wouldn't say, she's declaring that she's one of them despite clearly being cynical about the tribal signs.

Comment by orthonormal on No, Really, I've Deceived Myself · 2020-08-05T18:16:58.371Z · score: 9 (3 votes) · LW · GW

I can't help but think of Simulacra Levels. She Wants To Be A Theist (aspiring to Level 3), but this is different from Actually Being A Theist (Level 3), let alone Actually Thinking That God Exists (Level 1). She's on Level 4, where she talks the way nobody on Level 3 would talk - Level 3's assert they are Level 1's; Level 4's assert they are Level 3's.

Comment by orthonormal on Dark Side Epistemology · 2020-08-05T17:59:26.876Z · score: 9 (3 votes) · LW · GW

Exactly - it's not epistemics, it's a peace treaty.

Comment by orthonormal on Stop Voting For Nincompoops · 2020-08-04T02:24:32.963Z · score: 6 (3 votes) · LW · GW

This felt a bit too much like naive purity ethics back in 2008, and it looks even worse in the light of the current situation in the USA.

As for a substantive criticism:

Consider these two clever-sounding game-theoretical arguments side by side:

1. You should vote for the less evil of the top mainstream candidates, because your vote is unlikely to make a critical difference if you vote for a candidate that most people don't vote for.
2. You should stay home, because your vote is unlikely to make a critical difference.

It's hard to see who should accept argument #1 but refuse to accept argument #2.

The reason this is wrong is that your vote non-negligibly changes the probability of each candidate winning if the election is close, but not otherwise. In particular, if the candidates are within the margin of error (that is, the confidence interval of their margin of victory includes zero), then an additional vote for one candidate has about a 1/2N chance of breaking a tie, where N is the width of the confidence interval*. So as I explained in that link, you should vote if you'd bother voting between those two candidates under a system where the winner was chosen by selecting a single ballot at random.

But if they're very much not within a margin of error, then an additional vote does have an exponentially small effect on the candidate's chances. That is the difference between #1 and #2.

*If this seems counterintuitive, consider that adding N votes to either candidate would probably assure their victory, so the average chance of swinging the election is nearly 1/2N if you add a random number between 0 and N to one side or the other.

Comment by orthonormal on Thiel on Progress and Stagnation · 2020-07-30T01:06:37.002Z · score: 14 (11 votes) · LW · GW

Name me one science fiction film that Hollywood produced in the last 25 years in which technology is portrayed in a positive light, in which it’s not dystopian, it doesn’t kill people, it doesn’t destroy the world, it doesn’t not work, etc., etc.

Contact, Interstellar, The Martian, Hidden Figures.

Technology does play the villain in a lot of movies, but you don't need a sinister reason for that: if you're writing a dramatic story that prominently features a nonhuman entity/force/environment, the most narratively convenient place to fit it in is as the antagonist. Most movies where people are in the wilderness end up being Man vs Nature, for the same reason.

Comment by orthonormal on Billionaire Economics · 2020-07-29T18:34:59.840Z · score: 15 (7 votes) · LW · GW

The author of the meme isn't a utilitarian, but I am. "A simple, cost-neutral way to reduce homelessness by 90%" is an obvious policy win, even if it's not literally "ending homelessness". How to help the 10% who are completely unhousable (due to their sanity or morality or behavior, etc) is a hard problem, but for goodness' sake, we can at least fix the easier problem!

Comment by orthonormal on Developmental Stages of GPTs · 2020-07-28T21:22:07.529Z · score: 4 (3 votes) · LW · GW

No, the closest analogue of comparing text snippets is staring at image completions, which is not nearly as informative as being able to go neuron-by-neuron or layer-by-layer and get a sense of the concepts at each level.

Comment by orthonormal on Developmental Stages of GPTs · 2020-07-28T17:48:00.638Z · score: 7 (4 votes) · LW · GW

I... oops. You're completely right, and I'm embarrassed. I didn't check the original, because I thought Gwern would have noted it if so. I'm going to delete that example.

What's really shocking is that I looked at what was the original poetry, and thought to myself, "Yeah, that could plausibly have been generated by GPT-3." I'm sorry, Emily.

Comment by orthonormal on What specific dangers arise when asking GPT-N to write an Alignment Forum post? · 2020-07-28T17:42:34.713Z · score: 5 (3 votes) · LW · GW

This was literally the first output, with no rerolls in the middle! (Although after posting it, I did some other trials which weren't as good, so I did get lucky on the first one. Randomness parameter was set to 0.5.)

I cut it off there because the next paragraph just restated the previous one.

Comment by orthonormal on What specific dangers arise when asking GPT-N to write an Alignment Forum post? · 2020-07-28T04:41:51.146Z · score: 32 (15 votes) · LW · GW

(sorry, couldn't resist)

This is the first post in an Alignment Forum sequence explaining the approaches both MIRI and OpenAI staff believe are the most promising means of auditing the cognition of very complex machine learning models. We will be discussing each approach in turn, with a focus on how they differ from one another.

The goal of this series is to provide a more complete picture of the various options for auditing AI systems than has been provided so far by any single person or organization. The hope is that it will help people make better-informed decisions about which approach to pursue.

We have tried to keep our discussion as objective as possible, but we recognize that there may well be disagreements among us on some points. If you think we've made an error, please let us know!

If you're interested in reading more about the history of AI research and development, see:

1. What Is Artificial Intelligence? (Wikipedia) 2. How Does Machine Learning Work? 3. How Can We Create Trustworthy AI?

The first question we need to answer is: what do we mean by "artificial intelligence"?

The term "artificial intelligence" has been used to refer to a surprisingly broad range of things. The three most common uses are:

The study of how to create machines that can perceive, think, and act in ways that are typically only possible for humans. The study of how to create machines that can learn, using data, in ways that are typically only possible for humans. The study of how to create machines that can reason and solve problems in ways that are typically only possible for humans.

In this sequence, we will focus on the third definition. We believe that the first two are much less important for the purpose of AI safety research, and that they are also much less tractable.

Why is it so important to focus on the third definition?

The third definition is important because, as we will discuss in later posts, it is the one that creates the most risk. It is also the one that is most difficult to research, and so it requires the most attention.

Comment by orthonormal on Are we in an AI overhang? · 2020-07-27T22:58:55.230Z · score: 8 (5 votes) · LW · GW

[EDIT: oops, I thought you were talking about the direct power consumption of the computation, not the extra hardware weight. My bad.]

It's not about the power consumption.

The air conditioner in your car uses 3 kW, and GPT-3 takes 0.4 kWH for 100 pages of output - thus a dedicated computer on AC power could produce 700 pages per hour, going substantially faster than AI Dungeon (literally and metaphorically). So a model as large as GPT-3 could run on the electricity of a car.

The hardware would be more expensive, of course. But that's different.

Comment by orthonormal on Developmental Stages of GPTs · 2020-07-27T19:51:53.903Z · score: 16 (6 votes) · LW · GW

Funny thing about BPEs: GPT-3 has to know about the individual letters, because I taught it how to spell both real words and nonsense words. (Prompts in bold, including two where I edited GPT-3's responses.)

The students were ready to test their spelling.
The teacher stood at the front of the class. "Adam, please spell PARTY."
The teacher said, "Correct. Beatrice, please spell THROUGH."
Beatrice replied, "Through. T-H-O-R-O-U-G-H."
The teacher said, "Incorrect. Through is spelled T-H-R-O-U-G-H. Carlos, please spell SPELL."
Carlos replied, "Spell. S-P-E-L-L."
The teacher said, "Correct. Daphne, please spell NUCLEAR."
Daphne replied, "Nuclear. N-U-C-L-E-A-R."
The teacher replied, "Incorrect. Infinite is spelled I-N-F-I-N-I-T-E. Beatrice, please spell BALLOON."
Beatrice replied, "Balloon. B-A-L-L-O-O-N."
The teacher replied, "Correct. Carlos, please spell ENCLOSURE."
Carlos replied, "Enclosure. I-N-C-L-O-S-U-R-E."
The teacher replied, "Incorrect. Enclosure is spelled E-N-C-L-O-S-U-R-E. Daphne, please spell ELECTRON."
Daphne replied, "Electron. E-L-E-C-T-R-O-N."
The teacher said, "Correct. Carlos, please spell MACHINE."
Carlos replied, "Machine. M-A-C."
The teacher said, "Incorrect. Machine is spelled M-A-C-H-I-N-E. Beatrice, please spell SCIENTIST."
Beatrice replied, "Scientist. S-C-I-E-N-T-I-S-T."
The teacher said, "Correct. Daphne, please spell ASTRONOMER.
Daphne replied, "Astronomer. A-S-T-R-O-N-O-M-E-R."
The teacher said, "Correct. Carlos, please spell FLARBLE.
Carlos replied, "Flarble. F-L-A-R-B-L-E.

----------------

You've done much more advanced stuff, of course.

Comment by orthonormal on Are we in an AI overhang? · 2020-07-27T19:28:24.715Z · score: 42 (20 votes) · LW · GW

I'm imagining a tiny AI Safety organization, circa 2010, that focused on how to achieve probable alignment for scaled-up versions of that year's state-of-the-art AI designs. It's interesting to ask whether that organization would have achieved more or less than MIRI has, in terms of generalizable work and in terms of field-building.

Certainly it would have resulted in a lot of work that was initially successful but ultimately dead-end. But maybe early concrete results would have attracted more talent/attention/respect/funding, and the org could have thrown that at DL once it began to win the race.

On the other hand, maybe committing to 2010's AI paradigm would have made them a laughingstock by 2015, and killed the field. Maybe the org would have too much inertia to pivot, and it would have taken away the oxygen for anyone else to do DL-compatible AI safety work. Maybe it would have stated its problems less clearly, inviting more philosophical confusion and even more hangers-on answering the wrong questions.

Or, worst, maybe it would have made a juicy target for a hostile takeover. Compare what happened to nanotechnology research (and nanotech safety research) when too much money got in too early - savvy academics and industry representatives exiled Drexler from the field he founded so that they could spend the federal dollars on regular materials science and call it nanotechnology.

Comment by orthonormal on Developmental Stages of GPTs · 2020-07-27T17:34:29.162Z · score: 7 (5 votes) · LW · GW

That's not what I mean by planning. I mean "outputting a particular word now because most alternatives would get you stuck later".

An example is rhyming poetry. GPT-3 has learned to maintain the rhythm and the topic, and to end lines with rhyme-able words. But then as it approaches the end of the next line, it's painted itself into a corner- there very rarely exists a word that completes the meter of the line, makes sense conceptually and grammatically, and rhymes exactly or approximately with the relevant previous line.

When people are writing rhyming metered poetry, we do it by having some idea where we're going - setting ourselves up for the rhyme in advance. It seems that GPT-3 isn't doing this.

...but then again, if it's rewarded only for predictions one word at a time, why should it learn to do this? And could it learn the right pattern if given a cost function on the right kind of time horizon?

As for why your example isn't what I'm talking about, there's no point at which it needs to think about later words in order to write the earlier words.

Comment by orthonormal on Developmental Stages of GPTs · 2020-07-27T01:36:15.002Z · score: 16 (7 votes) · LW · GW

That's not a nitpick at all!

Upon reflection, the structured sentences, thematically resolved paragraphs, and even JSX code can be done without a lot of real lookahead. And there's some evidence it's not doing lookahead - its difficulty completing rhymes when writing poetry, for instance.

(Hmm, what's the simplest game that requires lookahead that we could try to teach to GPT-3, such that it couldn't just memorize moves?)

Thinking about this more, I think that since planning depends on causal modeling, I'd expect the latter to get good before the former. But I probably overstated the case for its current planning capabilities, and I'll edit accordingly. Thanks!

Comment by orthonormal on Developmental Stages of GPTs · 2020-07-27T01:09:12.937Z · score: 5 (3 votes) · LW · GW

The outer optimizer is the more obvious thing: it's straightforward to say there's a big difference in dealing with a superhuman Oracle AI with only the goal of answering each question accurately, versus one whose goals are only slightly different from that in some way. Inner optimizers are an illustration of another failure mode.

Comment by orthonormal on Eliezer Yudkowsky Facts · 2020-07-22T19:36:09.527Z · score: 2 (1 votes) · LW · GW

Since someone just upvoted this long-ago comment, I'd just like to point out that I made this joke years before the HPMoR "Final Exam".

Comment by orthonormal on $1000 bounty for OpenAI to show whether GPT3 was "deliberately" pretending to be stupider than it is · 2020-07-22T01:35:38.378Z · score: 10 (7 votes) · LW · GW The simpler explanation is that GPT-3 didn't understand the pattern, got the first few mostly wrong, and more easily grasped the pattern "Eugene tells John he is incorrect" than the pattern of balanced/imbalanced parentheses. Comment by orthonormal on What Would I Do? Self-prediction in Simple Algorithms · 2020-07-21T17:32:29.300Z · score: 6 (3 votes) · LW · GW Re: Ben's question but much lower-level, there's some extent to which a logical inductor keeps having to accept actual failures sometimes as the price of preventing spurious counterfactuals, where in our world we model the consequences of certain actions without ever doing them. It's a free lunch type of thing; we assume our world has way more structure than just "the universe is a computable environment", so that we can extrapolate reliably in many cases. Strictly logically, we can't assume that- the universe could be set up to violate physics and reward you personally if you dive into a sewer, but you'll never find that out because that's an action you won't voluntarily take. So if the universe appears to run by simple rules, you can often do without exploration; but if you can't assume it is and always will be run by those rules, then you need to accept failures as the price of knowledge. Comment by orthonormal on Tolerate Tolerance · 2020-07-13T03:04:36.342Z · score: 4 (2 votes) · LW · GW The application of this principle to [outrage over the comments and commenters which a blogger declines to delete/ban] is left as an exercise for the reader. Comment by orthonormal on Was a PhD necessary to solve outstanding math problems? · 2020-07-11T17:08:33.681Z · score: 3 (2 votes) · LW · GW Brain plasticity. I wondered whether I should put "given that the two are the same intelligence and age" into the last paragraph. Comment by orthonormal on Was a PhD necessary to solve outstanding math problems? · 2020-07-11T00:10:02.482Z · score: 4 (2 votes) · LW · GW One way that mathematics is different from the other sciences is that, since the last time it had to repair its foundations around 1900, progress within it doesn't get obviated by new technology. Biologists who've spent a career using one tool can be surpassed quickly by anyone who's mastered the new, better tool. Not the case for mathematicians. (Maybe computer-verified and computer-generated proofs will change that, but they really haven't yet in almost any domains.) That means that someone who's put years of work into a mathematical field has a strong advantage over someone who hasn't; and if you're going to put years of full-time effort into mathematics anyway, why not get a PhD for it? Comment by orthonormal on Simulacra Levels and their Interactions · 2020-07-09T19:56:22.978Z · score: 6 (3 votes) · LW · GW I understood this better when I made it more concrete, though not with the same phrase at all levels. 1. W says "There’s not a pandemic headed our way from China" simply because W believes there’s not a pandemic headed our way from China. W is making a claim, turns out to be wrong, and is surprised. 2. X says "Masks aren't effective against COVID-19" because X is a government agency who wants to reserve masks for frontline workers, and hasn't thought up any strategies besides lying to the nation. X knows they were probably lying. 3. Y reflexively shares a meme saying "The anti-mask folks are definitely getting sick from their protests" because Y's friends are all pro-mask. Y's not actually trying to reach anyone who might be dissuaded from going to those protests. Y will switch on a dime once Y's friends identify with the next set of protests, and share stories about how outdoor protests are actually safe - Y never remembers that they were wrong before. 4. Journalist Z writes an article mocking the tech weirdos who stopped shaking hands in February out of fear that coronavirus was coming. Z isn't following a crowd, Z's making a step to shift the crowd- making a bet that Z, and tech journalism in general, will look good to the crowd for dunking on the tech nerds. Z will be very surprised, later, to find out that there were actual real-world events at stake. Comment by orthonormal on A Sense That More Is Possible · 2020-07-08T19:01:07.128Z · score: 10 (4 votes) · LW · GW Unfortunately, this hasn't aged very impressively. Despite the attempts to build the promised dojo (CFAR, Leverage/Paradigm, the EA Hotel, Dragon Army, probably several more that I'm missing), rationalists aren't winning in this way. The most impressive result so far is that a lot of mid-tier powerful people read Slate Star Codex, but I think most of that isn't about carrying on the values Eliezer is trying to construct in this sequence - Scott is a good writer on many topics, most of which are at best rationality-adjacent. The second most impressive result is the power of the effective altruism movement, but that's also not the same thing Eliezer was pointing at here. The remaining positive results of the 2009 rationality community are a batch of happy group houses, and MIRI chugging along its climb (thanks to hard-to-replicate personalities like Eliezer and Nate). I think the "all you need is to try harder" stance is inferior to the "try to make a general postmortem of 'rationalist dojo' projects in general" stance, and I'd like to see a systematic attempt at the latter, assembling public information and interviewing people in all of these groups, and integrating all the data on why they failed to live up to their promises. Comment by orthonormal on Map Errors: The Good, The Bad, and The Territory · 2020-06-29T03:27:58.976Z · score: 2 (1 votes) · LW · GW #1 is just inevitable in all but a few perfectly specified domains. The map can't contain the entire territory. #2 is what I'm discussing in this post; it's the one we rationalists try most to notice and combat. (Beliefs paying rent and all.) #3 is fine; I'm not as worried about [maps that admit they don't know what's beyond the mountain] as I'm worried about [maps that fabricate the territory beyond the mountain]. #4. For sufficiently perfect predictive power, the difference between map and territory becomes an epiphenomenon, so I don't worry about this either. Comment by orthonormal on Self-sacrifice is a scarce resource · 2020-06-28T17:29:19.364Z · score: 12 (6 votes) · LW · GW This is super important, and I'm curious what your process of change was like. (I'm working on an analogous change- I've been terrified of letting people down for my whole adult life.) Comment by orthonormal on A reply to Agnes Callard · 2020-06-28T17:21:14.546Z · score: 16 (10 votes) · LW · GW I forget who responded with the kernel of this argument, but it wasn't mine: Saying the incentives should be different doesn't mean pretending they are different. In an ideal world, news organizations would have good standards and would not give in to external pressure on those standards. In our world, news organizations have some bad standards and give in to external pressure from time to time. Reaching a better world has to come from making de-escalation treaties or changing the overall incentives. Unilaterally disarming (by refusing to even sign petitions) has the completely predictable consequence that the NYT will compromise their standards in directions that we dislike, because the pressure would be high in each direction but ours. Comment by orthonormal on DontDoxScottAlexander.com - A Petition · 2020-06-28T17:12:10.300Z · score: 10 (2 votes) · LW · GW What they say is that they don't respect pseudonyms in stories unless there's a compelling reason to do so in that particular case. There appears to be a political bias to the exceptions, but good luck getting an editor to admit that even to themself, let alone to others. Comment by orthonormal on Map Errors: The Good, The Bad, and The Territory · 2020-06-27T19:03:14.761Z · score: 2 (1 votes) · LW · GW That more or less covers the advice at the end, but the rest of my post feels very valuable to my model of rationality. Comment by orthonormal on Covid 6/25: The Dam Breaks · 2020-06-26T00:55:40.640Z · score: 27 (14 votes) · LW · GW When you say that "our civilization was inadequate [to the task of suppressing COVID-19]", I just want to emphasize that "our civilization" means only the USA, not Western civilization in general. The EU got hit harder at first and has since then performed well; you can blame them for not taking it seriously early enough, but you certainly can't accuse them of the level of dysfunction you see here. In general, I like the framing that the United States is running on the worst legacy code of any Western democracy; the UK's is older but was more amenable to modern patches. Never underestimate the degree to which the US government is just the least efficient government of any developed nation. Comment by orthonormal on The EMH Aten't Dead · 2020-05-19T16:23:58.206Z · score: 2 (1 votes) · LW · GW The catch though, from a couple of times I've tried placing big bets on unlikely events, is that (most) bookmakers don't seem to accept them. They might accept a$100 bet but not a \$1000 one on such odds. They suspect you have inside information. (The same happens I've heard if you repeatedly win at roulette in some casinos. Goons appear and instruct you firmly that you may only bet on the low-stakes tables from now on.)

Right, the EMH doesn't fully apply when sharks can't swoop in with bets large enough to overwhelm the confederacy of Georges. The odds bookies offer are a hybrid between a market and a democracy.