Posts
Comments
If this post is selected, I'd like to see the followup made into an addendum—I think it adds a very important piece, and it should have been nominated itself.
I think this post (and similarly, Evan's summary of Chris Olah's views) are essential both in their own right and as mutual foils to MIRI's research agenda. We see related concepts (mesa-optimization originally came out of Paul's talk of daemons in Solomonoff induction, if I remember right) but very different strategies for achieving both inner and outer alignment. (The crux of the disagreement seems to be the probability of success from adapting current methods.)
Strongly recommended for inclusion.
It's hard to know how to judge a post that deems itself superseded by a post from a later year, but I lean toward taking Daniel at his word and hoping we survive until the 2021 Review comes around.
I can't think of a question on which this post narrows my probability distribution.
Not recommended.
The content here is very valuable, even if the genre of "I talked a lot with X and here's my articulation of X's model" comes across to me as a weird intellectual ghostwriting. I can't think of a way around that, though.
That being said, I'm not very confident this piece (or any piece on the current state of AI) will still be timely a year from now, so maybe I shouldn't recommend it for inclusion after all.
Ironically enough for Zack's preferred modality, you're asserting that even though this post is reasonable when decoupled from the rest of the sequence, it's worrisome when contextualized.
I agree about the effects of deep learning hype on deep learning funding, though I think very little of it has been AGI hype; people at the top level had been heavily conditioned to believe we were/are still in the AI winter of specialized ML algorithms to solve individual tasks. (The MIRI-sphere had to work very hard, before OpenAI and DeepMind started doing externally impressive things, to get serious discussion on within-lifetime timelines from anyone besides the Kurzweil camp.)
Maybe Demis was strategically overselling DeepMind, but I expect most people were genuinely over-optimistic (and funding-seeking) in the way everyone in ML always is.
This is a retroactively obvious concept that I'd never seen so clearly stated before, which makes it a fantastic contribution to our repertoire of ideas. I've even used it to sanity-check my statements on social media. Well, I've tried.
Recommended, obviously.
This reminds me of That Alien Message, but as a parable about mesa-alignment rather than outer alignment. It reads well, and helps make the concepts more salient. Recommended.
This makes a simple and valuable point. As discussed in and below Anna's comment, it's very different when applied to a person who can interact with you directly versus a person whose works you read. But the usefulness in the latter context, and the way I expect new readers to assume that context, leads me to recommend it.
I liked the comments on this post more than I liked the post itself. As Paul commented, there's as much criticism of short AGI timelines as there is of long AGI timelines; and as Scott pointed out, this was an uncharitable take on AI proponents' motives.
Without the context of those comments, I don't recommend this post for inclusion.
I've referred and linked to this post in discussions outside the rationalist community; that's how important the principle is. (Many people understand the idea in the domain of consent, but have never thought about it in the domain of epistemology.)
Recommended.
As mentioned in my comment, this book review overcame some skepticism from me and explained a new mental model about how inner conflict works. Plus, it was written with Kaj's usual clarity and humility. Recommended.
I stand by this piece, and I now think it makes a nice complement to discussions of GPT-3. In both cases, we have significant improvements in chunking of concepts into latent spaces, but we don't appear to have anything like a causal model in either. And I've believed for several years that causal reasoning is the thing that puts us in the endgame.
(That's not to say either system would still be safe if scaled up massively; mesa-optimization would be a reason to worry.)
I never found a Coherence Therapy practitioner, but I found a really excellent IFS practitioner who's helped me break down many of my perpetual hangups in ways compatible with this post.
In particular, one difference between the self-IFS I'd attempted before is that I'd previously tried to destroy some parts as irrational or hypocritical, where the therapist was very good at being non-judgmental towards them. That approach paid better dividends.
Can't update on #4. Of course a rapidly growing new strain will have a negligible impact on total numbers early on; it's a question of whether it will dominate the total numbers in a few months.
Remind me which bookies count and which don't, in the context of the proofs of properties?
If any computable bookie is allowed, a non-Bayesian is in trouble against a much larger bookie who can just (maybe through its own logical induction) discover who the bettor is and how to exploit them.
[EDIT: First version of this comment included "why do convergence bettors count if they don't know the bettor will oscillate", but then I realized the answer while Abram was composing his response, so I edited that part out. Editing it back in so that Abram's reply has context.]
Not the most important thing, but Adler and Colbert's situations feel rather different to me.
Colbert is bubbled with a small team in order to provide mass entertainment to the nation... just like sports teams, which you endorse.
Adler is partying for his own benefit.
To steelman the odds' consistency (though I agree with you that the market isn't really reflecting careful thinking from enough people), Biden is farther ahead in the 538 projection now than he was before, but on the other hand, Trump has completely gotten away with refusing to commit to a peaceful transfer of power. Even if that's not the most surprising thing in the world (how far indeed we have fallen), it wasn't at 100% two months ago.
There's certainly a tradeoff involved in using a disputed example as your first illustration of a general concept (here, Bayesian reasoning vs the Traditional Scientific Method).
I can't help but think of Scott Alexander's long posts, where usually there's a division of topics between roman-numeraled sections, but sometimes it seems like it's just "oh, it's been too long since the last one, got to break it up somehow". I do think this really helps with readability; it reminds the reader to take a breath, in some sense.
Or like, taking something that works together as a self-contained thought but is too long to serve the function of a paragraph, and just splitting it by adding a superficially segue-like sentence at the start of the second part.
It may not be possible to cleanly divide the Technical Explanation into multiple posts that each stand on their own, but even separating it awkwardly into several chapters would make it less intimidating and invite more comments.
(I think this may be the longest post in the Sequences.)
I forget if I've said this elsewhere, but we should expect human intelligence to be just a bit above the bare minimum required to result in technological advancement. Otherwise, our ancestors would have been where we are now.
(Just a bit above, because there was the nice little overhang of cultural transmission: once the hardware got good enough, the software could be transmitted way more effectively between people and across generations. So we're quite a bit more intelligent than our basically anatomically equivalent ancestors of 500,000 years ago. But not as big a gap as the gap from that ancestor to our last common ancestor with chimps, 6-7 million years ago.)
Additional hypothesis: everything is becoming more political than it has been since the Civil War, to the extent that any celebration of a new piece of construction/infrastructure/technology would also be protested. (I would even agree with the protesters in many cases! Adding more automobile infrastructure to cities is really bad!)
The only things today [where there's common knowledge that the demonstration will swamp any counter-demonstration] are major local sports achievements.
(I notice that my model is confused in the case of John Glenn's final spaceflight. NASA achievements would normally be nonpartisan, but Glenn was a sitting Democratic Senator at the time of the mission! I guess they figured that in heavily Democratic NYC, not enough Republicans would dare to make a stink.)
Eliezer's mistake here was that he didn't, before the QM sequence, write a general post to the effect that you don't have an additional Bayesian burden of proof if your theory was proposed chronologically later. Given such a reference, it would have been a lot simpler to refer to that concept without it seeming like special pleading here.
It's not explicit. Like I said, the terms are highly dependent in reality, but for intuition you can think of a series of variables for from to , where equals with probability . And think of as pretty large.
So most of the time, the sum of these is dominated by a lot of terms with small contributions. But every now and then, a big one hits and there's a huge spike.
(I haven't thought very much about what functions of and I'd actually use if I were making a principled model; and are just there for illustrative purposes, such that the sum is expected to have many small terms most of the time and some very large terms occasionally.)
No. My model is the sum of a bunch of random variables for possible conflicts (these variables are not independent of each other), where there are a few potential global wars that would cause millions or billions of deaths, and lots and lots of tiny wars each of which would add a few thousand deaths.
This model predicts a background rate of the sum of the smaller ones, and large spikes to the rate whenever a larger conflict happens. Accordingly, over the last three decades (with the tragic exception of the Rwandan genocide) total war deaths per year (combatants + civilians) have been between 18k and 132k (wow, the Syrian Civil War has been way worse than the Iraq War, I didn't realize that).
So my median is something like 1M people dying over the decade, because I view a major conflict as under 50% likely, and we could easily have a decade as peaceful (no, really) as the 2000s.
An improvement in this direction: the Fed has just acknowledged, at least, that it is possible for inflation to be too low as well as too high, that inflation targeting needs to acknowledge that the US has been consistently undershooting its goal, and that this leads to the further feedback of the market expecting the US to continue undershooting its goal. And then it explains and commits to average inflation targeting:
We have also made important changes with regard to the price-stability side of our mandate. Our longer-run goal continues to be an inflation rate of 2 percent. Our statement emphasizes that our actions to achieve both sides of our dual mandate will be most effective if longer-term inflation expectations remain well anchored at 2 percent. However, if inflation runs below 2 percent following economic downturns but never moves above 2 percent even when the economy is strong, then, over time, inflation will average less than 2 percent. Households and businesses will come to expect this result, meaning that inflation expectations would tend to move below our inflation goal and pull realized inflation down. To prevent this outcome and the adverse dynamics that could ensue, our new statement indicates that we will seek to achieve inflation that averages 2 percent over time. Therefore, following periods when inflation has been running below 2 percent, appropriate monetary policy will likely aim to achieve inflation moderately above 2 percent for some time.
Of course, this say nothing about how they intend to achieve this—seigniorage has its downsides—but I expect Eliezer would see it as good news.
The claim that came to my mind is that the conscious mind is the mesa-optimizer here, the original outer optimizer being a riderless elephant.
When University of North Carolina students learned that a speech opposing coed dorms had been banned, they became more opposed to coed dorms (without even hearing the speech). (Probably in Ashmore et. al. 1971.)
De-platforming may be effective in a different direction than intended.
That link is now broken, unfortunately. Here's a working one.
It's a great story of an anthropologist who, one night, tells the story of Hamlet to the Tiv tribe in order to see how they react to it. They get invested in the story, but tell her that she must be telling it wrong, as the details are things that wouldn't be permissible in their culture. At the end they explain what really must have happened in that story (involving Hamlet being actually mad, due to witchcraft) and ask her to tell them more stories.
In addition to the other thread on this, some of the usage of "I'm not sure what I think about that" matches "I notice that I am confused". Namely, that your observations don't fit your current model, and your model needs to be updated, but you don't know where.
And this is much trickier to get a handle on, from the inside, than estimating the probability of something within your model.
As always, there's the difference between "we're all doomed to be biased, so I might as well carry on with whatever I was already doing" and "we're all doomed to be somewhat biased, but less biased is better than more biased, so let's try and mitigate them as we go".
Someone really ought to name a website along those lines.
"I don't think we have to wait to scan a whole brain. Neural networks are just like the human brain, and you can train them to do things without knowing how they do them. We'll create programs that will do arithmetic without we, our creators, ever understanding how they do arithmetic."
This sort of anti-predicts the deep learning boom, but only sort of.
Fully connected networks didn't scale effectively; researchers had to find (mostly principled, but some ad-hoc) network structures that were capable of more efficiently learning complex patterns.
Also, we've genuinely learned more about vision by realizing the effectiveness of convolutional neural nets.
And yet, the state of the art is to take a generalizable architecture and to scale it massively, not needing to know anything new about the domain, nor learning much new about it. So I do think Eliezer loses some Bayes points for his analogy here, as it applies to games and to language.
When I design a toaster oven, I don't design one part that tries to get electricity to the coils and a second part that tries to prevent electricity from getting to the coils.
On the other hand, there was a fleeting time (after this post) when generative adversarial networks were the king of some domains. And more fairly as counterpoints go, the body is subject to a single selective pressure (as opposed to the pressures for two rival species), and yet our brains and immune systems are riddled with systems whose whole purpose is to selectively suppress each other.
Of course there are features of the ecosystem that don't match any plausible goal of a humanized creator, but the analogy is on wobblier ground than Eliezer seems to have thought.
For me, I'd already absorbed all the right arguments against my religion, as well as several years' worth of assiduously devouring the counterarguments (which were weak, but good enough to push back my doubts each time). What pushed me over the edge, the bit of this that I reinvented for myself, was:
"What would I think about these arguments if I hadn't already committed myself to faith?"
Once I asked myself those words, it was clear where I was headed. I've done my best to remember them since.
(looks around at 2020)
Interesting case of an evolved heuristic gone wrong in the modern world.
Mutational load correlates negatively with facial symmetry, height, strength, and IQ. Some of these are important in assessing (desirability or inevitability of) leadership, and others are easier to externally verify. So in a tribe, you could be forgiven for assuming that the more attractive people are going to end up powerful, and strategizing accordingly by making favor with them. (Bit of a Keynesian beauty contest there, but there is a signal at the root which keeps the equilibrium stable.)
However, in modern society, we're not sampling randomly from the population; the candidates for office, or for a job, have already been screened for some level of ability. And in fact, now the opposite pattern should hold, because you're conditioning on the collider: X is a candidate either because they're very capable or because they're somewhat capable and also attractive!
Since all tech interviews are being conducted online these days, I wonder if any company has been wise enough to snap up some undervalued talent by doing their interviews entirely without cameras...
Gah, not to persist with the Simulacra discussion, but most religious people (and most people, most of the time, on most topics) are on Simulacra Level 3: beliefs are membership badges. Wingnuts, conspiracy theorists, and rationalists are out on Level 1, taking beliefs at face value.
I'm now thinking the woman mentioned here is on Level 4: she no longer really cares that she's admitting things that her tribe wouldn't say, she's declaring that she's one of them despite clearly being cynical about the tribal signs.
I can't help but think of Simulacra Levels. She Wants To Be A Theist (aspiring to Level 3), but this is different from Actually Being A Theist (Level 3), let alone Actually Thinking That God Exists (Level 1). She's on Level 4, where she talks the way nobody on Level 3 would talk - Level 3's assert they are Level 1's; Level 4's assert they are Level 3's.
Exactly - it's not epistemics, it's a peace treaty.
This felt a bit too much like naive purity ethics back in 2008, and it looks even worse in the light of the current situation in the USA.
As for a substantive criticism:
Consider these two clever-sounding game-theoretical arguments side by side:
- You should vote for the less evil of the top mainstream candidates, because your vote is unlikely to make a critical difference if you vote for a candidate that most people don't vote for.
- You should stay home, because your vote is unlikely to make a critical difference.
It's hard to see who should accept argument #1 but refuse to accept argument #2.
The reason this is wrong is that your vote non-negligibly changes the probability of each candidate winning if the election is close, but not otherwise. In particular, if the candidates are within the margin of error (that is, the confidence interval of their margin of victory includes zero), then an additional vote for one candidate has about a 1/2N chance of breaking a tie, where N is the width of the confidence interval*. So as I explained in that link, you should vote if you'd bother voting between those two candidates under a system where the winner was chosen by selecting a single ballot at random.
But if they're very much not within a margin of error, then an additional vote does have an exponentially small effect on the candidate's chances. That is the difference between #1 and #2.
*If this seems counterintuitive, consider that adding N votes to either candidate would probably assure their victory, so the average chance of swinging the election is nearly 1/2N if you add a random number between 0 and N to one side or the other.
Name me one science fiction film that Hollywood produced in the last 25 years in which technology is portrayed in a positive light, in which it’s not dystopian, it doesn’t kill people, it doesn’t destroy the world, it doesn’t not work, etc., etc.
Contact, Interstellar, The Martian, Hidden Figures.
Technology does play the villain in a lot of movies, but you don't need a sinister reason for that: if you're writing a dramatic story that prominently features a nonhuman entity/force/environment, the most narratively convenient place to fit it in is as the antagonist. Most movies where people are in the wilderness end up being Man vs Nature, for the same reason.
The author of the meme isn't a utilitarian, but I am. "A simple, cost-neutral way to reduce homelessness by 90%" is an obvious policy win, even if it's not literally "ending homelessness". How to help the 10% who are completely unhousable (due to their sanity or morality or behavior, etc) is a hard problem, but for goodness' sake, we can at least fix the easier problem!
No, the closest analogue of comparing text snippets is staring at image completions, which is not nearly as informative as being able to go neuron-by-neuron or layer-by-layer and get a sense of the concepts at each level.
I... oops. You're completely right, and I'm embarrassed. I didn't check the original, because I thought Gwern would have noted it if so. I'm going to delete that example.
What's really shocking is that I looked at what was the original poetry, and thought to myself, "Yeah, that could plausibly have been generated by GPT-3." I'm sorry, Emily.
This was literally the first output, with no rerolls in the middle! (Although after posting it, I did some other trials which weren't as good, so I did get lucky on the first one. Randomness parameter was set to 0.5.)
I cut it off there because the next paragraph just restated the previous one.
(sorry, couldn't resist)
This is the first post in an Alignment Forum sequence explaining the approaches both MIRI and OpenAI staff believe are the most promising means of auditing the cognition of very complex machine learning models. We will be discussing each approach in turn, with a focus on how they differ from one another.
The goal of this series is to provide a more complete picture of the various options for auditing AI systems than has been provided so far by any single person or organization. The hope is that it will help people make better-informed decisions about which approach to pursue.
We have tried to keep our discussion as objective as possible, but we recognize that there may well be disagreements among us on some points. If you think we've made an error, please let us know!
If you're interested in reading more about the history of AI research and development, see:
1. What Is Artificial Intelligence? (Wikipedia) 2. How Does Machine Learning Work? 3. How Can We Create Trustworthy AI?
The first question we need to answer is: what do we mean by "artificial intelligence"?
The term "artificial intelligence" has been used to refer to a surprisingly broad range of things. The three most common uses are:
The study of how to create machines that can perceive, think, and act in ways that are typically only possible for humans. The study of how to create machines that can learn, using data, in ways that are typically only possible for humans. The study of how to create machines that can reason and solve problems in ways that are typically only possible for humans.
In this sequence, we will focus on the third definition. We believe that the first two are much less important for the purpose of AI safety research, and that they are also much less tractable.
Why is it so important to focus on the third definition?
The third definition is important because, as we will discuss in later posts, it is the one that creates the most risk. It is also the one that is most difficult to research, and so it requires the most attention.
[EDIT: oops, I thought you were talking about the direct power consumption of the computation, not the extra hardware weight. My bad.]
It's not about the power consumption.
The air conditioner in your car uses 3 kW, and GPT-3 takes 0.4 kWH for 100 pages of output - thus a dedicated computer on AC power could produce 700 pages per hour, going substantially faster than AI Dungeon (literally and metaphorically). So a model as large as GPT-3 could run on the electricity of a car.
The hardware would be more expensive, of course. But that's different.