Explaining why false ideas spread is more fun than why true ones do 2019-11-24T20:21:50.906Z · score: 30 (12 votes)
Will transparency help catch deception? Perhaps not 2019-11-04T20:52:52.681Z · score: 46 (13 votes)
Two explanations for variation in human abilities 2019-10-25T22:06:26.329Z · score: 74 (31 votes)
Misconceptions about continuous takeoff 2019-10-08T21:31:37.876Z · score: 70 (29 votes)
A simple environment for showing mesa misalignment 2019-09-26T04:44:59.220Z · score: 64 (26 votes)
One Way to Think About ML Transparency 2019-09-02T23:27:44.088Z · score: 20 (8 votes)
Has Moore's Law actually slowed down? 2019-08-20T19:18:41.488Z · score: 13 (9 votes)
How can you use music to boost learning? 2019-08-17T06:59:32.582Z · score: 10 (5 votes)
A Primer on Matrix Calculus, Part 3: The Chain Rule 2019-08-17T01:50:29.439Z · score: 10 (4 votes)
A Primer on Matrix Calculus, Part 2: Jacobians and other fun 2019-08-15T01:13:16.070Z · score: 22 (10 votes)
A Primer on Matrix Calculus, Part 1: Basic review 2019-08-12T23:44:37.068Z · score: 23 (10 votes)
Matthew Barnett's Shortform 2019-08-09T05:17:47.768Z · score: 7 (5 votes)
Why Gradients Vanish and Explode 2019-08-09T02:54:44.199Z · score: 27 (14 votes)
Four Ways An Impact Measure Could Help Alignment 2019-08-08T00:10:14.304Z · score: 21 (25 votes)
Understanding Recent Impact Measures 2019-08-07T04:57:04.352Z · score: 17 (6 votes)
What are the best resources for examining the evidence for anthropogenic climate change? 2019-08-06T02:53:06.133Z · score: 11 (8 votes)
A Survey of Early Impact Measures 2019-08-06T01:22:27.421Z · score: 22 (7 votes)
Rethinking Batch Normalization 2019-08-02T20:21:16.124Z · score: 21 (7 votes)
Understanding Batch Normalization 2019-08-01T17:56:12.660Z · score: 19 (7 votes)
Walkthrough: The Transformer Architecture [Part 2/2] 2019-07-31T13:54:44.805Z · score: 9 (9 votes)
Walkthrough: The Transformer Architecture [Part 1/2] 2019-07-30T13:54:14.406Z · score: 30 (13 votes)


Comment by matthew-barnett on [deleted post] 2019-12-10T10:25:35.865Z

After thinking about your reply, I've come to the conclusion that my thoughts are currently too confused to continue explaining. I've edited the main post to add that detail.

Comment by matthew-barnett on [deleted post] 2019-12-10T08:32:21.503Z
Isn't this true of every computer program? This sounds like an argument that AI can never be robust in functionality, which seems to prove too much. (If you actually mean this, I think your use of the word "robust" has diverged from the property I care about.)

When we describe the behavior of a system, we typically operate at varying levels of abstraction. I'm not making an argument about the fragility of the substrate that the system is on, but rather the fragility of the parts that we typically use to describe the system at an appropriate level of abstraction.

When we describe the functionality of an artificial neural network, we tend to speak about model weights and computational graphs, which do tolerate slight modifications. On the other hand, when we describe the functionality of A* search, we tend to speak about single lines of code that do stuff, which generally don't tolerate slight modifications.

I would like to see an example of being unable to adapt to sudden, unpredictable changes; that doesn't match my explanation of why logical systems fail: I would say that they make assumptions that turn out to be false, with a particularly common case being the failure of the designer to consider some particular edge case, and there are enough edge cases that these failures become too common.

I'm not sure I understand the difference between a logical agent encountering a sudden, unpredictable change to its environment and a logical agent entering a regime where its operating assumptions turned out to be false. The reason why anyone would be in an unpredictable situation is because they assumed they would be in a different environment.

In any case, I'll re-write that part to be more clear.

In the previous paragraph, I thought you argued that logical / "old" AI has robust specification but not robust functionality. But the worry with mesa optimizers is the exact opposite; that they will have robust functionality but not robust specification.

Consider the particular mode of failure in logical systems that you highlighted above: the system makes a false assumption about the world. Since my operating assumption was that a mesa optimizer will use some form of logical reasoning, then they are therefore liable to make a false assumption about their enviroment which causes them to make an incorrect decision.

Note that I'm not saying that the primary issue with mesa optimizers is that they'll make some logical mistake, only that they could. In truth, the primary thesis of my post was that efforts to solve one type of robustness may not automatically carry over, because they don't respect the decomposition.

What does "brittle" mean here? You can say the same of humans; are humans "brittle"?

Consider the scalar definition of robustness: how well did you do during your performance off some training distribution? In this case, many humans are brittle, since they are not doing well according to inclusive fitness. Even within their own lives, humans don't pursue the goals they set for themselves 10 years ago. There's a lot of ways in which humans are brittle in this sense.

Overall, I think your confusions are warranted, and I wish I shared the post with others before sending out, and I may re-write some sections to make my main point more clear for any future readers.

Comment by matthew-barnett on What are some non-purely-sampling ways to do deep RL? · 2019-12-08T22:57:54.307Z · score: 9 (3 votes) · LW · GW

For the Alignment Newsletter:

Summary: A deep reinforcement learning agent trained by reward samples alone may predictably lead to a proxy alignment issue: the learner could fail to develop a full understanding of what behavior it is being rewarded for, and thus behave unacceptably when it is taken off its training distribution. Since we often use explicit specifications to define our reward functions, Evan Hubinger asks how we can incorporate this information into our deep learning models so that they remain aligned off the training distribution. He names several possibilities for doing so, such as giving the deep learning model access to a differentiable copy of the reward function during training, and fine-tuning a language model so that it can map natural language descriptions of a reward function into optimal actions.

Opinion: I'm unsure, though leaning skeptical, whether incorporating a copy of the reward function into a deep learning model would help it learn. My guess is that if someone did that with a current model it would make the model harder to train, rather than making anything easier. I will be excited if someone can demonstrate at least one feasible approach to addressing proxy alignment that does more than sample the reward function.

Comment by matthew-barnett on Arguments Against Speciesism · 2019-12-03T04:50:49.385Z · score: 1 (1 votes) · LW · GW
women and people from all "races" fought for their rights, when pigs and cattle will stand and let us know they've had enough, it'll be time to consider the question.

To generalize the principle you have described, should we never give a group moral consideration unless they can advocate for themselves? If we adopted this principle, then young children would immediately lose moral consideration, as would profoundly mentally disabled people.

Comment by matthew-barnett on Strategic implications of AIs' ability to coordinate at low cost, for example by merging · 2019-12-03T03:48:49.925Z · score: 1 (1 votes) · LW · GW

Thanks for the elaboration. You quoted Robin Hanson as saying

There are many other factors that influence coordination, after all; even perfect value matching is consistent with quite poor coordination.

My model says that this is about right. It generally takes a few more things for people to cooperate, such as common knowledge of perfect value matching, common knowledge of willingness to cooperate, and an understanding of the benefits of cooperation.

By assumption, AIs will become smarter than humans, which makes me think they will understand the benefits of cooperation better than we do. But this understanding won't be gained "all at once" but will instead be continuous with the past. This is essentially why I think cooperation will be easier in the future, but that it will more-or-less follow a gradual transition from our current trends (I think cooperation has been increasing globally in the last few centuries anyway, for similar reasons).

Abstracting away from the specific mechanism, as a more general argument, AI designers or evolution will (sooner or later) be able to explore a much larger region of mind design space than biological evolution could. Within this region there are bound to be minds much better at coordination than humans, and we should certainly expect coordination ability to be one objective that AI designers or evolution will optimize for since it offers a significant competitive advantage.

I agree that we will be able to search over a larger space of mind-design, and I also agree that this implies that it will be easier to find minds that cooperate.

I don't agree that cooperation necessarily allows you to have a greater competitive advantage. It's worth seeing why this is true in the case of evolution, as I think it carries over to the AI case. Naively, organisms that cooperate would always enjoy some advantages, since they would never have to fight for resources. However, this naive model ignores the fact that genes are selfish: if there is a way to reap the benefits of cooperation without having to pay the price of giving up resources, then organisms will pursue this strategy instead.

This is essentially the same argument that evolutionary game theorists have used to explain the evolution of aggression, as I understand it. Of course, there are some simplifying assumptions which could be worth disputing.

Comment by matthew-barnett on Strategic implications of AIs' ability to coordinate at low cost, for example by merging · 2019-12-03T01:52:27.671Z · score: 1 (1 votes) · LW · GW

For the Alignment Newsletter:

Summary: There are a number of differences between how humans cooperate and how hypothetical AI agents could cooperate, and these differences have important strategic implications for AI forecasting and safety. The first big implication is that AIs with explicit utility functions will be able to merge their values. This merging may have the effect of rending laws and norms obsolete, since large conflicts would no longer occur. The second big implication is that our approaches to AI safety should preserve the ability for AIs to cooperate. This is because if AIs *don't* have the ability to cooperate, they might not be as effective, as they will be outcompeted by factions who can cooperate better.

Opinion: My usual starting point for future forecasting is to assume that AI won't alter any long term trends, and then update from there on the evidence. Most technologies haven't disrupted centuries-long trends in conflict resolution, which makes me hesitant to accept the first implication. Here, I think the biggest weakness in the argument is the assumption that powerful AIs should be described as having explicit utility functions. I still think that cooperation will be easier in the future, but it probably won't follow a radical departure from past trends.

Comment by matthew-barnett on Misconceptions about continuous takeoff · 2019-12-03T01:50:08.189Z · score: 1 (1 votes) · LW · GW
I'm not sure this is an accurate characterization of the point; my understanding is that the concern largely comes from the possibility that the growth will be faster than exponential

Sure, if someone was arguing that, then they have a valid understanding of the difference between continuous vs. discontinuous takeoff. I would just question the assumption why we should expect growth to be faster than exponential for any sustained period of time.

Comment by matthew-barnett on Explaining why false ideas spread is more fun than why true ones do · 2019-11-26T18:22:40.036Z · score: 2 (2 votes) · LW · GW

I has the impression that when people write about the history of science, they tend to write about who discovered what, and how, but very little about what happened to the idea afterwards. I don't expect to find many chapters of history of science titled, "How scientific theories are spread through the world" but it's trivial to find an equivalent chapter on religion, or say, communism.

Comment by matthew-barnett on Do you get value out of contentless comments? · 2019-11-21T22:51:54.306Z · score: 39 (17 votes) · LW · GW

Great question!

Comment by matthew-barnett on Matthew Barnett's Shortform · 2019-11-18T17:02:30.032Z · score: 6 (4 votes) · LW · GW

Bertrand Russell's advice to future generations, from 1959

Interviewer: Suppose, Lord Russell, this film would be looked at by our descendants, like a Dead Sea scroll in a thousand years’ time. What would you think it’s worth telling that generation about the life you’ve lived and the lessons you’ve learned from it?
Russell: I should like to say two things, one intellectual and one moral. The intellectual thing I should want to say to them is this: When you are studying any matter or considering any philosophy, ask yourself only what are the facts and what is the truth that the facts bear out. Never let yourself be diverted either by what you wish to believe, or by what you think would have beneficent social effects if it were believed, but look only — and solely — at what are the facts. That is the intellectual thing that I should wish to say. The moral thing I should wish to say to them is very simple: I should say love is wise, hatred is foolish. In this world which is getting more and more closely interconnected, we have to learn to tolerate each other; we have to learn to put up with the fact that some people say things we don’t like. We can only live together in that way and if we are to live together and not die together, we must learn a kind of charity and a kind of tolerance, which is absolutely vital to the continuation of human life on this planet.
Comment by matthew-barnett on Creationism and Many-Worlds · 2019-11-15T06:01:44.568Z · score: 6 (4 votes) · LW · GW
It is generally accepted that there is no experimental evidence that could determine between macroscopic decoherence and the collapse postulate.

From the Stanford Encyclopedia of Philosophy,

It has frequently been claimed, e.g. by De Witt 1970, that the MWI is in principle indistinguishable from the ideal collapse theory. This is not so. The collapse leads to effects that do not exist if the MWI is the correct theory. To observe the collapse we would need a super technology which allows for the “undoing” of a quantum experiment, including a reversal of the detection process by macroscopic devices. See Lockwood 1989 (p. 223), Vaidman 1998 (p. 257), and other proposals in Deutsch 1986. These proposals are all for gedanken experiments that cannot be performed with current or any foreseeable future technology. Indeed, in these experiments an interference of different worlds has to be observed. Worlds are different when at least one macroscopic object is in macroscopically distinguishable states. Thus, what is needed is an interference experiment with a macroscopic body. Today there are interference experiments with larger and larger objects (e.g., fullerene molecules C70, see Brezger et al. 2002 ), but these objects are still not large enough to be considered “macroscopic”. Such experiments can only refine the constraints on the boundary where the collapse might take place. A decisive experiment should involve the interference of states which differ in a macroscopic number of degrees of freedom: an impossible task for today's technology. It can be argued, however, that the burden of an experimental proof lies with the opponents of the MWI, because it is they who claim that there is a new physics beyond the well tested Schrödinger equation.
Comment by matthew-barnett on Robin Hanson on the futurist focus on AI · 2019-11-15T03:28:02.118Z · score: 3 (2 votes) · LW · GW

Other than, say looking at our computers and comparing them to insects, what other signposts should we look for, if we want to calibrate progress towards domain-general artificial intelligence?

Comment by matthew-barnett on AI Alignment Open Thread October 2019 · 2019-11-05T04:29:15.284Z · score: 3 (2 votes) · LW · GW

I thought about your objection longer and realized that there are circumstances where we can expect the model to adversarially optimize against us. I think I've less changed my mind, and more clarified when I think these tools are useful. In the process, I also discovered that Chris Olah and Evan Hubinger seem to agree: naively using transparency tools can break down in the deception case.

Comment by matthew-barnett on Chris Olah’s views on AGI safety · 2019-11-05T00:27:41.524Z · score: 3 (2 votes) · LW · GW
I expect it to be more useful / accurate than the capabilities / alignment worldview.

To note, I sort of interpreted the capabilities/alignment tradeoff as more related to things that enhance capabilities while providing essentially no greater understanding. Increasing compute is the primary example I can think of.

Comment by matthew-barnett on Will transparency help catch deception? Perhaps not · 2019-11-05T00:10:09.472Z · score: 9 (3 votes) · LW · GW
Rather, the idea is that by using your transparency tools + overseer to guide your training process, you can prevent your training process from ever entering the regime where your model is trying to trick you. This is especially important in the context of gradient hacking (as I mention in that post)

Indeed. I re-read the post and I noticed that I hadn't realized how much of your reasoning applied directly as a reply to my argument.

Put another way: once you're playing the game where I can hand you any model and then you have to figure out whether it's deceptive or not, you've already lost.

For the record, I think this is probably false (unless the agent is much smarter than you). The analogy I had with GANs was an example of how this isn't necessarily true. If you just allowed an overseer to inspect the model, and trained that overseer end-to-end to detect deception, it would be very good at doing so. The main point of this post was not to give general pessimism about detecting deception, but to offer the idea that having a "tool" (as I defined it) for the overseer to use provides little more than a crutch, decreasing its effectiveness.

Now it is probably also true that monitoring the model's path through model-space during training would greatly enhance the safety guarantee. But I wouldn't go as far as saying that the alternative is doomed.

Comment by matthew-barnett on Two explanations for variation in human abilities · 2019-11-04T01:31:02.759Z · score: 0 (2 votes) · LW · GW

I should point out that the comment I made about being optimistic was a minor part of my post -- a footnote in fact. The reason why many people have trouble learning is because learning disorders are common and more generally people have numerous difficulties with grasping material (lack of focus, direction, motivation, getting distracted easily, aversion to the material, low curiosity, innumeracy etc.).

I suppose you can cast them as having 'broken parts', but I don't think that helps. They are people.

Having broken parts is not a moral judgement. In the next sentences I showed how I have similar difficulties.

Regardless, the main point of my first explanation was less of saying that everyone has a similar ability to learn, and more of saying that previous analysis didn't take into account that differences in abilities can be explained by accounting for differences in training. This is a very important point to make. Measuring performance on Go, Chess and other games like that misses the point -- and for the very reason I outlined.

ETA: Maybe it helps if we restrict ourselves to the group of people who are college educated and have no mental difficulties focusing on hard intellectual work. That way we can see more clearly the claim that "There don't really seem to be humans who tower above us in terms of their ability to soak up new information and process it."

Comment by matthew-barnett on AI Alignment Open Thread October 2019 · 2019-11-03T21:59:18.201Z · score: 3 (2 votes) · LW · GW

I was one of those people who you asked the skeptical question to, and I feel like I have a better reply now than I did at the time. In particular, your objection was

To generalize my question, what if something goes wrong, we peek inside and find out that it's one of the 10-15% of times when the model doesn't agree with the known-algorithm which is used to generate the penalty term?

I agree this is an issue, but at worst it puts a bound on how well we can inspect the neural network's behavior. In other words, it means something like, "Our model of what this neural network is doing is wrong X% of the time." This sounds bad, but X can also be quite low. Perhaps more importantly though, we shouldn't expect by default that in the X% of times where our guess is bad, that the neural network is adversarially optimizing against us.

The errors that we make are potentially neutral errors, meaning that the AI could be doing something either bad or good in those intervals, but probably nothing purposely catastrophic. We can strengthen this condition by using adversarial training to purposely search for interpretations that would prioritize exposing catastrophic planning.

ETA: This is essentially why engineers don't need to employ quantum mechanics to argue that their designs are safe. The normal models that are less computationally demanding might be less accurate, but by default engineers don't think that their bridge is going to adversarially optimize for the (small) X% where predictions disagree. There is of course a lot of stuff to be said about when this assumption does not apply to AI designs.

Comment by matthew-barnett on Rohin Shah on reasons for AI optimism · 2019-11-01T18:35:21.461Z · score: 5 (3 votes) · LW · GW
However, as far as I can tell, there's no evidence provided for the second claim.

I don't know how much this will mean, but the eighth chapter of Superintelligence is titled, "Is the default outcome doom?" which is highly suggestive of a higher than 10% chance of catastrophe. Of course, that was 2014, so the field has moved on...

Comment by matthew-barnett on What's your big idea? · 2019-10-31T22:02:19.038Z · score: 1 (1 votes) · LW · GW

The world population is larger than it used to be, and far more capable people are able to go to college and grad school than before. I would assume that there are many von Neumanns running around, and in fact there are probably people who are even better running around too.

Comment by matthew-barnett on What's your big idea? · 2019-10-31T03:41:02.667Z · score: 2 (2 votes) · LW · GW

I like this idea, but I'm still pretty negative about the entire idea of college as a job-training experience, and I'm worried that this proposal doesn't really address what I see as a key concern with that framework.

I agree with Bryan Caplan that the reason why people go to college is mainly to signal their abilities. However, it's an expensive signal -- one that could be better served by just getting a job and using the job to signal to future employers instead. Plus, then there would be fewer costs on the individual if they did that, and less 'exploitation' via lock-in (which is what this proposed system kind of does).

The reason why people don't just get a job out of high school is still unclear to me, but this is perhaps the best explanation: employers are just very skeptical of anyone without a college degree being high quality, which makes getting one nearly a requirement even for entry level roles. Hypothetically however, if you can pass the initial barrier and get 4 years worth of working experience right out of high school, that would be better than going to college (from a job-training and perhaps even signaling perspective).

Unfortunately, this proposal, while much better than the status quo, would perpetuate the notion that colleges are job-training grounds. In my opinion, this notion wastes nearly everyone's time. I think the right thing to do might be to combine this proposal with Caplan's by just eliminating subsidies to college.

Comment by matthew-barnett on Prediction markets for internet points? · 2019-10-28T00:00:02.159Z · score: 1 (1 votes) · LW · GW
($1k+ per question)

I think realistically, the expected value is much lower than $1k per question (on Predictit, for instance), unless you can beat the market by a very substantial margin.

Comment by matthew-barnett on An Increasingly Manipulative Newsfeed · 2019-10-27T20:49:27.458Z · score: 1 (1 votes) · LW · GW

For the Alignment Newsletter:


An early argument for specialized AI safety work is that misaligned systems will be incentivized to lie about their intentions while weak, so that they aren't modified. Then, when the misaligned AIs are safe from modification, they will become dangerous. Ben Goertzel found the argument unlikely, pointing out that weak systems won't be good at deception. This post asserts that weak systems can still be manipulative, and gives a concrete example. The argument is based on a machine learning system trained to maximize the number of articles that users label "unbiased" in their newsfeed. One way it can start being deceptive is by seeding users with a few very biased articles. Pursuing this strategy may cause users to label everything else unbiased, as it has altered their reference for evaluation. The system is therefore incentivized to be dishonest without necessarily being capable of pure deception.


While I appreciate and agree with the thesis of this post -- that machine learning models don't have to be extremely competent to be manipulative -- I would still prefer a different example to convince skeptical researchers. I suspect many people would reply that we could easily patch the issue without doing dedicated safety work. In particular, it is difficult to see how this strategy arises if we train the system via supervised learning rather than training it to maximize the number of articles users label unbiased (which requires RL).

Comment by matthew-barnett on Why so much variance in human intelligence? · 2019-10-26T05:21:39.653Z · score: 13 (3 votes) · LW · GW

I have my own (unoriginal) answer, outlined in this post.

In short, I think it's important to distinguish learning ability and competence. The reason why Magnus Carlsen towers above me in chess is because he has more practice, not because he's superintelligent.

However, I also think that even putting this distinction aside, we shouldn't be surprised by large variation. If we think of the brain like a machine, and cognitive deficits as parts being broken parts in the machine, then it is easy to see how distributed cognitive deficits can lead to wide variation.

Comment by matthew-barnett on An Increasingly Manipulative Newsfeed · 2019-10-26T03:55:46.978Z · score: 1 (1 votes) · LW · GW
Mostly I would expect such a system to overfit on the training data, and perform no better than chance when tested.
Reinforcement learning, meanwhile, will indeed become manipulative (in my expectation).

I'm confused why reinforcement learning would be well suited for the task, if it doesn't work at all in the supervised learning case.

Comment by matthew-barnett on Two explanations for variation in human abilities · 2019-10-26T01:50:39.919Z · score: 3 (3 votes) · LW · GW

Thanks for the information. My understanding was just based on my own experience, which is probably biased. I assume that most people can read headlines and parse text. I find it hard to believe that only 20 to 30 percent of people can read, given that more than 30 percent of people are on social media, which requires reading things (and usually responding to it in a coherent way).

I'm not sure why you are so optimistic about people learning calculus.

My own experience is that learning calculus isn't that much more difficult than learning a lot of other skills, including literacy. We start learning how to read while young and learn calculus later, which makes us think reading is easier. However, I think hypothetically, we could push the number of people who learn calculus to similar levels as reading (which, as you note, might still be limited to a basic form for most people). :)

Comment by matthew-barnett on What are some unpopular (non-normative) opinions that you hold? · 2019-10-25T00:55:20.289Z · score: 2 (2 votes) · LW · GW

I haven't read the paper, you're right. I didn't mean my own comment as a counterargument.

Comment by matthew-barnett on The Best Textbooks on Every Subject · 2019-10-24T23:27:51.863Z · score: 1 (1 votes) · LW · GW
[The Deep Learning Book] takes a purely frequentist perspective

Are you sure? I haven't read too much of it (though I read some from time to time), but it seems solidly agnostic about the debate. What do you think the book lacks that would be found in an equivalent Bayesian textbook?

Comment by matthew-barnett on What are some unpopular (non-normative) opinions that you hold? · 2019-10-24T22:45:02.004Z · score: 1 (1 votes) · LW · GW
BTW, if you can refute it and get published there is a $100k prize waiting!

Just so you know, whenever I hear that there's prize money for refuting a conspiracy theory, I immediately lower my probability that the conspiracy theory is true. I've encountered numerous such prizes from conspiracy theories in the past, and the general pattern I have seen is that the prize is being offered disingenuously, since the person offering it will never concede. I've (perhaps unconsciously) labeled anyone unaware of this pattern as either (1) purposely disingenuous, or (2) not very smart about convincing people of true things. Both (1) and (2) are evidence of a failure in their reasoning (but obviously this argument isn't airtight).

Comment by matthew-barnett on Artificial general intelligence is here, and it's useless · 2019-10-23T22:30:39.126Z · score: 6 (4 votes) · LW · GW

One reason why artificial intelligence might be more useful than a human for some service is because artificial intelligence is software, and therefore you can copy-paste it for every service that we might want in an industry.

Recruiting and training humans takes time, whereas if you already have an ML model that performs well on a given task, you only need to acquire the relevant hardware to run the model. If hardware is cheap enough, I can see how using artificial intelligence could be much cheaper than spending money on {training + recruiting + wages} for a human. Automation in jobs such as audio transcription exemplify this trend -- although I think the curve for automation is smooth as the software services require continuously less supervision over time as they improve.

Comment by matthew-barnett on What are some unpopular (non-normative) opinions that you hold? · 2019-10-23T19:11:23.862Z · score: 8 (2 votes) · LW · GW

Normative beliefs are ambiguous unless we have a shared, concrete understanding of what constitutes "good", or "progress" in this case. I suspect my understanding of progress diverges from Stuart's to a large extent.

Stability might be less ambiguous, but I wish it was operationalized. I agree with Hanson that value talk is usually purposely ambiguous because it's not about expressing probability distributions, but rather about implying a stance and signaling an attitude.

Comment by matthew-barnett on What are some unpopular (non-normative) opinions that you hold? · 2019-10-23T16:32:21.945Z · score: 8 (3 votes) · LW · GW

I'm just confused because the post specifically said non-normative, and this is clearly normative.

Comment by matthew-barnett on What are some unpopular (non-normative) opinions that you hold? · 2019-10-23T16:28:06.421Z · score: 1 (1 votes) · LW · GW
Democracy - good or bad?

Downvoted because the question specified non-normative opinions.

Comment by matthew-barnett on Open & Welcome Thread - October 2019 · 2019-10-21T23:47:49.057Z · score: 3 (2 votes) · LW · GW

This post has a discussion of every major alignment organization, and summarizes their mission to some extent.

Comment by matthew-barnett on Relaxed adversarial training for inner alignment · 2019-10-19T21:35:46.731Z · score: 1 (1 votes) · LW · GW

I'm not Rohin, but I think there's a tendency to reply to things you disagree with rather than things you agree with. That would explain my emphasis anyway.

Comment by matthew-barnett on Relaxed adversarial training for inner alignment · 2019-10-18T02:14:40.907Z · score: 1 (1 votes) · LW · GW

For the Alignment Newsletter:


Previously, Paul Christiano proposed creating an adversary to search for inputs that would make a powerful model behave "unacceptably" and then penalizing the model accordingly. To make the adversary's job easier, Paul relaxed the problem so that it only needed to find a pseudo-input, which can be thought of as predicate that constrains possible inputs. This post expands on Paul's proposal by first defining a formal unacceptability penalty and then analyzing a number of scenarios in light of this framework. The penalty relies on the idea of an amplified model inspecting an unamplified version of itself. For this procedure to work, amplified overseers must be able to correctly deduce whether potential inputs will yield unacceptable behavior in their unamplified selves, which seems plausible since it should know everything the unamplified version does. The post concludes by arguing that progress in model transparency is key to these acceptability guarantees. In particular, Evan emphasizes the need to decompose models into the parts involved in their internal optimization processes, such as their world models, optimization procedures, and objectives.


I agree that transparency is an important condition for the adversary, since it would be hard to search for catastrophe-inducing inputs without details of how the model operated. I'm less certain that this particular decomposition of machine learning models is necessary. More generally, I am excited to see how adversarial training can help with inner alignment.

Comment by matthew-barnett on Misconceptions about continuous takeoff · 2019-10-15T03:57:24.556Z · score: 2 (2 votes) · LW · GW
AlphaGo seems much closer to "one project leaps forward by a huge margin."

I don't have the data on hand, but my impression was that AlphaGo indeed represented a discontinuity in the domain of Go. It's difficult to say why this happened, but my best guess is that DeepMind invested a lot more money into solving Go than any competing actor at the time. Therefore, the discontinuity may have followed straightforwardly from a background discontinuity in attention paid to the task.

If this hypothesis is true, I don't find it compelling that AlphaGo is evidence for a discontinuity for AGI, since such funding gaps are likely to be much smaller for economically useful systems.

Comment by matthew-barnett on Misconceptions about continuous takeoff · 2019-10-14T06:30:13.411Z · score: 1 (1 votes) · LW · GW
My intuition is that understanding human values is a hard problem, but taking over the world is a harder problem.

Especially because taking over the world requires you to be much better than other agents who want to stop you from taking over the world, which could very well include other AIs.

ETA: That said, upon reflection, there have been instances of people taking over large parts of the world without being superhuman. All world leaders qualify, and it isn't that unusual. However, what would be unusual is if someone wanted to take over the world and everyone else didn't want that yet it still happened.

Comment by matthew-barnett on Misconceptions about continuous takeoff · 2019-10-12T21:54:30.160Z · score: 1 (1 votes) · LW · GW

I agree with this, but when I said deployment I meant deployment of a single system, not several.

Comment by matthew-barnett on Misconceptions about continuous takeoff · 2019-10-12T21:53:05.748Z · score: 1 (1 votes) · LW · GW

I'm confused about why (1) and (2) are separate scenarios then. Perhaps because in (2) there's a lot of different types of AIs?

Comment by matthew-barnett on AI Safety "Success Stories" · 2019-10-11T19:07:14.067Z · score: 1 (1 votes) · LW · GW

For the alignment newsletter:

Planned summary: It is difficult to measure the usefulness of various alignment approaches without clearly understanding what type of future they end up being useful for. This post collects "Success Stories" for AI -- disjunctive scenarios in which alignment approaches are leveraged to ensure a positive future. Whether these scenarios come to pass will depend critically on background assumptions, such as whether we can achieve global coordination, or solve the most ambitious safety issues. Mapping these success stories can help us prioritize research.

Planned opinion: This post does not exhaust the possible success stories, but it gets us a lot closer to being able to look at a particular approach and ask, "Where exactly does this help us?" My guess is that most research ends up being only minimally helpful for the long run, and so I consider inquiry like this to be very useful for cause prioritization.

Comment by matthew-barnett on Misconceptions about continuous takeoff · 2019-10-11T01:49:09.853Z · score: 1 (1 votes) · LW · GW

I'm finding it hard to see how we could get (1) without some discontinuity?

When I think about why (1) would be true, the argument that comes to mind is that single AI systems will be extremely expensive to deploy, which means that only a few very rich entities could own them. However, this would contradict the general trend of ML being hard to train and easy to deploy. Unlike, say nukes, once you've trained your AI you can create a lot of copies and distribute them widely.

Comment by matthew-barnett on Misconceptions about continuous takeoff · 2019-10-11T01:39:13.249Z · score: 1 (1 votes) · LW · GW

It's worth noting that I wasn't using it as evidence "for" continuous takeoff. It was instead an example of something which experienced a continuous takeoff that nonetheless was quick relative to the lifespan of a human.

It's hard to argue that it wasn't continuous under my definition, since the papers got gradually and predictably better. Perhaps there was an initial discontinuity in 2014 when it first became a target? Regardless, I'm not arguing that this is a good model for AGI development.

Comment by matthew-barnett on Misconceptions about continuous takeoff · 2019-10-09T19:21:46.006Z · score: 2 (2 votes) · LW · GW
It might just join the ranks of "projects from before", and subtly try to alter future systems to be similarly defective, waiting for a future opportunity to strike.

Admittedly, I did not explain this point well enough. What I meant to say was that before we have the first successful defection, we'll have some failed defection. If the system could indefinitely hide its own private intentions to later defect, then I would already consider that to be a 'successful defection.'

Knowing about a failed defection, we'll learn from our mistake and patch that for future systems. To be clear, I'm definitely not endorsing this as a normative standard for safety.

I agree with the rest of your comment.

Comment by matthew-barnett on Who lacks the qualia of consciousness? · 2019-10-06T22:25:35.987Z · score: 2 (2 votes) · LW · GW

When I think about what I find morally valuable about consciousness, I tend to think about rich experiences which are negative/positive, from the way that I rate them internally. An example of a negative value conscious experience is a sharp pain associated with being pricked by a pin. An example of a valuable conscious experience is the sensation of warmth associated with sitting near a fire during a cold winter day, together with the way that my brain processes the situation and enjoys it.

These things appears to me subtlety distinct from the feeling of inner awareness called 'consciousness' in this post.

Comment by matthew-barnett on Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More · 2019-10-06T05:45:15.412Z · score: 1 (1 votes) · LW · GW

once CO2 gets high enough, it starts impacting human cognition.

Do you have a citation for this being a big deal? I'm really curious whether this is a major harm over reasonable timescales (such as 100 years), as I don't recall ever hearing about it in an EA analysis of climate change. That said, I haven't looked very hard.

Comment by matthew-barnett on Concrete experiments in inner alignment · 2019-10-04T05:58:29.037Z · score: 2 (2 votes) · LW · GW

Planned summary:

This post lays out several experiments that could clarify the inner alignment problem: the problem of how to get an ML model to be robustly aligned with the objective function it was trained with. One example experiment is giving an RL trained agent direct access to its reward as part of its observation. During testing, we could try putting the model in a confusing situation by altering its observed reward so that it doesn't match the real one. The hope is that we could gain insight into when RL trained agents internally represent 'goals' and how they relate to the environment, if they do at all. You'll have to read the post to see all the experiments.

Planned opinion:

I'm currently convinced that doing empirical work right now will help us understand mesa optimization, and this was one of the posts that lead me to that conclusion. I'm still a bit skeptical that current techniques are sufficient to demonstrate the type of powerful learned search algorithms which could characterize the worst outcomes for failures in inner alignment. Regardless, I think at this point classifying failure modes is quite beneficial, and conducting tests like the ones in this post will make that a lot easier.

Comment by matthew-barnett on Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More · 2019-10-04T05:30:41.865Z · score: 14 (8 votes) · LW · GW

It seems to me that the intention is that solar radiation management is a solution that sounds good without actually being good. That is, it's an easy sell for fossil fuel corporations who have an interest in providing simple solutions to the problem rather than actually removing the root cause and thus solving the issue completely. I have little idea if this argument is actually true.

Comment by matthew-barnett on FB/Discord Style Reacts · 2019-10-03T20:37:02.746Z · score: 10 (3 votes) · LW · GW
If they leave an "unclear" react, I can't ignore that nearly as easily -- wait, which point was unclear? What are other people potentially missing that I meant to convey? Come back, anon!

Maybe there should be an option that allows you to highlight a part of the comment and react to that part in particular.

Comment by matthew-barnett on World State is the Wrong Level of Abstraction for Impact · 2019-10-02T03:51:42.669Z · score: 5 (3 votes) · LW · GW
So my feeling is that in order to actually implement an AI that does not cause bad kinds of high impact, we would need to make progress on value learning

Optimizing for a 'slightly off' utility function might be catastrophic, and therefore the margin for error for value learning could be narrow. However, it seems plausible that if your impact measurement used slightly incorrect utility functions to define the auxiliary set, this would not cause a similar error. Thus, it seems intuitive to me that you would need less progress on value learning than a full solution for impact measures to work.

From the AUP paper,

one of our key findings is that AUP tends to preserve the ability to optimize the correct reward function even when the correct reward function is not included in the auxiliary set.
Comment by matthew-barnett on A simple environment for showing mesa misalignment · 2019-09-28T21:22:36.315Z · score: 1 (1 votes) · LW · GW

Thanks. :)