The "Backchaining to Local Search" Technique in AI Alignment 2020-09-18T15:05:02.944Z · score: 20 (6 votes)
Universality Unwrapped 2020-08-21T18:53:25.876Z · score: 24 (8 votes)
Goal-Directedness: What Success Looks Like 2020-08-16T18:33:28.714Z · score: 9 (3 votes)
Mapping Out Alignment 2020-08-15T01:02:31.489Z · score: 42 (11 votes)
Will OpenAI's work unintentionally increase existential risks related to AI? 2020-08-11T18:16:56.414Z · score: 56 (25 votes)
Analyzing the Problem GPT-3 is Trying to Solve 2020-08-06T21:58:56.163Z · score: 16 (7 votes)
What are the most important papers/post/resources to read to understand more of GPT-3? 2020-08-02T20:53:30.913Z · score: 25 (12 votes)
What are you looking for in a Less Wrong post? 2020-08-01T18:00:04.738Z · score: 27 (13 votes)
Dealing with Curiosity-Stoppers 2020-07-30T22:05:02.668Z · score: 48 (16 votes)
adamShimi's Shortform 2020-07-22T19:19:27.622Z · score: 4 (1 votes)
The 8 Techniques to Tolerify the Dark World 2020-07-20T00:58:04.621Z · score: 2 (12 votes)
Locality of goals 2020-06-22T21:56:01.428Z · score: 15 (6 votes)
Goal-directedness is behavioral, not structural 2020-06-08T23:05:30.422Z · score: 7 (4 votes)
Focus: you are allowed to be bad at accomplishing your goals 2020-06-03T21:04:29.151Z · score: 20 (10 votes)
Lessons from Isaac: Pitfalls of Reason 2020-05-08T20:44:35.902Z · score: 10 (4 votes)
My Functor is Rich! 2020-03-18T18:58:39.002Z · score: 10 (5 votes)
Welcome to the Haskell Jungle 2020-03-18T18:58:18.083Z · score: 14 (8 votes)
Lessons from Isaac: Poor Little Robbie 2020-03-14T17:14:56.438Z · score: 1 (6 votes)
Where's the Turing Machine? A step towards Ontology Identification 2020-02-26T17:10:53.054Z · score: 18 (5 votes)
Goal-directed = Model-based RL? 2020-02-20T19:13:51.342Z · score: 21 (8 votes)


Comment by adamshimi on The Solomonoff Prior is Malign · 2020-10-22T11:46:07.380Z · score: 2 (2 votes) · LW · GW

Okay, it's probably subtler than that.

I think you're hinting at things like the expanding moral circle. And according to that, there's no reason that I should care more about people in my universe than people in other universes. I think this makes sense when saying whether I should care. But the analogy with "caring about people in a third world country on the other side of the world" breaks down when we consider our means to influence these other universes. Being able to influence the Solomonoff prior seems like a very indirect way to alter another universe, on which I have very little information. That's different from buying Malaria nets.

So even if you're altruistic, I doubt that "other universes" would be high in your priority list.

The best argument I can find for why you would want to influence the prior is if it is a way to influence the simulation of your own universe, à la gradient hacking.

Comment by adamshimi on AGI safety from first principles: Goals and Agency · 2020-10-22T11:35:53.687Z · score: 1 (1 votes) · LW · GW

Thanks for the answers!

We should categorise things as goal-directed agents if it scores highly on most of these criteria, not just if it scores perfectly on all of them. So I agree that you don't need one goal forever, but you do need it for more than a few minutes. And internal unification also means that the whole system is working towards this.

If coherence is about having the same goal for a "long enough" period of time, then it makes sense to me.

By "sensitive" I merely mean that differences in expected long-term or large-scale outcomes sometimes lead to differences in current choices.

So the think that judges outcomes in the goal-directed agent is "not always privileging short-term outcomes"? Then I guess it's also a scale, because there's a big difference between a system that has one case where it privileges long-term outcomes over short-term ones, and a system that focuses on long-term outcomes.

Yeah, I think there's still much more to be done to make this clearer. I guess my criticism of mesa-optimisers was that they talked about explicit representation of the objective function (whatever that means). Whereas I think my definition relies more on the values of choices being represented. Idk how much of an improvement this is.

I agree that the explicit representation of the objective is weird. But on the other hand, it's an explicit and obvious weirdness, that either calls for clarification or changes. Whereas in your criteria, I feel that essentially the same idea is made implicit/less weird, without actually bringing a better solution. Your approach might be better in the long run, possible because rephrasing the question in these terms lets us find a non weird way to define this objective.

I just wanted to point out that in our current state of knowledge, I feel like there are drawbacks in "hiding" the weirdness like you do.

I don't really know what it means for something to be a utility function. I assume you could interpret it that way, but my definition of goals also includes deontological goals, which would make that interpretation harder. I like the "equivalence classes" thing more, but I'm not confident enough about the space of all possible internal concepts to claim that it's always a good fit.

One idea I had for defining goals is as a temporal logic property (for example in LTL) on states. That lets you express things like "I want to reach one of these states" or "I never want to reach this state"; the latter looks like a deontological proprety to me. Thinking some more about this led me see two issues:

  • First, it doesn't let you encode preferences of some state over another. That might be solvable by adding an partial order with nice properties, like Stuart Armstrong's partial preferences.
  • Second, the system doesn't have access to the states of the world, it has access to its abstractions of those states. Here we go back to the equivalence classes idea. Maybe a way to cash in your internal abstractions and Paul's ascriptions of beliefs is through an equivalence relation on the states of the world, such that the goal of the system is defined on the equivalence classes for this relation.

I expect that asking "what properties do these utility functions have" will be generally more misleading than asking "what properties do these goals have", because the former gives you an illusion of mathematical transparency. My tentative answer to the latter question is that, due to Moravec's paradox, they will have the properties of high-level human thought more than they have the properties of low-level human thought. But I'm still pretty confused about this.

Agreed that the first step should be the properties of goals. I just also believe that if you get some nice properties of goals, you might know what constraints to add to utility functions to make them more "goal-like".

Your last sentence seems contradictory with what you wrote about Dennett. Like I understand it as you saying "goals would be like high level human goals", while your criticism of Dennett was that the intentional stance doesn't necessarily works on NNs because they don't have to have the same kind of goals than us. Am I wrong about one of those opinions?

Comment by adamshimi on Things are allowed to be good and bad at the same time · 2020-10-20T20:30:17.568Z · score: 3 (2 votes) · LW · GW

The specific examples in your quote remind me of my idea of curiosity stoppers from fear of psychological pain and guilt. Maybe wanting things to be either good or bad (me being curious about them or them being horrible) is an underlying cause of this specific curiosity stopper.

Comment by adamshimi on The Solomonoff Prior is Malign · 2020-10-20T20:26:47.502Z · score: 3 (3 votes) · LW · GW

I like this post, which summarizes other posts I wanted to read for a long time.

Yet I'm still confused by a fairly basic point: why would the agents inside the prior care about our universe? Like, I have preferences, and I don't really care about other universes. Is it because we're running their universe, and thus they can influence their own universe through ours? Or is there another reason why they are incentivized to care about universes which are not causally related to theirs?

Comment by adamshimi on Problems Involving Abstraction? · 2020-10-20T19:07:39.945Z · score: 5 (3 votes) · LW · GW

One we already talked about together is the problem of defining the locality of goals. From an abstraction point of view, local goals (goals about inputs) and non-local goals (goals about properties of the world) are both abstractions: they throw away information. But with completely different results!

Comment by adamshimi on AGI safety from first principles: Goals and Agency · 2020-10-20T14:39:22.679Z · score: 2 (2 votes) · LW · GW

1. AI systems which pursue goals are also known as mesa-optimisers, as coined in Hubinger et al’s paper _Risks from Learned Optimisation in Advanced Machine Learning Systems.

Nitpicky, but I think it would be nice to write explicitly that here the AI systems are learned, because the standard definition of mesa-optimizers is of optimized optimizers. Also, I think it would be better to explicitly say that mesa-optimizers are optimizers. Given your criteria of goal-directed agency, that's implicit, but at this point the criteria are not yet stated.

Meanwhile, Dennett argues that taking the intentional stance towards systems can be useful for making predictions about them - but this only works given prior knowledge about what goals they’re most likely to have. Predicting the behaviour of a trillion-parameter neural network is very different from applying the intentional stance to existing artifacts. And while we do have an intuitive understanding of complex human goals and how they translate to behaviour, the extent to which it’s reasonable to extend those beliefs about goal-directed cognition to artificial intelligences is the very question we need a theory of agency to answer. So while Dennett’s framework provides some valuable insights - in particular, that assigning agency to a system is a modelling choice which only applies at certain levels of abstraction - I think it fails to reduce agency to simpler and more tractable concepts.

I agree with you that the intentional stance requires some assumption about the goals of the system you’re applying it too. But I disagree on the fact that this makes it very hard to apply the intentional stance to, let’s say neural networks. That’s because I think that goals have some special structure (being compressed for example), which means that there’s not that many different goals. So the intentional stance does reduce goal-directedness to simpler concepts like goals, and gives additional intuitions on them.

That being said, I also have issues with the intentional stance. Most problematic is the fact that it doesn’t give you a way to compute the goal-directedness of a system.

About your criteria, I have a couple of questions/observations.

  • Combining 1,2 and 3 seems to yield an optimizer in disguise: something that plans according to some utility/objective, in an embedded way. The change with mesa-optimizers (or simply optimizers) is that you treat separately the ingredients of optimization, but it still has the same problem of needing an objective it can use (for point 3).
  • About 4, I think I see where you’re aiming at (having long-term goals), but I’m confused by the way it is written. It depends on the objective/utility from 3, but it’s not clear what sensitive means for an objective. Do you mean that the objective values more long-term plans? That it doesn’t discount with length of plans? Or instead something more like the expanding moral circle, where the AI has an objective that treats equally near-future and far-future, and near and far things?
  • Also about 5, coherent goals (in the sense of goals that don’t change) is  a very dangerous case, but I’m not convinced that goal-directed agents must have one goal forever.
  • I agree completely about 6. It’s very close to the distinction between habitual behavior and goal-directed behavior in psychology.

On the examples of lacking 2, I feel like the ones you’re giving could be goal-directed. For example limiting the actions or context doesn’t necessarily ensure the lack of goal-directedness, it is more about making a deceptive plan harder to pull off.

Your definition of goals looks like a more constrained utility functions, defined on equivalence classes of states/outcomes as abstracted by the agent’s internal concepts. Is it correct? If so, do you have an idea of what specific properties such utility functions could have as a consequence. I'm interested in that, because I would really like a way to define a goal as a behavioral objective satisfying some structural constraints.

Comment by adamshimi on Knowledge, manipulation, and free will · 2020-10-19T07:23:01.183Z · score: 1 (1 votes) · LW · GW

When do you plan on posting this? I'm interested in reading it

Comment by adamshimi on Knowledge, manipulation, and free will · 2020-10-18T11:48:47.322Z · score: 3 (2 votes) · LW · GW

I was slightly confused by the beginning of the post, but by the end I was on board with the questions asked and the problems posed.

On impacts measures, there's already some discussions in this comment thread, but I'll put some more thoughts about that here. My first reaction to reading the last section was to think of attainable utility: non-manipulation as preservation of attainable utility. Sitting on this idea, I'm not sure this works as a non-manipulation condition, since it lets the AI manipulate us into having what we want. There should be no risk of it changing our utility, since that's a big change in attainable utility; but still, we might not want to be manipulated even for our own good (like some people's reactions to nudges).

Maybe there can be an alternative version of attainable utility, something like "attainable choice", which ensures that other agents (us included) are still able to make choices. Or to put it in terms of free will, that these agents choices are still primarily determined by internal causes, so by them, instead of primarily determined by external causes like the AI.

We can even imagining integrating attainable utility and attainable choice together (by weighting them for example), so that manipulation is avoided in a lot of cases, but the AI still manipulates Petrov to not report if not reporting saves the world (because it maintains attainable utility). So it solves the issue mentioned in this comment thread.

Comment by adamshimi on Toy Problem: Detective Story Alignment · 2020-10-18T11:29:39.668Z · score: 3 (2 votes) · LW · GW

That looks like a fun problem.

It also makes me think of getting from maps (in your sense of abstraction) to the original variable. Because the simple model has a basic abstraction, a map built from throwing a lot of information; and you want to build a better map. So you're asking how to decide what to add to the map to improve it, without having access to the initial variable. Or maybe differently, you have a bunch of simple abstractions (from the simple models), a bunch of more complex abstractions (from the complex model), and given a simple abstraction, you want to find the best complex abstraction that "improve" the simple abstraction. Is that correct?

For a solution, here's a possible idea if the complex model is also a cluster model, and both models are trained on the same data: look for the learned cluster in the complex model with the bigger intersection with the detective story cluster in the simple model. Obvious difficulties are "how do you compute that intersection?", and for more general complex models (like GPT-N), "how to even find the clusters?" Still, I would say that this idea satisfies your requirement of improving the concept if the initial one is good enough (most of the "true" cluster lies in the simple model's cluster/most of the simple model's cluster lies in the "true" cluster). 

Comment by adamshimi on Babble & Prune Thoughts · 2020-10-16T11:53:47.357Z · score: 12 (3 votes) · LW · GW

Nice post!

As someone slightly annoyed by epistemic status, I felt that your argument in favor of them was pretty convincing.

For the discussion of replacing guilt and standards, the "Confidence all the way up" post also seems relevant.

My main point here is that improving babble doesn't mean reducing prune. Alkjash sometimes speaks as if it's just a matter of opening the floodgates. Sometimes people do need to just relax, turn off their prune, and open the floodgates. But if you try to do this in general, you might have initial success but then experience backlash, since you may have failed to address the underlying reasons why you had closed the gates to begin with.

I think that depends on your personality, and where you're at in your life. By default, I'm very good at babble (intuition you would say), but my prune was initially weak. Every maths teacher I had before the age of 20 was basically "you got good intuition, but you need to stop following the first idea that comes to your mind". So I need to improve my prune. But I know others who have trouble babbling. Maybe that was the case for Alkjash.

Last thing: every time I read about babble and prune, I think about this quote from Goro Shimura, on his friend the mathematician Yutaka Taniyama:

Taniyama was not a very careful person as a mathematician. He made a lot of mistakes, but he made mistakes in a good direction, and so eventually, he got right answers, and I tried to imitate him, but I found out that it is very difficult to make good mistakes.

Comment by adamshimi on Conditions for Mesa-Optimization · 2020-10-15T15:26:55.770Z · score: 5 (3 votes) · LW · GW

I think there is a typo in your formula, because the number of bits you get is negative. Going back to Yudkowsky's post, I think the correct formula (using your approximations of sizes) is , or  to be closer to the entropy notation.

Comment by adamshimi on AGI safety from first principles: Introduction · 2020-10-03T12:49:34.798Z · score: 6 (4 votes) · LW · GW

I'm excited about this sequence!

Just a question: what audience do you have in mind? Is it a sequence for newcomers to AI Safety, or more a reframing of AI Safety arguments for researchers?

Comment by adamshimi on AGI safety from first principles: Superintelligence · 2020-10-03T12:48:04.519Z · score: 1 (1 votes) · LW · GW

Nice post!

In order to understand superintelligence, we should first characterise what we mean by intelligence. Legg’s well-known definition identifies intelligence as the ability to do well on a broad range of cognitive tasks.[1] However, this combines two attributes which I want to keep separate for the purposes of this report: the ability to understand how to perform a task, and the motivation to actually apply that ability to do well at the task. So I’ll define intelligence as the former, which is more in line with common usage, and discuss the latter in the next section.

I like this split into two components, mostly because it fits with my intuition that goal-directedness (what I assume the second component is) is separated from competence in principle. Looking only at the behavior, there's probably a minimal level of competence necessary to detect goal-directedness. But if I remember correctly, you defend a definition of goal-directedness that also depends on the internal structure, so that might not be an issue here.

Because of the ease and usefulness of duplicating an AGI, I think that collective AGIs should be our default expectation for how superintelligence will be deployed.

I am okay with assuming collective AGIs instead of single AGIs, but what does it change in terms of technical AI Safety?

Even a superintelligent AGI would have a hard time significantly improving its cognition by modifying its neural weights directly; it seems analogous to making a human more intelligent via brain surgery (albeit with much more precise tools than we have today)

Although I agree with your general point that self-modification will probably come out of self-retraining, I don't think I agree with the paragraph quoted. The main difference I see is that an AI built from... let say neural networks, has access to exactly every neurons making itself. It might not be able to study all of them at once, but that's still a big difference from measures in a functioning brain which are far less precise AFAIK. I think this entails that before AGI, ML researchers will go farther into understanding how NN works than neuroscientist will go for brains, and after AGI arrives the AI can take the lead.

So it’s probably more accurate to think about self-modification as the process of an AGI modifying its high-level architecture or training regime, then putting itself through significantly more training. This is very similar to how we create new AIs today, except with humans playing a much smaller role.

It also feels very similar to how humans systemically improve at something: make a study or practice plan, and then train according to it. 

Comment by adamshimi on Draft report on AI timelines · 2020-09-19T00:20:03.832Z · score: 2 (2 votes) · LW · GW
I'd prefer if people didn't share it widely in a low-bandwidth way (e.g., just posting key graphics on Facebook or Twitter) since the conclusions don't reflect Open Phil's "institutional view" yet, and there may well be some errors in the report.

Isn't that in contradiction with posting it to LW (by crossposting)? I mean, it's in free access for everyone, so anyone that wants to share it can find it.

Comment by adamshimi on Matt Botvinick on the spontaneous emergence of learning algorithms · 2020-08-24T18:12:37.637Z · score: 3 (2 votes) · LW · GW
In practice I find that anything I say tends to lose its nuance as it spreads, so I've moved towards saying fewer things that require nuance. If I said "X might be a good resource to learn from but I don't really know", I would only be a little surprised to hear a complaint in the future of the form "I deeply read X for two months because Rohin recommended it, but I still can't understand this deep RL paper".

Hum, I did not think about that. It makes more sense to me now why you don't want to point people towards specific things. I still believe the result will be net positive if the right caveat are in place (then it's the other's fault for misinterpreting your comment), but that's indeed assuming that the resource/concept is good/important and you're confident in that.

Comment by adamshimi on Matt Botvinick on the spontaneous emergence of learning algorithms · 2020-08-24T12:31:42.432Z · score: 3 (2 votes) · LW · GW

The solution is clear: someone needs to create an Evan bot that will comment on every post of the AF related to mesa-optimization, by providing the right pointers to the paper.

Comment by adamshimi on Matt Botvinick on the spontaneous emergence of learning algorithms · 2020-08-24T12:29:42.280Z · score: 3 (2 votes) · LW · GW

Thanks for the in-depth answer!

I do share your opinion on the Sutton and Barto, which is the only book I read from your list (except a bit of the Russell and Norvig, but not the RL chapter). Notably, I took a lot of time to study the action value methods, only to realise later that a lot of recent work focus instead of policy-gradient methods (even if actor critics do use action-values).

From your answer and Rohin's, I gather that we lack a good resource in Deep RL, at least of the kind useful for AI Safety researchers. It makes me even more curious of the kind of knowledge that would be treated in such a resource.

Comment by adamshimi on Matt Botvinick on the spontaneous emergence of learning algorithms · 2020-08-23T20:49:51.363Z · score: 3 (2 votes) · LW · GW
Here's an obvious next step for people: google for resources on RL, ask others for recommendations on RL, try out some of the resources and see which one works best for you, and then choose one resource and dive deep into it, potentially repeat until you understand new RL papers by reading.

Agreed. Which is exactly why I asked you for recommendations. I don't think you're the only one someone interested in RL should ask for recommendation (I already asked other people, and knew some resource before all this), but as one of the (apparently few) members of the AF with the relevant skills in RL, it seemed that you might offer good advice on the topic.

About self-learning, I'm pretty sure people around here are good on this count. But knowing how to self-learn doesn't mean knowing what to self-learning. Hence the pointers.

I also don't buy that pointing out a problem is only effective if you have a concrete solution in mind. MIRI argues that it is a problem that we don't know how to align powerful AI systems, but doesn't seem to have any concrete solutions. Do you think this disqualifies MIRI from talking about AI risk and asking people to work on solving it?

No, I don't think you should only point to a problem with a concrete solution in hands. But solving a research problem (what MIRI's case is about) is not the same as learning a well-established field of computer science (what this discussion is about). In the latter case, you ask for people to learn things that already exists, not to invent them. And I do believe that showing some concrete things that might be relevant (as I repeated in each comment, not an exhaustive list) would make the injunction more effective.

That being said, it's perfectly okay if you don't want to propose anything. I'm just confused because it seems low effort for you, net positive, and the kind of "ask people for recommendation" that you preach in the previous comment. Maybe we disagree on one of these points?

Comment by adamshimi on Matt Botvinick on the spontaneous emergence of learning algorithms · 2020-08-22T20:46:27.572Z · score: 3 (2 votes) · LW · GW

The handyman might not give basic advice, but if he didn't have any advice, I would assume that he doesn't want to help.

I'm really confused by your answers. You have a long comment criticizing the lack of basic RL knowledge of the AF community, and when I ask you for pointers, you say that you don't want to give any, and that people should just learn the background knowledge. So should every member of the AF stop what they're doing right now to spend 5 years doing a PhD in RL before being able to post here?

If the goal of your comment was to push people to learn things you think they should know, pointing towards some stuff (not an exhaustive list) is the bare minimum for that to be effective. If you don't, I can't see many people investing the time to learn enough RL so that by osmosis they can understand a point you're making.

Comment by adamshimi on Matt Botvinick on the spontaneous emergence of learning algorithms · 2020-08-22T11:30:02.913Z · score: 3 (2 votes) · LW · GW

If you don't have a resource, then do you have a list of pointers to what people should learn? For example the policy gradient theorem and the REINFORCE trick. It will probably not be exhaustive, I'm just trying to make your call to learn more RL theory more actionable to people here.

Comment by adamshimi on Will OpenAI's work unintentionally increase existential risks related to AI? · 2020-08-21T16:53:14.432Z · score: 13 (4 votes) · LW · GW

Well, if we take this comment by gwern at face value, it clearly seems that no one with the actual resources has any interest in doing it for now. Based on these premises, scaling towards incredibly larger models would probably not have happened for years.

So I do think that if you believe this is wrong, you should be able to show where gwern's comment is wrong.

Comment by adamshimi on Search versus design · 2020-08-20T22:13:48.840Z · score: 3 (2 votes) · LW · GW

If there was a vote for the best comment thread of 2020, that would probably be it for me.

Comment by adamshimi on Matt Botvinick on the spontaneous emergence of learning algorithms · 2020-08-20T21:02:47.844Z · score: 7 (4 votes) · LW · GW

What would be a good resource to level up on RL theory? Is the Sutton and Barto good enough, or do you have something else in mind?

Comment by adamshimi on Will OpenAI's work unintentionally increase existential risks related to AI? · 2020-08-17T20:03:12.745Z · score: 9 (2 votes) · LW · GW

But what if they reach AGI during their speed up? The smoothing at a later time assumes that we'll end up with diminishing returns before AGI, which is not what happens for the moment.

Comment by adamshimi on Tagging Open Call / Discussion Thread · 2020-08-16T16:49:36.629Z · score: 3 (2 votes) · LW · GW

Done. It's a stub, but I tagged all posts from the Abstraction sequence.

Comment by adamshimi on Tagging Open Call / Discussion Thread · 2020-08-16T16:23:46.189Z · score: 1 (1 votes) · LW · GW

Just realized when I tried to tag this post that there isn't an Abstraction tag. Should there be one, or is there an equivalent tag?

Comment by adamshimi on Alignment By Default · 2020-08-16T16:12:28.377Z · score: 3 (2 votes) · LW · GW

Great post!

That might have been discussed in the comments, but my gut reaction to the tree example was not "It's not really understanding tree" but "It's understanding trees visually". That is, I think the examples point to trees being a natural abstraction with regard to images made of pixels. In that sense, dogs and cats and other distinct visual objects might fit your proposal of natural abstraction. Yet this doesn't entail that trees are a natural abstraction when given the position of atoms, or sounds (to be more abstract). I thus think that natural abstractions should be defined with regard for the sort of data that is used.

For human values, I might accept that they are natural abstraction, but I don't know for which kind of data. Is audiovisual data (as in youtube videos) enough? Do we also need textual data? Neuroimagery? I don't know, and that makes me slightly more pessimistic about a unsupervised model learning human values by default.

Comment by adamshimi on Alignment By Default · 2020-08-12T20:17:46.530Z · score: 3 (2 votes) · LW · GW

Isn't remaining aligned an example of robust delegation? If so, there have been both discussions and technical work on this problem before.

Comment by adamshimi on Will OpenAI's work unintentionally increase existential risks related to AI? · 2020-08-12T18:45:43.311Z · score: 3 (2 votes) · LW · GW

Hum, my perspective is that in the example that you describe, OpenAI isn't intentionally increasing the risks, in that they think it improves things over all. My line at "intentionally increasing xrisks" would be to literally decide to act while thinking/knowing that your action are making things worse in general for xrisks, which doesn't sound like your example.

Comment by adamshimi on Will OpenAI's work unintentionally increase existential risks related to AI? · 2020-08-12T18:40:51.677Z · score: 5 (2 votes) · LW · GW

Do you think you (or someone else) could summarize this discussion here? I have to admit that the ideas being spread out between multiple posts doesn't help.

Comment by adamshimi on Will OpenAI's work unintentionally increase existential risks related to AI? · 2020-08-12T18:39:50.173Z · score: 6 (4 votes) · LW · GW

Thanks for your answer! Trying to make your examples of what might change your opinion substantially more concrete, I got these:

  • Does senior decision-making at OpenAI always consider safety issues before greenlighting new capability research?
  • Do senior researchers at OpenAI believe that their current research directly leads to AGI in the short term?
  • Would the Scaling Hypothesis (and thus GPT-N) have been vindicated as soon in a world without OpenAI?

Do you agree with these? Do you have other ideas of concrete questions?

Comment by adamshimi on Will OpenAI's work unintentionally increase existential risks related to AI? · 2020-08-12T18:18:22.997Z · score: 4 (3 votes) · LW · GW

So if I understand your main point, you argue that OpenAI LP incentivized new investments without endangering the safety, thanks to the capped returns. And that this tradeoff looks like one of the best possible, compared to becoming a for-profit or getting bought by a big for-profit company. Is that right?

I also think for most of the things I'm concerned about, psychological pressure to think the thing isn't dangerous is more important; like, I don't think we're in the cigarette case where it's mostly other people who get cancer while the company profits; I think we're in the case where either the bomb ignites the atmosphere or it doesn't, and even in wartime the evidence was that people would abandon plans that posed a serious chance of destroying humanity.

I agree with you that we're in the second case, but that doesn't necessarily means that there's a fire alarm. And economic incentives might push you to go slightly further, where it looks like everything is still okay, but we reached transformative AI in a terrible way. [I don't think it is actually the case for OpenAI right now, just answering to your point.]

Note also that economic incentives quite possibly push away from AGI towards providing narrow services (see Drexler's various arguments that AGI isn't economically useful, and so people won't make it by default). If you are more worried about companies that want to build AGIs and then ask it what to do than you are about companies that want to build AIs to accomplish specific tasks, increased short-term profit motive makes OpenAI more likely to move in the second direction

Good point, I need to think more about that. A counterargument that springs to mind is that AGI research might push forward other kinds of AI, and thus bring transformative AI sooner even if it isn't an AGI.

Comment by adamshimi on Will OpenAI's work unintentionally increase existential risks related to AI? · 2020-08-12T18:04:46.726Z · score: 1 (1 votes) · LW · GW

Just so you know, I got the reference. ;)

Comment by adamshimi on Will OpenAI's work unintentionally increase existential risks related to AI? · 2020-08-12T18:04:20.602Z · score: 3 (2 votes) · LW · GW

Done! I used your first proposal, as it is more in line with my original question.

Comment by adamshimi on Will OpenAI's work unintentionally increase existential risks related to AI? · 2020-08-12T11:38:07.836Z · score: 4 (3 votes) · LW · GW

Thanks a lot for this great answer!

First, I should have written it, but my baseline (or my counterfactual) is a world where OpenAI doesn't exists but the people working there still exists. This might be an improvement if you think that pushing the scaling hypothesis is dangerous and that most of the safety team would find money to keep working, or an issue if you think someone else, probably less aligned, would have pushed the scaling hypothesis, and that the structure given by OpenAI to its safety team is really special and important.

As for your obstacle, I agree that they pose problem. It's the reason why I don't expect a full answer to this question. On the other hand, as you show yourself with the end of your post, I still believe we can have a fruitful discussion and debate on some of the issues. This might result in a different stance toward OpenAI, or arguments to defending it, or something completely different. But I don't think there is nothing to be gained by having this discussion.

On resources, imagine that there's Dr. Light, whose research interests point in a positive direction, and Dr. Wily, whose research interests point in a negative direction, and the more money you give to Dr. Light the better things get, and the more money you give to Dr. Wily, the worse things get. [But actually what we care about is counterfactuals; if you don't give Dr. Wily access to any of your compute, he might go elsewhere and get similar amounts of compute, or possibly even more.]

This aims at one criticism of OpenAI I often see: the amount of resources they give to capability research. Your other arguments (particularly osmosis) might influence this, but there's an intuitive reason for which you might want to only given resources to the Dr Light out there.

On the other hand, your counterfactual world hints that maybe redirecting Dr. Wily or putting it in an environment where the issues of safety are mentioned a lot might help stirs his research in a positive direction.

On direction-shifting, imagine someone has a good idea for how to make machine learning better, and they don't really care what the underlying problem is. You might be able to dramatically change their impact by pointing them at cancer-detection instead of missile guidance, for example. Similarly, they might have a default preference for releasing models, but not actually care much if management says the release should be delayed.

Here too, I see a way for this part to mean that OpenAI has a positive impact and to mean that it has a negative impact. On the positive impact side, the constraints on models released, the fact of even having a safety team and discussing safety might push new researchers to go into safety or to consider more safety related issues in their work. But on the negative impact side, GPT-3 (as an example) is really cool. If you're a young student, you might be convinced by it to go work on AI capabilities, without much thought about safety.

On osmosis, imagine there are lots of machine learning researchers who are mostly focused on technical problems, and mostly get their 'political' opinions for social reasons instead of philosophical reasons. Then the main determinant of whether they think that, say, the benefits of AI should be dispersed or concentrated might be whether they hang out at lunch with people who think the former or the latter.

This is probably the most clearly positive point for OpenAI. Still, I'm curious of how much safety plays a role in the culture of OpenAI. For example, are all researchers and engineers sensibilized to safety issues? If that's the case, then the culture would seem to lessen the risks significantly.

Comment by adamshimi on Will OpenAI's work unintentionally increase existential risks related to AI? · 2020-08-12T11:22:06.395Z · score: 6 (4 votes) · LW · GW

That's an interesting point. Why do you think that the new organizational transition is not compromising safety? (I have no formed opinion on this, but it seems that adding economic incentives is dangerous by default)

Comment by adamshimi on Will OpenAI's work unintentionally increase existential risks related to AI? · 2020-08-12T11:18:17.817Z · score: 7 (4 votes) · LW · GW

I see what you mean. Although my question is definitely pointed at OpenAI, I don't want to accuse them of anything. One thing I wanted to write in the question but that I forgot was that the question asks about the consequences of OpenAI's work, not the intentions. So there might be negative consequences that were not intentional (or no negative consequences of course).

Is "Are the consequences of OpenAI's work positive or negative for xrisks?" better?

Comment by adamshimi on Will OpenAI's work unintentionally increase existential risks related to AI? · 2020-08-11T21:23:45.660Z · score: 10 (6 votes) · LW · GW

Thanks for explaining your downvote! I agree that the question is targeted. I tried to also give arguments against this idea of OpenAI increasing xrisks, but it probably still reads as biased.

That being said, I disagree about not targetting OpenAI. Everything that I've seen discussed by friends is centered completely about OpenAI. I think it would be great to have an answer showing that OpenAI is only the most visible group acting that way, but that others follow the same template. It's still true that the question is raised way more about OpenAI than any other research group.

Comment by adamshimi on Dealing with Curiosity-Stoppers · 2020-08-10T15:06:34.624Z · score: 1 (1 votes) · LW · GW

Hum, I would say it depends on why you want to learn it. If it's professional, then you might be right in not following through if it is not that useful. If it's for your own pleasure, then I must admit I rarely feel that specific curiosity-stopper. I tend to be pretty sure that I can do things; my issues are much more about following on that.

Comment by adamshimi on What are the most important papers/post/resources to read to understand more of GPT-3? · 2020-08-10T15:03:22.134Z · score: 1 (1 votes) · LW · GW

Thanks! I'll try to read that.

Comment by adamshimi on Open & Welcome Thread - August 2020 · 2020-08-07T20:25:37.211Z · score: 3 (2 votes) · LW · GW

I think (although I cannot be 100% sure) that the number of votes that appears for a post on the Alignment Forum is the number of vote of its Less Wrong version. The two number of votes are the same for the last 4 posts on the Alignment Forum, which seems weird. Is it a feature I was not aware of?

Comment by adamshimi on Open & Welcome Thread - August 2020 · 2020-08-06T20:55:05.722Z · score: 1 (1 votes) · LW · GW

I didn't know that you were working on a new editor! In that case, it makes sense to wait.

Comment by adamshimi on Open & Welcome Thread - August 2020 · 2020-08-06T19:33:56.522Z · score: 1 (1 votes) · LW · GW

Would it be possible to have a page with all editor shortcuts and commands (maybe a cheatsheet) easily accessible? It's a bit annoying to have to look up either this post or the right part of the FAQ to find out how to do something in the editor.

Comment by adamshimi on The 8 Techniques to Tolerify the Dark World · 2020-08-03T18:40:18.882Z · score: 1 (1 votes) · LW · GW

Yes, I was more thinking about doing them by default than continuously thinking about them. If you actually do the latter, they might indeed stop working.

Comment by adamshimi on What are the most important papers/post/resources to read to understand more of GPT-3? · 2020-08-03T18:39:01.546Z · score: 3 (2 votes) · LW · GW

Thanks for the answer! I knew about the "transformer explained" post, but I was not aware of its author's position on GPT-3.

Comment by adamshimi on Tagging Open Call / Discussion Thread · 2020-08-03T18:37:15.685Z · score: 8 (2 votes) · LW · GW

Karma is nice. Maybe simply an appreciation post at some point, which could still not name people. Just let them know that they are appreciated.

I don't know if that's possible, but another option might be some sort of "rank" or "badge" for top taggers. That being said, one might ask why have ranks only for this specific case, and not in general.

Comment by adamshimi on What are you looking for in a Less Wrong post? · 2020-08-03T18:34:55.899Z · score: 1 (1 votes) · LW · GW

Thanks for the answer! I didn't think of it that way, but I actually agree that I prefer when the post crystallize both sides of the disagreement, for example in a double crux.

Comment by adamshimi on What are you looking for in a Less Wrong post? · 2020-08-03T18:32:39.220Z · score: 1 (1 votes) · LW · GW

I didn't know that one. It's great!

Comment by adamshimi on What Failure Looks Like: Distilling the Discussion · 2020-08-02T20:38:07.428Z · score: 2 (2 votes) · LW · GW

Thanks for this post! I think the initiative is great, and I'm glad to be able to read a summary of the discussion.

Two points, the first more serious and important than the second:

  • Concerning your summary, I think that having only the Eleven Paragraph Summary (and maybe the one paragraph, if you really want a short one), is good enough. Notably, I feel like you end throwing too many important details in the three paragraphs summary. And nine paragraphs is short enough than anyone can read it.
  • Imagine that I want to respond to a specifc point that was discussed. Should I do that here or in the comments of the original post? The first option might make my comment easier to see, but it will split the discussion. (Also, it might cause an infinite recursion of distilling the discussions of the distillation of the discussion of the distillation of ... the discussion.)
Comment by adamshimi on What are you looking for in a Less Wrong post? · 2020-08-02T19:46:46.039Z · score: 2 (2 votes) · LW · GW

Thanks for the answer!

I see we basically vote in the same way, and it's nice to know that I'm not the only one that is lost sometimes.