Is MIRI's reading list up to date? 2021-09-11T18:56:39.779Z
[External Event] 2022 IEEE Conference on Assured Autonomy (ICAA) 2021-09-08T16:31:29.265Z
Improving rationality / scout mindset on a debate group 2021-09-04T17:02:24.557Z
NIST AI Risk Management Framework request for information (RFI) 2021-09-01T00:15:32.549Z
Modelling Transformative AI Risks (MTAIR) Project: Introduction 2021-08-16T07:12:22.277Z
How would the Scaling Hypothesis change things? 2021-08-13T15:42:03.449Z
Pros and cons of working on near-term technical AI safety and assurance 2021-06-17T20:17:13.451Z
List of good AI safety project ideas? 2021-05-26T22:36:07.910Z
[Link] Whittlestone et al., The Societal Implications of Deep Reinforcement Learning 2021-03-10T18:13:25.520Z
Institute for Assured Autonomy (IAA) newsletter 2021-02-11T21:48:52.420Z
Some recent survey papers on (mostly near-term) AI safety, security, and assurance 2021-01-13T21:50:40.089Z
Blog post: A tale of two research communities 2020-08-12T20:41:30.448Z
Talk: Key Issues In Near-Term AI Safety Research 2020-07-10T18:36:12.462Z
More on disambiguating "discontinuity" 2020-06-09T15:16:34.432Z
New article from Oren Etzioni 2020-02-25T15:25:22.244Z
Slide deck: Introduction to AI Safety 2020-01-29T15:57:04.979Z
Workshop on Assured Autonomous Systems (WAAS) 2020-01-20T16:21:14.806Z
alenglander's Shortform 2019-12-02T15:35:02.996Z
APL AI Safety Conference 2019-07-22T18:18:07.717Z


Comment by Aryeh Englander (alenglander) on Paths To High-Level Machine Intelligence · 2021-09-26T11:52:33.783Z · LW · GW

Thanks Daniel for that strong vote of confidence!

The full graph is in fact expandable / collapsible, and it does have the ability to display the relevant paragraphs when you hover over a node (although the descriptions are not all filled in yet). It also allows people to enter in their own numbers and spit out updated calculations, exactly as you described. We actually built a nice dashboard for that - we haven't shown it yet in this sequence because this sequence is mostly focused on phase 1 and that's for phase 2.

Analytica does have a web version, but it's a bit clunky and buggy so we haven't used it so far. However, I was just informed that they are coming out with a major update soon that will include a significantly better web version, so hopefully we can do all this then.

I certainly don't think we'd say no to additional funding or interns! We could certainly use them - there are quite a few areas that we have not looked into sufficiently because all of our team members were focused on other parts of the model. And we haven't gotten yet to much of the quantitative part (phase 2 as you called it), or the formal elicitation part.

Comment by Aryeh Englander (alenglander) on The Best Software For Every Need · 2021-09-20T03:09:17.818Z · LW · GW

My wife specializes in this and she says that's like asking what clothing should I buy. It depends on a lot of factors plus an element of taste. If you want you can message me - my wife says she's happy to help you work through the options a bit for free.

Comment by Aryeh Englander (alenglander) on Is MIRI's reading list up to date? · 2021-09-12T19:38:45.219Z · LW · GW


Comment by Aryeh Englander (alenglander) on alenglander's Shortform · 2021-09-08T00:37:46.012Z · LW · GW

Yes, this sounds like a reasonable interpretation.

Comment by Aryeh Englander (alenglander) on alenglander's Shortform · 2021-09-07T02:54:14.069Z · LW · GW

I am reluctant to mention specific examples, partly because maybe I've misunderstood and partly because I hate being at all confrontational. But regardless, I have definitely seen this outside the rationalist community, and I have definitely noticed myself doing this. Usually I only do it in my head though, where I feel upset when it's coming from outside my group but if someone inside the group says it then I'll mentally nod along.

Comment by Aryeh Englander (alenglander) on alenglander's Shortform · 2021-09-06T20:25:03.426Z · LW · GW

Failure mode I think I've noticed, including among rationalists (and certainly myself!): If someone in your in-group criticizes something about the group, then people often consider that critique to be reasonable. If someone outside the group levels the exact same criticism, then that feels like an attack on the group, and your tribal defensiveness kicks into gear, potentially making you more susceptible to confirmation / disconfirmation bias or the like. I've noticed myself and I'm pretty sure others in the rationalist community doing this, and even reacting in clearly different ways to the exact same critique when we hear it from an in-group member or someone outside the group.

Do you think this is correct or off the mark? Also, is there a name for this and have their been studies about it?

Comment by Aryeh Englander (alenglander) on [AN #156]: The scaling hypothesis: a plan for building AGI · 2021-07-16T20:41:25.406Z · LW · GW

I'd like to hear more thoughts, from Rohin or anybody else, about how the scaling hypothesis might affect safety work.

Comment by Aryeh Englander (alenglander) on List of good AI safety project ideas? · 2021-06-03T12:53:47.084Z · LW · GW

New post on the EA Forum: Some AI Governance Research Ideas

Comment by Aryeh Englander (alenglander) on List of good AI safety project ideas? · 2021-05-28T19:25:34.379Z · LW · GW

Just came across this: Research ideas to study humans with AI Safety in mind

Comment by Aryeh Englander (alenglander) on [Event] Weekly Alignment Research Coffee Time (05/10) · 2021-05-10T21:09:47.985Z · LW · GW

Thanks Adam for setting this up! I have no idea if my experience is representative, but that was definitely one of the highest-quality discussion sessions I've had at events of this type.

Comment by Aryeh Englander (alenglander) on [Linkpost] Treacherous turns in the wild · 2021-04-27T17:27:47.938Z · LW · GW

I don't think this is quite an example of a treacherous turn, but this still looks relevant:

Lewis et al., Deal or no deal? end-to-end learning for negotiation dialogues (2017):

Analysing the performance of our agents, we find evidence of sophisticated negotiation strategies. For example, we find instances of the model feigning interest in a valueless issue, so that it can later ‘compromise’ by conceding it. Deceit is a complex skill that requires hypothesising the other agent’s beliefs, and is learnt relatively late in child development (Talwar and Lee, 2002). Our agents have learnt to deceive without any explicit human design, simply by trying to achieve their goals.

(I found this reference cited in Kenton et al., Alignment of Language Agents (2021).)

Comment by Aryeh Englander (alenglander) on Timeline of AI safety · 2021-02-08T14:40:46.951Z · LW · GW

That's later in the linked wiki page:

Comment by Aryeh Englander (alenglander) on Timeline of AI safety · 2021-02-08T13:43:11.929Z · LW · GW

Excellent, thanks! Now I just need a similar timeline for near-term safety engineering / assured autonomy as they relate to AI, and then a good part of a paper I'm working on will have just written itself.

Comment by Aryeh Englander (alenglander) on The ethics of AI for the Routledge Encyclopedia of Philosophy · 2020-11-18T18:43:09.325Z · LW · GW

Also - particular papers that you think are important, especially if you think they might be harder to find in a quick literature search. I'm part of an AI Ethics team at work, and I would like to find out about these as well.

Comment by Aryeh Englander (alenglander) on The ground of optimization · 2020-07-02T17:44:15.914Z · LW · GW

This was actually part of a conversation I was having with this colleague regarding whether or not evolution can be viewed as an optimization process. Here are some follow-up comments to what she wrote above related to the evolution angle:

We could define the natural selection system as:

All configurations = all arrangements of matter on a planet (both arrangements that are living and those that are non-living)

Basis of attraction = all arrangements of matter on a planet that meet the definition of a living thing

Target configuration set = all arrangements of living things where the type and number of living things remains approximately stable.

I think that this system meets the definition of an optimizing system given in the Ground for Optimization essay. For example, predator and prey co-evolve to be about “equal” in survival ability. If a predator become so much better than its prey that it eats them all, the predator will die out along with its prey; the remaining animals will be in balance. I think this works for climate perturbations, etc. too.

HOWEVER, it should be clear that there are numerous ways in which this can happen – like the ball on bumpy surface with a lot of convex “valleys” (local minima), there is not just one way that living things can be in balance. So, to say that “natural selection optimized for intelligence” is quite not right – it just fell into a “valley” where intelligence happened. FURTHER, it’s not clear that we have reached the local minimum! Humans may be that predator that is going to fall “prey” to its own success. If that happened (and any intelligent animals remain at all), I guess we could say that natural selection optimized for less-than-human intelligence!

Further, this definition of optimization has no connotation of “best” or even better – just equal to a defined set. The word “optimize” is loaded. And its use in connection with natural selection has led to a lot of trouble in terms of human races, and humans v. animal rights.

Finally, in the essay’s definition, there is no imperative that the target set be reached. As long as the set of living things is “tending” toward intelligence, then the system is optimizing. So even if natural selection was optimizing for intelligence there is no guarantee that it will be achieved (in its highest manifestation). Like a billiards system where the table is slick (but not frictionless) and the collisions are close to elastic, the balls may come to rest with some of the balls outside the pockets. The reason I think this is important for AI research, especially AGI and ASI, is perhaps we should be looking for those perturbations to prevent us from ever reaching what we may think of as the target configuration, despite our best efforts.

Comment by Aryeh Englander (alenglander) on The ground of optimization · 2020-07-01T15:46:14.897Z · LW · GW

I shared this essay with a colleague where I work (Johns Hopkins University Applied Physics Lab). Here are her comments, which she asked me to share:

This essay proposes a very interesting definition of optimization as the manifestation of a particular behavior of a closed, physical system. I haven’t finished thinking this over, but I suspect it will be (as is suggested in the essay) a useful construct. The reasoning leading to the definition is clearly laid out (thank you!), with examples that are very useful in understanding the concept. The downside of being clearly laid out, however, is that it makes critique easier. I have a few thoughts about the reasoning in the essay.

The first thing I will note is that the essay gives three definitions for an optimizing system. These definitions are close, but not exactly equivalent. The nuances can be important. For example, that the target configuration set and the basin of attraction cannot be equal is obvious; that is made explicit in definition 3, but only implied in definitions 1 and 2. A bigger issue is that there are no criteria or rationale for their extent and relative size.

For example, the essay offers two reasons why the posterchild of non-optimizers - the bottle with a cap - is not an optimizing system; they both arise from the rather arbitrary definition of the basin of attraction as equal to the target configuration set. I see no necessary reason why the basin of attraction couldn’t be defined as the set of all configurations of water molecules both inside and outside the bottle. That way, the definitional requirement of a target configuration set smaller than the basis of attraction is met. The important point is: will water molecules in this new, larger basin of attraction tend to the target configuration set?

Let’s suppose that capped bottle is in a sealed room (not necessary but easier to think about), and that the cap is made of a special material that allows water molecules to pass through it in only one direction: from outside the bottle to inside. The water molecules inside the bottle stay inside the bottle, as for any cap. The water molecules inside the room, but outside the bottle, are zooming about (thermodynamic energy), bouncing off the walls, each other, and the bottle. Although it will take some time, sooner or later all the molecules outside the bottle will hit the bottle cap, go through, and be trapped in the bottle. Voila!

Originally, the bottle-with-a-cap system was a non-optimizing system by definition; the bottle cap type was irrelevant and could have been the rather special one I described. Simply by changing the definition of the basin of attraction, we could turn it into an optimizing system. Further, the original, “non-optimizing” system (with the original definitions of the basin of attraction and target set) would have behaved exactly the same as my optimizing system. On the other hand, changing the bottle cap from our special one to a regular cap will change the system into a non-optimizing system, regardless of the definitions of the basin of attraction and the target configuration set. Perhaps, we should insist that a properly formed system description has a basin of attraction that is larger than the target set, and count on the system behavior to make the optimizing/non-optimizing distinction.

Definitions 1 and 2 both contain the phrase “a small set of target configurations” which implies that the target set << than the basin of attraction. This is a problem for the notion of the universe as a system with maximum entropy as the target configuration set because the target set is most of the possible configurations. For this reason, the essay’s author concludes that universe-with-entropy system is not an optimizing system, or at best, a weak one. Stars, galaxies, black holes – there are strong forces that pull matter into these structures. I would say that any system that has succeeded in getting nearly everything within the basin of attraction into the target configuration is a strong optimizer!

Regardless of the way we chose to think about strong or weak, the universe is a system that tends to a set of configurations smaller than the set of possible configurations despite perturbations (the occasional house-building project for example!). Personally, I see no value in a definitional limitation. The behavior of the system (tending toward a smaller set of configurations out of a larger set) should govern the definition of an optimizing system, regardless of relative sizes of the sets.

Between the universe-with-entropy and bottle-with-a-cap systems, I question the utility of the “all configurations >= basin of attraction >> target set configuration” structure in the definition of optimizing systems. I believe it is worth thinking about what the necessary relationships among these configurations are, and how they are chosen.

The example of the billiards system raised another (to me) interesting question. The essay did not offer a system description but says “Consider a billiard table with some billiard balls that are currently bouncing around in motion. Left alone, the balls will eventually come to rest in some configuration…. If we reach in while the billiard balls are bouncing around and move one of the balls that is in motion, the system will now come to rest in a different configuration. Therefore this is not an optimizing system, because there is no set of target configurations towards which the system evolves despite perturbations.”

This example has some odd features. Friction between the balls and the table surface, along with the loss of energy during non-elastic collisions, cause the balls to slow down and stop. The minutia of their travels determines where they stop. The final arrangement is unpredictable (ok, it could be modeled given complete information, but let’s skip that as beside the point), and any arrangement is as likely as another. This suggests that the billiards system is a non-optimizing system even without the proposed perturbation of moving the balls around while the balls are in motion.

Looked at another way, billiards system does tend to a certain target configuration set, while friction and the non-elasticity of the collisions are perturbations. If we make the surface frictionless and the collisions perfectly elastic, the balls will bounce around the table without stopping. Much like the water molecules in the bottle-with-a-cap example, each will eventually fall into one pocket or another during its travels. Once in the pocket, the ball cannot get out, and thus eventually all will end up in the pockets. So, this system tends to a target configuration set of all balls in pockets.

Adding back in the perturbing friction and energy loss does not mean that this system is not tending to the target configuration set. Reaching in and moving a ball to a different point, or even redirecting any ball heading for a pocket, will not keep this system from tending towards the target configuration. It seems as though the billiards system was an optimizing system all along! The larger point is that it seems, by definition, an optimizing system is an optimizing system even if there are a set of perturbations that prevent it from ever reaching the target configuration! “Tending toward”, not “reaching”, a target configuration set is in all three definitions. It is worth thinking about an optimizing system that never actually optimizes. This may have some bearing on the AGI question.

[And for you readers who, like me, would say, whoa - it is possible that the balls will enter some repeating pattern of motion where some do not enter pockets. Maybe we need a robot to move the balls around randomly if they seem stuck, just like the ball-in-valley+robot system where the robot moves the ball over barriers. I maintain that the point is the same.]

The satellite system illustrates (perhaps an obvious point) that the definition of the target configuration set can change a single system from optimizing and to non-optimizing. What is a little more subtle is that the definition of the system boundaries is essential to the characterization of the system as optimizing or non-optimizing, even if the behavior of the system is the same under both definitions. In particular, what we consider to be part of the system and what is considered to be a perturbation can flip a system between characterizations. [This latter point is illustrated by the billiards system as well, as I will explain below.]

The essay says that a satellite in orbit is a non-optimizing system because if its position or velocity is perturbed, it has no tendency to return to its original orbit; that is, the author defines the target configuration as a particular orbit. With respect to another target configuration that may be described as “a scorched pile of junk on the surface of the Earth”, a satellite in orbit is an optimizing system exactly like a ball in a valley. As soon as the launch rocket stops firing, a satellite starts falling to the center of the earth because atmospheric drag and solar radiation pressure continuously decrease the component of the satellite’s velocity perpendicular to the force of gravity. So, unless a perturbation is big enough to send it out of orbit altogether, a satellite tends towards a target configuration of junk located on Earth’s surface.

Since a particular orbit is usually the desired target configuration (!), many satellites incorporate a rocket system to force them to stay in a chosen orbit. If a rocket system is included in the system definition, then the satellite is an optimizing system relative to the desired orbit. What is a little more interesting, with respect to the junk-on-the-Earth target set, drag and solar pressure are the part of the optimizing system; an orbit correction system is a perturbation. If the target set is the particular orbit the satellite started in, these definitions swap.

This observation has bearing on the billiards system example. If we include drag and non-elastic collisions as part of the billiards system, then the system is non-optimizing. If we see them as perturbations outside the system, then the billiards system is optimizing. I find this flexibility as a little curious, although I haven’t completely thought through the implications.

A completely different sort of question is suggested by the section on Drexler. There the essay sets out a hierarchy of all AI systems, optimizing systems, and goal-directed agent systems. This makes sense with respect to AI systems, but I do not see how optimizing systems, as defined, can be wholly contained within the category of AI systems, unless you define AI systems pretty broadly. For example, I think that pretty much any control system is an optimizing system by the definition in the essay. If we accept this definition of optimizing system, and hold that all optimizing systems are a subset of AI systems, do we have to accept our thermostats as AI systems? What about the program that determined the square root of 2? Is that AI? Is this an issue for this definition, or does its broadness matter in an AI context?

And a nitpick: The first example of an optimizing system offered in the essay is a program calculating the square root of 2. It meets the definition of an optimizing system, but it seems to contradict the earlier assertion that “… optimizing systems are not something that are designed but are discovered.” The algorithm and the program were both designed. I’m not sure why this point is necessary. Either I do not understand something fundamental, or the only purpose of the statement of discovery is to give people like me something to argue about!

In summary, the definition in the essay suggests a few questions that could have a bearing on its application:

  • How do we choose the basis of attraction relative to the target configuration set, if our choice can change the status of the system from optimizing to non-optimizing and vice versa?
  • Is it an issue that an optimizing system may never actually optimize?
  • How do we choose what is part of the system versus a perturbation outside the system when our choice changes the status of the system as optimizing or non-optimizing?
  • All control systems are optimizing systems by the definition, but are all control systems AI systems? Does it matter? If it does matter, how do we tell the difference?
  • For any of these, how do they affect our thinking for AI?

Finally, it might be better to have one, consistent definition that covers all the possibilities, including (in my opinion) that perturbations may be confined to certain dimensions.

Comment by Aryeh Englander (alenglander) on Discontinuous progress in history: an update · 2020-05-19T21:54:14.661Z · LW · GW

One thing that jumped out at me when reading this is that you were counting something as a discontinuity (a relative rate of change) by looking at how many years it jumped ahead (an absolute rate of change). This effectively rules out most recent technologies because the rate of technological progress is already quite high, so you'd have a much harder time jumping 100 years ahead of schedule now than you would have in the past.

I would think that a better metric would be to use some measure of general technological progress as a a base (the x-axis) instead of absolute number of years. I strongly suspect that you would find quite a few more discontinuities this way which were otherwise ruled out because they didn't "jump far enough ahead". For example, I suspect that AlexNet would be a discontinuity on this metric.

Comment by Aryeh Englander (alenglander) on Resources for AI Alignment Cartography · 2020-04-06T14:36:55.056Z · LW · GW

I am running a large-scale version of this, with contributors from multiple organizations. We should definitely discuss. Can you message me or email me? Aryeh.Englander at Thanks!

Comment by Aryeh Englander (alenglander) on New article from Oren Etzioni · 2020-02-25T18:03:42.519Z · LW · GW

It would likely depend on whether or not self-driving cars and AI doctors need some form of reinforcement learning to work. If they do, and especially if they need to use online learning, then presumably they will need to at least partially solve issues like safe exploration, distributional shift, avoiding side effects, verification and validation of RL policies, etc. It also seems likely that they would need to solve versions of specification gaming to ensure that the RL agent doesn't do weird things in edge cases because the reward function wasn't perfectly specified. Whether or not such partial solutions would scale up to AGI is a different discussion, as I mentioned.

Comment by Aryeh Englander (alenglander) on Link: Does the following seem like a reasonable brief summary of the key disagreements regarding AI risk? · 2019-12-27T13:49:05.900Z · LW · GW

Valid. I was primarily summarizing the risk part though, rather than the solutions.

Comment by Aryeh Englander (alenglander) on alenglander's Shortform · 2019-12-02T15:35:03.228Z · LW · GW

Something I've been thinking about recently. I've been reading several discussions surrounding potential risks from AI, especially the essays and interviews on AI Impacts. A lot of these discussions seem to me to center on trying to extrapolate from known data, or to analyze whether AI is or is not analogous to various historical transitions.

But it seems to me that trying to reason based on historical precedent or extrapolated data is only one way of looking at these issues. The other way seems to be more like what Bostrom did in Superintelligence, which seems more like reasoning based on theoretical models of how AI works, what could go wrong, how the world would likely react, etc.

It seems to me that the more you go with the historical analogies / extrapolated data approach, the more skeptical you'll be of claims from people claiming that AI risk is a huge problem. And conversely, the more you go with the reasoning from theoretical models approach, the more concerned you'll be. I'd probably put Robin Hanson somewhere close to the extreme end of the extrapolated data approach, and I'd put Eliezer Yudkowsky and Nick Bostrom close to the extreme end of the theoretical models approach. AI Impacts seems to fall closer to Hanson on this spectrum.

Of course, there's no real hard line between the two approaches. Reasoning from historical precedent and extrapolated data necessarily requires some theoretical modeling, and vice versa. But I still think the basic distinction holds value.

If this is right, then the question is how much weight should we put on each type of reasoning, and why?


Comment by Aryeh Englander (alenglander) on APL AI Safety Conference · 2019-07-22T19:26:16.038Z · LW · GW

I'm having trouble editing the correct location. It's supposed to be 7701 Montpelier Road, Laurel, MD 20723.