Idea: OpenAI Gym environments where the AI is a part of the environment

post by philip_b (crabman) · 2018-04-12T22:28:20.758Z · LW · GW · 5 comments

Contents

5 comments

AIXI is a mathematical construct, the perfect agent that maximizes its utility function in a discrete world. Unfortunately there is no algorithm implementing it, therefore it's impossible to create in our world. It has another problem - the agent in AIXI model exists outside of the world, it's impossible for the agent to drop an anvil on its own circuits and make itself more stupid.

Modern reinforcement learning algorithms, which are the closest thing to general AI that we have, operate in similar fashion, they aren't part of the environment either. If a bot is learning how to balance on one leg, or play pong, or super mario, it can't modify itself or break its own brain.

My idea is to create environments where the bot can modify itself and break itself, so that people who want to research creation of strong AI can test solutions for the anvil problem. Here are examples of such environments:

Feel free to post your thoughts and critique.

5 comments

Comments sorted by top scores.

comment by gwern · 2018-04-12T22:49:28.450Z · LW(p) · GW(p)

Have you seen "AI Safety Gridworlds", Leike et al 2017?

Replies from: crabman
comment by philip_b (crabman) · 2018-04-12T23:03:25.120Z · LW(p) · GW(p)

I haven't, thanks.

Btw was your goal to show me the link or to learn whether I have seen it before? If the former, then I don't need to respond. If the latter, then you want my response I guess.

Replies from: robert-miles, gwern
comment by Robert Miles (robert-miles) · 2018-04-14T13:12:57.452Z · LW(p) · GW(p)

The "Whisky and Gold" environment is particularly relevant

comment by gwern · 2018-04-13T02:07:18.866Z · LW(p) · GW(p)

It was partially to point out that you can get self-modification hazards with a substantially less complex setup than your proposal with a little hand-engineering of the agents; since none of the AI safety gridworld problems could be said to be rigorously solved, there's no need for more realistic self-modification environments.

comment by Caspar Oesterheld (Caspar42) · 2018-04-14T18:27:26.760Z · LW(p) · GW(p)

I list some relevant discussions of the "anvil problem" etc. here. In particular, Soares and Fallenstein (2014) seem to have implemented an environment in which such problems can be modeled.