Posts

One possible approach to develop the best possible general learning algorithm 2022-03-14T19:24:00.150Z
How to develop safe superintelligence 2022-03-01T21:57:22.811Z

Comments

Comment by martillopart on How to develop safe superintelligence · 2022-03-15T10:52:45.175Z · LW · GW

Hey! Thanks for your comment. 

This algorithm won't get caught in a loop like the one you mentioned, because it uses the same process as the one described in the AutoML-Zero paper. In the article, they 'found a better algorithm and iterated' without any problem whatsoever, using the processes described in figures 1 and 2. Please check the paper for that.

About your second point: that's exactly the aim of the experiment, to know if a strictly-better agent can be found with an automatic process. If we don't get there using substantial computation within an acceptable amount of time, then the experiment will have failed. But, as with all experiments, there's good reasons try.

Third: how do you find unseen games? Simply, unseen games are just games for which the algorithm hasn't been training to perform well at. In this experiment, this would be experiments that are not on the Deepmind MuZero benchmark. Obviously, these unseen games will be changed in every cycle (every point 5).

Fourth: Yes, of course there's no guarantee, because that's also the point of the experiment. To know if this will happen. And again, there's good reason to think so. Here's the explanation again:  you use a machine learning technique to find a general learning program that performs better than MuZero. But MuZero in itself is a deep reinforcement learning program that's designed to quickly learn many different games. And what is a game? An objective based activity related to certain rules. Hence, if the new program performs better than MuZero as a GLA, then it would be logical to assume that it will perform better as well in the process of finding better GLAs, because finding GLAs is a "game" too. This is also explained in points 5 and 6.

About your 'in general' statement,  at no point I am presenting an argument saying that these new algorithms will perform better or equally well than Levin Search. What I propose with this experiment is to automatically find improved versions of the general learning algorithms that we currently have. The ideal endpoint of the experiment would be to automatically find algorithms that are CLOSE to the best possible (such as Levin Search) but are practically feasible.

Kind Regards