Posts

Where's the first benign agent? 2017-04-15T12:36:13.000Z

Comments

Comment by IAFF-User-214 (Imported-IAFF-User-214) on Some Criticisms of the Logical Induction paper · 2017-07-07T04:53:07.000Z · LW · GW

The pointwise non-uniform convergence problem does strongly suggest that it will not produce useful properties.

Comment by IAFF-User-214 (Imported-IAFF-User-214) on Some Criticisms of the Logical Induction paper · 2017-07-07T04:52:04.000Z · LW · GW

This seems like a total non sequitur, since this problem is if anything much more significant for forecasting than for formal logic.

Also, did you skip over the section about pointwise vs. uniform convergence? That seems to be the most important criticism; this is not establishing the kind of property that is usable to generate useful models, it's establishing only a much less reliable one.

Comment by IAFF-User-214 (Imported-IAFF-User-214) on Where's the first benign agent? · 2017-04-29T20:27:02.000Z · LW · GW

(Yes, same person.)

I agree that no one else has solved the problem or made much progress. I object to Paul's approach here because it's coupling the value problem more closely to other problems in architecture and value stability. I would much prefer holding off on attacking it for the moment, rather than this approach, which - to my reading - takes for granted that the problem is not hard and rests further work on top of it. Holding off at least gets room for other pieces nearby to be carved out and provide a better idea of what properties a solution would have; this approach seems to be based on the solution looking vastly simpler than I think is true.

I also have a general intuitive prior that reinforcement learning approaches are untrustworthy and are "building on sand", but that's neither precise nor persuasive so I'm not writing it up except on questions like this where it's more solid. I've put much less work into this field than Paul or others, so I don't want to challenge things except where I'm confident.

Comment by IAFF-User-214 (Imported-IAFF-User-214) on Where's the first benign agent? · 2017-04-25T21:59:40.000Z · LW · GW

Your point 2 is an excellent summary of my reasons for being skeptical of relying on human reasoning. (I also expect more outlandish means of transmissible value-corruption would show up, on the principles that edge cases are hard to predict and that we don't really understand our minds.)

Comment by IAFF-User-214 (Imported-IAFF-User-214) on Where's the first benign agent? · 2017-04-23T08:58:47.000Z · LW · GW

Apologies, I stopped getting moderation emails at some point and haven't fixed it properly.

Comment by IAFF-User-214 (Imported-IAFF-User-214) on ALBA: can you be "aligned" at increased "capacity"? · 2017-04-15T10:50:31.000Z · LW · GW

If agent is choosing what values agent should maximize, and it picks , and if it’s clear to humans that maximizing is at odds with human interests (as compared to e.g. leaving humans in meaningful control of the situation)—then prima facie agent has failed to live up to its contract of trying to do what we want.

It seems to me that the default outcome for any process like this is always " is at odds with human interests but not in a way that humans will notice until downstream effects of decisions are felt". This framework does not deal with this problem; it is not incorporated into a model of what we want until feedback is received, and the default response to that feedback will be to execute a nearest unblocked strategy like it. (This is especially concerning because a human is not a secure system, and downstream effects that will not be noticed by the human can include accidental or purposeful social/basilisk-like changes to the human's value system. The human being in the loop is only superficially protective.)

Comment by IAFF-User-214 (Imported-IAFF-User-214) on Are daemons a problem for ideal agents? · 2017-02-16T18:11:41.000Z · LW · GW

That seems likely. Of course, learning those logical facts might take similarly unreasonable time.

Considering this has given me the intuition that, while pulling the information out into the overall inductor is probably possible, it will be a conflicting goal with making a variant inductor that runs efficiently. This might be avoidable, but my intuition is gesturing vaguely in the direction of P v. NP, EXPTIME v. NEXPTIME for why it is likely not to be.

Comment by IAFF-User-214 (Imported-IAFF-User-214) on Are daemons a problem for ideal agents? · 2017-02-16T02:33:10.000Z · LW · GW

I can see an interpretation of "idealized agent" under which it would make sense to model an algorithm you don't fully understand as a presumed-hostile agent acting on logical information you do not know. Say, because the idealized agent is bounded and would take time to solve a problem, and the partially-understood algorithm approximates it, with small but unknown bias, in time.