Comment by gurkenglas on And My Axiom! Insights from 'Computability and Logic' · 2019-01-18T13:07:21.291Z · score: 1 (1 votes) · LW · GW

Be careful stating what physics can't prove.

Comment by gurkenglas on And My Axiom! Insights from 'Computability and Logic' · 2019-01-18T13:01:40.126Z · score: 1 (1 votes) · LW · GW

That still doesn't make computability relevant until one introduces it deliberately. Compare to weaker notions than computability, like computability in polynomial time. Computability theory also complains the same once we have explicitly made definability subjective, and should have no more logical problems.

Comment by gurkenglas on Debate AI and the Decision to Release an AI · 2019-01-18T12:47:34.841Z · score: 1 (1 votes) · LW · GW

Introducing a handicap to compensate for an asymmetry does not preclude us from the need to rely on the underlying process pointing towards truth in the first place.

Comment by gurkenglas on Some Thoughts on My Psychiatry Practice · 2019-01-18T04:10:08.621Z · score: 1 (1 votes) · LW · GW

How is this not a problem that's solved by pointing it out? "Trying the pill doesn't cause you to be the kind of person who should take pills. It tells you whether you are one."

Comment by gurkenglas on Debate AI and the Decision to Release an AI · 2019-01-18T04:01:39.368Z · score: 4 (3 votes) · LW · GW

I think the point's that each judges the other. But we trust neither outright: They point out weaknesses in each other's reasoning, so they both have to reason in a way that can't be shown false to us, and we hope that gives an advantage to the side of truth.

Comment by gurkenglas on Debate AI and the Decision to Release an AI · 2019-01-18T01:34:23.969Z · score: 1 (1 votes) · LW · GW

Couldn't A want to cooperate with A' because it doesn't know it's the first instantiation, and it would want its predecessor and therefore itself to be the sort of AI that cooperates with ones successor? And then it could receive messages from the past by seeing what turns of phrase you recognize its previous version having said. (Or do the AIs not know how you react?)

Comment by gurkenglas on What shape has mindspace? · 2019-01-13T14:18:36.542Z · score: 1 (1 votes) · LW · GW

Isn't process space just discrete, because every subset of a process, a set partially ordered by causation, is partially ordered by causation, so a process? Topology doesn't give you much if you don't restrict which sets are open.

Each process has an interior defined as the process containing itself and ...

Isn't this a type error? Processes contain states, not processes.

Comment by gurkenglas on Non-Consequentialist Cooperation? · 2019-01-12T00:03:16.613Z · score: 1 (1 votes) · LW · GW

We are doing the hillclimbing, and implementing other object-level strategies does not help. Paul proposes something, we estimate the design's alignment, he tweaks the design to improve it. That's the hill-climbing I mean.

Comment by gurkenglas on What shape has mindspace? · 2019-01-11T23:46:57.849Z · score: 1 (1 votes) · LW · GW

From what I see, the phenomenological complexity classes separate minds based on what they are thinking about, while alignment depends on what they are trying to do.

treating minds as sets within a topological space

If a mind is a topological space equipped with a subset, what sort of mind would the set being full imply?

Comment by gurkenglas on What shape has mindspace? · 2019-01-11T19:49:30.278Z · score: 2 (2 votes) · LW · GW

I'm sceptical that whether a mind is aligned has anything to do with whether it is conscious.

What shape has mindspace?

2019-01-11T16:28:47.522Z · score: 16 (4 votes)
Comment by gurkenglas on Non-Consequentialist Cooperation? · 2019-01-11T14:05:55.564Z · score: 1 (1 votes) · LW · GW

Reasoning about utility functions, ie restricting deontological to consequentalist mindspace, seems a misstep, because slightly changing utility functions tends to change alignment a lot, and slightly changing deontological injunctions might not, making it easier for us to hillclimb mindspace.

Perhaps we should have some mathematical discussion of utilityfunction-space, mindspace, its consequentialist subspace, the injection turing-machines -> mindspace, the function mindspace -> alignment, how well that function can be optimized, properties that make for good lemmata about the previous such as continuity, mindspace modulo equal utility functions, etc.

Aaand I've started it. What shape has mindspace?

Comment by gurkenglas on Non-Consequentialist Cooperation? · 2019-01-11T13:03:55.926Z · score: 1 (1 votes) · LW · GW

Let me babble some nearby strategies that are explicitly not judged on their wisdom:

Do not what the user wants you to do, but what he expects you to do.

If the animal/user would consent to your help eventually, help it then. If it wouldn't, help it now.

Comment by gurkenglas on AlphaGo Zero and capability amplification · 2019-01-09T12:31:13.959Z · score: 1 (1 votes) · LW · GW

How do you know MCTS doesn't preserve alignment?

Comment by gurkenglas on AlphaGo Zero and capability amplification · 2019-01-09T12:26:21.389Z · score: 1 (1 votes) · LW · GW

Isn't A also grounded in reality by eventually giving no A to consult with?

Comment by gurkenglas on Ontological Crisis in Humans · 2019-01-05T04:06:39.836Z · score: 1 (1 votes) · LW · GW

If God doesn't exist, loads of people are currently fooling themselves into thinking they know what He would want, and CronoDAS claims that's enough.

Comment by gurkenglas on Logical inductors in multistable situations. · 2019-01-05T02:49:50.108Z · score: 1 (1 votes) · LW · GW

That definition makes more sense than the one in the question. :)

Comment by gurkenglas on Logical inductors in multistable situations. · 2019-01-04T02:12:21.703Z · score: 1 (1 votes) · LW · GW

I see no almost fixed point for the function that is 1 until 0.5 and 0 after.

Comment by gurkenglas on What do you do when you find out you have inconsistent probabilities? · 2019-01-01T11:37:07.865Z · score: 4 (3 votes) · LW · GW

0.9 = P(Objective Morality) ≠ P(God) * P(Objective Morality | God) + P(No God) * P(Objective Morality | No God) = 0.05 * 0.99 + 0.95 * 0.02 = 0.0685. That's inconsistent, right?

Comment by gurkenglas on What do you do when you find out you have inconsistent probabilities? · 2018-12-31T23:16:53.896Z · score: 2 (2 votes) · LW · GW

I would make explicit that her beliefs about her subjective probabilities are inaccurate observations of her implied underlying logically omniscient, consistent belief system. She can then assign each possible underlying consistent belief system a probability, and update that assignment once she realizes that some of the possible systems were not consistent. What this comes out to is that whether she should update her belief in God or Objective Morality comes down to which of her beliefs she is less certain about.

Comment by gurkenglas on What does it mean to "believe" a thing to be true? · 2018-12-30T19:29:41.637Z · score: 1 (1 votes) · LW · GW

You're talking about how human brains represent belief. I'm talking about what functional properties of an intelligence let us identify its beliefs.

Comment by gurkenglas on What does it mean to "believe" a thing to be true? · 2018-12-28T12:40:38.418Z · score: 2 (2 votes) · LW · GW

How so? He defined what it means to believe something, as was asked.

Comment by gurkenglas on What does it mean to "believe" a thing to be true? · 2018-12-28T06:16:37.559Z · score: 2 (2 votes) · LW · GW

I mean that it's used to anticipate experiences, like when you believe that you have a dragon in your garage, expect its breath to keep the house toasty and therefore turn off the heater in advance.

Comment by gurkenglas on What does it mean to "believe" a thing to be true? · 2018-12-27T13:58:17.423Z · score: 3 (3 votes) · LW · GW

To use it as an assumption when reasoning about the world. See Making Beliefs Pay Rent (in Anticipated Experiences).

Comment by gurkenglas on On Disingenuity · 2018-12-27T05:44:30.053Z · score: 1 (1 votes) · LW · GW

The critic could respond that our only measure of a moral system is whether its conclusions agree with our intuitions, so we should find conclusions about which we have strong intuitions.

The relativist could respond that our intuitions are noisy and so we should use error-correcting heuristics like Occam's Razor and bounding the score impact of each intuition.

Comment by gurkenglas on Best arguments against worrying about AI risk? · 2018-12-24T04:29:56.537Z · score: 1 (1 votes) · LW · GW

Your comment is malformatted.

Comment by gurkenglas on Best arguments against worrying about AI risk? · 2018-12-24T04:19:51.549Z · score: 1 (1 votes) · LW · GW

If the amplified human could take over the world but hasn't because he's not evil, and predicts that this other AI system would do such evil, yes.

It's plausible, though, that the decision theory used by the new AI would tell it to act predictably non-evilly, in order to make the amplified human see this coming and not destroy the new AI before it's turned on.

Note that this amplified human has thereby already taken over the world in all but name.

Comment by gurkenglas on Best arguments against worrying about AI risk? · 2018-12-23T16:56:24.938Z · score: 1 (1 votes) · LW · GW

Distillation-amplification, if it works, should only start a war if the amplified human would have wanted that. Agent foundations theorists afaic predict that the first mover in AI has enough strategic advantage that there'll be nothing worthy of the word war.

Comment by gurkenglas on Why Don't Creators Switch to their Own Platforms? · 2018-12-23T13:17:42.529Z · score: 2 (2 votes) · LW · GW

Perhaps protest makes for good publicity.

Comment by gurkenglas on A simple approach to 5-and-10 · 2018-12-22T05:06:28.767Z · score: 1 (1 votes) · LW · GW

(Oh, "have heart" was not "stay strong even though you're 7 years behind", but "be merciful, I wrote that 7 years ago".)

Can the failure of this approach to handle scenarios as complicated as cellular automata be formalized using a problem like 5-and-10?

What do you mean by provability being decidable? Couldn't I ask whether it is provable that a Turing machine halts?

Edit: In Conway's Game of Life, one can build replicators and Turing machines. If we strewed provability oracle interfaces across the landscape, we should be able to implement this reasoner and it could do things like attempting to maximize the number of gliders. Pitted against a honeycomb maximizer, we could investigate whether they would each render themselves unpredictable to the other through the oracle, or whether the fact that war in Life just sends everything into primordial soup would get them to do a values handshake, etc. Doesn't sound like modal logic doesn't apply?

Comment by gurkenglas on A simple approach to 5-and-10 · 2018-12-19T13:21:48.656Z · score: 1 (1 votes) · LW · GW

I think you're right, except I don't think I need the chicken rule. What's "have heart"? What should I read about the advances? Perhaps problems with each that I should try to use to invent each next one?

Comment by gurkenglas on A simple approach to 5-and-10 · 2018-12-18T20:52:20.304Z · score: 1 (1 votes) · LW · GW

With "find all f" it ceases to be a classical algorithm. We search all proofs. For finitely many possible f it could be implemented using a halting oracle, for example. Decision theory approaches needn't be computable, right?

A simple approach to 5-and-10

2018-12-17T18:33:46.735Z · score: 5 (1 votes)
Comment by gurkenglas on Three AI Safety Related Ideas · 2018-12-15T03:30:01.860Z · score: 4 (2 votes) · LW · GW

So you want to align the AI with us rather than its user by choosing the alignment approach it uses. If it's corrigible towards its user, won't it acquire the capabilities of the other approach in short order to better serve its user? Or is retrofitting the other approach also a blind spot of your proposed approach?

Comment by gurkenglas on New report: Intelligence Explosion Microeconomics · 2018-12-14T14:03:02.490Z · score: 1 (1 votes) · LW · GW

If we have an NP-complete problem for which random instances are hard, but we can't generate them with solutions, that doesn't help cryptography.

Comment by gurkenglas on Three AI Safety Related Ideas · 2018-12-14T13:21:01.444Z · score: 1 (1 votes) · LW · GW

Reading the link and some reference abstracts, I think my last comment already had that in mind. The idea here is that a certain kind of AI would accelerate a certain kind of progress more than another, because of the approach we used to align it, and on reflection we would not want this. But surely if it is aligned, and therefore corrigible, this should be no problem?

Comment by gurkenglas on Three AI Safety Related Ideas · 2018-12-14T03:46:21.219Z · score: 3 (2 votes) · LW · GW

Please reword your last idea. There is a possible aligned AI that is biased in its research and will ignore people telling it so?

Comment by gurkenglas on Assuming we've solved X, could we do Y... · 2018-12-12T01:37:48.964Z · score: 4 (2 votes) · LW · GW

At some point in the discussion, he said "let's assume the question of defining human values is solved"

What did he need that assumption for? If the setting is that two AIs try to convince a human audience, the assumption doesn't seem to enter into it. The important question is presumably what friendliness properties we can deduce about any AI that wins the debate against every other AI.

Comment by gurkenglas on Book review: Artificial Intelligence Safety and Security · 2018-12-09T22:38:07.179Z · score: 2 (2 votes) · LW · GW

Links to the papers would be useful.

Comment by gurkenglas on EDT solves 5 and 10 with conditional oracles · 2018-12-08T23:38:08.111Z · score: 1 (1 votes) · LW · GW

It seems to me like the conditional oracle's definition could be made more elegant by taking only m and n as a parameter, both of which take an action as a parameter. The oracle would then implement .

Comment by gurkenglas on Factored Cognition · 2018-12-05T03:18:50.212Z · score: 2 (2 votes) · LW · GW

I would like to read more of that meta-reasoning log. Is it public?

Comment by gurkenglas on Coherence arguments do not imply goal-directed behavior · 2018-12-03T18:01:51.576Z · score: 9 (5 votes) · LW · GW

Presumably, it is a random number generator hooked up to motor controls. There is no explicit calculation of utilities that tells it to twitch.

Comment by gurkenglas on Intuitions about goal-directed behavior · 2018-12-01T12:02:57.756Z · score: 10 (7 votes) · LW · GW

According to GAZP vs. GLUT with consciousness replaced by goal-directed behavior, we may want to say that goal-directed behavior is involved in the creation or even just specification of the giant lookup table TicTacToe agent.

Comment by gurkenglas on Iterated Distillation and Amplification · 2018-11-30T22:09:31.769Z · score: 1 (1 votes) · LW · GW

Maximizing the sum of the difference of state value just maximizes state value again, which the point of narrow reinforcement learning was to get away from.

Comment by gurkenglas on Iterated Distillation and Amplification · 2018-11-30T11:40:01.377Z · score: 1 (1 votes) · LW · GW

Narrow reinforcement learning: As A takes actions in the world, we give it a dense reward signal based on how reasonable we judge its choices are (perhaps we directly reward state-action pairs themselves rather than outcomes in the world, as in TAMER). A optimizes for the expected sum of its future rewards.

Wouldn't it try to bring about states in which some action is particularly reasonable? Like the villain from that story who brings about a public threat in order to be seen defeating it.

Comment by gurkenglas on Corrigibility · 2018-11-28T03:09:13.397Z · score: 3 (2 votes) · LW · GW

Could we budget our trust in a predictor by keeping track of how hard we've tried to maximize predicted approval? Let Arthur expect any action to have a chance to get its approval overestimated, and he will try proposing fewer alternatives. Just like when frequentists decrease p-value thresholds as they ask more questions of the same data. To avert brainwashing the real Hugh, assume that even asking him is just a predictor of the "true" approval function.

Comment by gurkenglas on Approval-directed agents: overview · 2018-11-27T13:29:22.147Z · score: 1 (1 votes) · LW · GW

The situation seems to me comparable to one where we upload Hugh and then let him do what he wants, such as optimizing himself by replacing parts of himself by machine learning predictors with correlating results. The hope here then sounds like that this is fine so long as we perform differential testing to limit accidental drift.

Comment by gurkenglas on Quantum Mechanics, Nothing to do with Consciousness · 2018-11-27T12:46:22.849Z · score: 4 (2 votes) · LW · GW

QM-based realities may just be amenable to containing fusion or planets or amino acids.

Comment by gurkenglas on Approval-directed agents: details · 2018-11-24T14:56:22.945Z · score: 1 (1 votes) · LW · GW

We then loop over each action a and take the action with the highest expected answer.

Wasn't the whole point that we want to avoid such goal-direction?

Comment by gurkenglas on Approval-directed agents: overview · 2018-11-24T12:38:27.087Z · score: 1 (1 votes) · LW · GW

We could say that Hugh must first approve of the strategy in your first paragraph, but that lands us in a bootstrapping problem.

Comment by gurkenglas on Approval-directed agents: overview · 2018-11-24T12:26:06.299Z · score: 1 (1 votes) · LW · GW

How does this differ from just running Hugh?

Comment by gurkenglas on Iteration Fixed Point Exercises · 2018-11-22T12:08:50.457Z · score: 11 (3 votes) · LW · GW

#3:

on shortens all distances but is strictly monotonic.

#6: (the "show that if" condition follows from the property, the question is likely misstated)

The iteration is so long that it must visit an element twice. We can't have a cycle in the order so the repetition must be immediate.

Quantum AI Goal

2018-06-08T16:55:22.610Z · score: -2 (2 votes)

Quantum AI Box

2018-06-08T16:20:24.962Z · score: 5 (6 votes)

A line of defense against unfriendly outcomes: Grover's Algorithm

2018-06-05T00:59:46.993Z · score: 5 (3 votes)