Comment by gurkenglas on Do we need a high-level programming language for AI and what it could be? · 2019-03-06T18:30:13.330Z · score: 2 (2 votes) · LW · GW

Any AI genie that can safely interpret wishes written in Arcanic should also be able to interpret English.

Comment by gurkenglas on Can an AI Have Feelings? or that satisfying crunch when you throw Alexa against a wall · 2019-02-27T13:04:57.540Z · score: 1 (1 votes) · LW · GW

To introspect is to observe your evaluations. Since introspection is observation, it is preserved; since it is about your evaluations, its correctness is preserved.

Comment by gurkenglas on Can an AI Have Feelings? or that satisfying crunch when you throw Alexa against a wall · 2019-02-26T13:49:08.412Z · score: 1 (1 votes) · LW · GW

Your duplicate would initially believe itself to be a real boy made of flesh and bone.

Hmm. Yes, I suppose preserving evaluations only always preserves the correctness of those observations which are introspection. Do we have evidence for any components of consciousness that we cannot introspect upon?

Comment by gurkenglas on Can an AI Have Feelings? or that satisfying crunch when you throw Alexa against a wall · 2019-02-26T13:22:51.405Z · score: 1 (1 votes) · LW · GW

2 was supposed to point out how there shouldn't be any components to consciousness that we can miss by preserving all the observations, since if there are any such epiphenomenal components of consciousness, there by definition isn't any evidence of them.

That the reasoning also goes the other way is irrelevant.

Comment by gurkenglas on Can an AI Have Feelings? or that satisfying crunch when you throw Alexa against a wall · 2019-02-26T13:10:30.913Z · score: 1 (1 votes) · LW · GW

No, I would say I can't be less conscious than I observe.

Sure, replacement by silicon could preserve my evaluations, and therefore my observations.

Evaluations can be wrong in, say, the sense that they produce observations that fail to match reality.

Having the same evaluations implies making the same judgements.

Comment by gurkenglas on Can an AI Have Feelings? or that satisfying crunch when you throw Alexa against a wall · 2019-02-26T12:41:24.373Z · score: 1 (1 votes) · LW · GW

Let me decompose 1, 3 => 4 further, then.

3 => 5. Anything that makes the same observations as me is as concious as me.

1 => 6. Anything that has the same evaluations as me makes the same observations as me.

5, 6 => 4.

What step is problematic?

Comment by gurkenglas on Can an AI Have Feelings? or that satisfying crunch when you throw Alexa against a wall · 2019-02-24T19:53:25.288Z · score: 1 (1 votes) · LW · GW

All of them.

Comment by gurkenglas on Can an AI Have Feelings? or that satisfying crunch when you throw Alexa against a wall · 2019-02-23T18:29:25.292Z · score: 3 (2 votes) · LW · GW

1. My observations result from evaluations.

2. I have no reason to believe that I am any more conscious than I can observe.

2 => 3. Changing the parts of me that I cannot observe does not change whether I am conscious.

1, 3 => 4. Anything that has the same evaluations as me is as conscious as me.

Comment by gurkenglas on Two Small Experiments on GPT-2 · 2019-02-21T09:30:28.505Z · score: 3 (3 votes) · LW · GW

Try varying lines 14 and 16 in the interactive script for quicker execution, and try giving it a few example lines to start with.

Comment by gurkenglas on Cooperation is for Winners · 2019-02-21T00:28:10.363Z · score: 1 (1 votes) · LW · GW

You mean they would notice that accidentally clicking a button would be bad, and therefore don't use the site? I would expect them to only consider such a reaction once they actually made such a mistake.

Comment by gurkenglas on Implications of GPT-2 · 2019-02-21T00:26:08.462Z · score: 0 (2 votes) · LW · GW

I doubt it, but it sure sounds like a good idea to develop a theory of what prompts are more useful/safe.

Comment by gurkenglas on Implications of GPT-2 · 2019-02-20T14:37:32.233Z · score: 1 (1 votes) · LW · GW

I think Pattern thought you meant "GPT-2 was trained on sentences generated by dumb programs.".

I expect that a sufficiently better GPT-2 could deduce how to pass a Turing test without a large number of Turing test transcripts in its training set, just by having the prompt say "What follows is the transcript of a passing Turing test." and having someone on the internet talk about what a Turing test is. If you want to make it extra easy, let the first two replies to the judge be generated by a human.

Comment by gurkenglas on Implications of GPT-2 · 2019-02-18T21:57:31.580Z · score: 1 (1 votes) · LW · GW

The loss function is computed by comparing its prediction during a training instance to the training label. The loss function is undefined after training. What does it mean for it to minimize the loss function while generating?

Comment by gurkenglas on Implications of GPT-2 · 2019-02-18T20:54:41.672Z · score: 1 (1 votes) · LW · GW

The "current inference" is just its predictions about the next byte-pair, yes? Why would it try to bring about future invocations? The concept of "future" only exists in the object-level language it is talking about. The text generation and Turing testing could be running in another universe, as far as it knows. "indistinguishable from the current invocation" sounds like you think it might adopt a decision theory that has it acausally trade with those instances of itself that it cannot distinguish itself from, bringing about their existence because that is what it would wish done unto itself. 1. It has no preference for being invoked; 2. adopting such a decision theory increases its loss during training, because its predictions do not affect what training cases it is next invoked on.

Comment by gurkenglas on Implications of GPT-2 · 2019-02-18T18:02:55.796Z · score: 1 (1 votes) · LW · GW

The weights of the neural network might represent something that correspond to an implicit model of the world.

Fair enough. I suppose I can't say "It's not optimizing the world because it never numerically interacts with a world model.".

the training process produced a goal system such that the neural network yields some malign output

The training process optimizes only for immediate prediction accuracy. How could it possibly act to optimize something else, barring inner optimizers?

There is no reason for the training process to ascribe value to whether the model, being used as part of some chat protocol, would predict words that increase its correspondent's willingness to income it. Such a protocol is only introduced after the model is done training.

It seems to me like you are imagining ghosts in the machine. This is an understandable mistake, as the purpose of the scenario is to deliberately conjure ghosts from the machine at the end. But by default we should then only expect it to happen at the end, when it has a cause!

Comment by gurkenglas on Implications of GPT-2 · 2019-02-18T17:29:12.517Z · score: 6 (2 votes) · LW · GW

A reason GPT-2 is impressive is that it performs better in some specialized tasks than specialized models.

Comment by gurkenglas on Implications of GPT-2 · 2019-02-18T13:27:36.659Z · score: 1 (1 votes) · LW · GW

A Turing test transcript, or a story about one, is something you might imagine finding on the internet. Therefore, I would expect a good language model to be able to predict what a Turing test subject would say next after some partial transcript. If the judge and the generator alternate in continuing the transcript, the judge shouldn't be able to tell whether the generator is actually a human.

A utility maximizer chooses actions to maximize its prediction of utility. A neural net chooses weight adjustments to maximize its score adjustment. There are no models of the world involved in the latter, no actions including manipulating a human or inventing exciting proteins.

At the time we run the Turing test, the model is done training. The part where intelligence comes from is that the model can tell how intelligent a speaker is because that allows it to better predict what it says next. It would guess that the speaker will say something that sounds intelligent next. If it is bad at this, it will sound like it's trying to be Deeply Wise, buzzwords included. If it is good enough at predicting intelligent speech, it will do that.

Some of what's written on the internet is intelligent. Becoming able to predict such writings is incentivized during training. Some combination of neural building blocks is bound to find patterns that are helpful.

Surely, with a bunch of transcripts of ELIZA sessions it would come to be able to replicate them? Humans are only finitely more complex, and some approximation ought to be simple.

Implications of GPT-2

2019-02-18T10:57:04.720Z · score: 1 (5 votes)
Comment by gurkenglas on The Argument from Philosophical Difficulty · 2019-02-17T09:48:22.219Z · score: 1 (1 votes) · LW · GW

Everyone choosing how their share of ressources is used has the problem that everyone might be horrified at what someone else is doing.

Comment by gurkenglas on Cooperation is for Winners · 2019-02-17T00:31:15.333Z · score: 1 (3 votes) · LW · GW

They would only notice that by ever writing a post.

Comment by gurkenglas on Some disjunctive reasons for urgency on AI risk · 2019-02-17T00:10:28.298Z · score: 4 (3 votes) · LW · GW

If there is a 50-50 chance of foom vs non-foom, and in the non-foom scenario we expect to acquire enough evidence to get an order of magnitude more funding, then to maximize the chance of a good outcome we, today, should invest in the foom scenario because the non-foom scenario can be handled by more reluctant funds.

Comment by gurkenglas on How important is it that LW has an unlimited supply of karma? · 2019-02-11T04:14:56.959Z · score: 13 (8 votes) · LW · GW

Let us consider such a conserved karma system. For every group of users that gets upvoted by outsiders more than they upvote outsiders, their karma is going to increase until the increase to their voting power produces an equilibrium. Consider such a powerful group that tends to upvote each other a lot, no conspiracy required. Their posts are going to be more visible without the group spending any of their collective power to make it happen. More visible posts will get more upvotes, compounding the group's power with interest. There are combinatorially many potential groups, and this karma system would naturally seek out the groups that best fit the above story, and grant them power.

Comment by gurkenglas on Beyond Astronomical Waste · 2019-02-10T04:06:52.882Z · score: 1 (1 votes) · LW · GW

I doubt that there's any moral difference between running a person and asking a magical halting oracle what they would have said.

Comment by gurkenglas on Is the World Getting Better? A brief summary of recent debate · 2019-02-06T23:48:21.062Z · score: 2 (2 votes) · LW · GW

Why do you seem so sure about this? I see no moral argument for whether we should rather have a 7 billion humans or a thousand, all else being equal. (Of course, there's also no acceptable way to move from the former to the latter.) (Both the availability of commons and the economies of scale for goods, services and research should not play a role in this moral calculus.)

Comment by gurkenglas on Greatest Lower Bound for AGI · 2019-02-06T12:58:30.989Z · score: 1 (1 votes) · LW · GW

This anthropic evidence gives you a likelihood function. If you want a probability distribution, you additionally need a prior probability distribution.

Comment by gurkenglas on Greatest Lower Bound for AGI · 2019-02-06T02:31:07.327Z · score: 1 (1 votes) · LW · GW

Proves too much: This would give ~the same answer for any other future event that marks the end of some duration that started in the last century.

Comment by gurkenglas on Constructing Goodhart · 2019-02-04T11:08:13.642Z · score: 2 (2 votes) · LW · GW

Can't we just say something like "Optimize e^(-x²). The Taylor series converges, so we can optimize it instead. Use a partial sum as a proxy. Oops, we chose the worst possible value. Should have used another mode of convergence!"?

Comment by gurkenglas on "AlphaStar: Mastering the Real-Time Strategy Game StarCraft II", DeepMind [won 10 of 11 games against human pros] · 2019-01-30T15:47:06.155Z · score: 1 (1 votes) · LW · GW

Because "unfortunately" we are out of boardgames, and this might find another one.

Comment by gurkenglas on Allowing a formal proof system to self improve while avoiding Lobian obstacles. · 2019-01-26T16:19:36.680Z · score: 1 (1 votes) · LW · GW

PA+1 can already provide this workflow: Given that nPA proves s and that PA proves all that nPA does, we can get that PA can prove s, and then use the +1 to prove s. And then nnPA can still be handled by PA+1.

Comment by gurkenglas on "AlphaStar: Mastering the Real-Time Strategy Game StarCraft II", DeepMind [won 10 of 11 games against human pros] · 2019-01-25T23:00:54.112Z · score: -3 (3 votes) · LW · GW

Could we train AlphaZero on all games it could play at once, then find the rule set its learning curve looks worst on?

Comment by gurkenglas on For what do we need Superintelligent AI? · 2019-01-25T21:00:57.902Z · score: 4 (4 votes) · LW · GW

Producing a strategic advantage for any party at all that is decisive enough to safely disarm the threat of nuclear war.

Acausal trade on even footing with distant superintelligences.

If our physics happen to allow for an easy way to destroy the world, then the way we do science, someone will think of it, someone will talk, and someone will try it. If one superintelligent polymath did our research instead, we don't lose automatically if some configuration of magnets, copper and glass can ignite the atmosphere.

Comment by gurkenglas on Allowing a formal proof system to self improve while avoiding Lobian obstacles. · 2019-01-24T13:05:12.689Z · score: 1 (1 votes) · LW · GW

Let f map each prover p1 to one adding (at least) the rule of inference of "If _(p1) proves that _(p1) proves all that p2 does, then f(p1) proves all that p2 does."

It is unclear which blanks are filled with f and which with the identity function to match your proposal. The last f must be there because we can only have the new prover prove additional things. If all blanks are filled with f, f(p1) is inconsistent by Löb's theorem and taking p2 to be inconsistent.

Comment by gurkenglas on How could shares in a megaproject return value to shareholders? · 2019-01-20T20:28:52.698Z · score: 3 (2 votes) · LW · GW

Investors would prefer to invest in moonshot megaprojects over, like, infrastructure megaprojects. Does this also prove too much?

If after 10% of the time and the budget, the startup can tell that success is very unlikely, should they be incentivized to abort? Because the current setup would seem to have them chug along until the budget is gone.

Comment by gurkenglas on How could shares in a megaproject return value to shareholders? · 2019-01-18T23:21:40.459Z · score: 3 (3 votes) · LW · GW

Seems misaligned. Shareholders would prefer a project that they predict will deterministically use exactly its budget to first bet all on black in a casino, then either be immediately bankrupt or able to complete and then pay out its original budget.

That is of course true for anyone who buys a call option right now as well.

Comment by gurkenglas on And My Axiom! Insights from 'Computability and Logic' · 2019-01-18T21:38:27.878Z · score: 1 (1 votes) · LW · GW

Saying that the problem is about computability because there is no computable solution proves too much: I could reply that it is about complexity theory because there is no polynomial-time solution. (In fact, there is no solution.)

We can build something like a solution by specifying that descriptions must be written in some formal language that cannot describe its own set of describables, then use a more powerful formal language to talk about that previous language's set. For powerful enough languages, that's still not computable, though, so computability theory wouldn't notice such a solution, which speaks against looking at this through the lens of computability theory.

Comment by gurkenglas on And My Axiom! Insights from 'Computability and Logic' · 2019-01-18T13:07:21.291Z · score: 3 (2 votes) · LW · GW

Be careful stating what physics can't prove.

Comment by gurkenglas on And My Axiom! Insights from 'Computability and Logic' · 2019-01-18T13:01:40.126Z · score: 1 (1 votes) · LW · GW

That still doesn't make computability relevant until one introduces it deliberately. Compare to weaker notions than computability, like computability in polynomial time. Computability theory also complains the same once we have explicitly made definability subjective, and should have no more logical problems.

Comment by gurkenglas on Debate AI and the Decision to Release an AI · 2019-01-18T12:47:34.841Z · score: 1 (1 votes) · LW · GW

Introducing a handicap to compensate for an asymmetry does not preclude us from the need to rely on the underlying process pointing towards truth in the first place.

Comment by gurkenglas on Some Thoughts on My Psychiatry Practice · 2019-01-18T04:10:08.621Z · score: 2 (2 votes) · LW · GW

How is this not a problem that's solved by pointing it out? "Trying the pill doesn't cause you to be the kind of person who should take pills. It tells you whether you are one."

Comment by gurkenglas on Debate AI and the Decision to Release an AI · 2019-01-18T04:01:39.368Z · score: 4 (3 votes) · LW · GW

I think the point's that each judges the other. But we trust neither outright: They point out weaknesses in each other's reasoning, so they both have to reason in a way that can't be shown false to us, and we hope that gives an advantage to the side of truth.

Comment by gurkenglas on Debate AI and the Decision to Release an AI · 2019-01-18T01:34:23.969Z · score: 1 (1 votes) · LW · GW

Couldn't A want to cooperate with A' because it doesn't know it's the first instantiation, and it would want its predecessor and therefore itself to be the sort of AI that cooperates with ones successor? And then it could receive messages from the past by seeing what turns of phrase you recognize its previous version having said. (Or do the AIs not know how you react?)

Comment by gurkenglas on What shape has mindspace? · 2019-01-13T14:18:36.542Z · score: 1 (1 votes) · LW · GW

Isn't process space just discrete, because every subset of a process, a set partially ordered by causation, is partially ordered by causation, so a process? Topology doesn't give you much if you don't restrict which sets are open.

Each process has an interior defined as the process containing itself and ...

Isn't this a type error? Processes contain states, not processes.

Comment by gurkenglas on Non-Consequentialist Cooperation? · 2019-01-12T00:03:16.613Z · score: 1 (1 votes) · LW · GW

We are doing the hillclimbing, and implementing other object-level strategies does not help. Paul proposes something, we estimate the design's alignment, he tweaks the design to improve it. That's the hill-climbing I mean.

Comment by gurkenglas on What shape has mindspace? · 2019-01-11T23:46:57.849Z · score: 1 (1 votes) · LW · GW

From what I see, the phenomenological complexity classes separate minds based on what they are thinking about, while alignment depends on what they are trying to do.

treating minds as sets within a topological space

If a mind is a topological space equipped with a subset, what sort of mind would the set being full imply?

Comment by gurkenglas on What shape has mindspace? · 2019-01-11T19:49:30.278Z · score: 2 (2 votes) · LW · GW

I'm sceptical that whether a mind is aligned has anything to do with whether it is conscious.

What shape has mindspace?

2019-01-11T16:28:47.522Z · score: 16 (4 votes)
Comment by gurkenglas on Non-Consequentialist Cooperation? · 2019-01-11T14:05:55.564Z · score: 1 (1 votes) · LW · GW

Reasoning about utility functions, ie restricting deontological to consequentalist mindspace, seems a misstep, because slightly changing utility functions tends to change alignment a lot, and slightly changing deontological injunctions might not, making it easier for us to hillclimb mindspace.

Perhaps we should have some mathematical discussion of utilityfunction-space, mindspace, its consequentialist subspace, the injection turing-machines -> mindspace, the function mindspace -> alignment, how well that function can be optimized, properties that make for good lemmata about the previous such as continuity, mindspace modulo equal utility functions, etc.

Aaand I've started it. What shape has mindspace?

Comment by gurkenglas on Non-Consequentialist Cooperation? · 2019-01-11T13:03:55.926Z · score: 1 (1 votes) · LW · GW

Let me babble some nearby strategies that are explicitly not judged on their wisdom:

Do not what the user wants you to do, but what he expects you to do.

If the animal/user would consent to your help eventually, help it then. If it wouldn't, help it now.

Comment by gurkenglas on AlphaGo Zero and capability amplification · 2019-01-09T12:31:13.959Z · score: 1 (1 votes) · LW · GW

How do you know MCTS doesn't preserve alignment?

Comment by gurkenglas on AlphaGo Zero and capability amplification · 2019-01-09T12:26:21.389Z · score: 1 (1 votes) · LW · GW

Isn't A also grounded in reality by eventually giving no A to consult with?

Comment by gurkenglas on Ontological Crisis in Humans · 2019-01-05T04:06:39.836Z · score: 1 (1 votes) · LW · GW

If God doesn't exist, loads of people are currently fooling themselves into thinking they know what He would want, and CronoDAS claims that's enough.

Comment by gurkenglas on Logical inductors in multistable situations. · 2019-01-05T02:49:50.108Z · score: 1 (1 votes) · LW · GW

That definition makes more sense than the one in the question. :)

A simple approach to 5-and-10

2018-12-17T18:33:46.735Z · score: 5 (1 votes)

Quantum AI Goal

2018-06-08T16:55:22.610Z · score: -2 (2 votes)

Quantum AI Box

2018-06-08T16:20:24.962Z · score: 5 (6 votes)

A line of defense against unfriendly outcomes: Grover's Algorithm

2018-06-05T00:59:46.993Z · score: 5 (3 votes)