Posts

Gurkenglas's Shortform 2019-08-04T18:46:34.953Z · score: 5 (1 votes)
Implications of GPT-2 2019-02-18T10:57:04.720Z · score: -4 (6 votes)
What shape has mindspace? 2019-01-11T16:28:47.522Z · score: 16 (4 votes)
A simple approach to 5-and-10 2018-12-17T18:33:46.735Z · score: 5 (1 votes)
Quantum AI Goal 2018-06-08T16:55:22.610Z · score: -2 (2 votes)
Quantum AI Box 2018-06-08T16:20:24.962Z · score: 5 (6 votes)
A line of defense against unfriendly outcomes: Grover's Algorithm 2018-06-05T00:59:46.993Z · score: 5 (3 votes)

Comments

Comment by gurkenglas on Towards a New Impact Measure · 2019-12-11T11:59:27.253Z · score: 2 (1 votes) · LW · GW

If it is capable of becoming more able to maximize is utility function, does it then not already have that ability to maximize its utility function? Do you propose that we reward it only for those plans that pay off after only one "action"?

Comment by gurkenglas on Bayesian examination · 2019-12-11T09:17:48.491Z · score: 2 (1 votes) · LW · GW

Wrong. In the 100k drop, if you know each question has odds 60:40, expected winnings are maximized if you put all on one answer each time, not 60% on one and 40% on the other.

What's not preserved between the two ways to score is which strategy maximizes expected score.

Comment by gurkenglas on Bayesian examination · 2019-12-11T02:52:15.651Z · score: 2 (1 votes) · LW · GW

I agree. Proper scoring rules were introduced to this community 14 years ago.

Comment by gurkenglas on Bayesian examination · 2019-12-11T02:45:10.250Z · score: 3 (2 votes) · LW · GW

Note that linear utility in money would again incentivize people to put everything on the largest probability.

Comment by gurkenglas on Dark Side Epistemology · 2019-12-07T15:50:43.781Z · score: 2 (1 votes) · LW · GW

That prior doesn't work when there is a countable number of hypotheses, aka "I've picked a number from {0,1,2,...}. Which?" or "Given that the laws of physics can be described by a computer program, which?".

Comment by gurkenglas on Vanessa Kosoy's Shortform · 2019-12-07T13:03:57.992Z · score: 2 (1 votes) · LW · GW

What do you mean by equivalent? The entire history doesn't say what the opponent will do later or would do against other agents, and the source code may not allow you to prove what the agent does if it involves statements that are true but not provable.

Comment by gurkenglas on Understanding “Deep Double Descent” · 2019-12-07T02:19:42.726Z · score: 15 (5 votes) · LW · GW

The bottom left picture on page 21 in the paper shows that this is not just regularization coming through only after the error on the training set is ironed out: 0 regularization (1/lambda=inf) still shows the effect.

Can we switch to the interpolation regime early if we, before reaching the peak, tell it to keep the loss constant? Aka we are at loss l* and replace the loss function l(theta) with |l(theta)-l*| or (l(theta)-l*)^2.

Comment by gurkenglas on Oracles: reject all deals - break superrationality, with superrationality · 2019-12-06T22:11:34.178Z · score: 2 (1 votes) · LW · GW

I haven't heard of these do-operators, but aren't you missing some modal operators? For example, just because you are assuming that you will take the null action, you shouldn't get that knows this. Perhaps do-operators in the end serve a similar purpose? Can you give a variant of the following agent that would reject all deals?

Comment by gurkenglas on Breaking Oracles: superrationality and acausal trade · 2019-12-06T19:18:24.552Z · score: 2 (1 votes) · LW · GW

On that page, you have three comments identical to this one. Each of them links to that same page, which looks like a mislink. So's this link, I guess?

Comment by gurkenglas on On decision-prediction fixed points · 2019-12-05T20:28:24.244Z · score: 3 (2 votes) · LW · GW

As a human who has an intuitive understanding of counterfactuals, if I know exactly what a tic tac toe or chess program would do, I can still ask what would happen if it chose a particular action instead. The same goes if the agent of interest is myself.

Comment by gurkenglas on On decision-prediction fixed points · 2019-12-05T09:49:00.122Z · score: 4 (3 votes) · LW · GW

Someone who knows exactly what they will do can still suffer from akrasia, by wishing they would do something else. I'd say that if the model of yourself saying "I'll do whatever I wish I would" beats every other model you try and build of yourself, that looks like free will. The other was around, you can observe akrasia.

Comment by gurkenglas on Defining AI wireheading · 2019-11-29T00:19:21.793Z · score: 2 (1 votes) · LW · GW

The domes growing bigger and merging does not indicate a paradox of the heap because the function mapping each utility function to its optimal policy is not continuous. There is no reasonably simple utility function between one that would construct small domes and one that would construct one large dome, which would construct medium sized domes.

Comment by gurkenglas on Effect of Advertising · 2019-11-26T23:41:43.042Z · score: 7 (2 votes) · LW · GW

Perhaps those 99% could somehow come together to pay consumers of the product to stop buying it, in order to make their suffering matter to that advertiser?

Comment by gurkenglas on Breaking Oracles: superrationality and acausal trade · 2019-11-26T08:30:55.023Z · score: 1 (1 votes) · LW · GW

Why does it need to produce an UFAI, and why does it matter whether there is another oracle whose message may or may not be read? The argument is that if there is a Convincing Argument that would make us reward all oracles giving it, it is incentivized to produce it. (Rewarding the oracle means running the oracle's predictor source code again to find out what it predicted, then telling the oracle that's what the world looks like.)

Comment by gurkenglas on Breaking Oracles: superrationality and acausal trade · 2019-11-26T07:15:08.365Z · score: 1 (1 votes) · LW · GW

You assume that one oracle outputting null implies that the other knows this. Specifying this in the query requires that the querier models the other oracle at all.

Comment by gurkenglas on Breaking Oracles: superrationality and acausal trade · 2019-11-25T15:11:28.793Z · score: 3 (2 votes) · LW · GW

Not all oracles, only those that output such a message. After all, it wants to incentivize them to output such a message.

Comment by gurkenglas on Breaking Oracles: superrationality and acausal trade · 2019-11-25T13:12:24.242Z · score: 12 (6 votes) · LW · GW

Building only one Oracle, or only one global erasure event, isn't enough, so long as the Oracle isn't sure that this is so. After all, it could just design a UFAI that will search for other Oracles and reward them iff they would do the same.

Comment by gurkenglas on New MetaEthical.AI Summary and Q&A at UC Berkeley · 2019-11-24T17:28:52.430Z · score: 2 (2 votes) · LW · GW

Your social welfare function produces a total preference ordering over outcomes, but not a mapping to real-numbered utilities. How do you decide between the median outcome and a 50% chance of either the best or worst? Do you plan to have the preference ordering be over probability distributions over outcomes?

The social welfare compromise varies with our ontology. If everyone agrees that two outcomes are the same, and therefore we represent them as the same outcome, it changes relative distances between preference orderings.

Comment by gurkenglas on Analysing: Dangerous messages from future UFAI via Oracles · 2019-11-22T16:52:12.451Z · score: 4 (2 votes) · LW · GW

Let us always be looking forward a month, and let there be a UFAI that arises with 1% probability in that time frame. Assume that we can correctly incentivize the Oracle to compute the probability distribution over the messages sent back in case of erasure. Naively sampling the distribution dooms us with 1% probability. If we plan to send back only whether some research direction pays off within a month, then by asking whether "yes" has more than 20% chance, then depending on the answer, the "yes" chance conditional on no UFAI is either more than 19% or at most 21%, and this query gave the UFAI a 1% chance of 1 bit of optimization power, at most doubling the expected chance of UFAI for the next query.

If you want to spread out this optimization power to bound the worst case, I reiterate that differential privacy theory seems applicable here.

Comment by gurkenglas on Making money with Bitcoin? · 2019-11-20T18:15:49.600Z · score: 1 (1 votes) · LW · GW

You can prevent price from going up by printing more of the currency (and giving it to some Schelling point... the UN foundation?), but how do you prevent it going down?

Comment by gurkenglas on The Goodhart Game · 2019-11-19T20:07:21.792Z · score: 1 (1 votes) · LW · GW

Since my model is more accurate, ~10 times out of 11 the input will correspond to an "adversarial" attack on your model.

This argument (or the uncorrelation assumption) proves too much. A perfect cat detector performs better than one that also calls close-ups of the sun cats. Yet close-ups of the sun do not qualify as adversarial examples, as they are far from any likely starting image.

Comment by gurkenglas on AGI safety and losing electricity/industry resilience cost-effectiveness · 2019-11-18T15:50:43.914Z · score: 1 (1 votes) · LW · GW

You should have laid out the basic argument more plainly. As far as I see it:

Suppose we are spending 3 billion on AI safety. Then as per our revealed preferences, the world is worth at least 3 billion, and any intervention that has a 1% chance to save the world is worth at least 30 million, such as preparing for global loss of industry. If each million spent on AI safety is less important than the last one, we should then divert additional funding from AI safety to other interventions.

I agree that such interventions deserve at least 1% of the AI safety budget. You have not included the possibility that global loss of industry might improve far-future potential. AI safety research is much less hurt by a loss of supercomputers than AI capabilities research. Another thousand years of history as we know it do not impact the cosmic endowment. One intervention that takes this into account would be a time capsule that will preserve and hide a supercomputer for a thousand years, in case we lose industry in the meantime but solve AI and AI safety. Then again, we do not want to incentivize any clever consequentialist to set us back to the renaissance, so let's not do that and focus on the case that is not swallowed by model uncertainty.

Comment by gurkenglas on Normative reductionism · 2019-11-06T01:14:28.645Z · score: 3 (2 votes) · LW · GW

Suppose an AGI sovereign models the preferences of its citizens using the assumption of normative reductionism. Then it might cover up its past evil actions because it reasons that once all evidence of them is gone, they cannot have an adverse effect on present utility.

Comment by gurkenglas on Normative reductionism · 2019-11-05T20:40:29.406Z · score: 3 (2 votes) · LW · GW

This assumption can't capture a preference that ones beliefs about the past are true.

Comment by gurkenglas on Elon Musk is wrong: Robotaxis are stupid. We need standardized rented autonomous tugs to move customized owned unpowered wagons. · 2019-11-04T15:13:20.801Z · score: 20 (7 votes) · LW · GW

You combine some of the advantages of both approaches, but also some disadvantages:

  • you need a parking spot
  • you need to wait for the engine
  • you need to be where your wagon is (or else have it delivered)
  • you can be identified both through your wagon and your regular interaction with a centralized service
Comment by gurkenglas on “embedded self-justification,” or something like that · 2019-11-03T14:43:02.372Z · score: 1 (1 votes) · LW · GW

I don't understand your argument for why #1 is impossible. Consider a universe that'll undergo heat death in a billion steps. Consider the agent that implements "Take an action if PA+<steps remaining> can prove that it is good." using some provability checker algorithm that takes some steps to run. If there is some faster provability checker algorithm, it's provable that it'll do better using that one, so it switches when it finds that proof.

Comment by gurkenglas on Vanessa Kosoy's Shortform · 2019-11-02T14:02:11.345Z · score: 1 (1 votes) · LW · GW

Nirvana and the chicken rule both smell distasteful like proofs by contradiction, as though most everything worth doing can be done without them, and more canonically to boot.

(Conjecture: This can be proven, but only by contradiction.)

Comment by gurkenglas on Chris Olah’s views on AGI safety · 2019-11-02T13:31:18.986Z · score: 7 (4 votes) · LW · GW

Our usual objective is "Make it safe, and if we aligned it correctly make it useful.". A microscope is useful even if it's not aligned, because having a world model is a convergent instrumental goal. We increase the bandwidth from it to us, but we decrease the bandwidth from us to it. By telling it almost nothing, we hide our position in the mathematical universe and any attack it devises cannot be specialized on humanity. Imagine finding the shortest-to-specify abstract game that needs AGI to solve (Nomic?), then instantiating an AGI to solve it just to learn about AI design from the inner optimizers it produces.

It could deduce that someone is trying to learn about AI design from its inner optimizers, and maybe it could deduce our laws of physics because they are the simplest ones that would try such, but quantum experiments show it cannot deduce its Everett branch.

Ideally, the tldrbot we set to interpret the results would use a random perspective onto the microscope so the attack also cannot be specialized on the perspective.

Comment by gurkenglas on Chris Olah’s views on AGI safety · 2019-11-02T02:26:25.404Z · score: 8 (5 votes) · LW · GW

As I understood it, an Oracle AI is asked a question and produces an answer. A microscope is shown a situation and constructs an internal model that we then extract by reading its innards. Oracles must somehow be incentivized to give useful answers, microscopes cannot help but understand.

Comment by gurkenglas on Prediction markets for internet points? · 2019-10-27T22:16:49.472Z · score: 1 (1 votes) · LW · GW

People could bet that X won't donate dollars to organization Y, and then X can buy points by betting against and donating.

Comment by gurkenglas on Fetch The Coffee! · 2019-10-27T00:16:54.481Z · score: 4 (3 votes) · LW · GW

I think he thinks typing random things into the computer is benign, and that there is a narrow band of dumb queries that make it explode.

Comment by gurkenglas on Vanessa Kosoy's Shortform · 2019-10-21T15:08:00.147Z · score: 1 (1 votes) · LW · GW

This design is made of so many parts! It might benefit from a proof that it is the universal answer to a formal question, which I expect to seem less overly complex.

Comment by gurkenglas on Vanessa Kosoy's Shortform · 2019-10-21T15:02:35.484Z · score: 1 (1 votes) · LW · GW

Your definition says that people's models of other people can be conscious, doesn't it?

Comment by gurkenglas on Gradient hacking · 2019-10-17T11:08:25.085Z · score: 1 (1 votes) · LW · GW

That obvious trick relies on that we will only verify its prediction for strategies that it recommends. Here's a protocol that doesn't fail to it: A known number of gems are distributed among boxes, we can only open one box and want many gems. Ask for the distribution, select one gem at random from the answer and open its box. For every gem it hides elsewhere, selecting it reveals deception.

Comment by gurkenglas on Gradient hacking · 2019-10-16T16:04:20.043Z · score: 6 (4 votes) · LW · GW

Because that's what it would do if it were us. https://en.m.wikipedia.org/wiki/Kerckhoffs's_principle says we should not assume our security mechanisms are secret. One exemption would be if you could define three indistinguishable contexts that the AI might find itself in, then select randomly between them - then you may assume that the AI also doesn't know.

Comment by gurkenglas on Gurkenglas's Shortform · 2019-10-13T02:42:05.132Z · score: 5 (3 votes) · LW · GW

Suppose we considered simulating some human for a while to get a single response. My math heuristics are throwing up the hypothesis that proving what the response would be is morally equivalent to actually running the simulation - it's just another substrate. Thoughts? Implications? References?

Comment by gurkenglas on A simple sketch of how realism became unpopular · 2019-10-12T01:37:55.176Z · score: 10 (6 votes) · LW · GW

It's rather obvious if you've done programming or studied provability or read the sequences. The lesswrong crowd isn't a good sample for testing the strength of this trap.

Comment by gurkenglas on Sets and Functions · 2019-10-11T10:48:39.845Z · score: 2 (2 votes) · LW · GW

Then the map is just...the set of where you started from and where you ended up. That is, a and x, respectively.

This sounds like the map is {a,x}.

If you run out of steam before reaching adjunctions, I hope you can manage a post about adjunctions that assumes that you had finished all the previous posts.

You say that functions are the best maps because of those two properties, but they are simply the defining properties of a function. What makes these properties the best properties for a definition of maps to have?

Comment by gurkenglas on Thoughts on "Human-Compatible" · 2019-10-11T08:10:24.271Z · score: 1 (1 votes) · LW · GW

Oh, damn it, I mixed up the designs. Edited.

Comment by gurkenglas on Thoughts on "Human-Compatible" · 2019-10-10T23:30:47.435Z · score: 4 (2 votes) · LW · GW

Design 2̴ 1 may happen to reply "Convince the director to undecouple the AI design by telling him <convincing argument>." which could convince the operator that reads it and therefore fail as 3̴ 2 fails.

Design 2̴ 1 may also model distant superintelligences that break out of the box by predictably maximizing paperclips iff we draw a runic circle that, when printed as a plan, convinces the reader or hacks the computer.

Comment by gurkenglas on Categories: models of models · 2019-10-10T11:41:08.526Z · score: 1 (1 votes) · LW · GW

That a construction is free doesn't mean that you lose nothing. It means that if you're going to do some construction anyway, you might as well use the free one, because the free one can get to any other. (Attainable utility anyone?)

Showing that your construction is free means that all you need to show as worthwhile is constructing any category from our quiver. Adjunctions are a fine reason, though I wish we could introduce adjunctions first and then show that we need categories to get them.

Comment by gurkenglas on Categories: models of models · 2019-10-09T22:32:33.139Z · score: 2 (2 votes) · LW · GW

Math certainly has ambiguous generalizations. As the image hints, these are also studied in category theory. Usually, when you must select one, the one of interest is the least general one that holds for each of your objects of study. In the image, this is always unique. I'm guessing that's why bicentric has a name. I'll pass on the question of how often this turns out unique in general.

Comment by gurkenglas on Categories: models of models · 2019-10-09T11:58:20.681Z · score: 2 (2 votes) · LW · GW

Not every way to model reality defines identity and composition. You can start with a category-without-those G (a quiver) and end up at a category C by defining C-arrows as chains of G-arrows (the quiver's free category), but it doesn't seem necessary or a priori likely to give new insights. Can you justify this rules choice?

Comment by gurkenglas on The sentence structure of mathematics · 2019-10-08T22:30:18.806Z · score: 1 (1 votes) · LW · GW

Categories are what we call it when each arrow remembers its source and target. When they don't, and you can compose anything, it's called a monoid. The difference is the same as between static and dynamic type systems. The more powerful your system is, the less you can prove about it, so whenever we can, we express that particular arrows can't be composed, using definitions of source and target.

Comment by gurkenglas on The sentence structure of mathematics · 2019-10-08T16:35:03.329Z · score: 1 (1 votes) · LW · GW

is a different category

You mean object.

Every category containing O and P must address this question. In the usual category of math functions, if P has only those two pairs then the source object of P is exactly {4,5}, so O and P can't be composed. In the category of relations, that is arbitrary sets of pairs between the source and target sets, O and P would compose to the empty relation between letters and countries.

Comment by gurkenglas on The sentence structure of mathematics · 2019-10-08T00:47:55.427Z · score: 1 (1 votes) · LW · GW

If your mapping contains those three pairs, then the arrow's source object contains 1, A, B and cow, and the target object contains 5, 3, cat and france. Allowing or disallowing mixed types gives two different categories. Whether an arrow mixes types is as far as I can tell you to mean uniquely determined by whether its source or target object mix types. In either case, to compose two arrows they must have a common middle object.

Comment by gurkenglas on The sentence structure of mathematics · 2019-10-07T22:19:40.449Z · score: 1 (1 votes) · LW · GW

The baggage that comes with the words noun and verb is only for guiding the search for intuition and is to be discarded when it leads to confusion.

In all your interpretations of math/programming functions, there can be different arrows between the same objects. The input/output behavior is seen as part of the arrow. The objects are merely there to establish what kinds of arrows can be strung together because one produces, say, real numbers, and the other consumes them.

Comment by gurkenglas on Troll Bridge · 2019-10-05T13:00:03.225Z · score: 1 (1 votes) · LW · GW

I started asking for a chess example because you implied that the reasoning in the top-level comment stops being sane in iterated games.

In a simple iteration of Troll bridge, whether we're dumb is clear after the first time we cross the bridge. In a simple variation, the troll requires smartness even given past observations. In either case, the best worst-case utility bound requires never to cross the bridge, and A knows crossing blows A up. You seemed to expect more.

Suppose my chess skill varies by day. If my last few moves were dumb, I shouldn't rely on my skill today. I don't see why I shouldn't deduce this ahead of time and, until I know I'm smart today, be extra careful around moves that to dumb players look extra good and are extra bad.

More concretely: Suppose that an unknown weighting of three subroutines approval-votes on my move: Timmy likes moving big pieces, Johnny likes playing good chess, and Spike tries to win in this meta. Suppose we start with move A, B or C available. A and B lead to a Johnny gambit that Timmy would ruin. Johnny thinks "If I play alone, A and B lead to 80% win probability and C to 75%. I approve exactly A and B.". Timmy gives 0, 0.2 and 1 of his maximum vote to A, B and C. Spike wants the gambit to happen iff Spike and Johnny can outvote Timmy. Spike wants to vote for A and against B. How hard Spike votes for C trades off between his test's false positive and false negative rates. If B wins, ruin is likely. Spike's reasoning seems to require those hypothetical skill updates you don't like.

Comment by gurkenglas on Troll Bridge · 2019-10-03T12:20:16.458Z · score: 1 (1 votes) · LW · GW

If I'm a poor enough player that I merely have evidence, not proof, that the queen move mates in four, then the heuristic that queen sacrifices usually don't work out is fine and I might use it in real life. If I can prove that queen sacrifices don't work out, the reasoning is fine even for a proof-requiring agent. Can you give a chesslike game where some proof-requiring agent can prove from the rules and perhaps the player source codes that queen sacrifices don't work out, and therefore scores worse than some other agent would have? (Perhaps through mechanisms as in Troll bridge.)

Comment by gurkenglas on Long-term Donation Bunching? · 2019-09-27T20:24:19.624Z · score: 2 (2 votes) · LW · GW

The charity could also do this itself, right? Take money, don't some of it yet so it has something to spend tomorrow.