Gurkenglas's Shortform 2019-08-04T18:46:34.953Z · score: 5 (1 votes)
Implications of GPT-2 2019-02-18T10:57:04.720Z · score: -4 (6 votes)
What shape has mindspace? 2019-01-11T16:28:47.522Z · score: 16 (4 votes)
A simple approach to 5-and-10 2018-12-17T18:33:46.735Z · score: 5 (1 votes)
Quantum AI Goal 2018-06-08T16:55:22.610Z · score: -2 (2 votes)
Quantum AI Box 2018-06-08T16:20:24.962Z · score: 5 (6 votes)
A line of defense against unfriendly outcomes: Grover's Algorithm 2018-06-05T00:59:46.993Z · score: 5 (3 votes)


Comment by gurkenglas on Gurkenglas's Shortform · 2019-10-13T02:42:05.132Z · score: 3 (2 votes) · LW · GW

Suppose we considered simulating some human for a while to get a single response. My math heuristics are throwing up the hypothesis that proving what the response would be is morally equivalent to actually running the simulation - it's just another substrate. Thoughts? Implications? References?

Comment by gurkenglas on A simple sketch of how realism became unpopular · 2019-10-12T01:37:55.176Z · score: 7 (4 votes) · LW · GW

It's rather obvious if you've done programming or studied provability or read the sequences. The lesswrong crowd isn't a good sample for testing the strength of this trap.

Comment by gurkenglas on Sets and Functions · 2019-10-11T10:48:39.845Z · score: 2 (2 votes) · LW · GW

Then the map is just...the set of where you started from and where you ended up. That is, a and x, respectively.

This sounds like the map is {a,x}.

If you run out of steam before reaching adjunctions, I hope you can manage a post about adjunctions that assumes that you had finished all the previous posts.

You say that functions are the best maps because of those two properties, but they are simply the defining properties of a function. What makes these properties the best properties for a definition of maps to have?

Comment by gurkenglas on Thoughts on "Human-Compatible" · 2019-10-11T08:10:24.271Z · score: 1 (1 votes) · LW · GW

Oh, damn it, I mixed up the designs. Edited.

Comment by gurkenglas on Thoughts on "Human-Compatible" · 2019-10-10T23:30:47.435Z · score: 4 (2 votes) · LW · GW

Design 2̴ 1 may happen to reply "Convince the director to undecouple the AI design by telling him <convincing argument>." which could convince the operator that reads it and therefore fail as 3̴ 2 fails.

Design 2̴ 1 may also model distant superintelligences that break out of the box by predictably maximizing paperclips iff we draw a runic circle that, when printed as a plan, convinces the reader or hacks the computer.

Comment by gurkenglas on Categories: models of models · 2019-10-10T11:41:08.526Z · score: 1 (1 votes) · LW · GW

That a construction is free doesn't mean that you lose nothing. It means that if you're going to do some construction anyway, you might as well use the free one, because the free one can get to any other. (Attainable utility anyone?)

Showing that your construction is free means that all you need to show as worthwhile is constructing any category from our quiver. Adjunctions are a fine reason, though I wish we could introduce adjunctions first and then show that we need categories to get them.

Comment by gurkenglas on Categories: models of models · 2019-10-09T22:32:33.139Z · score: 2 (2 votes) · LW · GW

Math certainly has ambiguous generalizations. As the image hints, these are also studied in category theory. Usually, when you must select one, the one of interest is the least general one that holds for each of your objects of study. In the image, this is always unique. I'm guessing that's why bicentric has a name. I'll pass on the question of how often this turns out unique in general.

Comment by gurkenglas on Categories: models of models · 2019-10-09T11:58:20.681Z · score: 2 (2 votes) · LW · GW

Not every way to model reality defines identity and composition. You can start with a category-without-those G (a quiver) and end up at a category C by defining C-arrows as chains of G-arrows (the quiver's free category), but it doesn't seem necessary or a priori likely to give new insights. Can you justify this rules choice?

Comment by gurkenglas on The sentence structure of mathematics · 2019-10-08T22:30:18.806Z · score: 1 (1 votes) · LW · GW

Categories are what we call it when each arrow remembers its source and target. When they don't, and you can compose anything, it's called a monoid. The difference is the same as between static and dynamic type systems. The more powerful your system is, the less you can prove about it, so whenever we can, we express that particular arrows can't be composed, using definitions of source and target.

Comment by gurkenglas on The sentence structure of mathematics · 2019-10-08T16:35:03.329Z · score: 1 (1 votes) · LW · GW

is a different category

You mean object.

Every category containing O and P must address this question. In the usual category of math functions, if P has only those two pairs then the source object of P is exactly {4,5}, so O and P can't be composed. In the category of relations, that is arbitrary sets of pairs between the source and target sets, O and P would compose to the empty relation between letters and countries.

Comment by gurkenglas on The sentence structure of mathematics · 2019-10-08T00:47:55.427Z · score: 1 (1 votes) · LW · GW

If your mapping contains those three pairs, then the arrow's source object contains 1, A, B and cow, and the target object contains 5, 3, cat and france. Allowing or disallowing mixed types gives two different categories. Whether an arrow mixes types is as far as I can tell you to mean uniquely determined by whether its source or target object mix types. In either case, to compose two arrows they must have a common middle object.

Comment by gurkenglas on The sentence structure of mathematics · 2019-10-07T22:19:40.449Z · score: 1 (1 votes) · LW · GW

The baggage that comes with the words noun and verb is only for guiding the search for intuition and is to be discarded when it leads to confusion.

In all your interpretations of math/programming functions, there can be different arrows between the same objects. The input/output behavior is seen as part of the arrow. The objects are merely there to establish what kinds of arrows can be strung together because one produces, say, real numbers, and the other consumes them.

Comment by gurkenglas on Troll Bridge · 2019-10-05T13:00:03.225Z · score: 1 (1 votes) · LW · GW

I started asking for a chess example because you implied that the reasoning in the top-level comment stops being sane in iterated games.

In a simple iteration of Troll bridge, whether we're dumb is clear after the first time we cross the bridge. In a simple variation, the troll requires smartness even given past observations. In either case, the best worst-case utility bound requires never to cross the bridge, and A knows crossing blows A up. You seemed to expect more.

Suppose my chess skill varies by day. If my last few moves were dumb, I shouldn't rely on my skill today. I don't see why I shouldn't deduce this ahead of time and, until I know I'm smart today, be extra careful around moves that to dumb players look extra good and are extra bad.

More concretely: Suppose that an unknown weighting of three subroutines approval-votes on my move: Timmy likes moving big pieces, Johnny likes playing good chess, and Spike tries to win in this meta. Suppose we start with move A, B or C available. A and B lead to a Johnny gambit that Timmy would ruin. Johnny thinks "If I play alone, A and B lead to 80% win probability and C to 75%. I approve exactly A and B.". Timmy gives 0, 0.2 and 1 of his maximum vote to A, B and C. Spike wants the gambit to happen iff Spike and Johnny can outvote Timmy. Spike wants to vote for A and against B. How hard Spike votes for C trades off between his test's false positive and false negative rates. If B wins, ruin is likely. Spike's reasoning seems to require those hypothetical skill updates you don't like.

Comment by gurkenglas on Troll Bridge · 2019-10-03T12:20:16.458Z · score: 1 (1 votes) · LW · GW

If I'm a poor enough player that I merely have evidence, not proof, that the queen move mates in four, then the heuristic that queen sacrifices usually don't work out is fine and I might use it in real life. If I can prove that queen sacrifices don't work out, the reasoning is fine even for a proof-requiring agent. Can you give a chesslike game where some proof-requiring agent can prove from the rules and perhaps the player source codes that queen sacrifices don't work out, and therefore scores worse than some other agent would have? (Perhaps through mechanisms as in Troll bridge.)

Comment by gurkenglas on Long-term Donation Bunching? · 2019-09-27T20:24:19.624Z · score: 2 (2 votes) · LW · GW

The charity could also do this itself, right? Take money, don't some of it yet so it has something to spend tomorrow.

Comment by gurkenglas on Long-term Donation Bunching? · 2019-09-27T20:22:51.411Z · score: 3 (2 votes) · LW · GW

The same reasoning also says to take out loans to bunch the donation now rather than later, to align your future self with your present self because paying off your loans is in your own best interest.

Comment by gurkenglas on Honoring Petrov Day on LessWrong, in 2019 · 2019-09-27T13:21:28.784Z · score: 1 (1 votes) · LW · GW

Participants were selected based on whether they seem unlikely to press the button, so whoever would have cared about future extortions being possible CDT-doesn't need to, because they won't be a part of it.

Comment by gurkenglas on Finding Cruxes · 2019-09-27T06:16:58.431Z · score: 2 (2 votes) · LW · GW

Having in mind that we are measuring bits of evidence tells us that to give percentages, we must establish a baseline prior probability that we would assign without reasons.

Mostly you should be fine, just have heuristics for the anomalies near 0 and 1 - if one belief pushes the probability to .5 and another to .6, then the prior was noticeably far from zero or getting only the second reason won't be noticeable either.

Comment by gurkenglas on Bíos brakhús · 2019-09-24T16:51:07.952Z · score: 1 (1 votes) · LW · GW

Have you played Factorio?

Comment by gurkenglas on Gurkenglas's Shortform · 2019-09-24T16:50:43.194Z · score: 1 (1 votes) · LW · GW

Suppose all futures end in FAI or UFAI. Suppose there were a magic button that rules out the UFAI futures if FAI was likely enough, and the FAI futures otherwise. The cutoff happens to be chosen to conserve your subjective probability of FAI. I see the button as transforming our game for the world's fate from one of luck into one of skill. Would you press it?

Comment by gurkenglas on Finding Cruxes · 2019-09-24T01:59:41.796Z · score: 2 (2 votes) · LW · GW

Yes to both. Suppose a coin has heads probability 33% and another 66%. We take a random coin and throw it three times. Afterwards, if we have seen 0, 1, 2 or 3 heads, the subjective probability of us having taken the 66% coin is 1/9, 1/3, 2/3 or 8/9. The absolute probability reduction is not the same each time we remove a reason to believe. On a log-odds scale, it is.

Comment by gurkenglas on The Power to Teach Concepts Better · 2019-09-23T17:43:42.940Z · score: 1 (1 votes) · LW · GW

I think this depends on your audience. If you want to explain to one person, look for Mind-Hangers. There's a chance they don't get it, but then you go to a replacement. This is faster than explaining from the ground up. If you use text to explain to many people, each Mind-Hanger is going to lose you that portion of your audience which isn't familiar with it. You'd have to design a conversation graph for people to follow, browsing through Mind-Hangers until they find one that fits them. This is faster for the reader than if you'd explained from the ground up, but slower for you. Reconciling the models so you can continue telling everyone the same stuff is an extra step that wouldn't be necessary in a one-on-one conversation. Keeping the models separate multiplies your work and hinders the economies of scale Web 2.0 grants us.

Comment by gurkenglas on Finding Cruxes · 2019-09-23T17:13:03.963Z · score: 5 (3 votes) · LW · GW

99% is much further from 98% than 51% from 50%. As an example, getting from a one in a million confidence that Alice killed Bob (because Alice is one of a million citizens) to ten suspects requires much more evidence than eliminating five of them. Probabiliy differences are measured on the log-odds scale, in order to make seeing reason A, then B have the same effect as seeing B, then A. On that scale, you could in fact take two statistically independent reasons and say how many times more evidence one gives than the other.

Comment by gurkenglas on Finding Cruxes · 2019-09-22T13:28:14.949Z · score: 2 (2 votes) · LW · GW

If you hadn't experienced those ghost girls, what would be your confidence?

What does that 60% mean? What changes when we replace it by 50%? Can you unpack the definition of "How much of the belief is due to this reason?"?

Comment by gurkenglas on Finding Cruxes · 2019-09-21T17:55:59.256Z · score: 2 (2 votes) · LW · GW

{} is the subjective probability estimate given that reasons A, B abd C are not present.

You may not want to ask for ask this information, but it is in fact exactly all the relevant information. If you want to extract just one percentage per reason, you should define how you do this, just so it is clear what exactly you are asking. Those percentages may then again be available through more direct questions.

Comment by gurkenglas on Finding Cruxes · 2019-09-21T10:48:38.568Z · score: 5 (4 votes) · LW · GW

I don't see an immediately obvious way to give percentages for which reasons are how responsible for a belief. What we can do is ask for each subset of reasons how likely they would find their belief if they only had that subset. Did you have some way in mind to get percentages from the following state of affairs?

A B C - reasons

{A, B, C} - 99%

{A, C} - 98%

{B, C} - 98%

{C} - 50%

{A, B} - 89%

{A} - 88%

{B} - 88%

{} - 40%

Comment by gurkenglas on Reframing Impact · 2019-09-21T10:32:39.015Z · score: 3 (3 votes) · LW · GW

I propose to measure impact by counting bits of optimization power, as in my Oracle question contest submission. Find some distribution over plans we might use if we didn't have an AI, such as stock market trading policies. Have the AI output a program that outputs plans according to some distribution. Measure impact by computing a divergence between the two distributions, such as the maximum pointwise quotient - if no plan becomes more than twice as likely, that's no more than one bit of optimization power. Note that the AI is incentivized to prove its output's impact bound to some dumb proof checker. If the AI cuts away the unprofitable half of policies, that is more than enough to get stupid rich.

Comment by gurkenglas on Proving Too Much (w/ exercises) · 2019-09-15T11:12:19.300Z · score: 11 (6 votes) · LW · GW

Your Proving Too Much disproves too much: If we only allow reasoning steps that always work, we never get real-world knowledge beyond "I think, therefore I am.". Some of these reasons for belief make their belief more likely to be true, and qualitatively that's the best we can get.

Comment by gurkenglas on The Power to Understand "God" · 2019-09-13T15:55:52.243Z · score: 3 (2 votes) · LW · GW

Isn't the map/territory distinction implied by minds not being fundamental to the universe, which follows from the heavily experimentally demonstrated hypothesis that the universe runs on math?

Comment by gurkenglas on The Power to Understand "God" · 2019-09-13T09:03:36.451Z · score: 1 (1 votes) · LW · GW

If you ask her how the universes higher purpose shapes her expectations, she might say that she expects God to think this universe to yet have an interesting story to tell, because otherwise God wouldn't bother to keep it instantiated. Therefore, she might see it as less likely that some nerds in a basement accidentally turn the world into paperclips, because that would be a stupid story.

Comment by gurkenglas on Relaxed adversarial training for inner alignment · 2019-09-11T09:11:59.208Z · score: 2 (2 votes) · LW · GW

I read up to "of this post.". Took me way too long to realize pseudo-inputs are input sets/distributions, not particular inputs. I'm guessing the argmax is supposed to be a max. Why do you split P(α(x) and C(M,x)) into P(α(x))*P(C(M,x)|α(x))?

Comment by gurkenglas on "AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence", Clune 2019 · 2019-09-10T22:58:53.733Z · score: 1 (1 votes) · LW · GW

Whatever is created by this paradigm may turn out just weak enough that repurposing the training hardware will only scale it up to competetive profitability.

Comment by gurkenglas on Is my result wrong? Maths vs intuition vs evolution in learning human preferences · 2019-09-10T18:52:47.396Z · score: 1 (1 votes) · LW · GW
Chessmasters didn't easily program chess programs; and those chess programs didn't generalise to games in general.

I'd say a more relevant analogy is whether some ML algorithm could learn to play Go teaching games against a master, by example of a master playing teaching games against a student, without knowing what Go is.

Comment by gurkenglas on Is my result wrong? Maths vs intuition vs evolution in learning human preferences · 2019-09-10T10:29:30.604Z · score: 3 (3 votes) · LW · GW

Your title seems clickbaity, since its question is answered no in the post, and this article would have been more surprising had you answered yes. (And my expectation was that if you ask that question in the title, you don't know the answer anymore.)

having implicit access to categorisation modules that themselves are valid only in typical situations... is not a way to generalise well

How do you know this? Should we turn this into one of those concrete ML experiments?

Comment by gurkenglas on Gurkenglas's Shortform · 2019-09-04T16:10:30.928Z · score: 5 (2 votes) · LW · GW

I've found a category-theoretical model of BCI-powered reddit!

Fix a set of posts. Its subsets form a category whose morphisms are inclusions that map every element to itself. Call its forgetful functor to Set f. Each BCI can measure its user, such as by producing a vector of neuron activations. Its possible measurements form a space, and these spaces form a category. (Its morphisms would translate between brains, and each morphism would keep track of how well it preserves meaning.) Call its forgetful functor to Set g.

The comma category f/g has as its objects users (each a Set-function from some set of posts they've seen to their measured reactions), and each morphism would relate the user to another brain that saw more posts and reacted similarly on what the first user saw.

The product on f/g tells you how to translate between a set of brains. A user could telepathically tell another what headspace they're in, so long as the other has ever demonstrated a corresponding experience. Note that a republican sending his love for republican posts might lead to a democrat receiving his hatred for republican posts.

The coproduct on f/g tells you how to extrapolate expected reactions between a set of brains. A user could simply put himself into a headspace and get handed a list of posts he hasn't seen for which it is expected that they would have put him into that headspace.

Comment by gurkenglas on Troll Bridge · 2019-09-04T12:05:30.123Z · score: 1 (1 votes) · LW · GW

(Maybe this doesn't answer your question?)

Correct. I am trying to pin down exactly what you mean by an agent controlling a logical statement. To that end, I ask whether an agent that takes an action iff a statement is true controls the statement through choosing whether to take the action. ("The Killing Curse doesn't crack your soul. It just takes a cracked soul to cast.")

Perhaps we could equip logic with a "causation" preorder such that all tautologies are equivalent, causation implies implication, and whenever we define an agent, we equip its control circuits with causation. Then we could say that A doesn't cross the bridge because it's not insane. (I perhaps contentiously assume that insanity and proving sanity are causally equivalent.)

If we really wanted to, we could investigate the agent that only accepts utility proofs that don't go causally backwards. (Or rather, it requires that its action provably causes the utility.)

You claimed this reasoning is unwise in chess. Can you give a simple example illustrating this?

Comment by gurkenglas on Best utility normalisation method to date? · 2019-09-03T16:51:19.343Z · score: 2 (2 votes) · LW · GW

I was aware, but addressing his objection as though it were justified, which it would be if this were the only place where the agent's preferences matter. This counterfactual is supported by my fondness for linear logic.

Comment by gurkenglas on The Transparent Society: A radical transformation that we should probably undergo · 2019-09-03T16:11:41.690Z · score: 1 (1 votes) · LW · GW

FAI is more plausible than magic to the point that we don't have to desperately try to make society transparent.

Comment by gurkenglas on The Transparent Society: A radical transformation that we should probably undergo · 2019-09-03T10:46:40.970Z · score: 6 (3 votes) · LW · GW

Your irregularly scheduled reminder that FAI solves these problems just fine.

Comment by gurkenglas on Best utility normalisation method to date? · 2019-09-02T21:34:27.491Z · score: 2 (2 votes) · LW · GW

there is zero probability of an option being chosen which is all the second choice of all parties

We might get around this by letting each agent submit not only a utility, but also the probability distribution over actions it would choose if it were dictator. If he's a maximizer, this doesn't get around that. If he's a quantilizer, this should. A desirable property would be that an agent wants to not lie about this.

Comment by gurkenglas on Best utility normalisation method to date? · 2019-09-02T20:49:24.832Z · score: 5 (3 votes) · LW · GW

Desirable properties that this may or may not have:

  • Partitioning the utilities, aggregating each component, then aggregating the results ought to not depend on the partition.
  • Any agent ought to want to submit its true utility function.

Taking the limit of introducing many copies of an indifferent utility into the mix recovers mean-max.

What happens when we use the resulting aggregated action as the new normalization pivot, and take a fixed point? The double-counting problem gets worse, but fixing it should also make this work.

If each agent can choose which action to submit to the random dictator policy, they might want to sacrifice a bit of their own utility (which they only currently want to improve their normalization position) in order to ruin other utilities (to worsen their normalization position). Two agents might cooperate by agreeing on an action they both submit.

In addition to the pivot each utility submits, we could take into account pivots selected by an aggregate of a subset of utilities. The full aggregate's pivot would agree with what the others submit (due to the convergent instrumental goal of reflective consistency). This construction might be easy to make invariant under partitioning.

Comment by gurkenglas on The Power to Demolish Bad Arguments · 2019-09-02T14:26:10.468Z · score: 7 (3 votes) · LW · GW

You saw coming that his position would be temporarily incoherent, that's why you went there. I expect Steve to be aware of this at some level, and update on how hostile the debate is. Minimize the amount of times you have to prove him wrong.

Comment by gurkenglas on The Power to Demolish Bad Arguments · 2019-09-02T13:47:10.018Z · score: 8 (12 votes) · LW · GW

By telling Steve to be specific, you are trying to trick him into adopting an incoherent position. You should be trying to argue against your opponent at his strongest, in this case in the general case that he has thought the most about. If you lay out your strategy before going specific, he can choose an example that is more resilient to it. In your example, if Uber didn't exist, that job may have been a taxi driver job instead, which pays more, because there's less rent seeking stacked against you.

Comment by gurkenglas on August 2019 newsletter (popups.js demo) · 2019-09-01T19:57:23.133Z · score: 3 (2 votes) · LW · GW

Popups can become larger than the screen on mobile. Add scrolling in such cases?

Comment by gurkenglas on Decision Theory · 2019-08-31T19:57:02.004Z · score: 1 (1 votes) · LW · GW

The agent has been constructed such that Provable("5 is the best possible action") implies that 5 is the best (only!) possible action. Then by Löb's theorem, 5 is the only possible action. It cannot also be simultaneously constructed such that Provable("10 is the best possible action") implies that 10 is the only possible action, because then it would also follow that 10 is the only possible action. That's not just our proof system being inconsistent, that's false!

Comment by gurkenglas on Slider's Shortform · 2019-08-31T19:36:51.682Z · score: 1 (1 votes) · LW · GW

I mean that it's gonna be inconvenient to consciously write down all the tags that apply, as opposed to the BCI giving a cloud of 2000 relevant tags/Discord reactions. It also feels like giving names would reduce the usefulness of this from telepathy to language.

Comment by gurkenglas on Burdens · 2019-08-31T16:19:51.719Z · score: 1 (1 votes) · LW · GW

Human psychology as it was optimized for the ancestral environment has been around longer than modern society.

Comment by gurkenglas on The Very Repugnant Conclusion · 2019-08-31T14:38:43.226Z · score: 1 (1 votes) · LW · GW

The utility of the universe should not depend on the order that we assign to the population. We could say that there is a space of lives one could live, and each person covers some portion of that space, and identical people are either completely redundant or only reinforce coverage of their region, and our aim should be to cover some swath of this space.

Comment by gurkenglas on Matthew Barnett's Shortform · 2019-08-31T02:32:44.511Z · score: 5 (6 votes) · LW · GW

Mathematically, it seems like you should just give your heuristic the better data you already consciously have: If your untrustworthy senses say you aren't on the mainline, the correct move isn't necessarily to believe them, but rather to decide to put effort into figuring it out, because it's important.

It's clear how your heuristic would evolve. To embrace it correctly, you should make sure that your entire life lives in the mainline. If there's a game with negative expected value, where the worst outcome has chance 10%, and you play it 20 times, that's stupid. Budget the probability you are willing to throw away for the rest of your life now.

If you don't think you can stay to your budget, if you know that always, you will tomorrow play another round of that game by the same reasoning as today, then realize that today's reasoning decides today and tomorrow. Realize that the mainline of giving in to the heuristic is losing eventually, and let the heuristic destroy itself immediately.

Comment by gurkenglas on Matthew Barnett's Shortform · 2019-08-31T02:20:50.267Z · score: 1 (1 votes) · LW · GW

The default argument that such a development would lead to a foom is that an insight-based regular doubling of speed mathematically reaches a singularity in finite time when the speed increases pay insight dividends. You can't reach that singularity with a fleshbag in the loop (though it may be unlikely to matter if with him in the loop, you merely double every day).

For certain shapes of how speed increases depend on insight and oversight, there may be a perverse incentive to cut yourself out of your loop before the other guy cuts himself out.