## Posts

Would you like me to debug your math? 2021-06-11T10:54:58.018Z
Domain Theory and the Prisoner's Dilemma: FairBot 2021-05-07T07:33:41.784Z
Changing the AI race payoff matrix 2020-11-22T22:25:18.355Z
Using GPT-N to Solve Interpretability of Neural Networks: A Research Agenda 2020-09-03T18:27:05.860Z
Mapping Out Alignment 2020-08-15T01:02:31.489Z
What are some good public contribution opportunities? (100$bounty) 2020-06-18T14:47:51.661Z Gurkenglas's Shortform 2019-08-04T18:46:34.953Z Implications of GPT-2 2019-02-18T10:57:04.720Z What shape has mindspace? 2019-01-11T16:28:47.522Z A simple approach to 5-and-10 2018-12-17T18:33:46.735Z Quantum AI Goal 2018-06-08T16:55:22.610Z Quantum AI Box 2018-06-08T16:20:24.962Z A line of defense against unfriendly outcomes: Grover's Algorithm 2018-06-05T00:59:46.993Z ## Comments Comment by Gurkenglas on Economic AI Safety · 2021-09-17T14:16:49.083Z · LW · GW You could have a meta-recommender system that aggregates recommendations from multiple algorithms, and shows which algorithm each recommendation came from. By default, when the user reinforces a recommendation's algorithm, the meta-recommender system's algorithm would also be shifted towards the reinforced approach. Comment by Gurkenglas on I wanted to interview Eliezer Yudkowsky but he's busy so I simulated him instead · 2021-09-16T16:31:26.678Z · LW · GW I've previously told a GPT-3 blogger that the proper way to measure the impressiveness of GPT-3's outputs is by the KL divergence to the sorts of outputs that make it into blog posts from the outputs that GPT-3 would generate on its own. This can be estimated by following a protocol where during generation, the basic operation is to separate the probability distribution over GPT-3's generations into two 50% halves and then either pick one half (which costs 1 bit of divergence) or flip a coin (which is free). Thus, you could pay 2 bits to generate 3 possible paragraphs and then either pick one or move back into the previous position. Comment by Gurkenglas on MIRI/OP exchange about decision theory · 2021-08-26T08:54:05.664Z · LW · GW Is there/Should there be a boolean table of undominated decision theories vs. enough problems to disprove any domination? Comment by Gurkenglas on A Qualitative and Intuitive Explanation of Expected Value · 2021-08-10T10:08:12.833Z · LW · GW Local netiquette when disagreeing is to give reasoning that can be attacked. Comment by Gurkenglas on A Qualitative and Intuitive Explanation of Expected Value · 2021-08-10T05:54:21.910Z · LW · GW It's misleading to prescribe maximizing expected large amounts of money: Your first$100k in the bank is a lot more important than your tenth.

Comment by Gurkenglas on #2: Neurocryopreservation vs whole-body preservation · 2021-07-28T10:15:39.041Z · LW · GW

Being trapped in an old body sucks. Extrapolating contemporary medicine forward until we can unfreeze the cryopreserved elderly and keep them from dying is not a great prospect.

Comment by Gurkenglas on Troll Bridge · 2021-07-13T07:38:44.785Z · LW · GW

Suppose the bridge is safe iff A() would decide to cross?

Comment by Gurkenglas on Troll Bridge · 2021-07-09T18:25:04.358Z · LW · GW

Suppose the bridge is safe iff there's a proof that the bridge is safe. Then you would forbid the reasoning "Suppose I cross. I must have proven it's safe. Then it's safe, and I get 10. Let's cross.", which seems sane enough in the face of Löb.

Comment by Gurkenglas on Confusions re: Higher-Level Game Theory · 2021-07-02T16:42:03.970Z · LW · GW

Why not take the bilimit of these types? , I'm guessing. The MirrorBot mirror diverges and that's fine. If you use a constructivist approach where every  comes with the expression that defined it, you can define such strategies as "Cooperate iff PA proves that the opponent cooperates against me.".

Comment by Gurkenglas on Nuclear Strategy in a Semi-Vulnerable World · 2021-06-27T10:07:29.646Z · LW · GW

Ignoring cooperation problems, variance of approaches doesn't necessarily increase variance of outcomes, like if everyone's playing minesweeper in parallel and only the first win or loss matters.

Comment by Gurkenglas on Visualizing in 5 dimensions · 2021-06-20T16:41:09.186Z · LW · GW

One trick I thought of for thinking about high-dimensional spaces is to put multiple dimensions on the same axis: Consider the vectors in R² from the origin onto the unit circle. Lengthen each into a axis, each going infinitely forward and backward, each sharing all its points of R² with one other, all of them intersecting at 0. Embed this in R³, then continuously rotate the tip of each axis into the new dimension, forming a double cone centered at 0. Rotate them further until all tips touch, forming a single axis that contains the information of two dimensions.

You can now have an axis contain the information of any R-vector space, and visualize up to 3 at a time. Of course, not all mental operations that worked in R³ still work.

Comment by Gurkenglas on The Apprentice Thread · 2021-06-17T13:57:02.782Z · LW · GW

[APPRENTICE] Advanced math or AI alignment. I'm bad at getting homework done and good at grokking things quickly, so the day-to-day should look like pair programming or tutoring.

[MENTOR] See https://www.lesswrong.com/posts/MHqwi8kzwaWD8wEQc/would-you-like-me-to-debug-your-math. The first session has the highest leverage, but if my calendar doesn't end up booked (there is one slot in the next two weeks booked out of like 50), more time per person makes sense. My specialization is pattern-matching to correctly predict where a piece of math is going if it's good. When you science that art you get applied category theory.

Comment by Gurkenglas on Experiments with a random clock · 2021-06-14T06:32:48.096Z · LW · GW

Break the minute hand off your wristwatch. Maybe some of the hour hand too.

Comment by Gurkenglas on Would you like me to debug your math? · 2021-06-12T08:31:53.820Z · LW · GW

Seems like less of a market niche, but link it!

Comment by Gurkenglas on Would you like me to debug your math? · 2021-06-11T18:16:08.048Z · LW · GW

It's not my one trick, of course, but it illustrates my usefulness. It's more maintainable not just because it is shorter but also because it has decades of theory behind it. Drawing the connection unlocks inspiration from entire branches of math. And the speedups from standing on the shoulders of giants go far beyond the constant factors from vectorized instructions.

Comment by Gurkenglas on Finite Factored Sets: Orthogonality and Time · 2021-06-11T13:41:28.517Z · LW · GW

subpartitions

So you're doing category theory after all! :)

Comment by Gurkenglas on Would you like me to debug your math? · 2021-06-11T11:47:08.889Z · LW · GW

Sure, I'll try it. I don't expect to be an order-of-magnitude power multiplier in that case, though.

Comment by Gurkenglas on The reverse Goodhart problem · 2021-06-08T21:41:15.184Z · LW · GW

I wouldn't relate it to humans. In just about any basic machine learning setting, (train, test) has aspect 2. In fact, what you describe speaks in favor of modeling preferences using something other than utility functions, where aspect 3 is ruled out.

re your natural example, I would expect that as one shifts from 90% doing the best for the worst off + 10% attainable utility preservation to 100% the former, average welfare goes down.

Comment by Gurkenglas on Speculations against GPT-n writing alignment papers · 2021-06-08T12:59:00.867Z · LW · GW

Take into account that the AI that interprets needs not be the same as the network being interpreted.

Why do you think that a mere autocomplete engine could not do interpretability work? It has been demonstrated to write comments for code and code for specs.

Comment by Gurkenglas on Speculations against GPT-n writing alignment papers · 2021-06-08T12:55:45.465Z · LW · GW

The error correction needs to be present in the original network because I also do some of the converting network into english. The only reason I don't do everything myself is that it takes too long. The proportion can be higher at the topmost levels because there are less tasks there. The error correction doesn't let it completely ignore what I would do at the low levels because on the 1% I can compare its outputs to mine, so they need to at least superficially look similar.

If we find that there's a bunch of redundancy, we can check whether there is any way to cut it down that would suddenly identify a bunch of mesa optimization. So the mesaoptimizer would have to take over the network entirely or trick its bretheren as well as us.

Comment by Gurkenglas on Speculations against GPT-n writing alignment papers · 2021-06-07T22:57:17.811Z · LW · GW

Translating to a tree of natural language descriptions and back lets you

1. use different instances of GPT at each step, so that any coded messages would have to rely on the common training set rather than shared internals.
2. do a random subset of the tasks yourself, so that any subversion of the target system would have to be based on subtle biases, perhaps with a small number of outright lies that it hopes we don't implement manually, rather than the description being fabricated from whole cloth.

Having the thing write papers is merely an existence proof of embedded agency being irrelevant except for deconfusion.

Intelligent agents causally responsible for your existence.

What do you mean you can think of this, I told this to you :D

Comment by Gurkenglas on [deleted post] 2021-06-07T09:27:59.796Z

Was the conversation in chapter one after this or is Tracey going back and forth between the two, as a Tracey does?

You, me, the Three Broomsticks, a bottle of fizzix and a nuclear missile?

Channelling Luna, I see.

Comment by Gurkenglas on SIA is basically just Bayesian updating on existence · 2021-06-04T17:32:39.677Z · LW · GW

If you try to model yourself as a uniformly randomly selected observer somewhere in the Solomonoff prior, that doesn't work because there isn't a uniform distribution on the infinite naturals. When you weight them by each universe's probability but allow a universe to specify the number of observers in it, you still diverge because the number goes up uncomputably fast while the probability goes only exponentially down. In the end, probabilities are what decision theories use to weight their utility expectations. Therefore I suggest we start from a definition of how much we care about what happens to each of the forever infinite copies of us throughout the multiverse. It is consistent to have 80% of the utility functions in one's aggregate say that whatever happens to the overwhelming majority of selves in universes of description length >1000, it can only vary utility by at most 0.2.

Comment by Gurkenglas on Finite Factored Sets · 2021-06-04T16:27:12.205Z · LW · GW

Call the index set of X IX. Call the partition into empty parts indexed by S 0S. We have 0 ⊣ I ⊣ D ⊣ ⊔ ⊣ T.

None of the our three adjunction strings can be extended further. Let's apply the construction that gave us histories at the other 5 ends. Niceness is implicit.
- The right construction of TS->X is the terminal S->S' with a TS'->X: The image of ⊔(TS->X).
- The left construction of X->0S is the initial S'->S with a X->0S': The image of I(X->0S).
- The left construction of B->FX is the initial X'->X with a B->FX': The image of ∨(B->FX).
- The right construction of Δ1•->S is the terminal •->• with a Δ1•->S: The image of Δ•(Δ1•->S).
- The left construction of S->Δ∅• is absurd, but can still be written as the image of Δ•(S->Δ∅•).
- The history of ∨B->X is the terminal B->B' with a ∨B'->X: Breaks the pattern! F(∨B->X) does not have the information to determine the history.

In fact, ⊔T, I0, ∨F, Δ•Δ1 and Δ•Δ∅ are all identity, only F∨ isn't.

Comment by Gurkenglas on [deleted post] 2021-06-04T10:47:42.265Z

She left her wand on the bar and she didn't immediately lose? Moody must have expected her to have another wand, given that she left her wand on the bar theatrically.

Comment by Gurkenglas on Finite Factored Sets · 2021-06-03T20:44:11.192Z · LW · GW

Let 1 be the category with one object • and one morphism. Let Δx be the constant functor to x.

A set is a family of • called elements. A set morphism S->S' has a 1-morphism between each element of S and some element of S'. The 1-morphisms •->Δ•S correspond to the set morphisms Δ∅•->S. The 1-morphisms Δ•S->• correspond to the set morphisms S->Δ1•. We have Δ∅ ⊣ Δ• ⊣ Δ1.

Let 0 the empty category. • is the empty family. A 1-morphism has nothing to prove. There's no forgetful functor 1->0 so the buck stops here.

Comment by Gurkenglas on Finite Factored Sets · 2021-06-03T09:02:21.391Z · LW · GW

Let's try category theory.

A partition is a family of sets called parts. A partition morphism X->X' has a function from each part of X to some part of X'. It witnesses that X is finer than X'¹.

The underlying set of a partition is its disjoint union. Call the discrete partition of S DS. The functions S->⊔X correspond to the partition morphisms DS->X. Call the trivial partition of S TS. The functions ⊔X->S correspond to the partition morphisms X->TS. In terser notation, we have D ⊣ ⊔ ⊣ T.

A factorization is a family of partitions called factors. A factorization morphism B->B' has a partition morphism to each factor of B' from some factor of B.²

The underlying partition of a factorization is its common refinement. Call the trivial factorization of X FX.³ The partition morphisms X->∨B correspond to the factorization morphisms FX->B: We have F ⊣ ∨. The absence of "discrete factorizations" as a right adjoint to ∨ is where histories come from.

A history of ∨B->X is a nice⁴ B->H with a ∨H->X. The history of ∨B->X is its terminal history. Note that this also attempts to coarsen each factor. ∨B->X being weakly after ∨B->X' is witnessed by a nice ∨H->X' or equivalently H->H'. ∨B->X and ∨B->X' are orthogonal iff the pushout of B->H and B->H' is the empty factorization.

Translating "2b. Conditional Orthogonality" is taking a while (I think it's something with pushouts) so let's post this now. I'm also planning to generalize "family" to "diagram". Everyone's allowed to ask stupid questions, including basic category theory.

¹: Which includes that X might rule out some worlds.
²: Trying to avert the analogy break cost me ~60% of the time behind this comment.
⁴: Nice means that everything in sight commutes.

Comment by Gurkenglas on If You Want to Find Truth You Need to Step Into Cringe · 2021-06-01T19:59:27.207Z · LW · GW

Unattractiveness => Cringe

Or Cringe => Unattractiveness, or they have a common cause. The people in the video may be unattractive because the author wanted to convince the viewer that weeabooism correlates with low social status. When you listen to the linked song, the reasoning it brings forward for why the viewer should consider weeaboos cringe is that they believe to have higher social status than they do.

Comment by Gurkenglas on If individual performance is Pareto distributed, how should we reform education? · 2021-05-25T09:23:23.150Z · LW · GW

How do you measure performance, then? If you can only rank it, distributions mean nothing.

Comment by Gurkenglas on Don't feel bad about not knowing basic things · 2021-05-24T18:49:43.859Z · LW · GW

You can map across any monad, but not everything you can map across is a monad. Applicatives are in between.

Comment by Gurkenglas on Don't feel bad about not knowing basic things · 2021-05-24T08:13:34.030Z · LW · GW

In the middle are stuff in between.

What an Applicative is? :)

Comment by Gurkenglas on Are PS5 scalpers actually bad? · 2021-05-18T18:13:56.353Z · LW · GW

Couldn't producers just hold an auction and have the proceeds beyond the price they're allowed by public opinion to charge go to charity?

Comment by Gurkenglas on European Soylent alternatives · 2021-05-16T10:56:45.005Z · LW · GW

Is this up to date?

Comment by Gurkenglas on How to compute the probability you are flipping a trick coin · 2021-05-15T15:23:05.486Z · LW · GW

The exponential is because updates happen on a logarithmic scale. Do you have a simple variant of the problem in mind where we don't get exponentials? When I try to construct one, I have to start from "we don't get exponentials" and calculate how the probabilities of different hypotheses would have to converge over time.

Comment by Gurkenglas on How to compute the probability you are flipping a trick coin · 2021-05-15T06:47:43.237Z · LW · GW

2^-n is in fact the probability of a coin showing n heads. Where is the choice?

Comment by Gurkenglas on Agency in Conway’s Game of Life · 2021-05-13T06:55:43.694Z · LW · GW

I would say it all depends on whether there is a wall gadget which protects everything on one side from anything on the other side. (And don't forget the corner gadget.)

If so, cover the edges of the controlled portion in it, except for a "gate" gadget which is supposed to be a wall except openable and closable. (This is relatively easier since a width of 100 ought to be enough, and since we can stack 10000 of these in case one is broken through - rarely should chaos be able to reach through a 100x10000 rectangle.)

Wait 10^40 steps for the chaos to lose entropy. The structures that remain should be highly compressible in a CS sense, and made of a small number of natural gadget types. Send out ships that cover everything in gliders. This should resurrect the chaos temporarily, but decrease entropy further in the long run. Repeat 10^10 times, waiting 10^40 steps in between.

The rest should be a simple matter of carefully distinguishing the remaining natural gadgets with ship sensors to dismantle each. Programming a smiley face deployer that starts from an empty slate is a trivial matter.

If walls are constructible, there's no need for gates, and also one could allow a margin for error in the sensory ships: One could advance the walls after claiming some area, in case a rare encounter summons another era of chaos.

All this is less AGI than ordinary game AI - a bundle of programmed responses.

Comment by Gurkenglas on Is driving worth the risk? · 2021-05-11T07:44:03.246Z · LW · GW

I think making utility linear in years is a mistake. The remote possibility of finding a physics hack to control infinite matter in finite time does not curbstomp all other considerations, therefore the utility of that outcome is finite. I prefer 66% of BB(1000) years to 33% of BB(10000) years. I am uncertain about my preferences, but utility functions are not aggregated by taking the expectation.

The only authority on your preferences is yourself; but reasonable agents, when a hypothetical proves them dutch-bookable/incoherent, will become less certain about their preferences.

What this cashes out to is that you should calculate the value not of a year but of (an extra 1% chance of) making it to takeoff. (From what you would do in (perhaps physically impossible) hypotheticals.)

Comment by Gurkenglas on MikkW's Shortform · 2021-05-10T18:46:50.110Z · LW · GW

So a forager animal with no predators isn't free because it has to look for food?

Comment by Gurkenglas on Domain Theory and the Prisoner's Dilemma: FairBot · 2021-05-10T07:09:50.340Z · LW · GW

Yeah, enacting comes in at a higher level of interpretation than is yet considered here. The increasing levels of interpretation here are: Set theory or other math foundations; we consider sets of queries and beliefs and players with functions between them; we add porder and monotonicity; we specify proof engines and their properties like consistency; we define utilities, decision theories, and what makes some players better than others. (Category theory is good at keeping these separate.) I'd start talking about "enacting" when we define a decision theory like "Make the decision such that I can prove the best lower bound on utility.". What do you mean by deciding on a belief state? "Decision" is defined before I establish any causation from decisions to beliefs.

Oh, I thought you meant you didn't see why any two beliefs had an upper bound. My choice to make players monotonic comes from intuition that that's how the math is supposed to look. I'd define Query=P(Decision) as Decision->2 as well but that plainly makes no sense so I'm looking for the true posetty definition of Query, and "logical formulas" looks good so far. Switching back and forth sounds more like you want to do multiple decisions, one after the other. There's also a more grounded case to be made that your policy should become more certain as your knowledge does, do you see it?

Comment by Gurkenglas on Domain Theory and the Prisoner's Dilemma: FairBot · 2021-05-09T22:34:35.925Z · LW · GW

The belief state diagram is upward closed because I included the inconsistent belief states. We could say that a one-query player "decides to defect" if his query is proven false. Then he will only decide on both decisions when his beliefs are inconsistent. Alternatively we could have a query for each primitive decision, inducing a monotone map from P({C,D}) to queries; or we could identify players with these monotone maps.

I didn't follow the bit about being modeled as oneself. Every definition of the belief space gives us a player space, yes? And once we specify some beliefs we have a tournament to examine, an interesting one if we happen to pipe the player's outputs into their inputs through some proof engines. Define enacted.

Comment by Gurkenglas on Domain Theory and the Prisoner's Dilemma: FairBot · 2021-05-09T16:46:30.487Z · LW · GW

In intuitionistic logic, "Cooperate iff either of C or ⊥ is provable." is equivalent to "Cooperate iff (C or ⊥) is provable.". So should we only consider players of form "Cooperate iff _ is provable.", where _ is some intuitionistic formula? Well...

Comment by Gurkenglas on Why are the websites of major companies so bad at core functionality? · 2021-05-09T14:34:21.274Z · LW · GW

I suppose for messy real-world tasks, you can't define distances objectively ahead of time. You could simply check a random 10 (x,f(x)) and choose how much to pay. In an ideal world, if they think you're being unfair they can stop working for you. In this world where giving someone a job is a favor, they could go to a judge to have your judgement checked.

Though if we're talking about AIs: You could have the AI output a probability distribution g(x)  over possible f(x) for each of the 100 x. Then for a random 10 x, you generate an f(x) and reward the AI according to how much probability it assigned to what you generated.

Comment by Gurkenglas on Why are the websites of major companies so bad at core functionality? · 2021-05-08T19:24:44.816Z · LW · GW

How would you goodhart this metric? To be clear, you want to map x to f(x), but this takes a second of your time. You pay them to map x to f(x), but they map x to g(x). After they're done mapping 100 x to g(x), you select a random 10 of those 100, spend 10 seconds to calculate the corresponding g(x)-f(x), and pay them more the smaller the absolute difference.

Comment by Gurkenglas on Why are the websites of major companies so bad at core functionality? · 2021-05-08T16:09:23.682Z · LW · GW

Couldn't you make them care by making their pay dependent on how well they predict what you would decide, as measured by you redoing the decision for a representative sample of tasks?

Comment by Gurkenglas on interpreting GPT: the logit lens · 2021-05-01T08:53:58.823Z · LW · GW

floating point underflow to simulate relu

Oh that's not good. Looks like we'd need a version of float that keeps track of an interval of possible floats (by the two floats at the end of the interval). Then we could simulate the behavior of infinite-precision floats so long as the network keeps the bounds tight, and we could train the network to keep the simulation in working order. Then we could see whether, in a network thus linear at small numbers, every visibly large effect has a visibly large cause.

By the way - have you seen what happens when you finetune GPT to reinforce this pattern that you're observing, that every entry of the table, not just the top right one, predicts an input token?

Comment by Gurkenglas on interpreting GPT: the logit lens · 2021-04-29T17:16:27.870Z · LW · GW

gelu has the same property

Actually, gelu is differentiable at 0, so it is linear on close-to-zero values.

Comment by Gurkenglas on When Should the Fire Alarm Go Off: A model for optimal thresholds · 2021-04-29T06:29:19.045Z · LW · GW

not having to pay  is effectively the same as gaining

No! If you're going to add/multiply something to your utility function for convenience, you have to do it for every action. When the building is on fire, deciding whether to turn on the sprinklers is a decision on whether to spend T and gain D, so V(TP)-V(FN) needs to be D-T.

Comment by Gurkenglas on What topics are on Dath Ilan's civics exam? · 2021-04-27T07:39:27.888Z · LW · GW

This is how you get Latin courses.

Comment by Gurkenglas on NTK/GP Models of Neural Nets Can't Learn Features · 2021-04-23T09:42:09.172Z · LW · GW

Can we therefore model fine-tuning as moving around in the parameter tangent space around the pre-trained network?

Comment by Gurkenglas on Löb's theorem simply shows that Peano arithmetic cannot prove its own soundness · 2021-04-22T12:17:37.579Z · LW · GW

We are trying to prove some statement p. When we're proving it for all pairs of real numbers x and y, the spell "Without loss of generality!" gives us the lemma "x<=y". When we're proving it for all natural numbers, the spell "Complete induction!" gives us the lemma "p holds for all smaller numbers". When we're working in PA, "Löb's Theorem!" gives us the lemma "Provable(p)".

Edit: And in general, math concepts are useful because of how they can be used. Memorize not things that are true, but things you can do. See this comment.