## Posts

Would you like me to debug your math? 2021-06-11T10:54:58.018Z
Domain Theory and the Prisoner's Dilemma: FairBot 2021-05-07T07:33:41.784Z
Changing the AI race payoff matrix 2020-11-22T22:25:18.355Z
Using GPT-N to Solve Interpretability of Neural Networks: A Research Agenda 2020-09-03T18:27:05.860Z
Mapping Out Alignment 2020-08-15T01:02:31.489Z
What are some good public contribution opportunities? (100\$ bounty) 2020-06-18T14:47:51.661Z
Gurkenglas's Shortform 2019-08-04T18:46:34.953Z
Implications of GPT-2 2019-02-18T10:57:04.720Z
What shape has mindspace? 2019-01-11T16:28:47.522Z
A simple approach to 5-and-10 2018-12-17T18:33:46.735Z
Quantum AI Goal 2018-06-08T16:55:22.610Z
Quantum AI Box 2018-06-08T16:20:24.962Z
A line of defense against unfriendly outcomes: Grover's Algorithm 2018-06-05T00:59:46.993Z

Comment by Gurkenglas on The Apprentice Thread · 2021-06-17T13:57:02.782Z · LW · GW

[APPRENTICE] Advanced math or AI alignment. I'm bad at getting homework done and good at grokking things quickly, so the day-to-day should look like pair programming or tutoring.

[MENTOR] See https://www.lesswrong.com/posts/MHqwi8kzwaWD8wEQc/would-you-like-me-to-debug-your-math. The first session has the highest leverage, but if my calendar doesn't end up booked (there is one slot in the next two weeks booked out of like 50), more time per person makes sense. My specialization is pattern-matching to correctly predict where a piece of math is going if it's good. When you science that art you get applied category theory.

Comment by Gurkenglas on Experiments with a random clock · 2021-06-14T06:32:48.096Z · LW · GW

Break the minute hand off your wristwatch. Maybe some of the hour hand too.

Comment by Gurkenglas on Would you like me to debug your math? · 2021-06-12T08:31:53.820Z · LW · GW

Seems like less of a market niche, but link it!

Comment by Gurkenglas on Would you like me to debug your math? · 2021-06-11T18:16:08.048Z · LW · GW

It's not my one trick, of course, but it illustrates my usefulness. It's more maintainable not just because it is shorter but also because it has decades of theory behind it. Drawing the connection unlocks inspiration from entire branches of math. And the speedups from standing on the shoulders of giants go far beyond the constant factors from vectorized instructions.

Comment by Gurkenglas on Finite Factored Sets: Orthogonality and Time · 2021-06-11T13:41:28.517Z · LW · GW

subpartitions

So you're doing category theory after all! :)

Comment by Gurkenglas on Would you like me to debug your math? · 2021-06-11T11:47:08.889Z · LW · GW

Sure, I'll try it. I don't expect to be an order-of-magnitude power multiplier in that case, though.

Comment by Gurkenglas on The reverse Goodhart problem · 2021-06-08T21:41:15.184Z · LW · GW

I wouldn't relate it to humans. In just about any basic machine learning setting, (train, test) has aspect 2. In fact, what you describe speaks in favor of modeling preferences using something other than utility functions, where aspect 3 is ruled out.

re your natural example, I would expect that as one shifts from 90% doing the best for the worst off + 10% attainable utility preservation to 100% the former, average welfare goes down.

Comment by Gurkenglas on Speculations against GPT-n writing alignment papers · 2021-06-08T12:59:00.867Z · LW · GW

Take into account that the AI that interprets needs not be the same as the network being interpreted.

Why do you think that a mere autocomplete engine could not do interpretability work? It has been demonstrated to write comments for code and code for specs.

Comment by Gurkenglas on Speculations against GPT-n writing alignment papers · 2021-06-08T12:55:45.465Z · LW · GW

The error correction needs to be present in the original network because I also do some of the converting network into english. The only reason I don't do everything myself is that it takes too long. The proportion can be higher at the topmost levels because there are less tasks there. The error correction doesn't let it completely ignore what I would do at the low levels because on the 1% I can compare its outputs to mine, so they need to at least superficially look similar.

If we find that there's a bunch of redundancy, we can check whether there is any way to cut it down that would suddenly identify a bunch of mesa optimization. So the mesaoptimizer would have to take over the network entirely or trick its bretheren as well as us.

Comment by Gurkenglas on Speculations against GPT-n writing alignment papers · 2021-06-07T22:57:17.811Z · LW · GW

Translating to a tree of natural language descriptions and back lets you

1. use different instances of GPT at each step, so that any coded messages would have to rely on the common training set rather than shared internals.
2. do a random subset of the tasks yourself, so that any subversion of the target system would have to be based on subtle biases, perhaps with a small number of outright lies that it hopes we don't implement manually, rather than the description being fabricated from whole cloth.

Having the thing write papers is merely an existence proof of embedded agency being irrelevant except for deconfusion.

Intelligent agents causally responsible for your existence.

What do you mean you can think of this, I told this to you :D

Comment by Gurkenglas on Tracey Davis and the Prisoner of Azkaban - Part 2 · 2021-06-07T09:27:59.796Z · LW · GW

Was the conversation in chapter one after this or is Tracey going back and forth between the two, as a Tracey does?

You, me, the Three Broomsticks, a bottle of fizzix and a nuclear missile?

Channelling Luna, I see.

Comment by Gurkenglas on SIA is basically just Bayesian updating on existence · 2021-06-04T17:32:39.677Z · LW · GW

If you try to model yourself as a uniformly randomly selected observer somewhere in the Solomonoff prior, that doesn't work because there isn't a uniform distribution on the infinite naturals. When you weight them by each universe's probability but allow a universe to specify the number of observers in it, you still diverge because the number goes up uncomputably fast while the probability goes only exponentially down. In the end, probabilities are what decision theories use to weight their utility expectations. Therefore I suggest we start from a definition of how much we care about what happens to each of the forever infinite copies of us throughout the multiverse. It is consistent to have 80% of the utility functions in one's aggregate say that whatever happens to the overwhelming majority of selves in universes of description length >1000, it can only vary utility by at most 0.2.

Comment by Gurkenglas on Finite Factored Sets · 2021-06-04T16:27:12.205Z · LW · GW

Call the index set of X IX. Call the partition into empty parts indexed by S 0S. We have 0 ⊣ I ⊣ D ⊣ ⊔ ⊣ T.

None of the our three adjunction strings can be extended further. Let's apply the construction that gave us histories at the other 5 ends. Niceness is implicit.
- The right construction of TS->X is the terminal S->S' with a TS'->X: The image of ⊔(TS->X).
- The left construction of X->0S is the initial S'->S with a X->0S': The image of I(X->0S).
- The left construction of B->FX is the initial X'->X with a B->FX': The image of ∨(B->FX).
- The right construction of Δ1•->S is the terminal •->• with a Δ1•->S: The image of Δ•(Δ1•->S).
- The left construction of S->Δ∅• is absurd, but can still be written as the image of Δ•(S->Δ∅•).
- The history of ∨B->X is the terminal B->B' with a ∨B'->X: Breaks the pattern! F(∨B->X) does not have the information to determine the history.

In fact, ⊔T, I0, ∨F, Δ•Δ1 and Δ•Δ∅ are all identity, only F∨ isn't.

Comment by Gurkenglas on Tracey Davis and the Prisoner of Azkaban - Part 1 · 2021-06-04T10:47:42.265Z · LW · GW

She left her wand on the bar and she didn't immediately lose? Moody must have expected her to have another wand, given that she left her wand on the bar theatrically.

Comment by Gurkenglas on Finite Factored Sets · 2021-06-03T20:44:11.192Z · LW · GW

Let 1 be the category with one object • and one morphism. Let Δx be the constant functor to x.

A set is a family of • called elements. A set morphism S->S' has a 1-morphism between each element of S and some element of S'. The 1-morphisms •->Δ•S correspond to the set morphisms Δ∅•->S. The 1-morphisms Δ•S->• correspond to the set morphisms S->Δ1•. We have Δ∅ ⊣ Δ• ⊣ Δ1.

Let 0 the empty category. • is the empty family. A 1-morphism has nothing to prove. There's no forgetful functor 1->0 so the buck stops here.

Comment by Gurkenglas on Finite Factored Sets · 2021-06-03T09:02:21.391Z · LW · GW

Let's try category theory.

A partition is a family of sets called parts. A partition morphism X->X' has a function from each part of X to some part of X'. It witnesses that X is finer than X'¹.

The underlying set of a partition is its disjoint union. Call the discrete partition of S DS. The functions S->⊔X correspond to the partition morphisms DS->X. Call the trivial partition of S TS. The functions ⊔X->S correspond to the partition morphisms X->TS. In terser notation, we have D ⊣ ⊔ ⊣ T.

A factorization is a family of partitions called factors. A factorization morphism B->B' has a partition morphism to each factor of B' from some factor of B.²

The underlying partition of a factorization is its common refinement. Call the trivial factorization of X FX.³ The partition morphisms X->∨B correspond to the factorization morphisms FX->B: We have F ⊣ ∨. The absence of "discrete factorizations" as a right adjoint to ∨ is where histories come from.

A history of ∨B->X is a nice⁴ B->H with a ∨H->X. The history of ∨B->X is its terminal history. Note that this also attempts to coarsen each factor. ∨B->X being weakly after ∨B->X' is witnessed by a nice ∨H->X' or equivalently H->H'. ∨B->X and ∨B->X' are orthogonal iff the pushout of B->H and B->H' is the empty factorization.

Translating "2b. Conditional Orthogonality" is taking a while (I think it's something with pushouts) so let's post this now. I'm also planning to generalize "family" to "diagram". Everyone's allowed to ask stupid questions, including basic category theory.

¹: Which includes that X might rule out some worlds.
²: Trying to avert the analogy break cost me ~60% of the time behind this comment.
⁴: Nice means that everything in sight commutes.

Comment by Gurkenglas on If You Want to Find Truth You Need to Step Into Cringe · 2021-06-01T19:59:27.207Z · LW · GW

Unattractiveness => Cringe

Or Cringe => Unattractiveness, or they have a common cause. The people in the video may be unattractive because the author wanted to convince the viewer that weeabooism correlates with low social status. When you listen to the linked song, the reasoning it brings forward for why the viewer should consider weeaboos cringe is that they believe to have higher social status than they do.

Comment by Gurkenglas on If individual performance is Pareto distributed, how should we reform education? · 2021-05-25T09:23:23.150Z · LW · GW

How do you measure performance, then? If you can only rank it, distributions mean nothing.

Comment by Gurkenglas on Don't feel bad about not knowing basic things · 2021-05-24T18:49:43.859Z · LW · GW

You can map across any monad, but not everything you can map across is a monad. Applicatives are in between.

Comment by Gurkenglas on Don't feel bad about not knowing basic things · 2021-05-24T08:13:34.030Z · LW · GW

In the middle are stuff in between.

What an Applicative is? :)

Comment by Gurkenglas on Are PS5 scalpers actually bad? · 2021-05-18T18:13:56.353Z · LW · GW

Couldn't producers just hold an auction and have the proceeds beyond the price they're allowed by public opinion to charge go to charity?

Comment by Gurkenglas on European Soylent alternatives · 2021-05-16T10:56:45.005Z · LW · GW

Is this up to date?

Comment by Gurkenglas on How to compute the probability you are flipping a trick coin · 2021-05-15T15:23:05.486Z · LW · GW

The exponential is because updates happen on a logarithmic scale. Do you have a simple variant of the problem in mind where we don't get exponentials? When I try to construct one, I have to start from "we don't get exponentials" and calculate how the probabilities of different hypotheses would have to converge over time.

Comment by Gurkenglas on How to compute the probability you are flipping a trick coin · 2021-05-15T06:47:43.237Z · LW · GW

2^-n is in fact the probability of a coin showing n heads. Where is the choice?

Comment by Gurkenglas on Agency in Conway’s Game of Life · 2021-05-13T06:55:43.694Z · LW · GW

I would say it all depends on whether there is a wall gadget which protects everything on one side from anything on the other side. (And don't forget the corner gadget.)

If so, cover the edges of the controlled portion in it, except for a "gate" gadget which is supposed to be a wall except openable and closable. (This is relatively easier since a width of 100 ought to be enough, and since we can stack 10000 of these in case one is broken through - rarely should chaos be able to reach through a 100x10000 rectangle.)

Wait 10^40 steps for the chaos to lose entropy. The structures that remain should be highly compressible in a CS sense, and made of a small number of natural gadget types. Send out ships that cover everything in gliders. This should resurrect the chaos temporarily, but decrease entropy further in the long run. Repeat 10^10 times, waiting 10^40 steps in between.

The rest should be a simple matter of carefully distinguishing the remaining natural gadgets with ship sensors to dismantle each. Programming a smiley face deployer that starts from an empty slate is a trivial matter.

If walls are constructible, there's no need for gates, and also one could allow a margin for error in the sensory ships: One could advance the walls after claiming some area, in case a rare encounter summons another era of chaos.

All this is less AGI than ordinary game AI - a bundle of programmed responses.

Comment by Gurkenglas on Is driving worth the risk? · 2021-05-11T07:44:03.246Z · LW · GW

I think making utility linear in years is a mistake. The remote possibility of finding a physics hack to control infinite matter in finite time does not curbstomp all other considerations, therefore the utility of that outcome is finite. I prefer 66% of BB(1000) years to 33% of BB(10000) years. I am uncertain about my preferences, but utility functions are not aggregated by taking the expectation.

The only authority on your preferences is yourself; but reasonable agents, when a hypothetical proves them dutch-bookable/incoherent, will become less certain about their preferences.

What this cashes out to is that you should calculate the value not of a year but of (an extra 1% chance of) making it to takeoff. (From what you would do in (perhaps physically impossible) hypotheticals.)

Comment by Gurkenglas on MikkW's Shortform · 2021-05-10T18:46:50.110Z · LW · GW

So a forager animal with no predators isn't free because it has to look for food?

Comment by Gurkenglas on Domain Theory and the Prisoner's Dilemma: FairBot · 2021-05-10T07:09:50.340Z · LW · GW

Yeah, enacting comes in at a higher level of interpretation than is yet considered here. The increasing levels of interpretation here are: Set theory or other math foundations; we consider sets of queries and beliefs and players with functions between them; we add porder and monotonicity; we specify proof engines and their properties like consistency; we define utilities, decision theories, and what makes some players better than others. (Category theory is good at keeping these separate.) I'd start talking about "enacting" when we define a decision theory like "Make the decision such that I can prove the best lower bound on utility.". What do you mean by deciding on a belief state? "Decision" is defined before I establish any causation from decisions to beliefs.

Oh, I thought you meant you didn't see why any two beliefs had an upper bound. My choice to make players monotonic comes from intuition that that's how the math is supposed to look. I'd define Query=P(Decision) as Decision->2 as well but that plainly makes no sense so I'm looking for the true posetty definition of Query, and "logical formulas" looks good so far. Switching back and forth sounds more like you want to do multiple decisions, one after the other. There's also a more grounded case to be made that your policy should become more certain as your knowledge does, do you see it?

Comment by Gurkenglas on Domain Theory and the Prisoner's Dilemma: FairBot · 2021-05-09T22:34:35.925Z · LW · GW

The belief state diagram is upward closed because I included the inconsistent belief states. We could say that a one-query player "decides to defect" if his query is proven false. Then he will only decide on both decisions when his beliefs are inconsistent. Alternatively we could have a query for each primitive decision, inducing a monotone map from P({C,D}) to queries; or we could identify players with these monotone maps.

I didn't follow the bit about being modeled as oneself. Every definition of the belief space gives us a player space, yes? And once we specify some beliefs we have a tournament to examine, an interesting one if we happen to pipe the player's outputs into their inputs through some proof engines. Define enacted.

Comment by Gurkenglas on Domain Theory and the Prisoner's Dilemma: FairBot · 2021-05-09T16:46:30.487Z · LW · GW

In intuitionistic logic, "Cooperate iff either of C or ⊥ is provable." is equivalent to "Cooperate iff (C or ⊥) is provable.". So should we only consider players of form "Cooperate iff _ is provable.", where _ is some intuitionistic formula? Well...

Comment by Gurkenglas on Why are the websites of major companies so bad at core functionality? · 2021-05-09T14:34:21.274Z · LW · GW

I suppose for messy real-world tasks, you can't define distances objectively ahead of time. You could simply check a random 10 (x,f(x)) and choose how much to pay. In an ideal world, if they think you're being unfair they can stop working for you. In this world where giving someone a job is a favor, they could go to a judge to have your judgement checked.

Though if we're talking about AIs: You could have the AI output a probability distribution g(x)  over possible f(x) for each of the 100 x. Then for a random 10 x, you generate an f(x) and reward the AI according to how much probability it assigned to what you generated.

Comment by Gurkenglas on Why are the websites of major companies so bad at core functionality? · 2021-05-08T19:24:44.816Z · LW · GW

How would you goodhart this metric? To be clear, you want to map x to f(x), but this takes a second of your time. You pay them to map x to f(x), but they map x to g(x). After they're done mapping 100 x to g(x), you select a random 10 of those 100, spend 10 seconds to calculate the corresponding g(x)-f(x), and pay them more the smaller the absolute difference.

Comment by Gurkenglas on Why are the websites of major companies so bad at core functionality? · 2021-05-08T16:09:23.682Z · LW · GW

Couldn't you make them care by making their pay dependent on how well they predict what you would decide, as measured by you redoing the decision for a representative sample of tasks?

Comment by Gurkenglas on interpreting GPT: the logit lens · 2021-05-01T08:53:58.823Z · LW · GW

floating point underflow to simulate relu

Oh that's not good. Looks like we'd need a version of float that keeps track of an interval of possible floats (by the two floats at the end of the interval). Then we could simulate the behavior of infinite-precision floats so long as the network keeps the bounds tight, and we could train the network to keep the simulation in working order. Then we could see whether, in a network thus linear at small numbers, every visibly large effect has a visibly large cause.

By the way - have you seen what happens when you finetune GPT to reinforce this pattern that you're observing, that every entry of the table, not just the top right one, predicts an input token?

Comment by Gurkenglas on interpreting GPT: the logit lens · 2021-04-29T17:16:27.870Z · LW · GW

gelu has the same property

Actually, gelu is differentiable at 0, so it is linear on close-to-zero values.

Comment by Gurkenglas on When Should the Fire Alarm Go Off: A model for optimal thresholds · 2021-04-29T06:29:19.045Z · LW · GW

not having to pay  is effectively the same as gaining

No! If you're going to add/multiply something to your utility function for convenience, you have to do it for every action. When the building is on fire, deciding whether to turn on the sprinklers is a decision on whether to spend T and gain D, so V(TP)-V(FN) needs to be D-T.

Comment by Gurkenglas on What topics are on Dath Ilan's civics exam? · 2021-04-27T07:39:27.888Z · LW · GW

This is how you get Latin courses.

Comment by Gurkenglas on NTK/GP Models of Neural Nets Can't Learn Features · 2021-04-23T09:42:09.172Z · LW · GW

Can we therefore model fine-tuning as moving around in the parameter tangent space around the pre-trained network?

Comment by Gurkenglas on Löb's theorem simply shows that Peano arithmetic cannot prove its own soundness · 2021-04-22T12:17:37.579Z · LW · GW

We are trying to prove some statement p. When we're proving it for all pairs of real numbers x and y, the spell "Without loss of generality!" gives us the lemma "x<=y". When we're proving it for all natural numbers, the spell "Complete induction!" gives us the lemma "p holds for all smaller numbers". When we're working in PA, "Löb's Theorem!" gives us the lemma "Provable(p)".

Edit: And in general, math concepts are useful because of how they can be used. Memorize not things that are true, but things you can do. See this comment.

Comment by Gurkenglas on Löb's theorem simply shows that Peano arithmetic cannot prove its own soundness · 2021-04-22T12:08:16.021Z · LW · GW

Then apparently "PA can't prove its own soundness." is an even weaker true statement among the ones one might choose to remember :).

Comment by Gurkenglas on Löb's theorem simply shows that Peano arithmetic cannot prove its own soundness · 2021-04-22T11:26:35.128Z · LW · GW

Misunderstanding. I'm saying "In order to prove p, it suffices to prove □p→p.". Compare to complete induction.

Comment by Gurkenglas on Löb's theorem simply shows that Peano arithmetic cannot prove its own soundness · 2021-04-22T10:11:03.171Z · LW · GW

Yes, but 3 is a one-way street. You should remember the theorem,  not the corollary.

Comment by Gurkenglas on Löb's theorem simply shows that Peano arithmetic cannot prove its own soundness · 2021-04-22T09:32:49.023Z · LW · GW

If you told me to write down "PA can't prove its own soundness.", I would not write down Löb's theorem, I would write down "¬(PA ⊢ ∀p: Provable(p)→p)".

I would translate Löb's theorem as "In PA, when proving something, you get the lemma that it is provable for free.". Compare to "When proving a property for all natural numbers, you get the lemma that it holds for all smaller natural numbers for free."

Comment by Gurkenglas on Are there opportunities for small investors unavailable to big ones? · 2021-04-20T00:44:21.846Z · LW · GW

How does price discrimination serve the public? I got the impression it's the sort of drawback of markets that gets regulated away when it gets pronounced enough, like network effects.

Comment by Gurkenglas on Superrational Agents Kelly Bet Influence! · 2021-04-17T10:47:35.994Z · LW · GW

Suppose instead of a timeline with probabilistic events, the coalition experiences the full tree of all possible futures - but we translate everything to preserve behavior. Then beliefs encode which timelines each member cares about, and bets trade influence (governance tokens) between timelines.

Comment by Gurkenglas on Why has nuclear power been a flop? · 2021-04-16T19:57:18.432Z · LW · GW

a tax on each kilowatt-hour

Wouldn't this almost precisely incentivize approving anything immediately?

Comment by Gurkenglas on Computing Natural Abstractions: Linear Approximation · 2021-04-16T09:27:23.814Z · LW · GW

Y can consist of multiple variables, and then there would always be multiple ways, right? I thought by indirect you meant that the path between X and Y was longer than 1. If some third cause is directly upstream from both, then I suppose it wouldn't be uniquely defined whether changing X changes Y, since there could be directions in which to change the cause that change some subset of X and Y.

Comment by Gurkenglas on Computing Natural Abstractions: Linear Approximation · 2021-04-15T22:48:18.757Z · LW · GW

If matrix A maps each input vector of X to a vector of which the first entry corresponds to Y, subtracting multiples of the first row from every other row to make them orthogonal to the first row, then deleting the first row, would leave a matrix whose row space is the input vectors that keep Y at 0, and whose column space is the outputs thus still reachable. If you fix some distribution on the inputs of X (such as the normal distribution with a given covariance matrix), whether this is losslessly possible should be more interesting.

Comment by Gurkenglas on Computing Natural Abstractions: Linear Approximation · 2021-04-15T21:34:05.754Z · LW · GW

With differential geometry, there's probably a way to translate properties between points. And a way to analyze the geometry of the training distribution: Train the generator to be locally injective and give it an input space uniformly distributed on the unit circle, and whether it successfully trains tells you whether the training distribution has a cycle. Try different input topologies to nail down the distribution's topology. But just like J's rank tells you the dimension of the input distribution if you just give the generator enough numbers to work with, a powerful generator ought to tell you the entire topology in one training run...

If the generator's input distribution is uniform, Σ is diagonal, and the left SVD component of J is also the left (and transposed right) SVD component of JΣJᵀ. Is that useful?

Comment by Gurkenglas on Computing Natural Abstractions: Linear Approximation · 2021-04-15T19:52:37.748Z · LW · GW

What a small world - I was thinking up a very similar transparency tool since two weeks ago. The function f from inputs to activations-of-every-neuron isn't linear but it is differentiable, aka linear near (input space, not pixel coordinates!) an input. The jacobian J at an input x is exactly the cross-covariance matrix between a normal distribution 𝓝(x,Σ) and its image 𝓝(f(x),JΣJᵀ), right? Then if you can permute a submatrix of JΣJᵀ into a block-diagonal matrix, you've found two modules that work with different properties of x. If the user gives you two modules, you could find an input where they work with different properties, and then vary that input in ways that change activations in one module but not the other to show the user what each module does. And by something like counting the near-zero entries in the matrix, you could differentiably measure the network's modularity, then train it to be more modular.

Train a (GAN-)generator on the training inputs and attach it to the front of the network - now you know the input distribution is uniform, the (reciprocals of) singular values say the density of the output distribution in the direction of their singular vector, and the inputs you show the user are all in-distribution.

And I've thought this up shortly before learning terms like cross-covariance matrix, so please point out terms that describe parts of this. Or expand on it. Or run away with it, would be good to get scooped.