The Variational Characterization of KL-Divergence, Error Catastrophes, and Generalization 2021-05-20T20:57:20.118Z
The Variational Characterization of Expectation 2021-04-16T00:12:09.743Z
What is the Difference Between Cheerful Price and Shadow Price? 2021-03-28T19:08:56.681Z
Minimal Map Constraints 2021-02-21T17:49:46.651Z
Time to Count or Count to Time? 2021-02-08T16:27:23.418Z
A Toy-Model of Instrumental Abstraction 2021-01-12T17:50:29.948Z
Minimal Maps, Semi-Decisions, and Neural Representations 2020-12-06T15:15:08.052Z
How to Catalyze Cooperation in Media Sharing 2020-10-24T16:14:33.724Z
A Toy Model for Media Sharing 2020-10-17T15:59:45.652Z
KL Divergence as Code Patching Efficiency 2020-09-27T16:06:26.186Z
Sufficiently Advanced Language Models Can Do Reinforcement Learning 2020-08-02T15:32:47.894Z
Structured Tasks for Language Models 2020-07-29T14:17:59.478Z
You Can Probably Amplify GPT3 Directly 2020-07-26T21:58:53.962Z
An Old Way to Visualize Biases 2020-07-24T00:10:17.970Z
Idea: Imitation/Value Learning AIXI 2020-07-03T17:10:16.775Z
Replication Dynamics Bridge to RL in Thermodynamic Limit 2020-05-18T01:02:53.417Z
Zachary Robertson's Shortform 2020-05-06T00:42:10.113Z
What Resources on Journal Analysis are Available? 2019-12-28T20:00:11.512Z
The Planning Problem 2019-08-04T18:58:55.186Z
Is there a user's manual to using the internet more efficiently? 2019-08-04T18:51:38.818Z


Comment by Zachary Robertson (zachary-robertson) on What are some triggers that prompt you to do a Fermi estimate, or to pull up a spreadsheet and make a simple/rough quantitative model? · 2021-07-26T20:14:09.355Z · LW · GW

Whenever I want to 'optimize' something I stop and do the following 'calculation':

  1. How long does it take to do the optimization? (including this calculation)
  2. What is the effect size?
  3. Subtract one from two

I find this helps curb over-analysis, procrastination, and masturbatory optimization. Technical explanation here. There are many XKCD comics also.

Comment by Zachary Robertson (zachary-robertson) on The Mountaineer's Fallacy · 2021-07-18T20:30:49.656Z · LW · GW

I'm sure this has a name, but I can't remember it. So I have given it a new name. The Mountaineer's Fallacy.

The Einstellung effect seems to be relevant. This refers to a person's predisposition to solve a given problem in a specific manner even though better or more appropriate methods of solving the problem exist. In particular, you can characterize the effect as having the wrong working hypothesis. Specifically, a wrong working hypothesis for an approach.

Question: What's a reasonable approach to get to the moon? Fallacy: We can climb things. Therefore a good start is to climb the tallest thing. Thus, working on finding or building a tall thing will eventually take us to the moon. Accordingly, a good feasibility test would be to climb Mount Everest.

Comment by Zachary Robertson (zachary-robertson) on MDP models are determined by the agent architecture and the environmental dynamics · 2021-05-31T20:59:11.912Z · LW · GW

I don't understand your point in this exchange.

Play or exercise.

I explicitly said I was going to be pedantic. It seems like useful/necessary role to play if you, a domain expert, were confused and then switched your viewpoint. This usually is where being formal becomes useful. First, it uncovers potentially subtle hidden assumptions. Second, it may offer a general result. Third, it protects the reader (me) from 'catching' your confusion by constraining communication to just things that can be independently verified.

Having said that,

You used the word 'model' in both of your prior comments, and so the search-replace yields "state-abstraction-irrelevant abstractions." Presumably not what you meant?

This does not come off as friendly. I asked you to search for 'model-irrelevant' which is distinct from 'model'. It's just a type of state-abstraction.

That's not a "concrete difference."

I claim there is an additional alternative. Two does not equal three. Just because you don't understand something doesn't mean it's not concrete.

I suppose those comments are part of the natural breakdown of civility at the end of an internet exchange and I'm probably no better off myself. Anyway, I certainly hope you figure out your confusion, although I see it's a far stretch my commentary is going to help you :)

Comment by Zachary Robertson (zachary-robertson) on MDP models are determined by the agent architecture and the environmental dynamics · 2021-05-30T13:21:16.905Z · LW · GW

I don’t think it’s a good use of time to get into this if you weren’t being specific about your usage of ‘model’ or the claim you made previously because I already pointed out a concrete difference: I claim it’s reasonable to say there are three alternatives while you claim there are two alternatives.

(If it helps you, you can search-replace model-irrelevant to state-abstraction because I don’t use the term model in my previous reply anyway.)

Comment by Zachary Robertson (zachary-robertson) on MDP models are determined by the agent architecture and the environmental dynamics · 2021-05-29T18:02:59.714Z · LW · GW

This was why gave a precise definition of model-irrelevance. I'll step through your points using the definition,

  1. Consider the underlying environment (assumed Markovian)
  2. Consider different state/action encodings (model-irrelevant abstractions) we might supply the agent.
  3. For each, fix a reward function distribution
  4. See what the theory predict

The problem I'm trying to highlight lies in point three. Each task is a reward function you could have the agent attempt to optimize. Every abstraction/encoding fixes a set of rewards under which the abstraction is model-irrelevant. This means the agent can successfully optimize these rewards.

[I]f you say "the MDP has a different model", you're either disagreeing with (1) the actual dynamics, or claiming that we will physically supply the agent with a different state/action encoding (2).

My claim is that there is a third alternative: you may claim that the reward function given to the agent does not satisfy model-irrelevance. This can be the case even if the underlying dynamics are markovian and the abstraction of the transitions satisfies model-irrelevance.

I don't follow. Can you give a concrete example?

That may take a while. The argument above is a reasonable candidate for a lemma. A useful example would show that the third alternative exists. Do you agree this is the crux of your disagreement with my objection? If so, I might try to formalize it.

Comment by Zachary Robertson (zachary-robertson) on MDP models are determined by the agent architecture and the environmental dynamics · 2021-05-29T00:12:33.511Z · LW · GW

I still see room for reasonable objection.

An MDP model (technically, a rewardless MDP) is a tuple

I need to be pedantic. The equivocation here is where I think the problem is. To assign a reward function we need a map from the state-action space to the reals. It's not enough to just consider a 'rewardless MDP'.

When we define state and action encodings, this implicitly defines an "interface" between the agent and the environment.

As you note, the choice of state-action encoding is an implicit modeling assumption. It could be wrong, but to even discuss that we do have to be technical. To be concrete, perhaps we agree that there’s some underlying dynamics that is Markovian. The moment we give the agent sensors we create our state abstraction for the MDP. Moreover, say we agree that our state abstraction needs to be model-irrelevant. Given a 'true' MDP and state abstraction that operates on we'll say that is model-irrelevant if where and we have, Strictly speaking, model-irrelevance is at least as hard to satisfy for a collection of MDPs than for a single MDP. In other words, we may be able to properly model a single task with an MDP, but a priori there should be skepticism that all tasks can be modeled with a specific state-abstraction. Later on you seem to agree with this conclusion,

That's also a claim that we can, in theory, specify reward functions which distinguish between 5 googolplex variants of red-ghost-game-over. If that were true, then yes - optimal policies really would tend to "die" immediately, since they'd have so many choices.

Specifically, the agent architecture is an implicit constraint on available reward functions. I'd suspect this does generalize into a fragility/impossibility result any time the reward is given to the agent in a way that's decoupled from the agent's sensors which is really going to be the prominent case in practice. In conclusion, you can try to work with a variable/rewardless MDP, but then this argument will apply and severely limit the usefulness of the generic theoretical analysis.

Comment by Zachary Robertson (zachary-robertson) on The Variational Characterization of KL-Divergence, Error Catastrophes, and Generalization · 2021-05-22T23:15:52.850Z · LW · GW

Because . They are the same. Does that help?

Comment by Zachary Robertson (zachary-robertson) on SGD's Bias · 2021-05-22T14:03:25.754Z · LW · GW

I’m assuming we can indeed box the bias as “drift from high noise to low noise”. I wonder if flat minima necessarily have lower noise, under empirical approximation, than sharp minima. If that were the case then you could use this to conclude that SGD does bias towards generalizable minima.

I’d look at this, but I figure you understand the SGD framework better and may have an idea about this?

Comment by Zachary Robertson (zachary-robertson) on The Variational Characterization of KL-Divergence, Error Catastrophes, and Generalization · 2021-05-21T14:23:42.924Z · LW · GW

The term is meant to be a posterior distribution after seeing data. If you have a good prior you could take . However, note could be high. You want trade-off between the cost of updating the prior and the loss reduction.

Example, say we have a neural network. Then our prior would be the initialization and the posterior would be the distribution of outputs from SGD.

(Btw thanks for the correction)

Comment by Zachary Robertson (zachary-robertson) on Open and Welcome Thread - May 2021 · 2021-05-06T14:11:58.548Z · LW · GW

They don't speak about a having a PhD but ability to get a into a top 5 graduate program.

Yes they do. On the same page,

The first step on this path is usually to pursue a PhD in machine learning at a good school. It’s possible to enter without a PhD, but it’s close to a requirement in research roles at the academic centres and DeepMind, which represent a large fraction of the best positions.

Certainly there’s a bottleneck on ‘good’ schools also, but then we can strengthen the claim using what they say later about ‘top’ schools being a proxy for success.

Comment by Zachary Robertson (zachary-robertson) on Open and Welcome Thread - May 2021 · 2021-05-06T13:46:09.388Z · LW · GW

They do say that a PhD from a top 5 program is a reasonable proxy for an AI research center. These are supply limited. Therefore, they are implying that top PhDs are a bottleneck. This is far upstream of everything else so it does seem that a top PhD is a reasonable proxy for the bottleneck.

Comment by Zachary Robertson (zachary-robertson) on NTK/GP Models of Neural Nets Can't Learn Features · 2021-04-24T15:48:20.571Z · LW · GW

I think that 'universal function approximation' and 'feature learning' are basically unrelated dimensions along which a learning algorithm can vary.

We may have reached the crux here. Say you take a time series and extract the Fourier features. By universal approximation, these features will be sufficient for any downstream learning task. So the two are related. I agree that there is no learning taking place and that such a method may be inefficient. However, that goes beyond my original objection.

This issue of 'embedding efficiency' seems only loosely related to the universal approximation property.

This is not a trivial question. In the paper I referenced the authors show that approximation efficiency of NTK for deep and shallow are equivalent. However, infinitely differentiable activations can only approximate smooth functions. On the other hand, ReLU seems capable of approximating a larger class of potentially non-smooth functions.

Comment by Zachary Robertson (zachary-robertson) on NTK/GP Models of Neural Nets Can't Learn Features · 2021-04-24T01:28:22.324Z · LW · GW

There's a big difference between 'universal learner' and 'fits any smooth function on a fixed input space'.

Note I never said 'universal learner'. What I actually said was,

It's clear enough that every finite embedding is a subspace of this embedding which sort of hints at the fact an infinite-width network is a universal function approximator.

In the context of ML universal approximation, or more specifically, universal function approximation is the argument showing that NTK functions are dense in a certain sense. This was meant to address your request,

You'll also need an argument showing that their density in the NTK embedding is bounded above zero.

This shows that the NTK functions in the associated reproducing space are dense in the smooth class of functions. I suspect I'm still not addressing your objection. If you could be more precise about your objection maybe we could get closer.

Comment by Zachary Robertson (zachary-robertson) on NTK/GP Models of Neural Nets Can't Learn Features · 2021-04-23T20:39:29.614Z · LW · GW

The argument about sub-embeddings is meant to be a hand-wave. More precisely, the NTK kernel can be used to fit any smooth function. You can see precisely which class in this paper. I don’t have a proof on hand that the map is injective into the NTK space.

Comment by Zachary Robertson (zachary-robertson) on NTK/GP Models of Neural Nets Can't Learn Features · 2021-04-23T16:36:35.235Z · LW · GW

This post argues that NTK/GP models of neural nets can't learn features. Feature learning here is explained as,

By 'feature learning' I mean the typical practice whereby a neural net is trained on one task, then 'fine-tuned' with a lower learning to rate to fit another task, usually with less data than the first.

It's worth pointing out that the NTK can be seen as an embedding. This embedding does not vary during training instead features are weighted differently. It's clear enough that every finite embedding is a subspace of this embedding which sort of hints at the fact an infinite-width network is a universal function approximator. Given, this I agree that it shouldn't be surprising that NTK doesn't learn features, but disagree on the reason. Namely, NTK doesn't learn features because the feature class at initialization is a universal class and already has a good representation at initialization.

Comment by Zachary Robertson (zachary-robertson) on What is the Difference Between Cheerful Price and Shadow Price? · 2021-03-28T22:00:20.665Z · LW · GW

Your example is interesting and clarifies exchange rates. However,

The shadow price quantifies the opportunity cost, so if I'm paid my shadow price, then that's just barely enough to cover my opportunity cost.

This is an interpretive point I'd like to focus on. When you move a constraint, in this case with price, the underlying equilibrium of the optimization shifts. From this perspective your usage of the word 'barely' stops making sense to me. If you were to 'overshoot' you wouldn't be optimal in the new optimization problem.

At this point I understand that the cheerful price will be equivalent to or more than the shadow price. You want to be able to shift the equilibrium point and have slack left over. It just seems obvious, to me, that shadow price isn't an exactly measurable thing in this context and so you'd naturally be led to make a confidence interval (belief) for it. Cheerful price is just the upper estimate on that. Hence, I'm surprised why this is being treated as a new / distinct concept.

Comment by Zachary Robertson (zachary-robertson) on What is the Difference Between Cheerful Price and Shadow Price? · 2021-03-28T21:50:25.374Z · LW · GW

I suppose this is the most correct answer. I'm not really updating very much though. From my perspective I'll continue to see cheerful price as a psychological/subjective reinvention of shadow price.

Edit: It seems clear in this context, shadow price isn't exactly measurable. Cheerful price is just the upper estimate on the shadow price.

Comment by Zachary Robertson (zachary-robertson) on Why Neural Networks Generalise, and Why They Are (Kind of) Bayesian · 2021-03-04T05:48:33.717Z · LW · GW

You seem to have updated your opinion: overtraining does make difference, but it’s not ‘huge’. Have you run a significance test for your lines of best fit? The plots as presented suggest the effect is significant.

Figure C.1.a indicates the tilting phenomena. Probabilities only go up to one so tilting down means that the most likely candidates from overstrained SGD are less likely with random sampling. Thus, unlikely random sampling candidates are more likely under SGD. At the tail, the opposite happens. Functions more likely with random sampling become less likely under SGD.

While the optimizer has a larger effect, I think the subtle question is whether the overtraining tilts in the same way each time. Figure 16 indicates yes again. This phenomena you consider to be minor is what I found most interesting about the paper.

Comment by Zachary Robertson (zachary-robertson) on Why Neural Networks Generalise, and Why They Are (Kind of) Bayesian · 2021-02-28T18:22:55.507Z · LW · GW

The main point, as I see it, is essentially that functions with good generalisation correspond to large volumes in parameter-space, and that SGD finds functions with a probability roughly proportional to their volume.

What I'm suggesting is that volume in high-dimensions can concentrate on the boundary. To be clear, when I say SGD only typically reaches the boundary, I'm talking about early stopping and the main experimental setup in your paper where training is stopped upon reaching zero train error.

We have done overtraining, which should allow SGD to penetrate into the region. This doesn’t seem to make much difference for the probabilities we get.

This does seem to invalidate the model. However, something tells me that the difference here is more about degree. Since you use the word 'should' I'll use the wiggle room to propose an argument for what 'should' happen.

If SGD is run with early stopping, as described above, then my argument is that this is roughly equivalent to random sampling via an appeal to concentration of measure in high-dimensions.

If SGD is not run with early stopping, it's enclosed by the boundary of zero train error functions. Because these are most likely in the interior these functions are unlikely to be produced by random sampling. Thus, on a log-log plot I'd expect overtraining to 'tilt' the correspondence between SGD and random sampling likelihoods downward.

Falsifiable Hypothesis: Compare SGD with overtaining to the random sampling algorithm. You will see that functions that are unlikely to be generated by random sampling will be more likely under SGD with overtraining. Moreover, functions that are more likely with random sampling will be become less likely under SGD with overtraining.

Comment by Zachary Robertson (zachary-robertson) on Recognizing Numbers · 2021-01-26T15:56:17.336Z · LW · GW

A problem I'm finding with this formulation is that it moves the problem to something that is arguably harder. We've replaced the problem of recognizing numbers with the problem of recognizing sets. The main post does this as well. There's nothing technically wrong with this, but then the immediate question is this: how do we know when sets are useful? If a similar logic applies: how do we create an abstraction of a set from observation(s)? George Cantor, one of the founders of set theory writes,

A set is a gathering together into a whole of definite, distinct objects of our perception [Anschauung] or of our thought—which are called elements of the set.

To gather distinct perceptions together requires unity of apperception or a single 'I think' to be attached to each perception so that they may be brought under a category/set/etc.

Comment by Zachary Robertson (zachary-robertson) on What is going on in the world? · 2021-01-18T22:22:18.967Z · LW · GW

I like this one because you can generate more using GPT3 (that doesn’t imply they make sense)

Comment by Zachary Robertson (zachary-robertson) on The Good Try Rule · 2020-12-28T23:55:33.335Z · LW · GW

I think this post does a good job of motivating a definition for “good try”. It also seems possible to think of habit changes as examples of goals. I personally find the SMART goal system to be useful and related to the discussion. SMART goals should be Specific, Measurable, Attainable, Reasonable, Timely. The approach is to specify why the habit change goal meets each of the SMART criteria.

I’d think that giving something a “good try” is similar enough to trying habit change with a SMART goal that I mention this. This makes it clearer (at least for me) that what we’re talking about is creating some sort of prediction about how a successful habit change will proceed and then testing the prediction by attempting the habit change according to the plan. I think this also opens up the opportunity for giving something multiple “good tries” before evaluating success/failure.

Comment by Zachary Robertson (zachary-robertson) on Minimal Maps, Semi-Decisions, and Neural Representations · 2020-12-08T16:06:09.644Z · LW · GW

I'm going to have to spend some time unpacking the very compact notation in the post, but here are my initial reactions.

I should apologize a bit for that. To a degree I wasn't really thinking about any of the concepts in the title and only saw the connection later.

First, very clean proof of the lemma, well done there.


Second... if I'm understanding this correctly, each neuron activation (or set of neuron activations?) would contain all the information from some-part-of-data relevant to some-other-part-of-data and the output.

To be honest, I haven't thought about interpreting the monad beyond the equivalence with neural networks. One thing I noticed early on is that you can create sequences of activations that delete information in the limit. For example, the ReLU activation is the limit of the SoftMax (change log base). I think something like this could be seen as abstracting away unnecessary data.

Better yet, it looks like the OP gives a recipe for unpacking those natural abstractions?

I'm not sure. I do think the method can justify the reuse of components (queries) and I wouldn't be surprised if this is a pre-requisite for interpreting network outputs. Most of my interest comes from trying to formalize the (perhaps obvious) idea that anything that can be reduced to a sequence of classifications can be used to systematically translate high-level reasoning about these processes into a neural networks.

I guess it's best to give an example of how I currently think about abstraction. Say we take the position that every object is completely determined by the information contained in a set of queries such that . For a picture, consider designing a game-avatar (mii character) by fiddling around with some knobs. The formalism lets us package observations as queries using return. Thus, we're hypothesizing that we can take a large collection of queries and make them equivalent to a small set of queries. Said another way, we can answer a large collection of queries by answering a much smaller set of 'principle' queries. In fact, if our activation was linear we'd be doing PCA. How we decide to measure success determines what abstraction is learned. If we only use the to answer a few queries then we're basically doing classification. However, if the have to be able to answer every query about then we're doing auto-encoding.

Comment by Zachary Robertson (zachary-robertson) on Doing discourse better: Stuff I wish I knew · 2020-10-01T12:48:18.789Z · LW · GW

Yes, but StackExchange has community posts that editable and I think this is nice. I believe edits for normal posts work like you say.

Comment by Zachary Robertson (zachary-robertson) on Doing discourse better: Stuff I wish I knew · 2020-09-30T13:51:20.322Z · LW · GW

It could still be useful to see different ‘versions’ of an article and then just vote on the ones that are best.

Comment by Zachary Robertson (zachary-robertson) on Richard Ngo's Shortform · 2020-08-21T00:24:38.789Z · LW · GW

Ya, totally messed up that. I meant the AI Alignment Forum or AIAF. I think out of habit I used AN (Alignment Newsletter)

Comment by Zachary Robertson (zachary-robertson) on Richard Ngo's Shortform · 2020-08-20T23:13:56.178Z · LW · GW

On AI alone (which I am using in large part because there's vaguely more consensus around it than around rationality), I think you wouldn't have seen almost any of the public write-ups (like Embedded Agency and Zhukeepa's Paul FAQ) without LessWrong

I think a distinction should be made between intellectual progress (whatever that is) and distillation. I know lots of websites that do amazing distillation of AI related concepts (literally I think most people would agree that sort of work is important in order to make intellectual progress, but I also think significantly less people would agree distillation is intellectual progress. Having this distinction in mind, I think your examples from AI are not as convincing. Perhaps more so once you consider the Less Wrong is often being used more as a platform to share these distillations than to create them.

I think you're right that Less Wrong has some truly amazing content. However, once again, it seems a lot of these posts are not inherently from the ecosystem but are rather essentially cross-posted. If I say a lot of the content on LW is low-quality it's mostly an observation about what I expect to find from material that builds on itself. The quality of LW-style accumulated knowledge seems lower than it could be.

On a personal note, I've actively tried to explore using this site as a way to engage with research and have come to a similar opinion as Richard. The most obvious barrier is the separation between LW and AIAF. Effectively, if you're doing AI safety research, to second-order approximation you can block LW (noise) and only look at AIAF (signal). I say to second-order because anything from LW that is signal ends up being posted on AIAF anyway which means the method is somewhat error-tolerant.

This probably comes off as a bit pessimistic. Here's a concrete proposal I hope to try out soon enough. Pick a research question. Get a small group of people/friends together. Start talking about the problem and then posting on LW. Iterate until there's group consensus.

Comment by Zachary Robertson (zachary-robertson) on Richard Ngo's Shortform · 2020-08-20T22:44:48.091Z · LW · GW

Setting them higher (standards) probably wouldn't result in more good content.

I broadly agree here. However, I do see the short-forms as a consistent way to skirt around this. I'd say at least 30% of the Less Wrong value proposition are the conversations I get to have. Short-forms seem to be more adapted for continuing conversations and they have a low bar for being made.

I could clarify a bit. My main problem with low quality content isn't exactly that it's 'wrong' or something like that. Mostly, the issues I'm finding most common for me are,

  1. Too many niche pre-requisites.
  2. No comments
  3. Nagging feeling post is reinventing the wheel

I think one is a ridiculously bad problem. I'm literally getting a PhD in machine learning, write about AI Safety, and still find a large number of those posts (yes AN posts) glazed in internal-jargon that makes it difficult to connect with current research. Things get even worse when I look at non-AI related things.

Two is just a tragedy of the fact the rich get richer. While I'm guilty of this also, I think that requiring posts to also post seed questions/discussion topics in the comments could go a long way to alleviate this problem. I oftentimes read a post and want to leave a comment, but then don't because I'm not even sure the author thought about the discussion their post might start.

Three is probably a bit mean. Yet, more than once I've discovered a Less Wrong concept already had a large research literature devoted to it. I think this ties in with one due to the fact niche pre-reqs often go hand-in-hand with insufficient literature review.

Comment by Zachary Robertson (zachary-robertson) on Richard Ngo's Shortform · 2020-08-20T19:10:39.275Z · LW · GW

One fairly strong belief of mine is that Less Wrong's epistemic standards are not high enough to make solid intellectual progress here.

I think this is literally true. There seems to be very little ability to build upon prior work.

Out of curiosity do you see Less Wrong as significantly useful or is it closer to entertainment/habit? I've found myself thinking along the same lines as I start thinking about starting my PhD program etc. The utility of Less Wrong seems to be a kind of double-edged sword. On the one hand, some of the content is really insightful and exposes me to ideas I wouldn't otherwise encounter. On the other hand, there is such an incredible amount of low-quality content that I worry that I'm learning bad practices.

Comment by Zachary Robertson (zachary-robertson) on Developmental Stages of GPTs · 2020-08-16T13:48:49.728Z · LW · GW

The paper doesn't draw the causal diagram "Power → instrumental convergence", it gives sufficient conditions for power-seeking being instrumentally convergent. Cycle reachability preservation is one of those conditions.

This definitely feels like the place where I'm missing something. What is the formal definition of 'power seeking'? My understanding is that power is the rescaled value function, in the limit of farsightedness is decreasing, and in the context of terminal state reachability always goes to zero. The agent literally gives up power to achieve it's goal.

Now, I realize this might just be naming convention confusion. I do, I think, understand the idea that preserving cycle reachability could be instrumental. However,

Cycle reachability preservation is one of those conditions.

this seems circular to me. My understanding of figure 7 of your paper indicates that cycle reachability cannot be a sufficient condition.

You can formalize a kind of "alignment capability" by introducing a joint distribution over the human's goals and the induced agent goals

This is very interesting to me. Thank you for sharing. I wonder what you mean by,

The point isn't that alignment is impossible, but that you have to hit a low-measure set of goals which will give you aligned or non-power-seeking behavior.

Given your definitions it's clear that the set of aligned goals must be low-measure. Also by your reasoning 'non-power seeking behavior' is not instrumental. However, in a curricula, power-seeking must be instrumental or else the agent is less likely to achieve it's goals. It seems there's a two out of three condition (aligned/general/non-power-seeking) here. My philosophy is that aligned/general is OK based on a shared (?) premise that,

If the rewards are -close in sup-norm, then you can get nice regret bounds, sure.

Comment by Zachary Robertson (zachary-robertson) on Developmental Stages of GPTs · 2020-08-15T15:11:20.004Z · LW · GW

Thanks for the comment! I think max-ent brings up a related point. In IRL we observed behavior and infer a reward function (using max-ent also?). Ultimately, there is a relationship between state/action frequency and reward. This would considerably constrain the distribution of reward functions to be considered in instrumental/power analysis.

I think I get confused about the usage of power the most. It seems like you can argue that given a random reward to optimize the agent will try to avoid getting turned off without invoking power. If there's a collection of 'turned-off' terminal states where the agent receives no further reward for all time then every optimized policy will try to avoid such a state. It seems as though we could define for each and then we'd have,

It seems like this would extend out to a full definition. The advantage here being that you can say, “If one action in this state is more instrumental than another then the return is likely to be greater as well”.

I imagine that this is sufficient for the catastrophic power-stealing incentives

I'm not confident analysis in the single-agent case extends to the multi-agent setting. If our goal is fixed as and the agent's varies then I might argue it's instrumental for us to align the agent's goal with ours and vice versa. In general, I'd suspect that there are goals we could give the agent that significantly reduce our gain. However, I'd also suspect the opposite.

Say we have the capability to introduce a second agent with a reward . Would we want to introduce the agent? It seems reasonable to argue that we would if we could guarantee . There might be a way to argue over randomness and say this would double our gain. More speculatively, what if ?

Comment by Zachary Robertson (zachary-robertson) on Developmental Stages of GPTs · 2020-08-12T01:07:40.240Z · LW · GW

I think this is a slight misunderstanding of the theory in the paper.

I disagree. What I'm trying to do is outline a reinterpretation of the 'power seeking' claim. I'm citing the pre-task section and theorem 17 to insist that power-seeking can only really happen in the pre-task because,

The way the theory does this is by saying that first a reward function is drawn from the distribution, then it is given to the agent, then the agent thinks really hard, and then the agent executes the optimal policy.

The agent is done optimizing before the main portion of the paper even begins. I do not see how the agent 'seeks' out powerful states because, as you say, the agent is fixed. Now, when you say,

If we do not know an agent's goal, but we know that the agent knows its goal and is optimal w.r.t it, then from our perspective the agent is more likely to go to higher-power states. (From the agent's perspective, there is no probability, it always executes the deterministic perfect policy for its reward function.)

My issue is that the Figure 19 shows an example where the agent doesn't display this behavior. Tautologically, the agent tends to do what is instrumentally convergent. If power was tied to instrumental convergence then we could also say the agent tends to do what is powerful. However, it seems as though a state can be arbitrarily powerful without having the instrumental property which breaks the analogy.

From here I could launch a counter-argument: if power can be arbitrarily removed from the instrumental convergence phenomena then agent 'wireheading', while a powerful state, is sufficiently out of the way from most goals that the agent most likely won't. To be clear, I don't have any strong opinions, I'm just confused about these interpretive details.

Comment by Zachary Robertson (zachary-robertson) on Developmental Stages of GPTs · 2020-08-11T18:30:15.585Z · LW · GW

I appreciate the more concrete definition of IC presented here. However, I have an interpretation that is a bit different from you. I'm following the formal presentation.

My base understanding is that a cycle with max average reward is optimal. This is essentially just a definition. In the case the agent doesn't know the reward function, it seems clear that the agent ought to position it's self in a state which gives it access to as many of these cycles as possible.

In your paper, theorem 19 suggests that given a choice between two sets of 1-cycles and the agent is more likely to select the larger set. This makes sense. What doesn't make sense is the conclusion (theorem 17) that the agent selects states with more power. This is because at the very start of the paper it's mentioned that,

As an alternative motivation, consider an agent in a communicating MDP which is periodically assigned a task from a known distribution . Between tasks, the agent has as much downtime as required. To maximize return over time, the optimal policy during downtime is to navigate to the state with maximal .

According to theorem 17, loosing access to states means that power goes down (or stays constant). This seems to indicate power (cycle access) is really some sort of Lyapunov function for the dynamics. So at the outset, it seems clear that the agent will prefer states that maximize power, but then as soon as a determination is made on what the actual reward function is, power goes down, not up.

What I'm trying to point out here is that I find the distinction between pre-task optimization and execution to be loose. This is to such a degree that I find myself drawing the exact opposite conclusion: agents optimizing a generic reward will tend to give-up power.

At the moment, I find myself agreeing with the idea that an agent unaware of it's task will seek power, but also conclude that an agent aware of it's task will give-up power. My current opinion is that power seeking behavior is concentrated in the pre-task step. Giving the AI unrestricted 'free-time' to optimize with should 'never' be allowed. Now, I could be misunderstanding parts of the paper, but hopefully I've made things clear enough!

Comment by Zachary Robertson (zachary-robertson) on How will internet forums like LW be able to defend against GPT-style spam? · 2020-08-09T00:24:12.430Z · LW · GW

The fact that you and Zachary can't see a talk about countries without pattern matching into race seems illustrative of how screwed up the discourse.

Maybe? On further consideration, it seems you are the one pattern matching by making this generalization about the state of discourse.

It's not central but it helps people have models with gears to be able to visualize supply chains.

My point here was simply to get you to unpack your comment so that I could point out what you said could've been said more clearly another way (i.e. the original was not gears level)

This here is where the problem ultimately was. You think an explicit reference to 'poor Indian' specifically contributes to a gear level understanding of some kind. Otherwise, you would've reworded given that I'm arguing that these criteria are irrelevant. I'm kind of the opinion you've taken a half-essay to respond to a single comment I left a week ago because you didn't provide that gears level understanding in the first place. Instead you left a place-holder that I had to essentially prod you to unpack further.

Now that this has been done, wouldn't you agree it'd be wholly more accurate to replace your oversimplification with what you surely agree is the more accurate characterization determined in this comment section? If so, I think you're forced to admit that your original comment in fact did not provide the gears level understanding you think it did by virtue of the fact I got you to unpack the comment into actual gears.

Comment by Zachary Robertson (zachary-robertson) on Analyzing the Problem GPT-3 is Trying to Solve · 2020-08-07T12:52:32.933Z · LW · GW

I've argued that GPT3 can do a form of boosting if you pair it with an output filter. In terms of the language introduced here we have something like where filters the output with some probability according to . So if we have strong reason to believe that a good enough prompt exists then as we iterate the prompt GPT3 will get better at matching the output which allows for improvement.

Comment by Zachary Robertson (zachary-robertson) on Infinite Data/Compute Arguments in Alignment · 2020-08-05T01:39:37.309Z · LW · GW

Perhaps it's worth explicitly noting the relatedness to the canonical rendition of The Bitter Lesson. (Searching and Learning win in the long run)

Comment by Zachary Robertson (zachary-robertson) on is gpt-3 few-shot ready for real applications? · 2020-08-03T22:45:36.587Z · LW · GW

So storage no longer scales badly with the number of operations you define. However, latency still does, and latency per call is now much larger, so this might end up being as much of a constraint. The exact numbers – not well understood at this time – are crucial: in real life the difference between 0.001 seconds, 0.1 seconds, 1 second, and 10 seconds will make or break your project.

This does seem to be a big issue for practical applications. Yet, I'm very much of the opinion that the API is more about exploring areas where fine-tuning would be useful. As you note, OpenAI does both. I'd assume a common use pattern will end up being something like: use few-shot, release, collect data, fine-tune, rinse-repeat.

(-3) Unlike supervised learning, there’s no built-in mechanism where you continually improve as your application passively gathers data during usage.

I think it's worth remembering that OpenAI is getting incredibly valuable data right now from the API. Adding more data about how people interact with the model seems completely doable with the centralization setup OpenAI has.

Comment by Zachary Robertson (zachary-robertson) on Sufficiently Advanced Language Models Can Do Reinforcement Learning · 2020-08-03T13:32:47.244Z · LW · GW

This paper looks interesting. My understanding is that this paper implemented a form of fine-tuning. However, learning the reward function does not seem to be few-shot whereas GPT3 does few-shot pretty well. That’s the main difference here as I see it.

It seems like there’s slow adaption (this paper) which is useful for more complicated tasks and fast adaption (the method here) that is useful for disposable tasks. I’d think a combination of both approaches is needed. For example, a module that tracks repeatedly occurring tasks can start a larger buffer to perform slow adaption.

Perhaps on a meta-level fine tuning GPT3 to few-shot inverse reinforcement learning would be an example of what could be possible with combining both approaches?

Comment by Zachary Robertson (zachary-robertson) on Sufficiently Advanced Language Models Can Do Reinforcement Learning · 2020-08-02T23:56:24.439Z · LW · GW

Correct. That’s why the section on classification and RL are separate. Classification tasks are a subclass of RL. A recurrent task need not be a classification task. In fact that I’d go further and say there’s still a huge difference between having an agent that can do RL and having an AGI. That’s why I put such speculation at the end.

Having said all that, it seems plausible to me that a language model might be able to reason about what modules it needs and then design them. I implicitly believe this to be the case, but perhaps I could’ve been more explicit. This is more of an empirical question, but if that were possible the difference between that model and AGI would become much smaller in my opinion.

Comment by Zachary Robertson (zachary-robertson) on Sufficiently Advanced Language Models Can Do Reinforcement Learning · 2020-08-02T18:54:19.726Z · LW · GW

Hopefully that's fixed! I wrote this as quickly as possible so there may be many tiny errors. Apologies. Let me know if anything else is wrong.

Comment by Zachary Robertson (zachary-robertson) on Power as Easily Exploitable Opportunities · 2020-08-02T12:16:41.157Z · LW · GW

Oh, if you read Understand from Story of Our Lives and Others by Ted Chiang you end up with a scenario where a human ends up finding a way to hack biological feedback loops into other people. At least, that’s what I immediately thought of when I read this.

Comment by Zachary Robertson (zachary-robertson) on Would AGIs parent young AGIs? · 2020-08-02T02:20:24.085Z · LW · GW

You might be interested in reading The Lifecycle of Software Objects by Ted Chiang

Comment by Zachary Robertson (zachary-robertson) on How will internet forums like LW be able to defend against GPT-style spam? · 2020-07-29T13:55:23.915Z · LW · GW

It's stereotyping to assume X will copy-paste a lot of posts per hour for little money where X is actually based on class/race status. Also, it's not central to your point so it seems easy to just remove.

Comment by Zachary Robertson (zachary-robertson) on How will internet forums like LW be able to defend against GPT-style spam? · 2020-07-29T12:21:05.784Z · LW · GW

I think the stereotyping (‘poor Indian’) is unnecessary to your point.

Comment by Zachary Robertson (zachary-robertson) on You Can Probably Amplify GPT3 Directly · 2020-07-27T13:41:43.080Z · LW · GW

Thanks! I forgot to do this. Luckily I can go back through the run and put this is in. There is ambiguity whenever it auto-completes, but I hope I did a decent job of noting where this is happening.

Comment by Zachary Robertson (zachary-robertson) on You Can Probably Amplify GPT3 Directly · 2020-07-27T13:09:49.102Z · LW · GW

You could prompt with “Q:” + (content) and then “A:”

I use the default settings on the temperature, but I do cut it off after it finishes an answer. However, you likely won’t get my exact results unless you literally copy the instances. Moreover, if you gave up after the first response I think might’ve given up to quickly. You can respond to it and communicate more information, as I did. The above really was what I got on the first try. It’s not perfect, but that’s the point. You can teach it. It’s not “it works” or “it doesn’t work”.

I don’t think there are tutorials, but perhaps in due time someone (maybe me) will get to that. I also feel like ‘trying’ to get it to do something might be a sub-optimal approach. This is a subtle difference, but my intent here was to get it to confirm it understood what I was asking by answering questions.

Comment by Zachary Robertson (zachary-robertson) on You Can Probably Amplify GPT3 Directly · 2020-07-27T01:33:09.293Z · LW · GW

I agree. Coming up with the right prompts was not trivial. I almost quit several times. Yet, there is a science to this and I think it’ll become more important to turn out focus away from the spectacle aspects of GPT and more towards reproducibility. More so if the way forward is via interrelated instances of GPT.

As an aside, critique seems much easier than generation. I’m cautiously optimistic about prompting GPT instances to “check” output.

Comment by Zachary Robertson (zachary-robertson) on Alignment As A Bottleneck To Usefulness Of GPT-3 · 2020-07-23T01:23:33.970Z · LW · GW

My problem is that this doesn't seem to scale. I like the idea of visual search, but I also realize you're essentially bit-rate limited in what you can communicate. For example, I'd about give up if I had to write my reply to you using a topic model. Other places in this thread mention semi-supervised learning. I do agree with the idea of taking a prompt and auto-generating the relevant large prompt that at the moment is manually being written in.

Comment by Zachary Robertson (zachary-robertson) on Alignment As A Bottleneck To Usefulness Of GPT-3 · 2020-07-22T12:00:43.985Z · LW · GW

Thanks for the link! I’ll partially accept the variations example. That seems to qualify as “show me what you learned”. But I’m not sure if that counts as an interface simply because of the lack of interactivity/programability.

Comment by Zachary Robertson (zachary-robertson) on Alignment As A Bottleneck To Usefulness Of GPT-3 · 2020-07-22T02:19:57.463Z · LW · GW

Have we ever figured out a way to interface with what something has learned that doesn't involve language prompts? I'm serious. What other options are you trying to hint at? I think manipulating hidden layers is a terrible approach, but I won't expound on that here.