The Variational Characterization of Expectation 2021-04-16T00:12:09.743Z
What is the Difference Between Cheerful Price and Shadow Price? 2021-03-28T19:08:56.681Z
Minimal Map Constraints 2021-02-21T17:49:46.651Z
Time to Count or Count to Time? 2021-02-08T16:27:23.418Z
A Toy-Model of Instrumental Abstraction 2021-01-12T17:50:29.948Z
Minimal Maps, Semi-Decisions, and Neural Representations 2020-12-06T15:15:08.052Z
How to Catalyze Cooperation in Media Sharing 2020-10-24T16:14:33.724Z
A Toy Model for Media Sharing 2020-10-17T15:59:45.652Z
KL Divergence as Code Patching Efficiency 2020-09-27T16:06:26.186Z
Sufficiently Advanced Language Models Can Do Reinforcement Learning 2020-08-02T15:32:47.894Z
Structured Tasks for Language Models 2020-07-29T14:17:59.478Z
You Can Probably Amplify GPT3 Directly 2020-07-26T21:58:53.962Z
An Old Way to Visualize Biases 2020-07-24T00:10:17.970Z
Idea: Imitation/Value Learning AIXI 2020-07-03T17:10:16.775Z
Replication Dynamics Bridge to RL in Thermodynamic Limit 2020-05-18T01:02:53.417Z
Zachary Robertson's Shortform 2020-05-06T00:42:10.113Z
What Resources on Journal Analysis are Available? 2019-12-28T20:00:11.512Z
The Planning Problem 2019-08-04T18:58:55.186Z
Is there a user's manual to using the internet more efficiently? 2019-08-04T18:51:38.818Z


Comment by Zachary Robertson (zachary-robertson) on What is the Difference Between Cheerful Price and Shadow Price? · 2021-03-28T22:00:20.665Z · LW · GW

Your example is interesting and clarifies exchange rates. However,

The shadow price quantifies the opportunity cost, so if I'm paid my shadow price, then that's just barely enough to cover my opportunity cost.

This is an interpretive point I'd like to focus on. When you move a constraint, in this case with price, the underlying equilibrium of the optimization shifts. From this perspective your usage of the word 'barely' stops making sense to me. If you were to 'overshoot' you wouldn't be optimal in the new optimization problem.

At this point I understand that the cheerful price will be equivalent to or more than the shadow price. You want to be able to shift the equilibrium point and have slack left over. It just seems obvious, to me, that shadow price isn't an exactly measurable thing in this context and so you'd naturally be led to make a confidence interval (belief) for it. Cheerful price is just the upper estimate on that. Hence, I'm surprised why this is being treated as a new / distinct concept.

Comment by Zachary Robertson (zachary-robertson) on What is the Difference Between Cheerful Price and Shadow Price? · 2021-03-28T21:50:25.374Z · LW · GW

I suppose this is the most correct answer. I'm not really updating very much though. From my perspective I'll continue to see cheerful price as a psychological/subjective reinvention of shadow price.

Edit: It seems clear in this context, shadow price isn't exactly measurable. Cheerful price is just the upper estimate on the shadow price.

Comment by Zachary Robertson (zachary-robertson) on Why Neural Networks Generalise, and Why They Are (Kind of) Bayesian · 2021-03-04T05:48:33.717Z · LW · GW

You seem to have updated your opinion: overtraining does make difference, but it’s not ‘huge’. Have you run a significance test for your lines of best fit? The plots as presented suggest the effect is significant.

Figure C.1.a indicates the tilting phenomena. Probabilities only go up to one so tilting down means that the most likely candidates from overstrained SGD are less likely with random sampling. Thus, unlikely random sampling candidates are more likely under SGD. At the tail, the opposite happens. Functions more likely with random sampling become less likely under SGD.

While the optimizer has a larger effect, I think the subtle question is whether the overtraining tilts in the same way each time. Figure 16 indicates yes again. This phenomena you consider to be minor is what I found most interesting about the paper.

Comment by Zachary Robertson (zachary-robertson) on Why Neural Networks Generalise, and Why They Are (Kind of) Bayesian · 2021-02-28T18:22:55.507Z · LW · GW

The main point, as I see it, is essentially that functions with good generalisation correspond to large volumes in parameter-space, and that SGD finds functions with a probability roughly proportional to their volume.

What I'm suggesting is that volume in high-dimensions can concentrate on the boundary. To be clear, when I say SGD only typically reaches the boundary, I'm talking about early stopping and the main experimental setup in your paper where training is stopped upon reaching zero train error.

We have done overtraining, which should allow SGD to penetrate into the region. This doesn’t seem to make much difference for the probabilities we get.

This does seem to invalidate the model. However, something tells me that the difference here is more about degree. Since you use the word 'should' I'll use the wiggle room to propose an argument for what 'should' happen.

If SGD is run with early stopping, as described above, then my argument is that this is roughly equivalent to random sampling via an appeal to concentration of measure in high-dimensions.

If SGD is not run with early stopping, it's enclosed by the boundary of zero train error functions. Because these are most likely in the interior these functions are unlikely to be produced by random sampling. Thus, on a log-log plot I'd expect overtraining to 'tilt' the correspondence between SGD and random sampling likelihoods downward.

Falsifiable Hypothesis: Compare SGD with overtaining to the random sampling algorithm. You will see that functions that are unlikely to be generated by random sampling will be more likely under SGD with overtraining. Moreover, functions that are more likely with random sampling will be become less likely under SGD with overtraining.

Comment by Zachary Robertson (zachary-robertson) on Recognizing Numbers · 2021-01-26T15:56:17.336Z · LW · GW

A problem I'm finding with this formulation is that it moves the problem to something that is arguably harder. We've replaced the problem of recognizing numbers with the problem of recognizing sets. The main post does this as well. There's nothing technically wrong with this, but then the immediate question is this: how do we know when sets are useful? If a similar logic applies: how do we create an abstraction of a set from observation(s)? George Cantor, one of the founders of set theory writes,

A set is a gathering together into a whole of definite, distinct objects of our perception [Anschauung] or of our thought—which are called elements of the set.

To gather distinct perceptions together requires unity of apperception or a single 'I think' to be attached to each perception so that they may be brought under a category/set/etc.

Comment by Zachary Robertson (zachary-robertson) on What is going on in the world? · 2021-01-18T22:22:18.967Z · LW · GW

I like this one because you can generate more using GPT3 (that doesn’t imply they make sense)

Comment by Zachary Robertson (zachary-robertson) on The Good Try Rule · 2020-12-28T23:55:33.335Z · LW · GW

I think this post does a good job of motivating a definition for “good try”. It also seems possible to think of habit changes as examples of goals. I personally find the SMART goal system to be useful and related to the discussion. SMART goals should be Specific, Measurable, Attainable, Reasonable, Timely. The approach is to specify why the habit change goal meets each of the SMART criteria.

I’d think that giving something a “good try” is similar enough to trying habit change with a SMART goal that I mention this. This makes it clearer (at least for me) that what we’re talking about is creating some sort of prediction about how a successful habit change will proceed and then testing the prediction by attempting the habit change according to the plan. I think this also opens up the opportunity for giving something multiple “good tries” before evaluating success/failure.

Comment by Zachary Robertson (zachary-robertson) on Minimal Maps, Semi-Decisions, and Neural Representations · 2020-12-08T16:06:09.644Z · LW · GW

I'm going to have to spend some time unpacking the very compact notation in the post, but here are my initial reactions.

I should apologize a bit for that. To a degree I wasn't really thinking about any of the concepts in the title and only saw the connection later.

First, very clean proof of the lemma, well done there.


Second... if I'm understanding this correctly, each neuron activation (or set of neuron activations?) would contain all the information from some-part-of-data relevant to some-other-part-of-data and the output.

To be honest, I haven't thought about interpreting the monad beyond the equivalence with neural networks. One thing I noticed early on is that you can create sequences of activations that delete information in the limit. For example, the ReLU activation is the limit of the SoftMax (change log base). I think something like this could be seen as abstracting away unnecessary data.

Better yet, it looks like the OP gives a recipe for unpacking those natural abstractions?

I'm not sure. I do think the method can justify the reuse of components (queries) and I wouldn't be surprised if this is a pre-requisite for interpreting network outputs. Most of my interest comes from trying to formalize the (perhaps obvious) idea that anything that can be reduced to a sequence of classifications can be used to systematically translate high-level reasoning about these processes into a neural networks.

I guess it's best to give an example of how I currently think about abstraction. Say we take the position that every object is completely determined by the information contained in a set of queries such that . For a picture, consider designing a game-avatar (mii character) by fiddling around with some knobs. The formalism lets us package observations as queries using return. Thus, we're hypothesizing that we can take a large collection of queries and make them equivalent to a small set of queries. Said another way, we can answer a large collection of queries by answering a much smaller set of 'principle' queries. In fact, if our activation was linear we'd be doing PCA. How we decide to measure success determines what abstraction is learned. If we only use the to answer a few queries then we're basically doing classification. However, if the have to be able to answer every query about then we're doing auto-encoding.

Comment by Zachary Robertson (zachary-robertson) on Doing discourse better: Stuff I wish I knew · 2020-10-01T12:48:18.789Z · LW · GW

Yes, but StackExchange has community posts that editable and I think this is nice. I believe edits for normal posts work like you say.

Comment by Zachary Robertson (zachary-robertson) on Doing discourse better: Stuff I wish I knew · 2020-09-30T13:51:20.322Z · LW · GW

It could still be useful to see different ‘versions’ of an article and then just vote on the ones that are best.

Comment by Zachary Robertson (zachary-robertson) on Richard Ngo's Shortform · 2020-08-21T00:24:38.789Z · LW · GW

Ya, totally messed up that. I meant the AI Alignment Forum or AIAF. I think out of habit I used AN (Alignment Newsletter)

Comment by Zachary Robertson (zachary-robertson) on Richard Ngo's Shortform · 2020-08-20T23:13:56.178Z · LW · GW

On AI alone (which I am using in large part because there's vaguely more consensus around it than around rationality), I think you wouldn't have seen almost any of the public write-ups (like Embedded Agency and Zhukeepa's Paul FAQ) without LessWrong

I think a distinction should be made between intellectual progress (whatever that is) and distillation. I know lots of websites that do amazing distillation of AI related concepts (literally I think most people would agree that sort of work is important in order to make intellectual progress, but I also think significantly less people would agree distillation is intellectual progress. Having this distinction in mind, I think your examples from AI are not as convincing. Perhaps more so once you consider the Less Wrong is often being used more as a platform to share these distillations than to create them.

I think you're right that Less Wrong has some truly amazing content. However, once again, it seems a lot of these posts are not inherently from the ecosystem but are rather essentially cross-posted. If I say a lot of the content on LW is low-quality it's mostly an observation about what I expect to find from material that builds on itself. The quality of LW-style accumulated knowledge seems lower than it could be.

On a personal note, I've actively tried to explore using this site as a way to engage with research and have come to a similar opinion as Richard. The most obvious barrier is the separation between LW and AIAF. Effectively, if you're doing AI safety research, to second-order approximation you can block LW (noise) and only look at AIAF (signal). I say to second-order because anything from LW that is signal ends up being posted on AIAF anyway which means the method is somewhat error-tolerant.

This probably comes off as a bit pessimistic. Here's a concrete proposal I hope to try out soon enough. Pick a research question. Get a small group of people/friends together. Start talking about the problem and then posting on LW. Iterate until there's group consensus.

Comment by Zachary Robertson (zachary-robertson) on Richard Ngo's Shortform · 2020-08-20T22:44:48.091Z · LW · GW

Setting them higher (standards) probably wouldn't result in more good content.

I broadly agree here. However, I do see the short-forms as a consistent way to skirt around this. I'd say at least 30% of the Less Wrong value proposition are the conversations I get to have. Short-forms seem to be more adapted for continuing conversations and they have a low bar for being made.

I could clarify a bit. My main problem with low quality content isn't exactly that it's 'wrong' or something like that. Mostly, the issues I'm finding most common for me are,

  1. Too many niche pre-requisites.
  2. No comments
  3. Nagging feeling post is reinventing the wheel

I think one is a ridiculously bad problem. I'm literally getting a PhD in machine learning, write about AI Safety, and still find a large number of those posts (yes AN posts) glazed in internal-jargon that makes it difficult to connect with current research. Things get even worse when I look at non-AI related things.

Two is just a tragedy of the fact the rich get richer. While I'm guilty of this also, I think that requiring posts to also post seed questions/discussion topics in the comments could go a long way to alleviate this problem. I oftentimes read a post and want to leave a comment, but then don't because I'm not even sure the author thought about the discussion their post might start.

Three is probably a bit mean. Yet, more than once I've discovered a Less Wrong concept already had a large research literature devoted to it. I think this ties in with one due to the fact niche pre-reqs often go hand-in-hand with insufficient literature review.

Comment by Zachary Robertson (zachary-robertson) on Richard Ngo's Shortform · 2020-08-20T19:10:39.275Z · LW · GW

One fairly strong belief of mine is that Less Wrong's epistemic standards are not high enough to make solid intellectual progress here.

I think this is literally true. There seems to be very little ability to build upon prior work.

Out of curiosity do you see Less Wrong as significantly useful or is it closer to entertainment/habit? I've found myself thinking along the same lines as I start thinking about starting my PhD program etc. The utility of Less Wrong seems to be a kind of double-edged sword. On the one hand, some of the content is really insightful and exposes me to ideas I wouldn't otherwise encounter. On the other hand, there is such an incredible amount of low-quality content that I worry that I'm learning bad practices.

Comment by Zachary Robertson (zachary-robertson) on Developmental Stages of GPTs · 2020-08-16T13:48:49.728Z · LW · GW

The paper doesn't draw the causal diagram "Power → instrumental convergence", it gives sufficient conditions for power-seeking being instrumentally convergent. Cycle reachability preservation is one of those conditions.

This definitely feels like the place where I'm missing something. What is the formal definition of 'power seeking'? My understanding is that power is the rescaled value function, in the limit of farsightedness is decreasing, and in the context of terminal state reachability always goes to zero. The agent literally gives up power to achieve it's goal.

Now, I realize this might just be naming convention confusion. I do, I think, understand the idea that preserving cycle reachability could be instrumental. However,

Cycle reachability preservation is one of those conditions.

this seems circular to me. My understanding of figure 7 of your paper indicates that cycle reachability cannot be a sufficient condition.

You can formalize a kind of "alignment capability" by introducing a joint distribution over the human's goals and the induced agent goals

This is very interesting to me. Thank you for sharing. I wonder what you mean by,

The point isn't that alignment is impossible, but that you have to hit a low-measure set of goals which will give you aligned or non-power-seeking behavior.

Given your definitions it's clear that the set of aligned goals must be low-measure. Also by your reasoning 'non-power seeking behavior' is not instrumental. However, in a curricula, power-seeking must be instrumental or else the agent is less likely to achieve it's goals. It seems there's a two out of three condition (aligned/general/non-power-seeking) here. My philosophy is that aligned/general is OK based on a shared (?) premise that,

If the rewards are -close in sup-norm, then you can get nice regret bounds, sure.

Comment by Zachary Robertson (zachary-robertson) on Developmental Stages of GPTs · 2020-08-15T15:11:20.004Z · LW · GW

Thanks for the comment! I think max-ent brings up a related point. In IRL we observed behavior and infer a reward function (using max-ent also?). Ultimately, there is a relationship between state/action frequency and reward. This would considerably constrain the distribution of reward functions to be considered in instrumental/power analysis.

I think I get confused about the usage of power the most. It seems like you can argue that given a random reward to optimize the agent will try to avoid getting turned off without invoking power. If there's a collection of 'turned-off' terminal states where the agent receives no further reward for all time then every optimized policy will try to avoid such a state. It seems as though we could define for each and then we'd have,

It seems like this would extend out to a full definition. The advantage here being that you can say, “If one action in this state is more instrumental than another then the return is likely to be greater as well”.

I imagine that this is sufficient for the catastrophic power-stealing incentives

I'm not confident analysis in the single-agent case extends to the multi-agent setting. If our goal is fixed as and the agent's varies then I might argue it's instrumental for us to align the agent's goal with ours and vice versa. In general, I'd suspect that there are goals we could give the agent that significantly reduce our gain. However, I'd also suspect the opposite.

Say we have the capability to introduce a second agent with a reward . Would we want to introduce the agent? It seems reasonable to argue that we would if we could guarantee . There might be a way to argue over randomness and say this would double our gain. More speculatively, what if ?

Comment by Zachary Robertson (zachary-robertson) on Developmental Stages of GPTs · 2020-08-12T01:07:40.240Z · LW · GW

I think this is a slight misunderstanding of the theory in the paper.

I disagree. What I'm trying to do is outline a reinterpretation of the 'power seeking' claim. I'm citing the pre-task section and theorem 17 to insist that power-seeking can only really happen in the pre-task because,

The way the theory does this is by saying that first a reward function is drawn from the distribution, then it is given to the agent, then the agent thinks really hard, and then the agent executes the optimal policy.

The agent is done optimizing before the main portion of the paper even begins. I do not see how the agent 'seeks' out powerful states because, as you say, the agent is fixed. Now, when you say,

If we do not know an agent's goal, but we know that the agent knows its goal and is optimal w.r.t it, then from our perspective the agent is more likely to go to higher-power states. (From the agent's perspective, there is no probability, it always executes the deterministic perfect policy for its reward function.)

My issue is that the Figure 19 shows an example where the agent doesn't display this behavior. Tautologically, the agent tends to do what is instrumentally convergent. If power was tied to instrumental convergence then we could also say the agent tends to do what is powerful. However, it seems as though a state can be arbitrarily powerful without having the instrumental property which breaks the analogy.

From here I could launch a counter-argument: if power can be arbitrarily removed from the instrumental convergence phenomena then agent 'wireheading', while a powerful state, is sufficiently out of the way from most goals that the agent most likely won't. To be clear, I don't have any strong opinions, I'm just confused about these interpretive details.

Comment by Zachary Robertson (zachary-robertson) on Developmental Stages of GPTs · 2020-08-11T18:30:15.585Z · LW · GW

I appreciate the more concrete definition of IC presented here. However, I have an interpretation that is a bit different from you. I'm following the formal presentation.

My base understanding is that a cycle with max average reward is optimal. This is essentially just a definition. In the case the agent doesn't know the reward function, it seems clear that the agent ought to position it's self in a state which gives it access to as many of these cycles as possible.

In your paper, theorem 19 suggests that given a choice between two sets of 1-cycles and the agent is more likely to select the larger set. This makes sense. What doesn't make sense is the conclusion (theorem 17) that the agent selects states with more power. This is because at the very start of the paper it's mentioned that,

As an alternative motivation, consider an agent in a communicating MDP which is periodically assigned a task from a known distribution . Between tasks, the agent has as much downtime as required. To maximize return over time, the optimal policy during downtime is to navigate to the state with maximal .

According to theorem 17, loosing access to states means that power goes down (or stays constant). This seems to indicate power (cycle access) is really some sort of Lyapunov function for the dynamics. So at the outset, it seems clear that the agent will prefer states that maximize power, but then as soon as a determination is made on what the actual reward function is, power goes down, not up.

What I'm trying to point out here is that I find the distinction between pre-task optimization and execution to be loose. This is to such a degree that I find myself drawing the exact opposite conclusion: agents optimizing a generic reward will tend to give-up power.

At the moment, I find myself agreeing with the idea that an agent unaware of it's task will seek power, but also conclude that an agent aware of it's task will give-up power. My current opinion is that power seeking behavior is concentrated in the pre-task step. Giving the AI unrestricted 'free-time' to optimize with should 'never' be allowed. Now, I could be misunderstanding parts of the paper, but hopefully I've made things clear enough!

Comment by Zachary Robertson (zachary-robertson) on How will internet forums like LW be able to defend against GPT-style spam? · 2020-08-09T00:24:12.430Z · LW · GW

The fact that you and Zachary can't see a talk about countries without pattern matching into race seems illustrative of how screwed up the discourse.

Maybe? On further consideration, it seems you are the one pattern matching by making this generalization about the state of discourse.

It's not central but it helps people have models with gears to be able to visualize supply chains.

My point here was simply to get you to unpack your comment so that I could point out what you said could've been said more clearly another way (i.e. the original was not gears level)

This here is where the problem ultimately was. You think an explicit reference to 'poor Indian' specifically contributes to a gear level understanding of some kind. Otherwise, you would've reworded given that I'm arguing that these criteria are irrelevant. I'm kind of the opinion you've taken a half-essay to respond to a single comment I left a week ago because you didn't provide that gears level understanding in the first place. Instead you left a place-holder that I had to essentially prod you to unpack further.

Now that this has been done, wouldn't you agree it'd be wholly more accurate to replace your oversimplification with what you surely agree is the more accurate characterization determined in this comment section? If so, I think you're forced to admit that your original comment in fact did not provide the gears level understanding you think it did by virtue of the fact I got you to unpack the comment into actual gears.

Comment by Zachary Robertson (zachary-robertson) on Analyzing the Problem GPT-3 is Trying to Solve · 2020-08-07T12:52:32.933Z · LW · GW

I've argued that GPT3 can do a form of boosting if you pair it with an output filter. In terms of the language introduced here we have something like where filters the output with some probability according to . So if we have strong reason to believe that a good enough prompt exists then as we iterate the prompt GPT3 will get better at matching the output which allows for improvement.

Comment by Zachary Robertson (zachary-robertson) on Infinite Data/Compute Arguments in Alignment · 2020-08-05T01:39:37.309Z · LW · GW

Perhaps it's worth explicitly noting the relatedness to the canonical rendition of The Bitter Lesson. (Searching and Learning win in the long run)

Comment by Zachary Robertson (zachary-robertson) on is gpt-3 few-shot ready for real applications? · 2020-08-03T22:45:36.587Z · LW · GW

So storage no longer scales badly with the number of operations you define. However, latency still does, and latency per call is now much larger, so this might end up being as much of a constraint. The exact numbers – not well understood at this time – are crucial: in real life the difference between 0.001 seconds, 0.1 seconds, 1 second, and 10 seconds will make or break your project.

This does seem to be a big issue for practical applications. Yet, I'm very much of the opinion that the API is more about exploring areas where fine-tuning would be useful. As you note, OpenAI does both. I'd assume a common use pattern will end up being something like: use few-shot, release, collect data, fine-tune, rinse-repeat.

(-3) Unlike supervised learning, there’s no built-in mechanism where you continually improve as your application passively gathers data during usage.

I think it's worth remembering that OpenAI is getting incredibly valuable data right now from the API. Adding more data about how people interact with the model seems completely doable with the centralization setup OpenAI has.

Comment by Zachary Robertson (zachary-robertson) on Sufficiently Advanced Language Models Can Do Reinforcement Learning · 2020-08-03T13:32:47.244Z · LW · GW

This paper looks interesting. My understanding is that this paper implemented a form of fine-tuning. However, learning the reward function does not seem to be few-shot whereas GPT3 does few-shot pretty well. That’s the main difference here as I see it.

It seems like there’s slow adaption (this paper) which is useful for more complicated tasks and fast adaption (the method here) that is useful for disposable tasks. I’d think a combination of both approaches is needed. For example, a module that tracks repeatedly occurring tasks can start a larger buffer to perform slow adaption.

Perhaps on a meta-level fine tuning GPT3 to few-shot inverse reinforcement learning would be an example of what could be possible with combining both approaches?

Comment by Zachary Robertson (zachary-robertson) on Sufficiently Advanced Language Models Can Do Reinforcement Learning · 2020-08-02T23:56:24.439Z · LW · GW

Correct. That’s why the section on classification and RL are separate. Classification tasks are a subclass of RL. A recurrent task need not be a classification task. In fact that I’d go further and say there’s still a huge difference between having an agent that can do RL and having an AGI. That’s why I put such speculation at the end.

Having said all that, it seems plausible to me that a language model might be able to reason about what modules it needs and then design them. I implicitly believe this to be the case, but perhaps I could’ve been more explicit. This is more of an empirical question, but if that were possible the difference between that model and AGI would become much smaller in my opinion.

Comment by Zachary Robertson (zachary-robertson) on Sufficiently Advanced Language Models Can Do Reinforcement Learning · 2020-08-02T18:54:19.726Z · LW · GW

Hopefully that's fixed! I wrote this as quickly as possible so there may be many tiny errors. Apologies. Let me know if anything else is wrong.

Comment by Zachary Robertson (zachary-robertson) on Power as Easily Exploitable Opportunities · 2020-08-02T12:16:41.157Z · LW · GW

Oh, if you read Understand from Story of Our Lives and Others by Ted Chiang you end up with a scenario where a human ends up finding a way to hack biological feedback loops into other people. At least, that’s what I immediately thought of when I read this.

Comment by Zachary Robertson (zachary-robertson) on Would AGIs parent young AGIs? · 2020-08-02T02:20:24.085Z · LW · GW

You might be interested in reading The Lifecycle of Software Objects by Ted Chiang

Comment by Zachary Robertson (zachary-robertson) on How will internet forums like LW be able to defend against GPT-style spam? · 2020-07-29T13:55:23.915Z · LW · GW

It's stereotyping to assume X will copy-paste a lot of posts per hour for little money where X is actually based on class/race status. Also, it's not central to your point so it seems easy to just remove.

Comment by Zachary Robertson (zachary-robertson) on How will internet forums like LW be able to defend against GPT-style spam? · 2020-07-29T12:21:05.784Z · LW · GW

I think the stereotyping (‘poor Indian’) is unnecessary to your point.

Comment by Zachary Robertson (zachary-robertson) on You Can Probably Amplify GPT3 Directly · 2020-07-27T13:41:43.080Z · LW · GW

Thanks! I forgot to do this. Luckily I can go back through the run and put this is in. There is ambiguity whenever it auto-completes, but I hope I did a decent job of noting where this is happening.

Comment by Zachary Robertson (zachary-robertson) on You Can Probably Amplify GPT3 Directly · 2020-07-27T13:09:49.102Z · LW · GW

You could prompt with “Q:” + (content) and then “A:”

I use the default settings on the temperature, but I do cut it off after it finishes an answer. However, you likely won’t get my exact results unless you literally copy the instances. Moreover, if you gave up after the first response I think might’ve given up to quickly. You can respond to it and communicate more information, as I did. The above really was what I got on the first try. It’s not perfect, but that’s the point. You can teach it. It’s not “it works” or “it doesn’t work”.

I don’t think there are tutorials, but perhaps in due time someone (maybe me) will get to that. I also feel like ‘trying’ to get it to do something might be a sub-optimal approach. This is a subtle difference, but my intent here was to get it to confirm it understood what I was asking by answering questions.

Comment by Zachary Robertson (zachary-robertson) on You Can Probably Amplify GPT3 Directly · 2020-07-27T01:33:09.293Z · LW · GW

I agree. Coming up with the right prompts was not trivial. I almost quit several times. Yet, there is a science to this and I think it’ll become more important to turn out focus away from the spectacle aspects of GPT and more towards reproducibility. More so if the way forward is via interrelated instances of GPT.

As an aside, critique seems much easier than generation. I’m cautiously optimistic about prompting GPT instances to “check” output.

Comment by Zachary Robertson (zachary-robertson) on Alignment As A Bottleneck To Usefulness Of GPT-3 · 2020-07-23T01:23:33.970Z · LW · GW

My problem is that this doesn't seem to scale. I like the idea of visual search, but I also realize you're essentially bit-rate limited in what you can communicate. For example, I'd about give up if I had to write my reply to you using a topic model. Other places in this thread mention semi-supervised learning. I do agree with the idea of taking a prompt and auto-generating the relevant large prompt that at the moment is manually being written in.

Comment by Zachary Robertson (zachary-robertson) on Alignment As A Bottleneck To Usefulness Of GPT-3 · 2020-07-22T12:00:43.985Z · LW · GW

Thanks for the link! I’ll partially accept the variations example. That seems to qualify as “show me what you learned”. But I’m not sure if that counts as an interface simply because of the lack of interactivity/programability.

Comment by Zachary Robertson (zachary-robertson) on Alignment As A Bottleneck To Usefulness Of GPT-3 · 2020-07-22T02:19:57.463Z · LW · GW

Have we ever figured out a way to interface with what something has learned that doesn't involve language prompts? I'm serious. What other options are you trying to hint at? I think manipulating hidden layers is a terrible approach, but I won't expound on that here.

Comment by Zachary Robertson (zachary-robertson) on Idea: Imitation/Value Learning AIXI · 2020-07-04T21:58:34.540Z · LW · GW

I agree with what you’re saying. Perhaps, I’m being a bit strong. I’m mostly talking about ambitious value learning in an open-ended environment. The game of Go doesn’t have inherent computing capability so anything the agent does is rather constrained to begin with. I’d hope (guess) that alignment in similarly closed environments is achievable. I’d also like to point out that in such scenarios I’d expect it to be normally possible to give exact goal descriptions rendering value learning superfluous.

In theory, I’m actually onboard with a weakly superhuman AI. I’m mostly skeptical of the general case. I suppose that makes me sympathetic to approaches that iterate/collectivize things already known to work.

Comment by Zachary Robertson (zachary-robertson) on Idea: Imitation/Value Learning AIXI · 2020-07-04T02:19:16.342Z · LW · GW

However, why should you expect to be a "better" policy than according to human values?

I feel like this is sneaking in the assumption that we're going to partition the policy into an optimization step and a value learning step. Say we train using data sampled from , then my point is that generalizes to optimally. Value learning doesn't do this. In the context of algorithmic complexity, value learning inserts a prior about how a policy ought to be structured.

On a philosophical front, I'm of the opinion that any error in defining "human values" will blow up arbitrarily if given to an optimizer with arbitrary capability. Thus, the only way to safely work with this inductive bias is to constrain the capability of the optimizer. If this is done correctly, I'd assume the agent will only be barely superhuman according to "human values". These are extra steps and regularizations that effectively remove the inductive bias with the promise that we can then control how "superhuman" the agent will be. My conclusion (tentatively) is that there is no way to arbitrarily extrapolate human values and doing so, even a little, introduces risk.

Comment by Zachary Robertson (zachary-robertson) on Idea: Imitation/Value Learning AIXI · 2020-07-04T00:37:45.464Z · LW · GW

I guess I'm confused by your third point. It seems clear that AIXI optimized on any learned reward function will have superhuman performance. However, AIXI is completely unaligned via wireheading. My point with the Kolgomorov argument is that AIXIL is much more likely to behave reasonably than AIXVL. Almost by definition, AIXIL will generalize most similarly to a human. Moreover, any value learning attempt will have worse generalization capability. I'm hesitant, but it seems I should conclude value alignment is a non-starter.

Comment by Zachary Robertson (zachary-robertson) on Superexponential Historic Growth, by David Roodman · 2020-06-19T22:43:57.320Z · LW · GW

It’d be nice if they would’ve plotted the projected singularity date using partial slices (filtrations) of the data. In other words, if we did the analysis in year X what is the projected singularity year Y?

Comment by Zachary Robertson (zachary-robertson) on Inaccessible information · 2020-06-04T04:50:57.739Z · LW · GW

As an aside, it really seems like the core issue rests with unease about using machine generated concepts to predict things about the world. Yet, the truth is that humans normally operate in the very way you don't want. We explain reasoning by heavily sanitizing our thought process. For example, I came up with several bad intuitions for why ML is a reasonable solution and that informed me that humans are also bad at the thing you think we're good at. See how convoluted that last sentence was? The point is that when I formally type up my thoughts I'm also sanitizing them so you don't see reasoning structure I don't want you to see.

The basic motivation to avoid sharing sensitive information leads me to maintain somewhat high differential privacy in communication. I'd like the result to be rather disentangled from the process. Well, we certainly agree that the process largely consists of inaccessible information. Trivially, you're only judging my writing based on what you can read...even if you make a latent model about me. Putting these two facts together, disagreement/debate only really exists if we have differing internal models.

Using your post as start, I'd say the strategic advantage of inaccessible information incentives agents (instrumentally) to have internal models of the kind I just described. This means that differential privacy would have instrumental value as a mechanism to prevent agents from peering inside one another without explicit permission from one another.

Comment by Zachary Robertson (zachary-robertson) on Inaccessible information · 2020-06-04T04:26:11.526Z · LW · GW

I actually enjoyed the post, but I'm not convinced of the relevance of this topic. You seem to be concerned that certain queries to an intelligent oracle might result in returning false, but plausible answers. On the one hand, this seems relevant. On the other hand, I struggle to actually come up with an example where this can happen. In principle, everything a model does with data is observable. You write,

At that point “picking the model that matches the data best” starts to look a lot like doing ML, and it’s more plausible that we’re going to start getting hypotheses that we don’t understand or which behave badly.

This confuses me. Modern ML is designed to engage in automatic feature generation. It turns out that engineered features introduce more bias than their worth. For example, it's fair to point out racial bias in facial recognition technology. However, it's also sensible to argue that creating a method to automatically engineer concepts to discriminate with is a major advance in removing human bias (inaccessible information) from the process.

But, but, but then you have groups that intentionally choose to use controversial concepts, such as facial features, to infer things such as income, propensity to violence, etc. You see this is where the bait-and-switch seems to come in here. It's not the machine's fault for being used to make spurious judgments. The real culprit is poor human reasoning. So then,

...or we need to figure out some way to access the inaccessible information that “A* leads to lots of human flourishing.”

sets off an alarm bell in my head. How is this any different than trying to use ML to 'catch' terrorists via facial recognition? While I'll readily admit ML models can learn good/bad concepts to use for a downstream task, the idea that these concepts also map onto human-readable concepts seem rather tenuous.

So I conclude that the idea of making all concepts used by a machine human-readable seems dubious. You really want good/high-dimensional data that makes no assumptions on the concepts it's going to be used to model with. ML concepts come with PAC guarantees people don't.

Comment by zachary-robertson on [deleted post] 2020-06-02T05:16:39.401Z

This is out of context, perhaps even nitpicking. It’s clear they’re talking about universality. In that context, sustainable implies allowable.

Comment by Zachary Robertson (zachary-robertson) on GPT-3: a disappointing paper · 2020-05-29T19:32:06.851Z · LW · GW

Reading this I get the impression you have mismanaged expectations of what you think GPT-3 would do (ie should only be reserved for essentially pseudo-AGI)...but scaling GPT to the point of diminishing returns is going to take several more years. As everyone is stressing, they don’t even fit the training data at the moment.

Comment by Zachary Robertson (zachary-robertson) on OpenAI announces GPT-3 · 2020-05-29T19:03:42.118Z · LW · GW

GPT-2 was a hype fest, while this gets silently released on ArXiv. I’m starting think there’s something real here. I think before I’d laugh anyone who suggested GPT-2 could reason. I still think that’s true with GPT-3, but I wouldn’t laugh anymore. It seems possible massive scaling could legitimately produce a different kind of AI then anything we’ve seen yet.

Comment by Zachary Robertson (zachary-robertson) on OpenAI announces GPT-3 · 2020-05-29T19:00:29.387Z · LW · GW

While I’m not sure how easy plugging into a DRL algorithm will be, this seems to be the obvious next step. On the other hand, I suspect DRL isn’t really mature enough to work as an integrating paradigm.

Comment by Zachary Robertson (zachary-robertson) on What is your internet search methodology ? · 2020-05-23T21:07:38.714Z · LW · GW

I asked a related question and got some answers about finding things on the internet. Didn’t completely satisfy me, but my question was significantly more vague so it might help you!

Comment by Zachary Robertson (zachary-robertson) on Orthogonality · 2020-05-21T03:51:25.210Z · LW · GW

I think that a hidden assumption here is that improving in a weak skill always has a positive spillover affect on other skills. There might be a hidden truth within this. Namely, sometimes unlearning things will be the best way to make progress.

Comment by Zachary Robertson (zachary-robertson) on Orthogonality · 2020-05-21T03:41:12.386Z · LW · GW

Perhaps this can be connected with another recent post. It was pointed about in Subspace Optima that when we optimize we do so under constraints external or internal. It seems like you had an internal constraint stopping you from optimizing over the whole space. Instead you focused on what you thought was the most correlated trait. This almost reads like an insight following the realization you’ve been optimizing a skill along a artificial sub-space.

Comment by Zachary Robertson (zachary-robertson) on What are your greatest one-shot life improvements? · 2020-05-17T21:24:53.288Z · LW · GW

Do this at the end of the day as a way to review progress?

Comment by Zachary Robertson (zachary-robertson) on What are your greatest one-shot life improvements? · 2020-05-17T21:22:59.475Z · LW · GW

Can I get clarification on what sort of emotions were problematic and/or what reactions were problematic? I’m wondering if this was rumination or in the moment reactions.