A Toy Model for Media Sharing 2020-10-17T15:59:45.652Z · score: 11 (2 votes)
KL Divergence as Code Patching Efficiency 2020-09-27T16:06:26.186Z · score: 15 (5 votes)
Sufficiently Advanced Language Models Can Do Reinforcement Learning 2020-08-02T15:32:47.894Z · score: 23 (15 votes)
Structured Tasks for Language Models 2020-07-29T14:17:59.478Z · score: 5 (2 votes)
You Can Probably Amplify GPT3 Directly 2020-07-26T21:58:53.962Z · score: 35 (15 votes)
An Old Way to Visualize Biases 2020-07-24T00:10:17.970Z · score: 4 (5 votes)
Idea: Imitation/Value Learning AIXI 2020-07-03T17:10:16.775Z · score: 3 (1 votes)
Replication Dynamics Bridge to RL in Thermodynamic Limit 2020-05-18T01:02:53.417Z · score: 6 (3 votes)
Zachary Robertson's Shortform 2020-05-06T00:42:10.113Z · score: 2 (1 votes)
What Resources on Journal Analysis are Available? 2019-12-28T20:00:11.512Z · score: 15 (5 votes)
The Planning Problem 2019-08-04T18:58:55.186Z · score: 16 (8 votes)
Is there a user's manual to using the internet more efficiently? 2019-08-04T18:51:38.818Z · score: 19 (9 votes)


Comment by zachary-robertson on Doing discourse better: Stuff I wish I knew · 2020-10-01T12:48:18.789Z · score: 1 (1 votes) · LW · GW

Yes, but StackExchange has community posts that editable and I think this is nice. I believe edits for normal posts work like you say.

Comment by zachary-robertson on Doing discourse better: Stuff I wish I knew · 2020-09-30T13:51:20.322Z · score: 1 (1 votes) · LW · GW

It could still be useful to see different ‘versions’ of an article and then just vote on the ones that are best.

Comment by zachary-robertson on ricraz's Shortform · 2020-08-21T00:24:38.789Z · score: 1 (1 votes) · LW · GW

Ya, totally messed up that. I meant the AI Alignment Forum or AIAF. I think out of habit I used AN (Alignment Newsletter)

Comment by zachary-robertson on ricraz's Shortform · 2020-08-20T23:13:56.178Z · score: 7 (4 votes) · LW · GW

On AI alone (which I am using in large part because there's vaguely more consensus around it than around rationality), I think you wouldn't have seen almost any of the public write-ups (like Embedded Agency and Zhukeepa's Paul FAQ) without LessWrong

I think a distinction should be made between intellectual progress (whatever that is) and distillation. I know lots of websites that do amazing distillation of AI related concepts (literally I think most people would agree that sort of work is important in order to make intellectual progress, but I also think significantly less people would agree distillation is intellectual progress. Having this distinction in mind, I think your examples from AI are not as convincing. Perhaps more so once you consider the Less Wrong is often being used more as a platform to share these distillations than to create them.

I think you're right that Less Wrong has some truly amazing content. However, once again, it seems a lot of these posts are not inherently from the ecosystem but are rather essentially cross-posted. If I say a lot of the content on LW is low-quality it's mostly an observation about what I expect to find from material that builds on itself. The quality of LW-style accumulated knowledge seems lower than it could be.

On a personal note, I've actively tried to explore using this site as a way to engage with research and have come to a similar opinion as Richard. The most obvious barrier is the separation between LW and AIAF. Effectively, if you're doing AI safety research, to second-order approximation you can block LW (noise) and only look at AIAF (signal). I say to second-order because anything from LW that is signal ends up being posted on AIAF anyway which means the method is somewhat error-tolerant.

This probably comes off as a bit pessimistic. Here's a concrete proposal I hope to try out soon enough. Pick a research question. Get a small group of people/friends together. Start talking about the problem and then posting on LW. Iterate until there's group consensus.

Comment by zachary-robertson on ricraz's Shortform · 2020-08-20T22:44:48.091Z · score: 6 (3 votes) · LW · GW

Setting them higher (standards) probably wouldn't result in more good content.

I broadly agree here. However, I do see the short-forms as a consistent way to skirt around this. I'd say at least 30% of the Less Wrong value proposition are the conversations I get to have. Short-forms seem to be more adapted for continuing conversations and they have a low bar for being made.

I could clarify a bit. My main problem with low quality content isn't exactly that it's 'wrong' or something like that. Mostly, the issues I'm finding most common for me are,

  1. Too many niche pre-requisites.
  2. No comments
  3. Nagging feeling post is reinventing the wheel

I think one is a ridiculously bad problem. I'm literally getting a PhD in machine learning, write about AI Safety, and still find a large number of those posts (yes AN posts) glazed in internal-jargon that makes it difficult to connect with current research. Things get even worse when I look at non-AI related things.

Two is just a tragedy of the fact the rich get richer. While I'm guilty of this also, I think that requiring posts to also post seed questions/discussion topics in the comments could go a long way to alleviate this problem. I oftentimes read a post and want to leave a comment, but then don't because I'm not even sure the author thought about the discussion their post might start.

Three is probably a bit mean. Yet, more than once I've discovered a Less Wrong concept already had a large research literature devoted to it. I think this ties in with one due to the fact niche pre-reqs often go hand-in-hand with insufficient literature review.

Comment by zachary-robertson on ricraz's Shortform · 2020-08-20T19:10:39.275Z · score: 9 (5 votes) · LW · GW

One fairly strong belief of mine is that Less Wrong's epistemic standards are not high enough to make solid intellectual progress here.

I think this is literally true. There seems to be very little ability to build upon prior work.

Out of curiosity do you see Less Wrong as significantly useful or is it closer to entertainment/habit? I've found myself thinking along the same lines as I start thinking about starting my PhD program etc. The utility of Less Wrong seems to be a kind of double-edged sword. On the one hand, some of the content is really insightful and exposes me to ideas I wouldn't otherwise encounter. On the other hand, there is such an incredible amount of low-quality content that I worry that I'm learning bad practices.

Comment by zachary-robertson on Developmental Stages of GPTs · 2020-08-16T13:48:49.728Z · score: 3 (2 votes) · LW · GW

The paper doesn't draw the causal diagram "Power → instrumental convergence", it gives sufficient conditions for power-seeking being instrumentally convergent. Cycle reachability preservation is one of those conditions.

This definitely feels like the place where I'm missing something. What is the formal definition of 'power seeking'? My understanding is that power is the rescaled value function, in the limit of farsightedness is decreasing, and in the context of terminal state reachability always goes to zero. The agent literally gives up power to achieve it's goal.

Now, I realize this might just be naming convention confusion. I do, I think, understand the idea that preserving cycle reachability could be instrumental. However,

Cycle reachability preservation is one of those conditions.

this seems circular to me. My understanding of figure 7 of your paper indicates that cycle reachability cannot be a sufficient condition.

You can formalize a kind of "alignment capability" by introducing a joint distribution over the human's goals and the induced agent goals

This is very interesting to me. Thank you for sharing. I wonder what you mean by,

The point isn't that alignment is impossible, but that you have to hit a low-measure set of goals which will give you aligned or non-power-seeking behavior.

Given your definitions it's clear that the set of aligned goals must be low-measure. Also by your reasoning 'non-power seeking behavior' is not instrumental. However, in a curricula, power-seeking must be instrumental or else the agent is less likely to achieve it's goals. It seems there's a two out of three condition (aligned/general/non-power-seeking) here. My philosophy is that aligned/general is OK based on a shared (?) premise that,

If the rewards are -close in sup-norm, then you can get nice regret bounds, sure.

Comment by zachary-robertson on Developmental Stages of GPTs · 2020-08-15T15:11:20.004Z · score: 3 (2 votes) · LW · GW

Thanks for the comment! I think max-ent brings up a related point. In IRL we observed behavior and infer a reward function (using max-ent also?). Ultimately, there is a relationship between state/action frequency and reward. This would considerably constrain the distribution of reward functions to be considered in instrumental/power analysis.

I think I get confused about the usage of power the most. It seems like you can argue that given a random reward to optimize the agent will try to avoid getting turned off without invoking power. If there's a collection of 'turned-off' terminal states where the agent receives no further reward for all time then every optimized policy will try to avoid such a state. It seems as though we could define for each and then we'd have,

It seems like this would extend out to a full definition. The advantage here being that you can say, “If one action in this state is more instrumental than another then the return is likely to be greater as well”.

I imagine that this is sufficient for the catastrophic power-stealing incentives

I'm not confident analysis in the single-agent case extends to the multi-agent setting. If our goal is fixed as and the agent's varies then I might argue it's instrumental for us to align the agent's goal with ours and vice versa. In general, I'd suspect that there are goals we could give the agent that significantly reduce our gain. However, I'd also suspect the opposite.

Say we have the capability to introduce a second agent with a reward . Would we want to introduce the agent? It seems reasonable to argue that we would if we could guarantee . There might be a way to argue over randomness and say this would double our gain. More speculatively, what if ?

Comment by zachary-robertson on Developmental Stages of GPTs · 2020-08-12T01:07:40.240Z · score: 1 (1 votes) · LW · GW

I think this is a slight misunderstanding of the theory in the paper.

I disagree. What I'm trying to do is outline a reinterpretation of the 'power seeking' claim. I'm citing the pre-task section and theorem 17 to insist that power-seeking can only really happen in the pre-task because,

The way the theory does this is by saying that first a reward function is drawn from the distribution, then it is given to the agent, then the agent thinks really hard, and then the agent executes the optimal policy.

The agent is done optimizing before the main portion of the paper even begins. I do not see how the agent 'seeks' out powerful states because, as you say, the agent is fixed. Now, when you say,

If we do not know an agent's goal, but we know that the agent knows its goal and is optimal w.r.t it, then from our perspective the agent is more likely to go to higher-power states. (From the agent's perspective, there is no probability, it always executes the deterministic perfect policy for its reward function.)

My issue is that the Figure 19 shows an example where the agent doesn't display this behavior. Tautologically, the agent tends to do what is instrumentally convergent. If power was tied to instrumental convergence then we could also say the agent tends to do what is powerful. However, it seems as though a state can be arbitrarily powerful without having the instrumental property which breaks the analogy.

From here I could launch a counter-argument: if power can be arbitrarily removed from the instrumental convergence phenomena then agent 'wireheading', while a powerful state, is sufficiently out of the way from most goals that the agent most likely won't. To be clear, I don't have any strong opinions, I'm just confused about these interpretive details.

Comment by zachary-robertson on Developmental Stages of GPTs · 2020-08-11T18:30:15.585Z · score: 3 (2 votes) · LW · GW

I appreciate the more concrete definition of IC presented here. However, I have an interpretation that is a bit different from you. I'm following the formal presentation.

My base understanding is that a cycle with max average reward is optimal. This is essentially just a definition. In the case the agent doesn't know the reward function, it seems clear that the agent ought to position it's self in a state which gives it access to as many of these cycles as possible.

In your paper, theorem 19 suggests that given a choice between two sets of 1-cycles and the agent is more likely to select the larger set. This makes sense. What doesn't make sense is the conclusion (theorem 17) that the agent selects states with more power. This is because at the very start of the paper it's mentioned that,

As an alternative motivation, consider an agent in a communicating MDP which is periodically assigned a task from a known distribution . Between tasks, the agent has as much downtime as required. To maximize return over time, the optimal policy during downtime is to navigate to the state with maximal .

According to theorem 17, loosing access to states means that power goes down (or stays constant). This seems to indicate power (cycle access) is really some sort of Lyapunov function for the dynamics. So at the outset, it seems clear that the agent will prefer states that maximize power, but then as soon as a determination is made on what the actual reward function is, power goes down, not up.

What I'm trying to point out here is that I find the distinction between pre-task optimization and execution to be loose. This is to such a degree that I find myself drawing the exact opposite conclusion: agents optimizing a generic reward will tend to give-up power.

At the moment, I find myself agreeing with the idea that an agent unaware of it's task will seek power, but also conclude that an agent aware of it's task will give-up power. My current opinion is that power seeking behavior is concentrated in the pre-task step. Giving the AI unrestricted 'free-time' to optimize with should 'never' be allowed. Now, I could be misunderstanding parts of the paper, but hopefully I've made things clear enough!

Comment by zachary-robertson on How will internet forums like LW be able to defend against GPT-style spam? · 2020-08-09T00:24:12.430Z · score: -6 (4 votes) · LW · GW

The fact that you and Zachary can't see a talk about countries without pattern matching into race seems illustrative of how screwed up the discourse.

Maybe? On further consideration, it seems you are the one pattern matching by making this generalization about the state of discourse.

It's not central but it helps people have models with gears to be able to visualize supply chains.

My point here was simply to get you to unpack your comment so that I could point out what you said could've been said more clearly another way (i.e. the original was not gears level)

This here is where the problem ultimately was. You think an explicit reference to 'poor Indian' specifically contributes to a gear level understanding of some kind. Otherwise, you would've reworded given that I'm arguing that these criteria are irrelevant. I'm kind of the opinion you've taken a half-essay to respond to a single comment I left a week ago because you didn't provide that gears level understanding in the first place. Instead you left a place-holder that I had to essentially prod you to unpack further.

Now that this has been done, wouldn't you agree it'd be wholly more accurate to replace your oversimplification with what you surely agree is the more accurate characterization determined in this comment section? If so, I think you're forced to admit that your original comment in fact did not provide the gears level understanding you think it did by virtue of the fact I got you to unpack the comment into actual gears.

Comment by zachary-robertson on Analyzing the Problem GPT-3 is Trying to Solve · 2020-08-07T12:52:32.933Z · score: 1 (1 votes) · LW · GW

I've argued that GPT3 can do a form of boosting if you pair it with an output filter. In terms of the language introduced here we have something like where filters the output with some probability according to . So if we have strong reason to believe that a good enough prompt exists then as we iterate the prompt GPT3 will get better at matching the output which allows for improvement.

Comment by zachary-robertson on Infinite Data/Compute Arguments in Alignment · 2020-08-05T01:39:37.309Z · score: 7 (4 votes) · LW · GW

Perhaps it's worth explicitly noting the relatedness to the canonical rendition of The Bitter Lesson. (Searching and Learning win in the long run)

Comment by zachary-robertson on is gpt-3 few-shot ready for real applications? · 2020-08-03T22:45:36.587Z · score: 3 (2 votes) · LW · GW

So storage no longer scales badly with the number of operations you define. However, latency still does, and latency per call is now much larger, so this might end up being as much of a constraint. The exact numbers – not well understood at this time – are crucial: in real life the difference between 0.001 seconds, 0.1 seconds, 1 second, and 10 seconds will make or break your project.

This does seem to be a big issue for practical applications. Yet, I'm very much of the opinion that the API is more about exploring areas where fine-tuning would be useful. As you note, OpenAI does both. I'd assume a common use pattern will end up being something like: use few-shot, release, collect data, fine-tune, rinse-repeat.

(-3) Unlike supervised learning, there’s no built-in mechanism where you continually improve as your application passively gathers data during usage.

I think it's worth remembering that OpenAI is getting incredibly valuable data right now from the API. Adding more data about how people interact with the model seems completely doable with the centralization setup OpenAI has.

Comment by zachary-robertson on Sufficiently Advanced Language Models Can Do Reinforcement Learning · 2020-08-03T13:32:47.244Z · score: 5 (2 votes) · LW · GW

This paper looks interesting. My understanding is that this paper implemented a form of fine-tuning. However, learning the reward function does not seem to be few-shot whereas GPT3 does few-shot pretty well. That’s the main difference here as I see it.

It seems like there’s slow adaption (this paper) which is useful for more complicated tasks and fast adaption (the method here) that is useful for disposable tasks. I’d think a combination of both approaches is needed. For example, a module that tracks repeatedly occurring tasks can start a larger buffer to perform slow adaption.

Perhaps on a meta-level fine tuning GPT3 to few-shot inverse reinforcement learning would be an example of what could be possible with combining both approaches?

Comment by zachary-robertson on Sufficiently Advanced Language Models Can Do Reinforcement Learning · 2020-08-02T23:56:24.439Z · score: 2 (2 votes) · LW · GW

Correct. That’s why the section on classification and RL are separate. Classification tasks are a subclass of RL. A recurrent task need not be a classification task. In fact that I’d go further and say there’s still a huge difference between having an agent that can do RL and having an AGI. That’s why I put such speculation at the end.

Having said all that, it seems plausible to me that a language model might be able to reason about what modules it needs and then design them. I implicitly believe this to be the case, but perhaps I could’ve been more explicit. This is more of an empirical question, but if that were possible the difference between that model and AGI would become much smaller in my opinion.

Comment by zachary-robertson on Sufficiently Advanced Language Models Can Do Reinforcement Learning · 2020-08-02T18:54:19.726Z · score: 1 (1 votes) · LW · GW

Hopefully that's fixed! I wrote this as quickly as possible so there may be many tiny errors. Apologies. Let me know if anything else is wrong.

Comment by zachary-robertson on Power as Easily Exploitable Opportunities · 2020-08-02T12:16:41.157Z · score: 1 (1 votes) · LW · GW

Oh, if you read Understand from Story of Our Lives and Others by Ted Chiang you end up with a scenario where a human ends up finding a way to hack biological feedback loops into other people. At least, that’s what I immediately thought of when I read this.

Comment by zachary-robertson on Would AGIs parent young AGIs? · 2020-08-02T02:20:24.085Z · score: 1 (1 votes) · LW · GW

You might be interested in reading The Lifecycle of Software Objects by Ted Chiang

Comment by zachary-robertson on How will internet forums like LW be able to defend against GPT-style spam? · 2020-07-29T13:55:23.915Z · score: 2 (2 votes) · LW · GW

It's stereotyping to assume X will copy-paste a lot of posts per hour for little money where X is actually based on class/race status. Also, it's not central to your point so it seems easy to just remove.

Comment by zachary-robertson on How will internet forums like LW be able to defend against GPT-style spam? · 2020-07-29T12:21:05.784Z · score: 1 (6 votes) · LW · GW

I think the stereotyping (‘poor Indian’) is unnecessary to your point.

Comment by zachary-robertson on You Can Probably Amplify GPT3 Directly · 2020-07-27T13:41:43.080Z · score: 2 (2 votes) · LW · GW

Thanks! I forgot to do this. Luckily I can go back through the run and put this is in. There is ambiguity whenever it auto-completes, but I hope I did a decent job of noting where this is happening.

Comment by zachary-robertson on You Can Probably Amplify GPT3 Directly · 2020-07-27T13:09:49.102Z · score: 5 (4 votes) · LW · GW

You could prompt with “Q:” + (content) and then “A:”

I use the default settings on the temperature, but I do cut it off after it finishes an answer. However, you likely won’t get my exact results unless you literally copy the instances. Moreover, if you gave up after the first response I think might’ve given up to quickly. You can respond to it and communicate more information, as I did. The above really was what I got on the first try. It’s not perfect, but that’s the point. You can teach it. It’s not “it works” or “it doesn’t work”.

I don’t think there are tutorials, but perhaps in due time someone (maybe me) will get to that. I also feel like ‘trying’ to get it to do something might be a sub-optimal approach. This is a subtle difference, but my intent here was to get it to confirm it understood what I was asking by answering questions.

Comment by zachary-robertson on You Can Probably Amplify GPT3 Directly · 2020-07-27T01:33:09.293Z · score: 6 (4 votes) · LW · GW

I agree. Coming up with the right prompts was not trivial. I almost quit several times. Yet, there is a science to this and I think it’ll become more important to turn out focus away from the spectacle aspects of GPT and more towards reproducibility. More so if the way forward is via interrelated instances of GPT.

As an aside, critique seems much easier than generation. I’m cautiously optimistic about prompting GPT instances to “check” output.

Comment by zachary-robertson on Alignment As A Bottleneck To Usefulness Of GPT-3 · 2020-07-23T01:23:33.970Z · score: 3 (2 votes) · LW · GW

My problem is that this doesn't seem to scale. I like the idea of visual search, but I also realize you're essentially bit-rate limited in what you can communicate. For example, I'd about give up if I had to write my reply to you using a topic model. Other places in this thread mention semi-supervised learning. I do agree with the idea of taking a prompt and auto-generating the relevant large prompt that at the moment is manually being written in.

Comment by zachary-robertson on Alignment As A Bottleneck To Usefulness Of GPT-3 · 2020-07-22T12:00:43.985Z · score: 3 (2 votes) · LW · GW

Thanks for the link! I’ll partially accept the variations example. That seems to qualify as “show me what you learned”. But I’m not sure if that counts as an interface simply because of the lack of interactivity/programability.

Comment by zachary-robertson on Alignment As A Bottleneck To Usefulness Of GPT-3 · 2020-07-22T02:19:57.463Z · score: 2 (2 votes) · LW · GW

Have we ever figured out a way to interface with what something has learned that doesn't involve language prompts? I'm serious. What other options are you trying to hint at? I think manipulating hidden layers is a terrible approach, but I won't expound on that here.

Comment by zachary-robertson on Idea: Imitation/Value Learning AIXI · 2020-07-04T21:58:34.540Z · score: 1 (1 votes) · LW · GW

I agree with what you’re saying. Perhaps, I’m being a bit strong. I’m mostly talking about ambitious value learning in an open-ended environment. The game of Go doesn’t have inherent computing capability so anything the agent does is rather constrained to begin with. I’d hope (guess) that alignment in similarly closed environments is achievable. I’d also like to point out that in such scenarios I’d expect it to be normally possible to give exact goal descriptions rendering value learning superfluous.

In theory, I’m actually onboard with a weakly superhuman AI. I’m mostly skeptical of the general case. I suppose that makes me sympathetic to approaches that iterate/collectivize things already known to work.

Comment by zachary-robertson on Idea: Imitation/Value Learning AIXI · 2020-07-04T02:19:16.342Z · score: 1 (1 votes) · LW · GW

However, why should you expect to be a "better" policy than according to human values?

I feel like this is sneaking in the assumption that we're going to partition the policy into an optimization step and a value learning step. Say we train using data sampled from , then my point is that generalizes to optimally. Value learning doesn't do this. In the context of algorithmic complexity, value learning inserts a prior about how a policy ought to be structured.

On a philosophical front, I'm of the opinion that any error in defining "human values" will blow up arbitrarily if given to an optimizer with arbitrary capability. Thus, the only way to safely work with this inductive bias is to constrain the capability of the optimizer. If this is done correctly, I'd assume the agent will only be barely superhuman according to "human values". These are extra steps and regularizations that effectively remove the inductive bias with the promise that we can then control how "superhuman" the agent will be. My conclusion (tentatively) is that there is no way to arbitrarily extrapolate human values and doing so, even a little, introduces risk.

Comment by zachary-robertson on Idea: Imitation/Value Learning AIXI · 2020-07-04T00:37:45.464Z · score: 1 (1 votes) · LW · GW

I guess I'm confused by your third point. It seems clear that AIXI optimized on any learned reward function will have superhuman performance. However, AIXI is completely unaligned via wireheading. My point with the Kolgomorov argument is that AIXIL is much more likely to behave reasonably than AIXVL. Almost by definition, AIXIL will generalize most similarly to a human. Moreover, any value learning attempt will have worse generalization capability. I'm hesitant, but it seems I should conclude value alignment is a non-starter.

Comment by zachary-robertson on Superexponential Historic Growth, by David Roodman · 2020-06-19T22:43:57.320Z · score: 3 (2 votes) · LW · GW

It’d be nice if they would’ve plotted the projected singularity date using partial slices (filtrations) of the data. In other words, if we did the analysis in year X what is the projected singularity year Y?

Comment by zachary-robertson on Inaccessible information · 2020-06-04T04:50:57.739Z · score: 1 (1 votes) · LW · GW

As an aside, it really seems like the core issue rests with unease about using machine generated concepts to predict things about the world. Yet, the truth is that humans normally operate in the very way you don't want. We explain reasoning by heavily sanitizing our thought process. For example, I came up with several bad intuitions for why ML is a reasonable solution and that informed me that humans are also bad at the thing you think we're good at. See how convoluted that last sentence was? The point is that when I formally type up my thoughts I'm also sanitizing them so you don't see reasoning structure I don't want you to see.

The basic motivation to avoid sharing sensitive information leads me to maintain somewhat high differential privacy in communication. I'd like the result to be rather disentangled from the process. Well, we certainly agree that the process largely consists of inaccessible information. Trivially, you're only judging my writing based on what you can read...even if you make a latent model about me. Putting these two facts together, disagreement/debate only really exists if we have differing internal models.

Using your post as start, I'd say the strategic advantage of inaccessible information incentives agents (instrumentally) to have internal models of the kind I just described. This means that differential privacy would have instrumental value as a mechanism to prevent agents from peering inside one another without explicit permission from one another.

Comment by zachary-robertson on Inaccessible information · 2020-06-04T04:26:11.526Z · score: 1 (1 votes) · LW · GW

I actually enjoyed the post, but I'm not convinced of the relevance of this topic. You seem to be concerned that certain queries to an intelligent oracle might result in returning false, but plausible answers. On the one hand, this seems relevant. On the other hand, I struggle to actually come up with an example where this can happen. In principle, everything a model does with data is observable. You write,

At that point “picking the model that matches the data best” starts to look a lot like doing ML, and it’s more plausible that we’re going to start getting hypotheses that we don’t understand or which behave badly.

This confuses me. Modern ML is designed to engage in automatic feature generation. It turns out that engineered features introduce more bias than their worth. For example, it's fair to point out racial bias in facial recognition technology. However, it's also sensible to argue that creating a method to automatically engineer concepts to discriminate with is a major advance in removing human bias (inaccessible information) from the process.

But, but, but then you have groups that intentionally choose to use controversial concepts, such as facial features, to infer things such as income, propensity to violence, etc. You see this is where the bait-and-switch seems to come in here. It's not the machine's fault for being used to make spurious judgments. The real culprit is poor human reasoning. So then,

...or we need to figure out some way to access the inaccessible information that “A* leads to lots of human flourishing.”

sets off an alarm bell in my head. How is this any different than trying to use ML to 'catch' terrorists via facial recognition? While I'll readily admit ML models can learn good/bad concepts to use for a downstream task, the idea that these concepts also map onto human-readable concepts seem rather tenuous.

So I conclude that the idea of making all concepts used by a machine human-readable seems dubious. You really want good/high-dimensional data that makes no assumptions on the concepts it's going to be used to model with. ML concepts come with PAC guarantees people don't.

Comment by zachary-robertson on Reexamining The Dark Arts · 2020-06-02T05:16:39.401Z · score: 0 (3 votes) · LW · GW

This is out of context, perhaps even nitpicking. It’s clear they’re talking about universality. In that context, sustainable implies allowable.

Comment by zachary-robertson on GPT-3: a disappointing paper · 2020-05-29T19:32:06.851Z · score: 22 (12 votes) · LW · GW

Reading this I get the impression you have mismanaged expectations of what you think GPT-3 would do (ie should only be reserved for essentially pseudo-AGI)...but scaling GPT to the point of diminishing returns is going to take several more years. As everyone is stressing, they don’t even fit the training data at the moment.

Comment by zachary-robertson on OpenAI announces GPT-3 · 2020-05-29T19:03:42.118Z · score: 5 (3 votes) · LW · GW

GPT-2 was a hype fest, while this gets silently released on ArXiv. I’m starting think there’s something real here. I think before I’d laugh anyone who suggested GPT-2 could reason. I still think that’s true with GPT-3, but I wouldn’t laugh anymore. It seems possible massive scaling could legitimately produce a different kind of AI then anything we’ve seen yet.

Comment by zachary-robertson on OpenAI announces GPT-3 · 2020-05-29T19:00:29.387Z · score: 5 (3 votes) · LW · GW

While I’m not sure how easy plugging into a DRL algorithm will be, this seems to be the obvious next step. On the other hand, I suspect DRL isn’t really mature enough to work as an integrating paradigm.

Comment by zachary-robertson on What is your internet search methodology ? · 2020-05-23T21:07:38.714Z · score: 4 (2 votes) · LW · GW

I asked a related question and got some answers about finding things on the internet. Didn’t completely satisfy me, but my question was significantly more vague so it might help you!

Comment by zachary-robertson on Orthogonality · 2020-05-21T03:51:25.210Z · score: 5 (3 votes) · LW · GW

I think that a hidden assumption here is that improving in a weak skill always has a positive spillover affect on other skills. There might be a hidden truth within this. Namely, sometimes unlearning things will be the best way to make progress.

Comment by zachary-robertson on Orthogonality · 2020-05-21T03:41:12.386Z · score: 3 (2 votes) · LW · GW

Perhaps this can be connected with another recent post. It was pointed about in Subspace Optima that when we optimize we do so under constraints external or internal. It seems like you had an internal constraint stopping you from optimizing over the whole space. Instead you focused on what you thought was the most correlated trait. This almost reads like an insight following the realization you’ve been optimizing a skill along a artificial sub-space.

Comment by zachary-robertson on What are your greatest one-shot life improvements? · 2020-05-17T21:24:53.288Z · score: 1 (1 votes) · LW · GW

Do this at the end of the day as a way to review progress?

Comment by zachary-robertson on What are your greatest one-shot life improvements? · 2020-05-17T21:22:59.475Z · score: 11 (7 votes) · LW · GW

Can I get clarification on what sort of emotions were problematic and/or what reactions were problematic? I’m wondering if this was rumination or in the moment reactions.

Comment by zachary-robertson on What are your greatest one-shot life improvements? · 2020-05-17T13:55:48.660Z · score: 2 (2 votes) · LW · GW

This also helped me for getting up on time!

Comment by zachary-robertson on What newsletters are you subscribed to, and why? · 2020-05-14T18:56:58.487Z · score: 2 (2 votes) · LW · GW

Just a meta-comment. If you don’t give a description of the feed, I found myself very unlikely to look at the url.

Comment by zachary-robertson on Zachary Robertson's Shortform · 2020-05-11T02:57:03.630Z · score: 1 (1 votes) · LW · GW

I think it's worth taking a look at what's out there:

  • SpanBERT
    • Uses random spans to do masked pre-training
    • Seems to indicate that using longer spans is essentially difficult
  • Distillation of BERT Models
    • BERT embeddings are hierarchical
Comment by zachary-robertson on Zachary Robertson's Shortform · 2020-05-11T02:29:48.606Z · score: 1 (1 votes) · LW · GW

I'm aware of this. I'm slowly piecing together what I'm looking for if you decide to follow this.

Comment by zachary-robertson on Zachary Robertson's Shortform · 2020-05-11T02:13:00.658Z · score: 1 (1 votes) · LW · GW

Markov and general next-token generators work well when conditioned with text. While some models, such as Bert, are able to predict masked tokens I'm not aware of models that are able to generate the most likely sentence that would sit between a given start/end prompt.

It's worth working in the Markov setting to get a grounding for what we're looking for. The core of Markov model is the transition matrix which tells us the conditional likelihood of the token following immediately after the token . The rules of conditional probability allow us to write,

This gives us the probability of a token occurring immediately between the start/end prompts. In general we're interested in what happens if we 'travel' from the starting token to the ending token over time steps. Say we want to see the distribution of tokens at time step . Then we'd write,

This shows us that we can break up the conditional generation process into a calculation over transition probabilities. We could write this out for an arbitrary sequence of separated words. From this perspective we'd be training a model to perform a regression over the words being generated. This is the sense in which we already use outlines to effectively create regression data-sets to model arguments.

What would be ideal is to find a way to generalize this to a non-Markovian, preferably deep-learning, setting. This is where I'm stuck at the moment. I'd want to understand where the SOTA is on this. The only options that immediately come to mind seem to be tree-search over tokens or RL. From the regression point of view, it seems like you'd want to try fitting the 'training data' such that the likelihood for the result is as high as possible.

Comment by zachary-robertson on The Mind: Board Game Review · 2020-05-10T03:54:36.291Z · score: 3 (2 votes) · LW · GW

I'll independently support that this is an amazing game. It's a really good icebreaker or way to get a sense of a person because it's both short, collaborative, and intense.

Comment by zachary-robertson on Zachary Robertson's Shortform · 2020-05-10T03:45:39.863Z · score: 1 (1 votes) · LW · GW

If we're taking the idea that arguments are paths in topological space seriously, I feel like conditioned language models are going to be really important. We already use outlines to effectively create regression data-sets to model arguments. It seems like modifying GPT-2 so that you can condition on start/end prompts would be incredibly helpful here. More speculative, I think that GPT-2 is near the best we'll ever get at next word prediction. Humans use outline like thinking much more often then is commonly supposed.

Comment by zachary-robertson on Zachary Robertson's Shortform · 2020-05-10T03:36:07.405Z · score: 1 (1 votes) · LW · GW

Not really sure, if I was really going for it, I could do about 15-25 posts. I'm going back and forth on which metrics to use. This seems highly tied to what I actually want feedback on. What do you mean by Q&A?