Book Club: Software Design for Flexibility 2021-03-18T15:42:59.376Z
Competition: Amplify Rohin’s Prediction on AGI researchers & Safety Concerns 2020-07-21T20:06:09.194Z
Machine Learning Projects on IDA 2019-06-24T18:38:18.873Z
Factored Cognition 2018-12-05T01:01:43.544Z


Comment by stuhlmueller on Forecasting Thread: AI Timelines · 2020-08-22T04:50:52.662Z · LW · GW

My quick take:

Comment by stuhlmueller on Competition: Amplify Rohin’s Prediction on AGI researchers & Safety Concerns · 2020-08-04T00:58:43.775Z · LW · GW

Rohin has created his posterior distribution! Key differences from his prior are at the bounds:

  • He now assigns 3% rather than 0.1% to the majority of AGI researchers already agreeing with safety concerns.
  • He now assigns 40% rather than 35% to the majority of AGI researchers agreeing with safety concerns after 2100 or never.

Overall, Rohin’s posterior is a bit more optimistic than his prior and more uncertain.

Ethan Perez’s snapshot wins the prize for the most accurate prediction of Rohin's posterior. Ethan kept a similar distribution shape while decreasing the probability >2100 less than the other submissions.

The prize for a comment that updated Rohin’s thinking goes to Jacob Pfau! This was determined by a draw with comments weighted proportionally to how much they updated Rohin’s thinking.

Thanks to everyone who participated and congratulations to the winners! Feel free to continue making comments and distributions, and sharing any feedback you have on this competition.

Comment by stuhlmueller on Ought: why it matters and ways to help · 2019-08-09T00:02:28.748Z · LW · GW

Thanks for this post, Paul!

NOTE: Response to this post has been even greater than we expected. We received more applications for experiment participant than we currently have the capacity to manage so we are temporarily taking the posting down. If you've applied and don't hear from us for a while, please excuse the delay! Thanks everyone who has expressed interest - we're hoping to get back to you and work with you soon.

Comment by stuhlmueller on The Stack Overflow of Factored Cognition · 2019-04-21T16:22:56.497Z · LW · GW

It's correct that, so far, Ought has been running small-scale experiments with people who know the research background. (What is amplification? How does it work? What problem is it intended to solve?)

Over time, we also think it's necessary to run larger-scale experiments. We're planning to start by running longer and more experiments with contractors instead of volunteers, probably over the next month or two. Longer-term, it's plausible that we'll build a platform similar to what this post describes. (See here for related thoughts.)

The reason we've focused on small-scale experiments with a select audience is that it's easy to do busywork that doesn't tell you anything about the question of interest. The purpose of our experiments so far has been to get high-quality feedback on the setup, not to gather object-level data. As a consequence, the experiments have been changing a lot from week to week. The biggest recent change is the switch from task decomposition (analogous to amplification with imitation learning as distillation step) to decomposition of evaluation (analogous to amplification with RL as distillation step). Based on these changes, I think that if we had stopped at any point so far and focused on scaling up instead of refining the setup, it would have been a mistake.

Comment by stuhlmueller on Factored Cognition · 2018-12-05T03:38:08.194Z · LW · GW

The log is taken from this tree. There isn't much more to see than what's visible in the screenshot. Building out more complete versions of meta-reasoning trees like this is on our roadmap.

Comment by stuhlmueller on Factored Cognition · 2018-12-05T01:14:17.001Z · LW · GW

What I'd do differently now:

  • I'd talk about RL instead of imitation learning when I describe the distillation step. Imitation learning is easier to explain, but ultimately you probably need RL to be competitive.
  • I'd be more careful when I talk about internal supervision. The presentation mixes up three related ideas:
    • (1) Approval-directed agents: We train an ML agent to interact with an external, human-comprehensible workspace using steps that an (augmented) expert would approve.
    • (2) Distillation: We train an ML agent to implement a function from questions to answers based on demonstrations (or incentives) provided by a large tree of experts, each of which takes a small step. The trained agent is a big neural net that only replicates the tree's input-output behavior, not individual reasoning steps. Imitating the steps directly wouldn't be possible since the tree would likely be exponentially large and so has to remain implicit.
    • (3) Transparency: When we distill, we want to verify that the behavior of the distilled agent is a faithful instantiation of the behavior demonstrated (or incentivized) by the overseer. To do this, we might use approaches to neural net interpretability.
  • I'd be more precise about what the term "factored cognition" refers to. Factored cognition refers to the research question whether (and how) complex cognitive tasks can be decomposed into relatively small, semantically meaningful pieces. This is relevant to alignment, but it's not an approach to alignment on its own. If factored cognition is possible, you'd still need a story for leveraging it to train aligned agents (such as the other ingredients of the iterated amplification program), and it's of interest outside of alignment as well (e.g. for building tools that let us delegate cognitive work to other people).
  • I'd hint at why you might not need an unreasonably large amount of curated training data for this approach to work. When human experts do decompositions, they are effectively specifying problem solving algorithms, which can then be applied to very large external data sets in order to generate subquestions and answers that the ML system can be trained on. (Additionally, we could pretrain on a different problem, e.g. natural language prediction.)
  • I'd highlight that there's a bit of a sleight of hand going on with the decomposition examples. I show relatively object-level problem decompositions (e.g. Fermi estimation), but in the long run, for scaling to problems that are far beyond what the human overseer could tackle on their own, you're effectively specifying general algorithms for learning and reasoning with concepts, which seems harder to get right.