A parable of brightspots and blindspots 2021-03-21T18:18:51.531Z
Some blindspots in rationality and effective altruism 2021-03-19T11:40:05.618Z
Delegated agents in practice: How companies might end up selling AI services that act on behalf of consumers and coalitions, and what this implies for safety research 2020-11-26T11:17:18.558Z
The Values-to-Actions Decision Chain 2018-06-30T21:52:02.532Z
The first AI Safety Camp & onwards 2018-06-07T20:13:42.962Z


Comment by Remmelt Ellen (remmelt-ellen) on Some blindspots in rationality and effective altruism · 2021-06-14T17:04:53.853Z · LW · GW

I hadn't read about these specific cases yet, thanks! I appreciate your nuances here

Comment by Remmelt Ellen (remmelt-ellen) on Some blindspots in rationality and effective altruism · 2021-05-15T09:22:21.534Z · LW · GW

This interview with Jacqueline Novogratz from Acumen Fund covers some practical approaches to attain skin in the game.

Comment by Remmelt Ellen (remmelt-ellen) on Some blindspots in rationality and effective altruism · 2021-05-15T09:14:46.419Z · LW · GW

Two people asked me to clarify this claim:

Going by projects I've coordinated, EAs often push for removing paper conflicts of interest over attaining actual skin in the game.

Copying over my responses:

re: Conflicts of interest:

My impression has been that a few people appraising my project work looked for ways to e.g. reduce Goodharting, or the risk that I might pay myself too much from the project budget. Also EA initiators sometimes post a fundraiser write-up for an official project with an official plan, that somewhat hides that they're actually seeking funding for their own salaries to do that work (the former looks less like a personal conflict of interest *on paper*).

re: Skin in the game:

Bigger picture, the effects of our interventions aren't going to affect us in a visceral and directly noticeable way (silly example: we're not going to slip and fall from some defect in the malaria nets we fund). That seems hard to overcome in terms of loose feedback from far-away interventions, but I think it's problematic that EAs also seem to underemphasise skin in the game for in-between steps where direct feedback is available. For example, EAs seem sometimes too ready to pontificate (me included) about how particular projects should be run or what a particular position involves, rather than rely on the opinions/directions of an experienced practician who would actually suffer the consequences of failing (or even be filtered out of their role) if they took actions that had negative practical effects for them. Or they might dissuade someone from initiating an EA project/service that seems risky to them in theory, rather than guide the initiator to test it out locally to constrain or cap the damage.

Comment by Remmelt Ellen (remmelt-ellen) on Some blindspots in rationality and effective altruism · 2021-05-01T13:42:55.229Z · LW · GW

Some further clarification and speculation:

Edits interlude: 
People asked for examples of focussing on (interpreting & forecasting) 
processes vs. structures. 
See here.

This links to more speculative brightspot-blindspot distinctions:
7. Trading off sensory groundedness vs. representational stability of 
believed aspects
- in learning structure: a direct observation's recurrence vs. sorted identity
- in learning process: a transition of observed presence vs. analogised relation 

8. Trading off updating your interpretations vs. forecasts:
A. Interpret (recognise + represent) aspects
     eg. classical archaeologists focus on differentiated recognition of artefacts, 
     linguistic anthropologists on representation of differentiated social contexts 
B. Forecast (sample + predict) aspects
     eg. development economists focus more on calibrated sampling of metrics, 
     global prio scholars on calibrating their predictions of distilled scenarios 

9. Gain vs. loss motivated focus 
    (attain more positive vs. remove more negative valence)

Comment by Remmelt Ellen (remmelt-ellen) on Some blindspots in rationality and effective altruism · 2021-04-15T23:06:54.367Z · LW · GW

This is clarifying, thank you!

Comment by Remmelt Ellen (remmelt-ellen) on Some blindspots in rationality and effective altruism · 2021-04-13T15:51:22.579Z · LW · GW

I also noticed I was confused. Feels like we're at least disentangling cases and making better distinctions here.
BTW, just realised that a problem with my triangular prism example is that theoretically no will rectangular side can face up parallel to the floor at the same time, just two at 60º angles).

But on the other hand x is not sufficient to spot when we have a new type of die (see previous point) and if we knew more about the dice we could do better estimates which makes me think that it is epistemic uncertainty.

This is interesting. This seems to ask the question 'Is a change in the quality of x like colour actually causal to outcomes y?' Difficulty here is that you can never fully be certain empirically, just get closer to [change in roll probability] for [limit number of rolls -> infinity] = 0.

Comment by Remmelt Ellen (remmelt-ellen) on Some blindspots in rationality and effective altruism · 2021-04-13T11:04:29.334Z · LW · GW

Thank you! That was clarifying especially the explanation of epistemic uncertainty for y. 

1. I've been thinking about epistemic uncertainty more in terms of 'possible alternative qualities present', where 

  • you don't know the probability of a certain quality being present for x (e.g. what's the chance of the die having an extended three-sided base?).
  • or might not even be aware of some of the possible qualities that x might have (e.g. you don't know a triangular prism die can exist).

2. Your take on epistemic uncertainty for that figure seems to be

  • you know of x's possible quality dimensions (e.g. relative lengths and angles of sides at corners).
  • but given a set configuration of x (e.g. triangular prism with equilateral triangle sides = 1, rectangular lengths = 2 ), you don't know yet the probabilities of outcomes for y (what's the probability of landing face up for base1, base2, rect1, rect2, rect3?).

Both seem to fit the definition of epistemic uncertainty. Do correct me here!

Edit: Rough difference in focus: 
 1. Recognition and Representation
 2. Sampling and Prediction

Comment by Remmelt Ellen (remmelt-ellen) on Some blindspots in rationality and effective altruism · 2021-03-24T15:26:00.711Z · LW · GW

Well-written! Most of this definitely resonates for me.

Quick thoughts:

  • Some of the jargon I've heard sounded plain silly from a making-intellectual-progress-perspective (not just implicit aggrandising). Makes it harder to share our reasoning, even to each other, in a comprehensible, high-fidelity way. I like Rob Wiblin's guide on jargon.
  • Perhaps we put too much emphasis on making explicit communication comprehensible. Might be more fruitful to find ways to recognise how particular communities are set up to be good at understanding or making progress in particular problem niches, even if we struggle to comprehend what they're specifically saying or doing.

(I was skeptical about the claim 'majority of people are explicit utilitarians' – i.e. utilitarian not just consequentialist or some pluralistic mix of moral views  – but EA Survey responses seems to back it up: ~70% utilitarian)

Comment by Remmelt Ellen (remmelt-ellen) on Some blindspots in rationality and effective altruism · 2021-03-24T14:58:30.362Z · LW · GW

This is a good question hmm. Now I’m trying to come up with specific concrete cases, I actually feel less confident of this claim.

Examples that did come to mind:

  1. I recall reading somewhere about early LessWrong authors reinventing concepts that were already worked out before in philosophic disciplines (particularly in decision theory?). Can't find any post on this though.
  2. More subtly, we use a lot of jargon. Some terms were basically imported from academic research (say into cognitive biases) and given a shiny new nerdy name that appeals to our incrowd.  In the case of CFAR, I think they were very deliberate about renaming some concepts, also to make them more intuitive for workshops participants (eg. implementation intentions -> trigger action plans/patterns, pre-mortem -> murphijitsu). 

    (After thinking about this, I called with someone who is doing academic research on Buddhist religion. They independently mentioned LW posts on 'noticing', which basically is a new name for a mediation technique that has been practiced for millennia.)

    Renaming is not reinventing of course, but the new terms do make it harder to refer back to sources from established research literature.  Further, some smart amateur blog authors like to synthesise and intellectually innovate upon existing research (eg. see Scott Alexander's speculative posts, or my post above ^^). 

    The lack of referencing while building up innovations can cause us to misinterpret and write stuff that poorly reflects previous specialist research. We're building up our own separated literature database.

    A particular example is Robin Hanson 'near-far mode', from a concise and well-articulated review paper about psychological distance to the community, which spawned a lot of subsequent posts about implications for thinking in the community (but with little referencing to other academic studies or analyses). 
    E.g. Hanson's idea that people are hypocritical when they signal high-construal values but are more honest when they think concretely – a psychology researcher who seems rigorously minded said to me that he dug into Hanson's claim but that conclusions from other studies don’t support this.
  3. My impression from local/regional/national EA community building is that a many organisers (including me) either tried to work out how to run their group from first principles, or consulted with other more experienced organisers.  We could also have checked for good practices from and consulted with other established youth movements. I have seen plenty of write-ups that go through the former, but little or none of the other.

Definitely give me counter-examples!

Comment by Remmelt Ellen (remmelt-ellen) on Some blindspots in rationality and effective altruism · 2021-03-24T14:13:39.309Z · LW · GW

Looks cool, thanks! Checking if I understood it correctly:
- is x like the input data?
- could y correspond to something like the supervised (continuous) labels of a neural network, which inputs are matched too?
- does epistemic uncertainty here refer to that inputs for x could be much different from the current training dataset if sampled again (where new samples could turn out be outside of the current distribution)?

Comment by Remmelt Ellen (remmelt-ellen) on Some blindspots in rationality and effective altruism · 2021-03-20T18:12:13.554Z · LW · GW

How about 'disputed'?

Seems good. Let me adjust!

My impression is that gradual takeoff has gone from a minority to a majority position on LessWrong, primarily due to Paul Christiano, but not an overwhelming majority

This roughly corresponds with my impression actually. 
I know a group that has surveyed researchers that have permission to post on the AI Alignment Forum, but they haven't posted an analysis of the survey's answers yet.

Comment by Remmelt Ellen (remmelt-ellen) on Some blindspots in rationality and effective altruism · 2021-03-20T17:57:56.868Z · LW · GW

Yeah, seems awesome for us to figure out where we fit within that global portfolio! Especially in policy efforts, that could enable us to build a more accurate and broadly reflective consensus to help centralised institutions improve on larger-scale decisions they make (see a general case for not channeling our current efforts towards making EA the dominant approach to decision-making). 

To clarify, I hope this post helps readers become more aware of their brightspots (vs. blindspots) that they might hold in common with like-minded collaborators – ie. areas they notice (vs. miss) that map to relevant aspects of the underlying territory. 

I'm trying to encourage myself and the friends I collaborate with to build up an understanding of alternative approaches that outside groups take up (ie. to map and navigate their surrounding environment), and where those approaches might complement ours. Not necessarily for us to take up more simultaneous mental styles or to widen our mental focus or areas of specialisation. But to be able to hold outside groups' views so we get roughly where they are coming from, can communicate from their perspective, and form mutually beneficial partnerships.

More fundamentally, as human apes, our senses are exposed to an environment that is much more complex than just us. So we don't have the capacity to process our surroundings fully, nor to perceive all the relevant underlying aspects at once. To map the environment we are embedded in, we need robust constraints for encoding moment-to-moment observations, through layers of inductive biases, into stable representations.

Different-minded groups end up with different maps. But in order to learn from outside critics of EA, we need to be able to line up our map better with theirs. 


Let me throw an excerpt from an intro draft on the tool I'm developing. Curious for your thoughts!

Take two principles for a collaborative conversation in LessWrong and Effective Altruism:

  1. Your map is not the territory: 
    Your interlocutor may have surveyed a part of the bigger environment that you haven’t seen yet. Selfishly ask for their map, line up the pieces of their map with your map, and combine them to more accurately reflect the underlying territory.
  2. Seek alignment:
    Rewards can be hacked. Find a collaborator whose values align with your values so you can rely on them to make progress on the problems you care about.

When your interlocutor happens to have a compatible map and aligned values, such principles will guide you to learn missing information and collaborate smoothly. 

On the flipside, you will hit a dead end in your new conversation when:

  1. you can’t line up their map with yours to form a shared understanding of the territory. 
    Eg. you find their arguments inscrutable.
  2. you don’t converge on shared overarching aims for navigating the territory.
    Eg. double cruxes tend to bottom out at value disagreements.

You can resolve that tension with a mental shortcut: 
When you get confused about what they mean and fundamentally disagree on what they find important, just get out of their way. Why sink more of your time into a conversation that doesn’t reveal any new insights to you? Why risk fuelling a conflict?

This makes sense, and also omits a deeper question: why can’t you grasp their perspective? 
Maybe they don’t think the same things through as rigorously as you, and you pick up on that. Maybe they dishonestly express their beliefs or preferences, and you pick up on that. Maybe they honestly shared insights that you failed to pick up on.

Underlying each word you exchange is your perception of the surrounding territory ...
A word’s common definition masks our perceptual divide. Say you and I both look at the same thing and agree which defined term describes it. Then, we can mention this term as a pointer to what we both saw.  Yet, the environment I perceive that I point the term to may be very different from the environment you perceive.

Different-minded people can illuminate our blindspots.  Across the areas they chart and the paths they navigate lie nuggets – aspects of reality we don’t even know yet that we will come to care about.

Comment by Remmelt Ellen (remmelt-ellen) on Some blindspots in rationality and effective altruism · 2021-03-20T09:47:20.635Z · LW · GW

To disentangle what I had in mind when I wrote ‘later overturned by some applied ML researchers’:

Some applied ML researchers in the AI x-safety research community like Paul Christiano, Andrew Critch, David Krueger, and Ben Garfinkel have made solid arguments towards the conclusion that Eliezer’s past portrayal of a single self-recursively improving AGI had serious flaws.

In the post though, I was sloppy in writing about this particular example, in a way that served to support the broader claims I was making.

Comment by Remmelt Ellen (remmelt-ellen) on Some blindspots in rationality and effective altruism · 2021-03-19T21:48:42.529Z · LW · GW

This resonates, based on my very limited grasp of statistics. 

My impression is that sensitivity analysis aims more at reliably uncovering epistemic uncertainty (whereas Guesstimate as a tool seems to be designed more for working out aleatory uncertainty). 

Quote from interesting data science article on Silver-Taleb debate:

Predictions have two types of uncertainty; aleatory and epistemic.
Aleatory uncertainty is concerned with the fundamental system (probability of rolling a six on a standard die). Epistemic uncertainty is concerned with the uncertainty of the system (how many sides does a die have? And what is the probability of rolling a six?). With the latter, you have to guess the game and the outcome; like an election!

Comment by Remmelt Ellen (remmelt-ellen) on Some blindspots in rationality and effective altruism · 2021-03-19T21:36:10.041Z · LW · GW

Interesting, I didn't know GiveDirectly ran unstructured focus groups, nor that JPAL does qualitative interviews at various stages of testing interventions.  Adds a bit more nuance to my thoughts, thanks! 

Comment by Remmelt Ellen (remmelt-ellen) on Some blindspots in rationality and effective altruism · 2021-03-19T21:29:54.952Z · LW · GW

Sorry, I get how the bullet point example gave that impression. I'm keeping the summary brief, so let me see what I can do. 

I think the culprit is 'overturned'. That makes it sound like their counterarguments were a done deal or something. I'll reword that to 'rebutted and reframed in finer detail'. 

Note though that 'some applied ML researchers' hardly sounds like consensus. I did not mean to convey that, but I'm glad you picked it up.

As far as I can tell, it's a reasonable summary of the fast takeoff position that many people still hold today.

Perhaps, your impression from your circle is different from mine in terms of what proportion of AIS researchers prioritise work on the fast takeoff scenario?

Comment by Remmelt Ellen (remmelt-ellen) on Some blindspots in rationality and effective altruism · 2021-03-19T21:11:42.209Z · LW · GW

I appreciate your thoughtful comment too,  Dan.

You're right I think that I overstated EA's tendency to assume generalisability, particularly when it comes to testing interventions in global health and poverty (though much less so when it comes to research in other cause areas). Eva Vivalt's interview with 80K, and more recent EA Global sessions discussing the limitations of the randomista approach are examples. Some incubated charity interventions by GiveWell also seemed to take a targeted regional approach (e.g. No Lean Season). Also, Ben Kuhn's 'local context plus high standards theory' for Wave. So point taken!

I still worry about EA-driven field experiments relying too much, too quickly on filtering experimental observations through quantitive metrics exported from Western academia. In their local implementation, these metrics may either not track the aspects we had in mind, or just not reflect what actually exists and/or is relevant to people's local context there. I haven't heard yet about EA founders who started out by doing open qualitative fieldwork on the ground (but happy to hear examples!). 

I assume generalisability of metrics would be less of a problem for medical interventions like anti-malaria nets and deworming tablets. But here's an interesting claim I just came across:  

One-size-fits-all doesn’t work and the ways medicine affects people varies dramatically. 

With schistosomiasis we found that fisherfolk, who are the most likely to be infected, were almost entirely absent from the disease programme and they’re the ones defecating and urinating in the water, spreading the disease.

Comment by Remmelt Ellen (remmelt-ellen) on Takeaways from the Intelligence Rising RPG · 2021-03-06T10:36:02.249Z · LW · GW

Do you mean the Game Master’s rules for world development? The basic gameplay rules for participants are outlined in the slides Ross posted above:

Comment by Remmelt Ellen (remmelt-ellen) on Takeaways from the Intelligence Rising RPG · 2021-03-05T11:22:51.301Z · LW · GW

Let me ask Ross.

Comment by Remmelt Ellen (remmelt-ellen) on Delegated agents in practice: How companies might end up selling AI services that act on behalf of consumers and coalitions, and what this implies for safety research · 2020-12-01T11:41:27.346Z · LW · GW

Sure! I'm curious to hear any purposes you thought of that delegated agents could assist with.

Comment by Remmelt Ellen (remmelt-ellen) on Delegated agents in practice: How companies might end up selling AI services that act on behalf of consumers and coalitions, and what this implies for safety research · 2020-11-26T16:36:09.450Z · LW · GW

I'm brainstorming ways this post may be off the mark. Curious if you have any :)

  • You can personalise an AI service across some dimensions that won’t make it more resemble an agent acting on a person’s behalf (or won't meet all criteria of 'agentiness')
    • not acting *over time* - more like a bespoke tool customised once to a customer’s preferred parameters, e.g. a website-builder like
    • an AI service personalising content according to a user’s likes/reads/don't show clicks isn't agent-like
    • efficient personalised services will be built on swappable modules and/or shared repositories of consumer preference components and contexts (meaning that the company never actually runs an independent instantiation of the service)
  • Personalisation of AI services will fall short of delegated agents except in a few niches because of lack of demand or supply
    • a handful of the largest software corporations (FAAMG, etc.) have locked in customers into networks and routines but are held back from personalising customer experiences because they tend to rely on third-party revenue streams
    • it's generally more profitable to specialise in and market a service that caters to either high-paying discerning customers, or a broad mass audience that's basically okay with anything you give them
    • too hard to manage mass customisation or not cost-effective compared to other forms of business innovation
    • humans are already well-adapted and trained for providing personalised services; AI can compete better in other areas
    • humans already have very similar preferences within the space of theoretical possibilities – making catering to individual differences less fruitful than you'd intuitively think
    • it’s easier to use AI to shape users to have more homogenous preferences than to cater to preference differences
    • eliciting human preferences takes up too much of the user's attention and/or runs up against too many possible interpretations (based on assumptions of user's rationality and prior knowledge, as well as relevant contextual cues) to work
    • you can make more commercial progress by designing and acclimatising users to a common interface that allows those users to meet their diverging preferences themselves (than to design AI interfaces that elicits the users' preferences and acts on their behalf)
    • software engineers need a rare mix of thing- and person-oriented skills to develop delegated agents
    • a series of bad publicity incidents impede further development (analogous to self-driving car crashes)
    • data protection or anonymisation laws in Europe and beyond limit personalisation efforts (or further down the line, restrictions on autonomous algorithms do)
    • doesn’t fit current zeitgeist somehow in high-income nations
  • Research directions aren't priorities
    • Advances in preference learning will be used for other unhelpful stuff (just read Andrew Critch's post)
    • Research on how much influence delegated agents might offer can, besides being really speculative, be misused or promote competitive dynamics
  • Context assumptions:
    • Delegated agents will be developed first inside say military labs (or other organisational structures in other places) that involve meaningfully dissimilar interactions than at a Silicon Valley start-up.
    • Initial contexts in which delegated agents are produced and used really don’t matter for how AI designs are deployed in later decades (something like, it’s overdetermined)
  • Conceptual confusion:
    • Terms in this post are ambiguous or used to refer to different things (e.g. general AI 'tasks' vs. 'tasks' humans conceive and act on, 'service' infrastructure vs. online 'service' aimed at human users, 'virtual assistant' conventionally means a remote human assistant, 'model')
    • An ‘AI agent’ is a vague, leaky concept that should be replaced with more exacting dimensions and mechanisms
    • Carving out humans and algorithms into separate individuals with separate ‘preferences’ is a fundamentally impoverished notion. This post assumes that perspective and therefore fosters mistaken/unskillful reasoning.
Comment by Remmelt Ellen (remmelt-ellen) on Delegated agents in practice: How companies might end up selling AI services that act on behalf of consumers and coalitions, and what this implies for safety research · 2020-11-26T16:02:14.088Z · LW · GW

I brainstormed (ways this post may be off the mark) []

Comment by Remmelt Ellen (remmelt-ellen) on Delegated agents in practice: How companies might end up selling AI services that act on behalf of consumers and coalitions, and what this implies for safety research · 2020-11-26T16:01:34.091Z · LW · GW

I brainstormed ways this post may be off the mark:

Comment by Remmelt Ellen (remmelt-ellen) on The Values-to-Actions Decision Chain · 2018-07-05T12:17:57.020Z · LW · GW

Ah, I have the first diagram in your article as one of my desktop backgrounds. :-) It was a fascinating demonstration of how experiences can be built up into more complex frameworks (even though I feel I only half-understand it). It was one of several articles that inspired and moulded my thinking in this post.

I'd value having a half-an-hour Skype chat with you some time. If you're up for it, feel free to schedule one here.

Comment by Remmelt Ellen (remmelt-ellen) on The Values-to-Actions Decision Chain · 2018-07-03T09:47:53.222Z · LW · GW

So, I do find it fascinating to analyse how multi-layered networks of agents interact and how those interactions can be improved to better reach goals together. My impression is also that it’s hard to make progress in (otherwise several simple coordination problems would already have been solved) and I lack expertise in network science, complexity science, multi-agent systems or microeconomics. I haven’t set out a clear direction but I do find your idea of making this into a larger project inspiring.

I’ll probably work on gathering more emperical data over time to overhaul any conclusions I came to in this article and gain a more fine-grained understanding how people interact in the EA community. When I happen to make some creative connections between concepts again, I’ll start writing those up. :-)

I think I’ll also write a case study in the next months that examines one possible implication of this model (e.g. local group engagement) in a more detailed, balanced way (for the strategic implications I wrote about in this post, I leant towards being concise and activating people to think about them instead of examining a bunch of separate data sources dispassionately).

Comment by Remmelt Ellen (remmelt-ellen) on The Values-to-Actions Decision Chain · 2018-07-03T08:50:51.522Z · LW · GW

Thanks for mentioning this!

Let me think about your question for a while. Will come back on it later.

Comment by Remmelt Ellen (remmelt-ellen) on AI Safety Research Camp - Project Proposal · 2018-02-03T08:25:18.263Z · LW · GW

Thanks for mentioning it.

If later you happen to see a blind spot or a failure mode we should work on covering, we'd like to learn about it!

Comment by Remmelt Ellen (remmelt-ellen) on AI Safety Research Camp - Project Proposal · 2018-02-03T08:21:53.309Z · LW · GW

Do you mean for the Gran Canaria camp?

We're also working towards a camp 2.0 in late July in the UK. I assume that's during summer break for you.

Comment by Remmelt Ellen (remmelt-ellen) on "Taking AI Risk Seriously" (thoughts by Critch) · 2018-02-01T18:54:09.337Z · LW · GW

Great, let me throw together a reply to your questions in reverse order. I've had a long day and lack the energy to do the rigorous, concise write-up that I'd want to do. But please comment with specific questions/criticisms that I can look into later.

What is the thought process behind their approach?

RAISE (copy-paste from slightly-promotional-looking wiki):

AI safety is a small field. It has only about 50 researchers. The field is mostly talent-constrained. Given the dangers of an uncontrolled intelligence explosion, increasing the amount of AIS researchers is crucial for the long-term survival of humanity.

Within the LW community there are plenty of talented people that bear a sense of urgency about AI. They are willing to switch careers to doing research, but they are unable to get there. This is understandable: the path up to research-level understanding is lonely, arduous, long, and uncertain. It is like a pilgrimage. One has to study concepts from the papers in which they first appeared. This is not easy. Such papers are undistilled. Unless one is lucky, there is no one to provide guidance and answer questions. Then should one come out on top, there is no guarantee that the quality of their work will be sufficient for a paycheck or a useful contribution.

The field of AI safety is in an innovator phase. Innovators are highly risk-tolerant and have a large amount of agency, which allows them to survive an environment with little guidance or supporting infrastructure. Let community organisers not fall for the typical mind fallacy, expecting risk-averse people to move into AI safety all by themselves. Unless one is particularly risk-tolerant or has a perfect safety net, they will not be able to fully take the plunge. Plenty of measures can be made to make getting into AI safety more like an "It's a small world"-ride:

  • Let there be a tested path with signposts along the way to make progress clear and measurable.
  • Let there be social reinforcement so that we are not hindered but helped by our instinct for conformity.
  • Let there be high-quality explanations of the material to speed up and ease the learning process, so that it is cheap.

AI Safety Camp (copy-paste from our proposal, which will be posted on LW soon):

Aim: Efficiently launch aspiring AI safety and strategy researchers into concrete productivity by creating an ‘on-ramp’ for future researchers.


  1. Get people started on and immersed into concrete research work intended to lead to papers for publication.
  2. Address the bottleneck in AI safety/strategy of few experts being available to train or organize aspiring researchers by efficiently using expert time.
  3. Create a clear path from ‘interested/concerned’ to ‘active researcher’.
  4. Test a new method for bootstrapping talent-constrained research fields.

Method: Run an online research group culminating in a two week intensive in-person research camp.

(our plans is test our approach in Gran Canaria on 12 April, for which we're taking in applications right now, and based on our refinements, organise a July camp at the planned EA Hotel in the UK)

What material do these groups cover?

RAISE (from the top of my head)

The study group has finished writing video scripts on the first corrigibility unit for the online course. It has now split into two to work on the second unit:

  1. group A is learning about reinforcement learning using this book
  2. group B is writing video scripts on inverse reinforcement learning

Robert Miles is also starting to make the first video of the first corrigibility unit (we've allowed ourselves to get delayed too much in actually publishing and testing material IMO). Past videos we've experimented with include a lecture by Johannes Treutin from FRI and Rupert McCallum giving lectures on corrigibility.

AI Safety Camp (copy-paste from proposal)

Participants will work in groups on tightly-defined research projects on the following topics:

  • Agent foundations
  • Machine learning safety
  • Policy & strategy
  • Human values

Projects will be proposed by participants prior to the start of the program. Expert advisors from AI Safety/Strategy organisations will help refine them into proposals that are tractable, suitable for this research environment, and answer currently unsolved research questions. This allows for time-efficient use of advisors’ domain knowledge and research experience, and ensures that research is well-aligned with current priorities.

Participants will then split into groups to work on these research questions in online collaborative groups over a period of several months. This period will culminate in a two week in-person research camp aimed at turning this exploratory research into first drafts of publishable research papers. This will also allow for cross-disciplinary conversations and community building. Following the two week camp, advisors will give feedback on manuscripts, guiding first drafts towards completion and advising on next steps for researchers.

Who's running them and what's their background?

Our two core teams mostly consist of young European researchers/autodidacts who haven't published much on AI safety yet (which does risk us not knowing enough about the outcomes we're trying to design for others).

RAISE (from the top of my head):

Toon Alfrink (founder, coordinator): AI bachelor student, also organises LessWrong meetups in Amsterdam.

Robert Miles (video maker): Runs a relatively well-known YouTube channel advocating careully for AI safety.

Veerle de Goederen (oversees preqs study group): Finished a Biology bachelor (and has been our most reliable team member)

Johannes Heidecke (oversees the advanced study group): Master student, researching inverse reinforcement learning in Spain.

Remmelt Ellen (planning coordinator): see below.

AI Safety Camp (copy-paste from proposal)

Remmelt Ellen Remmelt is the Operations Manager of Effective Altruism Netherlands, where he coordinates national events, supports organisers of new meetups and takes care of mundane admin work. He also oversees planning for the team at RAISE, an online AI Safety course. He is a Bachelor intern at the Intelligent & Autonomous Systems research group. In his spare time, he’s exploring how to improve the interactions within multi-layered networks of agents to reach shared goals – especially approaches to collaboration within the EA community and the representation of persons and interest groups by negotiation agents in sub-exponential takeoff scenarios.

Tom McGrath Tom is a maths PhD student in the Systems and Signals group at Imperial College, where he works on statistical models of animal behaviour and physical models of inference. He will be interning at the Future of Humanity Institute from Jan 2018, working with Owain Evans. His previous organisational experience includes co-running Imperial’s Maths Helpdesk and running a postgraduate deep learning study group.

Linda Linsefors Linda has a PhD in theoretical physics, which she obtained at Université Grenoble Alpes for work on loop quantum gravity. Since then she has studied AI and AI Safety online for about a year. Linda is currently working at Integrated Science Lab in Umeå, Sweden, developing tools for analysing information flow in networks. She hopes to be able to work full time on AI Safety in the near future.

Nandi Schoots Nandi did a research master in pure mathematics and a minor in psychology at Leiden University. Her master was focused on algebraic geometry and her thesis was in category theory. Since graduating she has been steering her career in the direction of AI safety. She is currently employed as a data scientist in the Netherlands. In parallel to her work she is part of a study group on AI safety and involved with the reinforcement learning section of RAISE.

David Kristoffersson David has a background as R&D Project Manager at Ericsson where he led a project of 30 experienced software engineers developing many-core software development tools. He liaised with five internal stakeholder organisations, worked out strategy, made high-level technical decisions and coordinated a disparate set of subprojects spread over seven cities on two different continents. He has a further background as a Software Engineer and has a BS in Computer Engineering. In the past year, he has contracted for the Future of Humanity Institute, and has explored research projects in ML and AI strategy with FHI researchers.

Chris Pasek After graduating from mathematics and theoretical computer science, Chris ended up touring the world in search of meaning and self-improvement, and finally settled on working as a freelance researcher focused on AI alignment. Currently also running a rationalist shared housing project on the tropical island of Gran Canaria and continuing to look for ways to gradually self-modify in the direction of a superhuman FDT-consequentialist.

Mistake: I now realise that by not mentioning that I'm involved with both may resemble a conflict of interest – I had removed 'projects I'm involved with' from my earlier comment before posting it to keep it concise.

Comment by Remmelt Ellen (remmelt-ellen) on "Taking AI Risk Seriously" (thoughts by Critch) · 2018-01-30T19:40:43.273Z · LW · GW

If you're committed to studying AI safety but have little money, here are two projects you can join (do feel free to add other suggestions):

1) If you want to join a beginners or advanced study group on reinforcement learning, post here in the RAISE group.

2) If you want to write research in a group, apply for the AI Safety Camp in Gran Canaria on 12-22 April.