Research ideas (AI Interpretability & Neurosciences) for a 2-months project

post by flux (FoxYou) · 2023-01-08T15:36:12.984Z · LW · GW · No comments

This is a question post.

Contents

  Answers
    2 Jon Garcia
None
No comments

My university (EPFL) organizes a "Summer in the Lab" internship for interested Bachelor students. The idea is to send them to a lab for a ~2-months period, so they can begin engaging with the research community and develop research skills. I have been selected for the internship and the Alexander Mathis Lab accepted me. They use a mix of neuroscience and AI/ML in their research. (e.g. their most famous paper: DeepLabCut: markerless pose estimation of user-defined body parts with deep learning)

As I am interested in studying AI interpretability later in my career, I am now looking for project ideas that combine this with neuroscience. I read a bit of research in this area, such as what is depicted in the Intro to brain-like AGI post series, or the Shard theory.

My question is: which ideas (in the theories above or any others) you think are worth diving into for this 2-months research internship ? What could help advance towards a safer future (even though it is at best a very small step)?

Answers

answer by Jon Garcia · 2023-01-08T20:55:05.336Z · LW(p) · GW(p)

Since two months is not a very long time to complete a research project, and I don't know what lab resources or datasets you have access to, it's a bit difficult to answer this.

It would be great if you could do something like build a model of human value formation based on the interactions between the hypothalamus, VTA, nucleus accumbens, vmPFC, etc. Like, how does the brain generalize its preferences from its gene-coded heuristic value functions? Can this inform how you might design RL systems that are more robust against reward misspecification?

Again, I doubt you can get beyond a toy model in the two months, but maybe you can think of something you can do related to the above.

No comments

Comments sorted by top scores.