How to do conceptual research: Case study interview with Caspar Oesterheld 2024-05-14T15:09:30.390Z
(When) Should you work through the night when inspiration strikes you? 2024-04-23T21:07:06.858Z
Slim overview of work one could do to make AI go better (and a grab-bag of other career considerations) 2024-03-20T23:17:52.964Z
AI things that are perhaps as important as human-controlled AI (Chi version) 2024-03-03T18:07:24.291Z
Evidential Cooperation in Large Worlds: Potential Objections & FAQ 2024-02-28T18:58:25.688Z
Everett branches, inter-light cone trade and other alien matters: Appendix to “An ECL explainer” 2024-02-24T23:09:27.147Z
Cooperating with aliens and AGIs: An ECL explainer 2024-02-24T22:58:47.345Z
Complexity of value but not disvalue implies more focus on s-risk. Moral uncertainty and preference utilitarianism also do. 2024-02-23T06:10:05.881Z
List of how people have become more hard-working 2023-09-29T11:30:38.802Z
What's your standard for good work performance? 2023-09-27T16:58:16.114Z
How have you become more hard-working? 2023-09-25T12:37:39.860Z
Probably tell your friends when they make big mistakes 2023-06-01T14:30:35.579Z
[inactive] £2000 bounty - contraceptives (and UTI) literature review 2021-09-15T22:37:13.356Z
My Understanding of Paul Christiano's Iterated Amplification AI Safety Research Agenda 2020-08-15T20:02:00.205Z
English summaries of German Covid-19 expert podcast 2020-04-08T20:29:25.221Z


Comment by Chi Nguyen on OpenAI: Exodus · 2024-05-20T14:12:50.515Z · LW · GW

Greg Brockman and Sam Altman (cosigned):
First, we have raised awareness of the risks and opportunities of AGI so that the world can better prepare for it. We’ve repeatedly demonstrated the incredible possibilities from scaling up deep learning

chokes on coffee

Comment by Chi Nguyen on OpenAI: Exodus · 2024-05-20T13:50:10.605Z · LW · GW

From my point of view, of course profit maximizing companies will…maximize profit. It never was even imaginable that these kinds of entities could shoulder such a huge risk responsibly.

Correct me if I'm wrong but isn't Conjecture legally a company? Maybe their profit model isn't actually foundation models? Not actually trying to imply things, just thought the wording was weird in that context and was wondering whether Conjecture has a different legal structure than I thought.

Comment by Chi Nguyen on OpenAI: Exodus · 2024-05-20T13:30:53.940Z · LW · GW

minus Cullen O’Keefe who worked on policy and legal (so was not a clear cut case of working on safety),


I think Cullen was on the same team as Daniel (might be misremembering), so if you count Daniel, I'd also count Cullen. (Unless you wanna count Daniel because he previously was more directly part of technical AI safety research at OAI.)

Comment by Chi Nguyen on How to do conceptual research: Case study interview with Caspar Oesterheld · 2024-05-16T11:37:04.653Z · LW · GW

Yes! Edited the main text to make it clear

Comment by Chi Nguyen on How to do conceptual research: Case study interview with Caspar Oesterheld · 2024-05-15T13:02:28.952Z · LW · GW

The "entity giving the payout" in practice for ECL would be just the world states you end up in and requires you to care about the environment of the person you're playing the PD with.

So, defecting might be just optimising my local environment for my own values and cooperating would be optimising my local environment for some aggregate of my own values and the values of the person I'm playing with. So, it only works if there are positive-sum aggregates and if each player cares about what the other does to their local environment.

Comment by Chi Nguyen on Which skincare products are evidence-based? · 2024-05-03T18:34:26.728Z · LW · GW

I watched and read a ton of Lab Muffin Beauty Science when I got into skincare. Apart from Sunscreen, I think a lot of it is trial and error with what has good short-term effects. I'm not sure about long-term effects at all tbh. Lab Muffin Beauty Science is helpful for figuring out your skin type, leads for which products to try first, and how to use them. (There's a fair number of products you wanna ramp up slowly and even by the end only use on some days.)

Comment by Chi Nguyen on Please stop publishing ideas/insights/research about AI · 2024-05-02T18:13:19.988Z · LW · GW

Are there types of published alignment research that you think were (more likely to be) good to publish? If so, I'd be curious to see a list.

Comment by Chi Nguyen on (When) Should you work through the night when inspiration strikes you? · 2024-04-23T21:10:33.367Z · LW · GW

Agree-vote: I generally tend to choose work over sleep when I feel particularly inspired to work.

Disagree-vote: I generally tend to choose to sleep over work when even when I feel particularly inspired to work.

Any other reaction, new answer or comment, or no reaction of any kind: Neither of the two descriptions above fit.

I considered making four options to capture the dimension of whether you endorse your behaviour or not but decided against it. Feel free to supplement this information.

Comment by Chi Nguyen on Cooperating with aliens and AGIs: An ECL explainer · 2024-03-01T20:23:58.805Z · LW · GW

Interesting. The main thing that pops out for me is that it feels like your story is descriptive while we try to be normative? I.e. it's not clear to me from what you say whether you would recommend to humans to act in this cooperative way towards distant aliens, but you seem to expect that they will do/are doing so. Meanwhile, I would claim that we should act cooperatively in this way but make no claims about whether humans actually do so.

Does that seem right to you or am I misunderstanding your point?

Comment by Chi Nguyen on Cooperating with aliens and AGIs: An ECL explainer · 2024-03-01T00:26:51.148Z · LW · GW

Letting on-lookers know that I responded in this comment thread

Comment by Chi Nguyen on Evidential Cooperation in Large Worlds: Potential Objections & FAQ · 2024-03-01T00:14:47.102Z · LW · GW

I'm not sure I understand exactly what you're saying, so I'm just gonna write some vaguely related things to classic acausal trade + ECL:


I'm actually really confused about the exact relationship between "classic" prediction-based acausal trade and ECL. And I think I tend to think about them as less crisply different than others. I've tried to unconfuse myself about that for a few hours some months ago and just ended up with a mess of a document. Some intuitive way to differentiate them:

  • ECL leverages the correlation between you and the other agent "directly."
  • "Classic" prediction-based acausal trade leverages the correlation between you and the other agent's prediction of you. (Which, intuitively, they are less in control of than their decision-making.

--> This doesn't look like a fundamental difference between the mechanisms (and maybe there are in-betweeners? But I don't know of any set-ups) but makes a difference in practice or something?


On the recursion question:

I agree that ECL has this whole "I cooperate if I think that makes it more likely that they cooperate", so there's definitely also some prediction flavoured thing going on and often, the deliberation about whether they'll be more likely to cooperate when you do will include "they think that I'm more likely to cooperate if they cooperate". So it's kind of recursive.

Note that ECL at least doesn't strictly require that. You can in principle do ECL with rocks "My world model says that conditioning on me taking action X, the likelihood of this rock falling down is higher than if I condition on taking action Y." Tbc, if action X isn't "throw the rock" or something similar, that's a pretty weird world model.  You probably can't do "classic" acausal trade with rocks?


Some more not well-in-order not thought-out somewhat incoherent thinking-out-loud random thoughts and intuitions:

More random and less coherent: Something something about how when you think of an agent using some meta-policy to answer the question "What object-level policy should I follow?", there's some intuitive sense in which ECL is recursive in the meta-policy while "classic" acausal trade is recursive in the object-level policy. I'm highly skeptical of this meta-policy object-level policy thing making sense though and also not confident in what I said about which type of trade is recursive in what.

Another intuitive difference is that with classic acausal trade, you usually want to verify whether the other agent is cooperating. In ECL you don't. Also, something something about how it's great to learn a lot about your trade partner for classic acausal trade and it's bad for ECL? (I suspect that there's nothing actually weird going on here and that this is because it's about learning different kinds of things. But I haven't thought about it enough to articulate the difference confidently and clearly.)

The concept of commitment race doesn't seem to make much sense when thinking just about ECL and maybe nailing down where the difference comes from is interesting?

Comment by Chi Nguyen on Evidential Cooperation in Large Worlds: Potential Objections & FAQ · 2024-02-29T23:59:27.884Z · LW · GW

Thanks! I actually agree with a lot of what you say. Lack of excitement about existing intervention ideas is part of the reason why I'm not all in on this agenda at the moment. Although in part I'm just bottlenecked by lack of technical expertise (and it's not like people had great ideas for how to align AIs at the beginning of the field...), so I don't want people to overupdate from "Chi doesn't have great ideas."

With that out of the way, here are some of my thoughts:

  • We can try to prevent silly path-dependencies in (controlled or uncontrolled i.e. misaligned) AIs. As a start, we can use DT benchmarks to study how DT endorsements and behaviour change under different conditions and how DT competence scales with size compared to other capabilities. I think humanity is unlikely to care a ton about AI's DT views and there might be path-dependencies. So like, I guess I'm saying I agree with "let's try to make the AI philosophically competent."
    • This depends a lot on whether you think there are any path-dependencies conditional on ~solving alignment. Or if humanity will, over time, just be wise enough to figure everything out regardless of the starting point.
    • One source of silly path-dependencies is if AIs' native DT depends on the training process and we want to de-bias against that. (See for example this or this for some research on what different training processes should incentivise.) Honestly, I have no idea how much things like that matter. Humans aren't all CDT even though my very limited understanding of evolution is that it should, in the limit, incentivise CDT.
    • I think depending on what you think about the default of how AIs/AI-powered earth-originating civilisation will arrive at conclusions about ECL, you might think some nudging towards the DT views you favour is more or less justified. Maybe we can also find properties of DTs that we are more confident in (e.g. "does this or that in decision problem X" than whole specified DTs, which, yeah, I have no clue. Other than "probably not CDT."
  • If the AI is uncontrolled/misaligned, there are things we can do to make it more likely it is interested in ECL, which I expect to be net good for the agents I try to acausally cooperate with. For example, maybe we can make misaligned AI's utility function more likely to have diminishing returns or do something else that would make its values more porous. (I'm using the term in a somewhat broader way than Bostrom.)
    • This depends a lot on whether you think we have any influence over AIs we don't fully control.
  • It might be important and mutable that future AIs don't take any actions that decorrelate them with other agents (i.e. does things that decrease the AI's acausal influence) before they discover and implement ECL. So, we might try to just make it aware of that early.
    • You might think that's just not how correlation or updatelessness work, such that there's no rush. Or that this is a potential source of value loss but a pretty negligible one.
  • Things that aren't about making AIs more likely to do ECL: Something not mentioned, but there might be some trades that we have to do now. For example, maybe ECL makes it super important to be nice to AIs we're training. (I am mostly lean no on this question (at least for "super important") but it's confusing.) I also find it plausible we want to do ECL with other pre-ASI civilisations who might or might not succeed at alignment and, if we succeed and they fail, part-optimise for their values. It's unclear to me whether this requires us to get people to spiritually commit to this now before we know whether we'll succeed at alignment or not. Or whether updatelessness somehow sorts this because if we (or the other civ) were to succeed at alignment, we would have seen that this is the right policy, and done this retroactively.
Comment by Chi Nguyen on Cooperating with aliens and AGIs: An ECL explainer · 2024-02-27T23:27:43.072Z · LW · GW

Yeah, you're right that we assume that you care about what's going on outside the lightcone! If that's not the case (or only a little bit the case), that would limit the action-relevance of ECL.

(That said, there might be some weird simulations-shenanigans or cooperating with future earth-AI that would still make you care about ECL to some extent although my best guess is that they shouldn't move you too much. This is not really my focus though and I haven't properly thought through ECL for people with indexical values.)

Comment by Chi Nguyen on Complexity of value but not disvalue implies more focus on s-risk. Moral uncertainty and preference utilitarianism also do. · 2024-02-26T03:27:45.856Z · LW · GW

Whoa, I didn't know about this survey, pretty cool! Interesting results overall.

It's notable that 6% of people also report they'd prefer absolute certainty of hell over not existing, which seems totally insane from the point of view of my preferences. The 11% that prefer a trillion miserable sentient beings over a million happy sentient beings also seems wild to me. (Those two questions are also relatively more correlated than the other questions.)

Comment by Chi Nguyen on How have you become more hard-working? · 2023-12-02T15:40:28.044Z · LW · GW

Thanks, I hadn't actually heard of this one before!

edit: Any takes on addictiveness/other potential side effects so far?

Comment by Chi Nguyen on List of how people have become more hard-working · 2023-10-01T12:39:03.193Z · LW · GW

First of all:  Thanks for asking. I was being lazy with this and your questions forced me to come up with a response which forced me to actually think about my plan.

Concrete changes

1) I'm currently doing week-daily in-person Pomodoro co-working with a friend, but I had planned that before this post IIRC, and definitely know for a while that that's a huge boost for me.

In-person co-working and the type of work I do seem somewhat situational/hard to sustain/hard to quickly change sometimes. For some reason, (perhaps because I feel a bit meh about virtual co-working) I've never tried Focusmate and this made me more likely to try it in the future if and when my in-person co-working fizzled out.

2) The things that were a high mix of resonating with me and new were "Identifying as hard-working" and "Finding ways of reframing work as non-work" (I was previously aware that often things would be fun if I didn't think of them as work and are "Ugh" as soon as there are work, but just knowing that there is another person who is successfully managing this property of theirs is really encouraging and helpful for thinking about solutions to this.)

Over the last few months, I've introduced the habit of checking in with myself at various times during the day and especially when I'm struggling with something (kind of spontaneous mini meditations). I'm hoping that I can piggy-back on that to try out the identity and reframing thing. (Although this comment just prompted be to actually go and write those down on post-its and hang them where I can see them, so I don't forget, so thanks for asking!)

3) I am currently testing out having a productive hobby for my weekends. (This ties into not reframing work things as "not work".  Also, I am often strict with my weekends in a way that I wanna experiment with relaxing given one of the responses I got.  Also prompted by the concept of doing something enjoyable and rewarding to regenerate instead of resting.) I'll monitor the effects on my mental health on that quite closely because I think it could end up quite badly but has been fun this weekend.

3.5) I often refrain from doing work things I feel energy and motivation for because it's too late in the day or otherwise "not work-time". I think this overall serves me well in various ways. But as a result of this post, I am more likely to try relaxing this into the future a bit. I am already tracking my work and sleep hours, so hopefully, that will give me some basis to check how it affects my productivity.  (And also 4 will hopefully help.)

4) Not directly as a consequence of this post, but related: I started thinking about how to set work targets for different time intervals and consistently setting and reviewing work targets. (It was kind of crazy to realise that I don't already do this! Plans ≠ Targets.) This is a priority for me at the moment and I am interviewing people about this. I expect this to feed into this whole hard-working topic and maybe some of the responses about working hard will influence how I go about this.

Other minor updates or things that I won't try immediately but that I'm more likely to try in the future now:

  • Decided not to prioritise improving diet, exercise, and sleep for the sake of becoming more hard-working.
  • Not being frustrated that there is no magical link: general growth as a person --> more hard-working
  • Maybe: Using the Freedom App (I've made good experiences with Cold Turkey but it's not on my phone.)
  • Maybe: Doing more on paper
  • Maybe: Kanban boards
  • Maybe: Meetings with myself
  • Maybe: Experiment with stimulants (I can get them prescribed but dropped them for various reasons)

Some overall remarks

My biggest update was just learning about people permanently becoming more hard-working at all well into their 20s through means that aren't only either meds or changing roles, meaning there is a point to me trying more non-med things that might increase how hard-working I am in the short-term. Previously, I was really unsure to which degree hard-workingness might just be a very stable trait across a lifetime. At least if you don't drastically change the kind of work you do or your work environment in ways that are difficult to actually pull off.  Tbf, I'm still not sure but am more hopeful than previously.

From that point of view, I found the people who mentioned having a concrete, time-constrained period where they were much more hard-working than previously for some reason and then keeping this going forward even when ~everything about their work situation changed really encouraging.

For context: I tracked my work hours for roughly a year. My week-to-week tends to be very heterogenous and through the tracking, I realised that none of the things I tracked during that year seemed to have any relationship to how much I work week-to-week other than having hard "real" deadlines and the overall trend was very flat, which felt a bit discouraging.

Comment by Chi Nguyen on How have you become more hard-working? · 2023-09-27T12:59:33.866Z · LW · GW

Thank you. This answer was both insightful and felt like a warm hug somehow.

Comment by Chi Nguyen on The Seeker’s Game – Vignettes from the Bay · 2023-07-10T17:17:39.119Z · LW · GW

Thanks for posting this! I really enjoyed the read.


Feedback on the accompanying poll: I was going to fill it out. Then saw that I have to look up and list the titles I can (not) relate to instead of just being able to click "(strongly) relate/don't relate" on a long list of titles. (I think the relevant function for this in forms is "Matrix" or something). And my reaction was "ugh, work". I think I might still fill it in but I'm muss less likely to. If others feel the same, maybe you wanna change the poll?

Comment by Chi Nguyen on My Model Of EA Burnout · 2023-01-30T18:25:33.167Z · LW · GW

I find this comment super interesting because

a) before, I would have expected many more people to be scared of being eaten by piranhas on LessWrong and not the EA Forum than vice versa. In fact, I didn't even consider that people could find the EA Forum more scary than LessWrong. (well, before FTX anyway)

b) my current read of the EA Forum (and this has been the case for a while) is that forum people like when you say something like "People should value things other than impact (more)" and that you're more likely to be eaten by piranhas for saying "People should value impact more" than vice versa.

Take this a slight nudge towards posting on the EA Forum perhaps, although I don't really have an opinion on whether 2) and 3) might still be true.

Comment by Chi Nguyen on Open & Welcome Thread October 2021 · 2021-10-16T20:32:51.310Z · LW · GW

edit: We're sorted :)


Hello, I'm Chi, the friend, in case you wanna check out my LessWrong, although my EA forum account probably says more. Also, £50 referral bonus if you refer a person we end up moving in with!

Also, we don't really know whether the Warren Street place will work out but are looking for flatmates either way. Potential other accommodation would likely be in N1, NW1, W1, or WC1

Comment by Chi Nguyen on [inactive] £2000 bounty - contraceptives (and UTI) literature review · 2021-09-19T22:46:40.613Z · LW · GW

Hi, thanks for this comment and the links.

I agree that it's a pretty vast topic. I agree that the questions are personalized in the sense that there are many different personal factors to this question, although the bullets I listed weren't actually really personalized to me. One hope I had with posting to LessWrong was that I trust people here to be able to do some of the "what's most relevant to include" thinking, (e.g.: everything that affects ≥10% of women between 20 and 40 + everything that's of more interest on LessWrong than elsewhere (e.g. irreversible contraception)) I agree it's a tall order though.

For talking to my doctor: I found my experience of talking to doctors pretty frustrating to be honest. I think I've learned much more about contraception (including about where my doctors were misinformed) via the internet or friends than doctors. I don't doubt that there are excellent doctors out there, but it's difficult to find them. The advice with looking up people who wrote up medical guidelines seems solid.

That being said, while I'm interested in the topic myself, I was mostly thinking that it would be good for the LessWrong/EA community to have a reliable source. (I'm mostly constrained to hormonal contraception and have already tried out a couple, so my remaining search space is relatively small.) I think it could save lots of women many hours of research into which contraception to take + productivity loss from trying out or permanently choosing suboptimal contraception.

You prompted me to try out the D-Mannose, thanks! I've had it lying around, but was always to inert to research whether it actually works, so never bothered to take it.

Comment by Chi Nguyen on [inactive] £2000 bounty - contraceptives (and UTI) literature review · 2021-09-19T22:34:17.743Z · LW · GW

Sorry for replying so late! I was quite busy this week.

  • I initially wanted to commission someone and expected that I'd have to pay 4 digits. Someone suggested I put down a bounty. I'm not familiar with putting bounties on things and I wanted to avoid getting myself in a situation where I feel like I have to pay the full amount for
    • work that's poor
    • work that's decent but much less detailed than I had envisioned
    • multiple reports each
  • I think I'm happy to pay the full amount for a report that is
    • transparent in its reasoning, so I can trust it,
    • tells me how much to trust study results, e.g., describes potential flaws and caveats for the studies they looked at,
    • roughly on the level of detail that's indicated by what I wrote under "the type of content I would like to see included". Ideally, the person writing wouldn't treat my list as a shopping list, but use their common sense to include the things I'd be interested in
    • the only report of this type that claims the bounty
  • The first two are the most important ones. (And the last one is weird) If It's much less detailed, but fulfills the other criteria, I'd still be happy to pay triple digit.
  • As you're later comment says, I think this is a pretty complex topic, and I can imagine that £2000 wouldn't actually cover the work needed to do such a report well.

I think before someone seriously puts time into this, they should probably just contact me. Both to spare awkward double work + submissions. And to set expectations on the payment. I'll edit my post to be clearer on this.

Comment by Chi Nguyen on Open & Welcome Thread September 2021 · 2021-09-15T22:13:58.571Z · LW · GW

Thanks! I felt kind of sheepish about making a top-level post/question out of this but will do so now. Feel free to delete my comment here if you think that makes sense.

Comment by Chi Nguyen on Open & Welcome Thread September 2021 · 2021-09-15T15:46:23.801Z · LW · GW

I would like if there was a well-researched LessWrong post on the pros and cons of different contraceptives. - Same deal with a good post on how to treat or prevent urinary tract infection, although I'm less excited about that.

  • I'd be willing to pay some from my private money for this to get done. Maybe up to £1000? Open to considering higher amounts.
  • It would mostly be a public service as I'm kind of fine with my current contraception. So, I'm also looking for people to chip in (either to offer more money or just to take some of the monetary burden off me!)

Examples of content that I would like to see included:

  • Clarity on the contraception and depression question. e.g. apparently theory says that hormonal IUDs should give you less depression risk than pills, but in empirical studies it looks like it's the other way around? Can I trust the studies?
  • Some perspective on the trade-offs involved. E.g. maybe I can choose between a 5% increased chance of depression vs. a 100% increased chance of blood clots. But maybe basically no one gets blood clots anyway, and then I'd rather take the increased blood clot risk! But because the medical system cares more about death than me, my doctor will never recommend me the blood clot one, or something like that.
  • If there wasn't already a post on this (but I think there is), info on that it's totally fine to *not* take 7 day pill breaks every months, but that you can just take the pill all the time. (Although I think it might be recommended to take a short break every X months)
  • Some realistic outlook on how much pain and effects on menstruation I should expect
  • Various potential benefits from contraceptives aside from contraception
  • On the UTI side: Is the cranberry stuff a myth or is it a myth that it's a myth or is it a myth that it's a myth that it's a myth?

Alternatively: If there actually already are really good resources on this topic out there, please let me know!

Comment by Chi Nguyen on My Understanding of Paul Christiano's Iterated Amplification AI Safety Research Agenda · 2020-09-29T08:43:00.126Z · LW · GW

Thanks! I already mention this in the post, but just wanted to clarify that Paul only read the first third/half (wherever his last comment is) in case people missed that and mistakenly take the second half at face value.

Edit: Just went back to the post and noticed I don't really clearly say that.

Comment by Chi Nguyen on My Understanding of Paul Christiano's Iterated Amplification AI Safety Research Agenda · 2020-09-03T13:57:48.937Z · LW · GW

Hey, thanks for the question! And I'm glad you liked the part about AGZ. (I also found this video by Robert Miles extremely helpful and accessible to understand AGZ)


This seems speculative. How do you know that a hypothetical infinite HCH tree does not depend the capabilities of the human?

Hm, I wouldn't say that it doesn't depend on the capabilities of the human. I think it does, but it depends on the type of reasoning they employ and not e.g. their working memory (to the extent that the general hypothesis of factored cognition holds that we can successfully solve tasks by breaking them down into smaller tasks.)

HCH does not depend on the starting point (“What difficulty of task can Rosa solve on her own?”)

The way to best understand this is maybe to think in terms of computation/time to think. What kind of tasks Rosa can solve obviously depends a lot on how much computation/time they have to think about it. But for the final outcome of HCH, it shouldn't matter if we half the computation/time the first node has (at least down to a certain level of computation/time) since the next lower node can just do the thinking that the first node would have done with more time. I guess this assumes that the way the first node would benefit from more time would be making more quantitative progress as opposed to qualitative progress. (I think I tried to capture quality with 'type of reasoning process'.)


Sorry, this answer is a bit rambly, I can spend some more time on an answer if this doesn't make sense! (There's also a good chance this doesn't make sense because it just doesn't make sense/I misunderstand stuff and not just because I explain it poorly)

Comment by Chi Nguyen on My Understanding of Paul Christiano's Iterated Amplification AI Safety Research Agenda · 2020-08-17T22:49:32.798Z · LW · GW

Thanks for the comment and I'm glad you like the post :)

On the other topic: I'm sorry, I'm afraid I can't be very helpful here. I'd be somewhat surprised if I'd have had a good answer to this a year ago and certainly don't have one now.

Some cop-out answers:

  • I often found reading his (discussions with others in) comments/remarks about corrigibility in posts focused on something else more useful to find out if his thinking changed on this than his blog posts that were obviously concentrating on corrigibility
  • You might have some luck reading through some of his newer blogposts and seeing if you can spot some mentions there
  • In case this was about "his current views" as opposed to "the views I tried to represent here which are one year old": The comments he left are from this summer, so you can get some idea from there/maybe assume that he endorses the parts I wrote that he didn't commented on (at least in the first third of the doc or so when he still left comments)

FWIW, I just had through my docs and found "resources" doc with the following links under corrigiblity:

Clarifying AI alignment

Can corrigibility be learned safely?

Problems with amplification/distillation

The limits of corrigibility

Addressing three problems with counterfactual corrigibility


Not vouching for any of those being the up-to-date or most relevant ones. I'm pretty sure I made this list early on in the process and it probably doesn't represent what I considered the latest Paul-view.

Comment by Chi Nguyen on My Understanding of Paul Christiano's Iterated Amplification AI Safety Research Agenda · 2020-08-17T21:01:13.137Z · LW · GW

Copied from my comment on this from the EA forum:

Yeah, that's a bit confusing. I think technically, yes, IDA is iterated distillation and amplification and that Iterated Amplification is just IA. However, IIRC many people referred to Paul Christiano's research agenda as IDA even though his sequence is called Iterated amplification, so I stuck to the abbreviation that I saw more often while also sticking to the 'official' name. (I also buried a comment on this in footnote 6)

I think lately, I've mostly seen people refer to the agenda and ideas as Iterated Amplification. (And IIRC I also think the amplification is the more relevant part.)

I agree that it's very not ideal and maybe I should just switch it to Iterated Distillation and Amplification :/