Posts

Essay competition on the Automation of Wisdom and Philosophy — $25k in prizes 2024-04-16T10:10:13.338Z
AI strategy given the need for good reflection 2024-03-18T00:48:09.578Z
Beyond Maxipok — good reflective governance as a target for action 2024-03-15T22:22:03.704Z
Wholesome Culture 2024-03-01T12:08:17.877Z
Wholesomeness and Effective Altruism 2024-02-28T20:28:22.175Z
Acting Wholesomely 2024-02-26T21:49:16.526Z
On the future of language models 2023-12-20T16:58:28.433Z
Report from a civilizational observer on Earth 2022-07-09T17:26:09.223Z
Perils of optimizing in social contexts 2022-06-16T17:40:46.843Z
Don't Over-Optimize Things 2022-06-16T16:33:17.560Z
Deferring 2022-05-12T23:56:20.679Z
Truthful AI: Developing and governing AI that does not lie 2021-10-18T18:37:38.325Z
"Good judgement" and its components 2020-08-24T10:32:50.445Z
Learning to get things right first time 2015-05-29T22:06:56.171Z
Counterfactual trade 2015-03-09T13:23:54.252Z
Neutral hours: a tool for valuing time 2015-03-04T16:38:50.419Z
Report -- Allocating risk mitigation across time 2015-02-20T16:37:33.199Z
Existential Risk and Existential Hope: Definitions 2015-01-10T19:09:43.882Z
Factoring cost-effectiveness 2014-12-24T23:25:24.230Z
Make your own cost-effectiveness Fermi estimates for one-off problems 2014-12-11T12:00:07.145Z
Estimating the cost-effectiveness of research 2014-12-11T11:55:01.025Z
Decision theories as heuristics 2014-09-28T14:36:27.460Z
Why we should err in both directions 2014-08-21T11:10:59.654Z
How to treat problems of unknown difficulty 2014-07-30T11:27:33.014Z

Comments

Comment by owencb on Express interest in an "FHI of the West" · 2024-04-20T08:43:37.164Z · LW · GW

I think that you're right that people's jobs are a significant thing driving the difference here (thanks), but I'd guess that the bigger impact of jobs is via jobs --> culture than via jobs --> individual decisions. This impression is based on a sense of "when visiting Constellation, I feel less pull to engage in the open-ended idea exploration vs at FHI", as well as "at FHI, I think people whose main job was something else would still not-infrequently spend some time engaging with the big open questions of the day".

I might be wrong about that ¯\_(ツ)_/¯

Comment by owencb on Essay competition on the Automation of Wisdom and Philosophy — $25k in prizes · 2024-04-19T22:32:52.751Z · LW · GW

I feel awkward about trying to offer examples because (1) I'm often bad at that when on the spot, and (2) I don't want people to over-index on particular ones I give. I'd be happy to offer thoughts on putative examples, if you wanted (while being clear that the judges will all ultimately assess things as seem best to them). 

Will probably respond to emails on entries (which might be to decline to comment on aspects of it).

Comment by owencb on Express interest in an "FHI of the West" · 2024-04-19T22:08:28.419Z · LW · GW

I don't really disagree with anything you're saying here, and am left with confusion about what your confusion is about (like it seemed like you were offering it as examples of disagreement?).

Comment by owencb on Express interest in an "FHI of the West" · 2024-04-19T20:22:10.035Z · LW · GW

(Caveat: it's been a while since I've visited Constellation, so if things have changed recently I may be out of touch.)

I'm not sure that Constellation should be doing anything differently. I think there's a spectrum of how much your culture is like blue-skies thinking vs highly prioritized on the most important things. I think that FHI was more towards the first end of this spectrum, and Constellation is more towards the latter. I think that there are a lot of good things that come with being further in that direction, but I do think it means you're less likely to produce very novel ideas.

To illustrate via caricatures in a made-up example: say someone turned up in one of the offices and said "OK here's a model I've been developing of how aliens might build AGI". I think the vibe in Constellation would trend towards people are interested to chat about it for fifteen minutes at lunch (questions a mix of the treating-it-as-a-game and the pointed but-how-will-this-help-us), and then say they've got work they've got to get back to. I think the vibe in FHI would have trended more towards people treat it as a serious question (assuming there's something interesting to the model), and it generates an impromptu 3-hour conversation at a whiteboard with four people fleshing out details and variations, which ends with someone volunteering to send round a first draft of a paper. I also think Constellation is further in the direction of being bought into some common assumptions than FHI was; e.g. it would seem to me more culturally legit to start a conversation questioning whether AI risk was real at FHI than Constellation.

I kind of think there's something valuable about the Constellation culture on this one, and I don't want to just replace it with the FHI one. But I think there's something important and valuable about the FHI thing which I'd love to see existing in some more places.

(In the process of writing this comment it occurred to me that Constellation could perhaps decide to have some common spaces which try to be more FHI-like, while trying not to change the rest. Honestly I think this is a little hard without giving that subspace a strong distinct identity. It's possible they should do that; my immediate take now that I've thought to pose the question is that I'm confused about it.)

Comment by owencb on Express interest in an "FHI of the West" · 2024-04-19T15:12:15.428Z · LW · GW

I completely agree that Oliver is a great fit for leading on research infrastructure (and the default thing I was imagining was that he would run the institute; although it's possible it would be even better if he could arrange to be number two with a strong professional lead, giving him more freedom to focus attention on new initiatives within the institute, that isn't where I'd start). But I was specifically talking about the "research lead" role. By default I'd guess people in this role would report to the head of the institute, but also have a lot of intellectual freedom. (It might not even be a formal role; I think sometimes "star researchers" might do a lot of this work without it being formalized, but it still seems super important for someone to be doing.) I don't feel like Oliver's track record blows me away on any of the three subdimensions I named there, and your examples of successes at research infrastructure don't speak to it. This is compatible with him being stronger than I guess, because he hasn't tried in earnest at the things I'm pointing to. (I'm including some adjustment for this, but perhaps I'm undershooting. On the other hand I'd also expect him to level up at it faster if he's working on it in conjunction with people with strong track records.)

I think it's obvious that you want some beacon function (to make it an attractive option for people with strong outside options). That won't be entirely by having excellent people which will mean that internal research conversations are really good, but it seems to me like that was a significant part of what made FHI work (NB this wasn't just Nick, but people like Toby or Anders or Eric); I think it could be make-or-break for any new endeavour in a way that might be somewhat path-dependent in how it turns out; it seems right and proper to give it attention at this stage.

Comment by owencb on Express interest in an "FHI of the West" · 2024-04-19T10:51:31.721Z · LW · GW

Makes sense! My inference was because the discussion at this stage is a high-level one about ways to set things up, but it does seem good to have space to discuss object-level projects that people might get into.

Comment by owencb on Express interest in an "FHI of the West" · 2024-04-19T09:50:21.932Z · LW · GW

I agree in the abstract with the idea of looking for niches, and I think that several of these ideas have something to them. Nevertheless when I read the list of suggestions my overall feeling is that it's going in a slightly wrong direction, or missing the point, or something. I thought I'd have a go at articulating why, although I don't think I've got this to the point where I'd firmly stand behind it:

It seems to me like some of the central FHI virtues were:

  • Offering a space to top thinkers where the offer was pretty much "please come here and think about things that seem important in a collaborative truth-seeking environment"
    • I think that the freedom of direction, rather than focusing on an agenda or path to impact, was important for:
      • attracting talent
      • finding good underexplored ideas (b/c of course at the start of the thinking people don't know what's important)
    • Caveats:
      • This relies on your researchers having some good taste in what's important (so this needs to be part of what you select people on)
      • FHI also had some success launching research groups where people were hired to more focused things
        • I think this was not the heart of the FHI magic, though, but more like a particular type of entrepreneurship picking up and running with things from the core
  • Willingness to hang around at whiteboards for hours talking and exploring things that seemed interesting
    • With an attitude of "OK but can we just model this?" and diving straight into it
      • Someone once described FHI as "professional amateurs", which I think is apt
        • The approach is a bit like the attitude ascribed to physicists in this xkcd, but applied more to problems-that-nobody-has-good-answers-for than things-with-lots-of-existing-study (and with more willingness to dive into understanding existing fields when they're importantly relevant for the problem at hand)
    • Importantly mostly without directly asking "ok but where is this going? what can we do about it?"
      • Prioritization at a local level is somewhat ruthless, but is focused on "how do we better understand important dynamics?" and not "what has external impact in the world?"
  • Sometimes orienting to "which of our ideas does the world need to know about? what are the best ways to disseminate these?" and writing about those in high-quality ways
    • I'd draw some contrast with MIRI here, who I think were also good at getting people to think of interesting things, but less good at finding articulations that translated to broadly-accessible ideas

Reading your list, a bunch of it seems to be about decisions about what to work on or what locally to pursue. My feeling is that those are the types of questions which are largely best left open to future researchers to figure out, and that the appropriate focus right now is more like trying to work out how to create the environment which can lead to some of this stuff.

Overall, the take in the previous paragraph is slightly too strong. I think it is in fact good to think through these things to get a feeling for possible future directions. And I also think that some of the good paths towards building a group like this start out by picking a topic or two to convene people on and get them thinking about. But if places want to pick up the torch, I think it's really important to attend to the ways in which it was special that aren't necessarily well-represented in the current x-risk ecosystem.

Comment by owencb on Express interest in an "FHI of the West" · 2024-04-18T12:00:33.328Z · LW · GW

I think FHI was an extremely special place and I was privileged to get to spend time there. 

I applaud attempts to continue its legacy. However, I'd feel gut-level more optimistic about plans that feel more grounded in thinking about how circumstances are different now, and then attempting to create the thing that is live and good given that, relative to attempting to copy FHI as closely as possible. 

Differences in circumstance

You mention not getting to lean on Bostrom's research taste as one driver of differences, and I think this is correct but may be worth tracing out the implications of even at the stage of early planning. Other things that seem salient and important to me:

  • For years, FHI was one of the only places in the world that you could seriously discuss many of these topics
    • There are now much bigger communities and other institutions where these topics are at least culturally permissible (and some of them, e.g. AI safety, are the subject of very active work)
    • This means that:
      • One of FHI's purposes was serving a crucial niche which is now less undersupplied
      • FHI benefited from being the obvious Schelling location to go to think about these topics
        • Whereas even in Berkeley you want to think a bit about how you sit in the ecosystem relative to Constellation (which I think has some important FHI-like virtues, although makes different tradeoffs and misses on others)
  • FHI benefited from the respectability of being part of the University
    • In terms of getting outsiders to take it seriously, getting meetings with interesting people, etc.
    • I'm not saying this was crucial for its success, and in any case the world looks different now; but I think it had some real impact and is worth bearing in mind
  • As you mention -- you have a campus!
    • I think it would be strange if this didn't have some impact on the shape of plans that would be optimal for you

Pathways to greatness

If I had to guess about the shape of plans that I think you might engage in that would lead to something deserving of the name "FHI of the West", they're less like "poll LessWrong for interest to discover if there's critical mass" (because I think that whether there's critical mass depends a lot on people's perceptions of what's there already, and because many of the people you might most want probably don't regularly read LessWrong), and more like thinking about pathways to scale gracefully while building momentum and support.

When I think about this, two ideas that seem to me like they'd make the plan more promising (that you could adopt separately or in conjunction) are (1) starting by finding research leads, and/or (2) starting small as-a-proportion-of-time. I'll elaborate on these:

Finding research leads

I think that Bostrom's taste was extremely important for FHI. There are a couple of levels this was true on:

  • Cutting through unimportant stuff in seminars
    • I think it's very easy for people, in research, to get fixated on things that don't really matter. Sometimes this is just about not asking enough which the really important questions are (or not being good enough at answering that); sometimes it's kind of performative, about people trying to show off how cool their work is
    • Nick had low tolerance for this, as well as excellent taste. He wasn't afraid to be a bit disagreeable in trying to get to the heart of things
    • This had a number of benefits:
      • Helping discussions in seminars to be well-focused
      • Teaching people (by example) how to do the cut-through-the-crap move
      • Shaping incentives for researchers in the institute, towards tackling the important questions head on
  • Gatekeeping access to the space
    • Bostrom was good at selecting people who would really contribute in this environment
      • This wasn't always the people who were keenest to be there; and saying "no" to people who would add a little bit but not enough (and dilute things) was probably quite important
      • In some cases this meant finding outsiders (e.g. professors elsewhere) to visit, and keeping things intellectually vibrant by having discussions with people with a wide range of current interests and expertise, rather than have FHI just become an echo chamber
  • Being a beacon
    • Nick had a lot of good ideas, which meant that people were interested to come and talk to him, or give seminars, etc.

If you want something to really thrive, at some point you're going to have to wrestle with who is providing these functions. I think that one thing you could do is to start with this piece. Rather than think about "who are all the people who might be part of this? does that sound like critical mass?", start by asking "who are the people who could be providing these core functions?". I'd guess if you brainstorm names you'll end up with like 10-30 that might be viable (if they were interested). Then I'd think about trying to approach them to see if you can persuade one or more to play this role. (For one thing, I think this could easily end up with people saying "yes" who wouldn't express interest on the current post, and that could help you in forming a strong nucleus.)

I say "leads" rather than "lead" because it seems to me decently likely that you're best aiming to have these responsibilities be shared over a small fellowship. (I'm not confident in this.)

Your answer might also be "I, Oliver, will play this role". My gut take would be excited for you to be like one of three people in this role (with strong co-leads, who are maybe complementary in the sense that they're strong at some styles of thinking you don't know exactly how to replicate), and kind of weakly pessimistic about you doing it alone. (It certainly might be that that pessimism is misplaced.)

Starting small as-a-proportion-of-time

Generally, things start a bit small, and then scale up. People can be reluctant to make a large change in life circumstance (like moving job or even city) for something where it's unknown what the thing they're joining even is. By starting small you get to iron out kinks and then move on from there.

Given that you have the campus, I'd seriously consider starting small not as-a-number-of-people but as-a-proportion-of-time. You might not have the convening power to get a bunch of great people to make this their full time job right now (especially if they don't have a good sense who their colleagues will be etc.). But you probably do have the convening power to get a bunch of great people to show up for a week or two, talk through big issues, and spark collaborations. 

I think that you could run some events like this. Maybe to start they're just kind of like conferences / workshops, with a certain focus. (I'd still start by trying to find something like "research leads" for the specific events, as I think it would help convening power as well as helping the things to go well.) In some sense that might be enough for carrying forward the spirit of FHI -- it's important that there are spaces for it, not that these spaces are open 365. But if it goes well and they seem productive, it could be expanded. Rather than just "research weeks", offer "visiting fellowships" where people take a (well-paid) 1-3 month sabbatical from their regular job to come and be in person all at the same time. And then if that's going well consider expanding to a permanent research group. (Or not! Perhaps the ephemeral nature of short-term things, and constantly having new people, would prove even more productive.)

Comment by owencb on Essay competition on the Automation of Wisdom and Philosophy — $25k in prizes · 2024-04-16T21:06:01.432Z · LW · GW

It's a fair point that wisdom might not be straightforwardly safety-increasing. If someone wanted to explore e.g. assumptions/circumstances under which it is vs isn't, that would certainly be within scope for the competition.

Comment by owencb on Essay competition on the Automation of Wisdom and Philosophy — $25k in prizes · 2024-04-16T20:56:26.014Z · LW · GW

Multiple entries are very welcome!

[With some kind of anti-munchkin caveat. Submitting your analyses of several different disjoint questions seems great; submitting two versions of largely the same basic content in different styles not so great. I'm not sure exactly how we'd handle it if someone did the latter, but we'd aim for something sensible that didn't incentivise people to have been silly about it.]

Comment by owencb on Acting Wholesomely · 2024-04-08T08:43:54.417Z · LW · GW

Thanks, yes, I think that you're looking at things essentially the same way that I am. I particularly like your exploration of what the inner motions feel like; I think "unfixation" is a really good word.

Comment by owencb on Acting Wholesomely · 2024-03-21T22:56:05.447Z · LW · GW

I think that for most of what I'm saying, the meaning wouldn't change too much if you replaced the word "wholesome" with "virtuous" (though the section contrasting it with virtue ethics would become more confusing to read). 

As practical guidance, however, I'm deliberately piggybacking off what people already know about the words. I think the advice to make sure that you pay attention to ways in which things feel unwholesome is importantly different from (and, I hypothesize, more useful than) advice to make sure you pay attention to ways in which things feel unvirtuous. And the advice to make sure you pay attention to things which feel frobby would obviously not be very helpful, since readers will not have much of a sense of what feels frobby.

Comment by owencb on Acting Wholesomely · 2024-03-17T16:29:36.647Z · LW · GW

If you personally believe it to be wrong, it's unwholesome. But generically no. See the section on revolutionary action in the third essay.

Comment by owencb on Acting Wholesomely · 2024-03-16T16:31:51.349Z · LW · GW

I think this is essentially correct. The essays (especially the later ones) do contain some claims about ways in which it might or might not be useful; of course I'm very interested to hear counter-arguments or further considerations.

Comment by owencb on Acting Wholesomely · 2024-03-15T01:49:39.994Z · LW · GW

The most straightforward criterion would probably be "things they themselves feel to be mistakes a year or two later". That risks people just failing to own their mistakes so would only work with people I felt enough trust in to be honest with themselves. Alternatively you could have an impartial judge. (I'd rather defer to "someone reasonable making judgements" than try to define exactly what a mistake is, because the latter would cover a lot of ground and I don't think I'd do a good job of it; also my claims don't feel super sensitive to how mistakes are defined.)

Comment by owencb on Acting Wholesomely · 2024-03-14T20:54:44.093Z · LW · GW

I would certainly update in the direction of "this is wrong" if I heard a bunch of people had tried to apply this style of thinking over an extended period, I got to audit it a bit by chatting to them and it seemed like they were doing a fair job, and the outcome was they made just as many/serious mistakes as before (or worse!).

(That's not super practically testable, but it's something. In fact I'll probably end up updating some from smaller anecdata than that.)

Comment by owencb on Acting Wholesomely · 2024-03-14T20:48:55.509Z · LW · GW

I definitely agree that this fails as a complete formula for assessing what's good or bad. My feeling is that it offers an orientation that can be helpful for people aggregating stuff they think into all-things-considered judgements (and e.g. I would in retrospect have preferred to have had more of this orientation in the past).

If someone were using this framework to stop thinking about things that I thought they ought to consider, I couldn't be confident that they weren't making a good faith effort to act wholesomely, but I at least would think that their actions weren't wholesome by my lights.

Comment by owencb on Acting Wholesomely · 2024-03-14T10:08:12.444Z · LW · GW

Good question, my answer on this is nuanced (and I'm kind of thinking it through in response to your question).

I think that what feels to you to be wholesome will depend on your values. And I'm generally in favour of people acting according to their own feeling of what is wholesome.

On the other hand I also think there would be some choices of values that I would describe as "not wholesome". These are the ones which ignore something of what's important about some dimension (perhaps justifying ignoring it by saying "I just don't value this"), at least as felt-to-be-important by a good number of other people in society.

But although "avoiding unwholesomeness" provides some constraints on values, it's not specifying exactly what values or tradeoffs are good to have. And then for any among the range of possible wholesome values, when you come to make decisions acting wholesomely will depend on your personal values. (Or, depending on the situation, perhaps not; in the case of the business plan, if it's supposed to be for the sake of the local community then what is wholesome could depend a lot more on the community's values than on your own.)

So there is an element of "paying at least some attention to traditional values" (at least while fair numbers of people care about them), but it's definitely not trying to say "optimize for them".

Comment by owencb on Acting Wholesomely · 2024-03-13T09:44:42.489Z · LW · GW

I doubt this is very helpful for our carefully-considered ethical notions of what's good.

I think it may be helpful as a heuristic for helping people to more consistently track what's good, and avoid making what they'd later regard as mistakes.

Comment by owencb on Acting Wholesomely · 2024-03-12T23:27:52.404Z · LW · GW

I agree that "paying attention to the whole system" isn't literally a thing that can be done, and I should have been clearer about what I actually meant. It's more like "making an earnest attempt to pay attention to the whole system (while truncating attention at a reasonable point)". It's not that you literally get to attend to everything, it's that you haven't excluded some important domain from things you care about. I think habryka (quoting and expanding on Ben Pace's thoughts) has a reasonable description of this in a comment

I definitely don't think this is just making an arbitrary choice of what things to value, or that it's especially anchored in traditional values (though I do think it's correlated with traditional values).

I discuss a bit about making the tradeoffs of when to stop giving things attention in the section "wholesomeness vs expedience" in the second essay.

Comment by owencb on Acting Wholesomely · 2024-03-12T23:19:37.639Z · LW · GW

I think that there is some important unwholesomeness in these things, but that isn't supposed to mean that they're never permitted. (Sorry, I see how it could give that impression; but in the cases you're discussing there would often be greater unwholesomeness in not doing something.)

I discuss how I think my notion of wholesomeness intersects with these kind of examples in the section on visionary thought and revolutionary action in the third essay.

Comment by owencb on Acting Wholesomely · 2024-03-12T23:15:04.123Z · LW · GW

I think that there's something interesting here. One of the people I talked about this with asked me why children seem exceptionally wholesome (it's certainly not because they're unusually good at tracking the whole of things), and I thought the answer was about them being a part of the world where it may be especially important to avoid doing accidental harm, so our feelings of harms-to-children have an increased sense of unwholesomeness. But I'm now thinking that something like "robustly not evil" may be an important part of it.

Now we can trace out some of the links between wholesomeness1 and wholesomeness2. If evil is something like "consciously disregarding the impacts of your actions on (certain) others", then wholesomeness1 should robustly avoid it. And failures of wholesomeness1 which aren't evil might still be failures of wholesomeness2 -- because they involve a failure to attend to some impacts of actions, while observers may not be able to tell whether that failure to attend was accidental or deliberate.

A couple more notes:

  • I don't think that wholesomeness2 is a crisp thing -- it's dependent on the audience, and how much they get to observe. Someone could have wholesomeness2 in a strong way with respect to one audience, and really not with respect to another audience.
  • I think in expectation / in the long run / as your audiences get smarter (or something), pursuing wholesomeness1 may be a good proxy for wholesomeness2. Basically for the kind of reasons discussed in Integrity for consequentialists
Comment by owencb on Acting Wholesomely · 2024-03-12T09:30:44.493Z · LW · GW

FWIW I quite like your way of pointing at things here, though maybe I'm more inclined towards letting things hang out for a while in the (conflationary?) alliance space to see which seem to be the deepest angles of what's going on in this vicinity, and doing more of the conceptual analysis a little later.

That said, if someone wanted to suggest a rewrite I'd seriously consider adopting it (or using it as a jumping-off point); I just don't think that I'm yet at the place where a rewrite will flow naturally for me.

Comment by owencb on Wholesome Culture · 2024-03-12T09:18:35.778Z · LW · GW

I largely think that the section of the second essay on "wholesomeness vs expedience" is also applicable here.

Basically I agree that you sometimes have to not look at things, and I like your framing of the hard question of wholesomeness. I think that the full art of deciding when it's appropriate to not think about something be better discussed via a bunch of examples, rather than trying to describe it in generalities. But the individual decisions are ones that you can make wholesomely or not, and I think that's my current best guess approach for how to handle this. Setting something aside, when it feels right to do so, with some sadness that you don't get to get to the bottom of it, feels wholesome. Blithely dismissing something as not worth attention typically feels unwholesome, because of something like a missing mood (and relatedly, it not being clear that you're attending enough to notice if it were worth more attention).

There's also a question about how this relates to social reality. I think that if you're choosing not to look at something because it doesn't feel like it's worth the attention, then if someone else raises it (because it seems important to them) it's natural to engage with some curiosity that you now -- for the space of the conversation -- get to look at the thing a bit. You may explain why you don't normally think about it, but you're not actively trying to suppress it. I think the more unwholesome versions of not looking at something are much more likely to try to actively avoid or shut the conversation down.

Comment by owencb on Wholesomeness and Effective Altruism · 2024-02-29T16:57:39.931Z · LW · GW

DALL·E. I often told it in abstract terms the themes I wanted to include, used prompts including "stylized and slightly abstract", and regenerated a few times till I got something I was happy with.

(There are also a few that I drew, but that's probably obvious.)

Comment by owencb on Acting Wholesomely · 2024-02-29T09:16:52.762Z · LW · GW

I'd be tempted to make it a question, and ask something like "what do you think the impacts of this on [me/person] are?".

It might be that question would already do work by getting them to think about the thing they haven't been thinking about. But it could also elicit a defence like "it doesn't matter because the mission is more important" in which case I'd follow up with an argument that it's likely worth at least understanding the impacts because it might help to find actions which are better on those grounds while being comparably good -- or even better -- for the mission. Or it might elicit a mistaken model of the impacts, in which case I'd follow up by saying that I thought it was mistaken and explaining how.

Comment by owencb on New LessWrong review winner UI ("The LeastWrong" section and full-art post pages) · 2024-02-28T22:39:42.042Z · LW · GW

Maybe consider asking the authors if they'd want to volunteer a ?50? word summary for this purpose, and include summaries for those who do?

Comment by owencb on Wholesomeness and Effective Altruism · 2024-02-28T20:35:18.658Z · LW · GW

Examples of EA errors as failures of wholesomeness

In this comment (cross-posted from the EA forum) I’ll share a few examples of things I mean as failures of wholesomeness. I don’t really mean to over-index on these examples. I actually feel like a decent majority of what I wish that EA had been doing differently relates to this wholesomeness stuff. However, I’m choosing examples that are particularly easy to talk about — around FTX and around mistakes I've made — because I have good visibility of them, and in order not to put other people on the spot. Although I’m using these examples to illustrate my points, my beliefs don’t hinge too much on the particulars of these cases. (But the fact that the “failures of wholesomeness” frame can be used to provide insight on a variety of different types of error does increase the degree to which I think there’s a deep and helpful insight here.)

Fraud at FTX
To the extent that the key people at FTX were motivated by EA reasons, it looks like a catastrophic failure of wholesomeness — most likely supported by a strong desire for expedience and a distorted picture where people's gut sense of what was good was dominated by the terms where they had explicit models of impact on EA-relevant areas. It is uncomfortable to think that people could have caused this harm while believing they were doing good, but I find that it has some plausibility. It is hard to imagine that they would have made the same mistakes if they had explicitly held “be wholesome” as a major desideratum in their decision-making.

EA relationship to FTX
Assume that we don't get to intervene to change SBF’s behaviour. I still think that EA would have had a healthier relationship with FTX if it had held wholesomeness as a core virtue. I think many people had some feeling of unwholesomeness associated with FTX, even if they couldn't point to all of the issues. I think focusing on this might have helped EA to keep FTX at more distance, not to extol SBF so much just for doing a great job at making a core metric ($ to be donated) go up, etc. It could have gone a long way to reducing inappropriate trust, if people felt that their degree of trust in other individuals or organizations should vary not just with who espouses EA principles, but with how much people act wholesomely in general.

My relationship to attraction
I had an unhealthy relationship to attraction, and took actions which caused harm. (I might now say that I related to my attraction as unwholesome — arguably a mistake in itself, but compounded because I treated that unwholesomeness as toxic and refused to think about it. This blinded me to a lot of what was going on for other people, which led to unwholesome actions.)

Though I now think my actions were wrong, at some level I felt at the time like I was acting rightly. But (though I never explicitly thought in these terms), I do not think I would have felt like I was acting wholesomely. So if wholesomeness had been closer to a core part of my identity I might have avoided the harms — even without getting to magically intervene to fix my mistaken beliefs.

(Of course this isn’t precisely an EA error, as I wasn’t regarding these actions as in pursuit of EA — but it’s still very much an error where I’m interested in how I could have avoided it via a different high-level orientation.)

Wytham Abbey
Although I still think that the Wytham Abbey project was wholesome in its essence, in retrospect I think that I was prioritizing expedience over wholesomeness in choosing to move forward quickly and within the EV umbrella. I think that the more wholesome thing to do would have been, up front, to establish a new charity with appropriate governance structures. This would have been more inconvenient, and slowed things down — but everything would have been more solid, more auditable in its correctness. Given the scale of the project and its potential to attract public scrutiny, having a distinct brand that was completely separate from “the Centre for Effective Altruism” would have been a real benefit.

I knew at the time that that wasn’t entirely the wholesome way to proceed. I can remember feeling “you know, it would be good to sort out governance properly — but this isn’t urgent, so maybe let’s move on and revisit this later”. Of course there were real tradeoffs there, and I’m less certain than for the other points that there was a real error here; but I think I was a bit too far in the direction of wanting expedience, and expecting that we’d be able to iron out small unwholesomenesses later. Leaning further towards caring about wholesomeness might have led to more-correct actions.

Comment by owencb on Acting Wholesomely · 2024-02-28T11:00:02.875Z · LW · GW

I think this is a fair complaint! I think it's quite unwholesome, if you think we're in a crisis, to turn away and not look at that, or not work towards helping. It seems important to think about safety rails against that. It's less obviously unwholesome to keep devoting some effort towards typical human things while also devoting some effort towards fighting. (And I think there are a lot of stories I could tell where such paths end up doing better for the world than monomania about the things which seem most important.)

BTW one agenda here is thinking about what kinds of properties we might want societies of AI systems to have. I think there's some slice of worlds where things go better if we teach our AI systems to be something-in-the-vicinity-of-wholesome than if we don't.

Comment by owencb on Acting Wholesomely · 2024-02-27T17:55:54.661Z · LW · GW

I like this articulation. Would you object if I were to borrow it into the main text?

At the same time I'm not certain, if you just gave someone this definition, if they'd properly grasp the idea (if they didn't kind of understand it already). There are lots of different possible interpretations. Some are obviously impossible and so not action-guiding ("have individual compassion and respect for each electron, leaving none out"). More realistically, I think someone might hear it as automatically satisfied by an EA-style impartiality (and I think there's more to it than that, and also guess you think there's more to it than that).

Comment by owencb on Acting Wholesomely · 2024-02-27T16:25:20.661Z · LW · GW

Sorry, I didn't realise until reading your comment today that that was a prevalent reading. (Although in retrospect it makes some sense.)

I think this error is pretty related to the original errors. I had a bunch of illusion of transparency and some psychological block around possible interpretations of things I did as sex-seeking. Also at the time of the public discussion I just felt really bad and it seemed kind of appropriate that people should think badly of me (I understood that they did think badly, but not the detail of that).

By the time I'd largely worked through the psychological issues and talked further about things publicly (in my December update, or my response to the EV investigation), it seemed better to talk at a more general level about what had gone wrong, and not try to rehash the original discussion. (I was aware that I'd be able to talk about it more clearly vs where I was last year, but I don't think it's a super comfortable topic of conversation for anyone, and it's not clear the extra clarity made it worth getting into it.)

Comment by owencb on Acting Wholesomely · 2024-02-27T11:04:22.571Z · LW · GW

This is partly a feature: since I feel happier that I'm pointing to a coherent concept than I feel happy with any of my short articulations of it, one of the ways to continue to point is to use it in various ways, and let the way it's being used implicitly convey some information about the boundaries or subtleties of the concept.

Of course it seems great to continue to try to find better short articulations, to explore more examples, or otherwise to explicitly specify the boundaries/subtleties of the concept. If some of that happens in the comments it seems great to me. (Or if it gets poked at and turns out to be less of coherent concept than I think, I'd very much like to know that!)

Interesting observation that this comes up in religious texts about how to live a good life. I guess the reason is similar: there's a lot of complexity and nuance about how to live a good life, and it's helpful to be able to talk about some general directions and features of the landscape, even if one can't give a precise articulation of those features.

Comment by owencb on Acting Wholesomely · 2024-02-27T10:56:23.520Z · LW · GW

I really appreciate the example that you spelled out. I think this is solidly pointing at the same concept.

On this paragraph:

When I am choosing an action and justifying it as wholesome, what it often feels like is that I am trying to track all the obvious considerations, but some (be it internal or external) force is pushing me to ignore one of them. Not merely to trade off against it, but to look away from it in my mind. And agains that force I’m trying to defend a particular action as the best one call all things considered - the “wholesome” action.

... I am inclined to try not to use the word "wholesome" to mean "right" (= the thing you should do, all things considered). I'm trying to use "wholesome" to mean "not looking away from any of the considerations". This then allows "choosing what feels wholesome is a good heuristic for choosing what is right" to be a substantive claim.

Comment by owencb on Acting Wholesomely · 2024-02-27T10:45:32.764Z · LW · GW

I think that there's something fair about your complaint in that I don't think I've fully specified the concept, and am gesturing rather than defining.

At the same time I feel like your rewrites with substituted words are less coherent than the original. I think this is true both with respect to the everyday English senses of the words (they're not completely incoherent, and of course we could imagine a world where the words were used in ways which made them comparably coherent -- I just think on actual usage they make a bit less sense), and also with respect to what I have outlined about my sense of "wholesome" in the essay prior to that, where it's important that "wholesome" is about paying attention to the whole of things.

Comment by owencb on Acting Wholesomely · 2024-02-27T10:41:08.439Z · LW · GW

I agree that this is relevant; but also I don't want to drum on about it. The most natural place to discuss it seemed to be on the second essay, so I was planning to post some stuff about the connections there. Apologies if this was the wrong call.

Comment by owencb on Acting Wholesomely · 2024-02-27T10:39:32.557Z · LW · GW

The act in question happened before she arrived, not after. (I wanted to reduce the impact of attraction on my experience while she was staying there.) But in any case I was not attuned to what her experience might be, and I now agree that it was highly inappropriate for me to have shared that information.

Comment by owencb on On the future of language models · 2023-12-20T23:46:42.255Z · LW · GW

Thanks. At a first look at what you're saying I'm understanding these to be subcategories of using finetuning or scaffolding (in the case of leveraging semantic knowledge graphs) in order to get useful tools. But I don't understand the sense in which you think finetuning in this context has completely different properties. Do you mean different properties from the point where I discuss agency entering via finetuning? If so I agree.

(Apologies for not having thought this through in greater depth.)

Comment by owencb on A note about differential technological development · 2022-07-15T14:35:22.116Z · LW · GW

I kind of want you to get quantitative here? Like pretty much every action we take has some effect on AI timelines, but I think effect-on-AI-timelines is often swamped by other considerations (like effects on attitudes around those who will be developing AI).

Of course it's prima facie more plausible that the most important effect of AI research is the effect on timelines, but I'm actually still kind of sceptical. On my picture, I think a key variable is the length of time between when-we-understand-the-basic-shape-of-things-that-will-get-to-AGI and when-it-reaches-strong-superintelligence. Each doubling of that length of time feels to me like it could be worth order of 0.5-1% of the future. Keeping implemented-systems close to the technological-frontier-of-what's-possible could help with this, and may be more affectable than the

Note that I don't think this really factors into an argument in terms of "advancing alignment" vs "aligning capabilities" (I agree that if "alignment" is understood abstractly the work usually doesn't add too much to that). It's more like a DTD argument about different types of advancing capabilities.

I think it's unfortunate if that strategy looks actively bad on your worldview. But if you want to persuade people not to do it, I think you either need to persuade them of the whole case for your worldview (for which I've appreciated your discussion of the sharp left turn), or to explain not just that you think this is bad, but also how big a deal do you think it is. Is this something your model cares about enough to trade for in some kind of grand inter-worldview bargaining? I'm not sure. I kind of think it shouldn't be (that relative to the size of ask it is, you'd get a much bigger benefit from someone starting to work on things you cared about than stopping this type of capabilities research), but I think it's pretty likely I couldn't pass your ITT here.

Comment by owencb on Report from a civilizational observer on Earth · 2022-07-11T08:05:34.866Z · LW · GW

I'd be very interested to read more about the assumptions of your model, if there's a write-up somewhere.

Comment by owencb on Report from a civilizational observer on Earth · 2022-07-11T08:04:46.875Z · LW · GW

Fair question. I just did the lazy move of looking up world GDP figures. In fact I don't think that my observers would measure GDP the same way we do. But it would be a measurement of some kind of fundamental sense of "capacity for output (of various important types)". And I'm not sure whether that has been growing faster or slower than real GDP, so the GDP figures seem a not-terrible proxy.

Comment by owencb on Report from a civilizational observer on Earth · 2022-07-09T23:09:17.450Z · LW · GW

I'd be interested to dig into this claim more. What exactly is the claim, and what is the justification for it? If the claim is something like "For most tasks, the thinking machines seem to need 0 to 3 orders of magnitude more experience on the task before they equal human performance" then I tentatively agree. But if it's instead 6 to 9 OOMs, or even just a solid 3 OOMs, I'd say "citation needed!"

No precise claim, I'm afraid! The whole post was written from a place of "OK but what are my independent impressions on this stuff?", and then setting down the things that felt most true in impression space. I guess I meant something like "IDK, seems like they maybe need 0 to 6 OOMs more", but I just don't think my impressions should be taken as strong evidence on this point. 

The general point about the economic viability of automating specialized labour is about more than just data efficiency; there are other ~fixed costs for automating industries which mean small specialized industries will be later to be automated.

(It's maybe worth commenting that the scenarios I describe here are mostly not like "current architecture just scales all the way to human-level and beyond with more compute". If they actually do scale then maybe superhuman generalization happens significantly earlier in the process.)

Comment by owencb on Report from a civilizational observer on Earth · 2022-07-09T18:51:50.856Z · LW · GW

It's a lightly fictionalized account of my independent impressions of AI trajectories.

Comment by owencb on Don't Over-Optimize Things · 2022-06-16T23:01:06.534Z · LW · GW

Interesting, I think there's some kind of analogy (or maybe generalization) here, but I don't fully see it.

I at least don't think it's a direct reinvention because slack (as I understand it) is a think that agents have, rather than something which determines what's good or bad about a particular decision.

(I do think I'm open to legit accusations of reinvention, but it's more like reinventing alignment issues.)

Comment by owencb on AMA on Truthful AI: Owen Cotton-Barratt, Owain Evans & co-authors · 2021-10-27T11:34:31.182Z · LW · GW

I'm relatively a fan of their approach (although I haven't spent an enormous amount of time thinking about it). I like starting with problems which are concrete enough to really go at but which are microcosms for things we might eventually want.

I actually kind of think of truthfulness as sitting somewhere on the spectrum between the problem Redwood are working on right now and alignment. Many of the reasons I like truthfulness as medium-term problem to work on are similar to the reasons I like Redwood's current work.

Comment by owencb on AMA on Truthful AI: Owen Cotton-Barratt, Owain Evans & co-authors · 2021-10-27T11:30:57.757Z · LW · GW

I think it would be an easier challenge to align 100 small ones (since solutions would quite possibly transfer across).

I think it would be a bigger victory to align the one big one.

I'm not sure from the wording of your question whether I'm supposed to assume success.

Comment by owencb on AMA on Truthful AI: Owen Cotton-Barratt, Owain Evans & co-authors · 2021-10-27T11:23:19.415Z · LW · GW

To add to what Owain said:

  • I think you're pointing to a real and harmful possible dynamic
  • However I'm generally a bit sceptical of arguments of the form "we shouldn't try to fix problem X because then people will get complacent"
    • I think that the burden of proof lies squarely with the "don't fix problem X" side, and that usually it's good to fix the problem and then also give attention to the secondary problem that's come up
  • I note that I don't think of politicians and CEOs to be the primary audience of our paper
    • Rather I think in the next several years such people will naturally start having more of their attention drawn to AI falsehoods (as these become a real-world issue), and start looking for what to do about it
    • I think that at that point it would be good if the people they turn to are better informed about the possible dynamics and tradeoffs. I would like these people to have read work which builds upon what's in our paper. It's these further researchers (across a few fields) that I regard as the primary audience for our paper.
Comment by owencb on AMA on Truthful AI: Owen Cotton-Barratt, Owain Evans & co-authors · 2021-10-27T09:58:31.947Z · LW · GW

I don't think I'm yet at "here's regulation that I'd just like to see", but I think it's really valuable to try to have discussions about what kind of regulation would be good or bad. At some point there will likely be regulation in this space, and it would be great if that was based on as deep an understanding as possible about possible regulatory levers, and their direct and indirect effects, and ultimate desirability.

I do think it's pretty plausible that regulation about AI and truthfulness could end up being quite positive. But I don't know enough to identify in exactly what circumstances it should apply, and I think we need a bit more groundwork on building and recognising truthful AI systems first. I guess quite a bit of our paper is trying to open the conversation on that.

Comment by owencb on "Good judgement" and its components · 2020-08-25T10:40:06.597Z · LW · GW
I think there's also a capability component, distinct from "understanding/modeling the world", about self-alignment or self-control - the ability to speak or act in accordance with good judgement, even when that conflicts with short-term drives.

In my ontology I guess this is about the heuristics which are actually invoked to decide what to do given a clash between abstract understanding of what would be good and short-term drives (i.e. it's part of meta-level judgement). But I agree that there's something helpful about having terminology to point to that part in particular. Maybe we could say that self-alignment and self-control are strategies for acting according to one's better judgement?

Comment by owencb on "Good judgement" and its components · 2020-08-25T10:32:16.660Z · LW · GW
Do you consider "good decision-making" and "good judgement" to be identical? I think there's a value alignment component to good judgement that's not as strongly implied by good decision-making.

I agree that there's a useful distinction to be made here. I don't think of it as fitting into "judgement" vs "decision-making" (and would regard those as pretty much the same), but rather about how "good" is interpreted/assessed. I was mostly using good to mean something like "globally good" (i.e. with something like your value alignment component), but there's a version of "prudentially good judgement/decision-making" which would exclude this.

I'm open to suggestions for terminology to capture this!

Comment by owencb on Acausal trade: double decrease · 2017-05-16T20:48:58.000Z · LW · GW

I think the double decrease effect kicks in with uncertainty, but not with confident expectation of a smaller network.