LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

next page (older posts) →

Recent comments

mikhail-samin on Evolution did a surprising good job at aligning humans...to social status

“[optimization process] did kind of shockingly well aligning humans to [a random goal that the optimization process wasn’t aiming for (and that’s not reproducible with a higher bandwidth optimization such as gradient descent over a neural network’s parameters)]”

Nope, if your optimization process is able to crystallize some goals into an agent, it’s not some surprising success, unless you picked these goals. If an agent starts to want paperclips in a coherent way and then every training step makes it even better at wanting and pursuing paperclips, your training process isn’t “surprisingly successful” at aligning the agent with making paperclips.

This makes me way less confident about the standard "evolution failed at alignment" story.

If people become more optimistic, because they see some goals in an agent, and say the optimization process was able to successfully optimize for that, but they don’t have evidence of the optimization process having tried to target the goals they observe, they’re just clearly doing something wrong.

Evolutionary physiology is a thing! It is simply invalid to say “[a physiological property of humans that is the result of evolution] existing in humans now is a surprising success of evolution at aligning humans”.

chris_leong on Express interest in an "FHI of the West"

Reading your list, a bunch of it seems to be about decisions about what to work on or what locally to pursue.

I think my list appears more this way then I intended because I gave some examples of projects I would be excited by if they happened. I wasn't intending to stake out a strong position as to whether these projects should projects chosen by the institute vs. some examples of projects that it might be reasonable for a researcher to choose.

kaj_sotala on Evolution did a surprising good job at aligning humans...to social status

Agree. This connects to why I think that the standard argument for evolutionary misalignment is wrong [LW · GW]: it's meaningless to say that evolution has failed to align humans with inclusive fitness, because fitness is not any one constant thing. Rather, what evolution can do is to align humans with drives that in specific circumstances promote fitness. And if we look at how well the drives we've actually been given generalize, we find that they have largely continued to generalize quite well [LW(p) · GW(p)], implying that while there's likely to still be a left turn [? · GW], it may very well be much milder than is commonly implied.

quetzal_rainbow on When is a mind me?

I always thought that in naive MWI what matters is nkt whether something happens in absolute sense, but what Born measure is concentrated on branches that contain good things instead of bad things.

will_pearson on Express interest in an "FHI of the West"

As well as thinking about the need for the place in terms of providing a space for research, it is probably worth thinking about the need for a place in terms of what it provides the world. What subjects are currently under-represented in the world and need strong representation to guide us to a positive future? That will guide who you want to lead the organisation.

jas-ho on Creating unrestricted AI Agents with Command R+

Glad you're doing this. By default, it seems we're going to end up with very strong tool-use models where any potential safety measures are easily removed by jailbreaks or fine-tuning. I understand you as working on: How are we going to know that it happened? is that a fair characterization?

Another important question: What should the response be to the appearance of such a model? any thoughts?

linda-linsefors on AI Safety Camp final presentations

I've now updated the event information to include summaries/abstracts for the projects/talks. Some of these are still under construction.

owencb on Express interest in an "FHI of the West"

I agree in the abstract with the idea of looking for niches, and I think that several of these ideas have something to them. Nevertheless when I read the list of suggestions my overall feeling is that it's going in a slightly wrong direction, or missing the point, or something. I thought I'd have a go at articulating why, although I don't think I've got this to the point where I'd firmly stand behind it:

It seems to me like some of the central FHI virtues were:

Offering a space to top thinkers where the offer was pretty much "please come here and think about things that seem important in a collaborative truth-seeking environment"
- I think that the freedom of direction, rather than focusing on an agenda or path to impact, was important for:
  - attracting talent
  - finding good underexplored ideas (b/c of course at the start of the thinking people don't know what's important)
- Caveats:
  - This relies on your researchers having some good taste in what's important (so this needs to be part of what you select people on)
  - FHI also had some success launching research groups where people were hired to more focused things
    - I think this was not the heart of the FHI magic, though, but more like a particular type of entrepreneurship picking up and running with things from the core
Willingness to hang around at whiteboards for hours talking and exploring things that seemed interesting
- With an attitude of "OK but can we just model this?" and diving straight into it
  - Someone once described FHI as "professional amateurs", which I think is apt
    - The approach is a bit like the attitude ascribed to physicists in this xkcd, but applied more to problems-that-nobody-has-good-answers-for than things-with-lots-of-existing-study (and with more willingness to dive into understanding existing fields when they're importantly relevant for the problem at hand)
- Importantly mostly without directly asking "ok but where is this going? what can we do about it?"
  - Prioritization at a local level is somewhat ruthless, but is focused on "how do we better understand important dynamics?" and not "what has external impact in the world?"
Sometimes orienting to "which of our ideas does the world need to know about? what are the best ways to disseminate these?" and writing about those in high-quality ways
- I'd draw some contrast with MIRI here, who I think were also good at getting people to think of interesting things, but less good at finding articulations that translated to broadly-accessible ideas

Reading your list, a bunch of it seems to be about decisions about what to work on or what locally to pursue. My feeling is that those are the types of questions which are largely best left open to future researchers to figure out, and that the appropriate focus right now is more like trying to work out how to create the environment which can lead to some of this stuff.

Overall, the take in the previous paragraph is slightly too strong. I think it is in fact good to think through these things to get a feeling for possible future directions. And I also think that some of the good paths towards building a group like this start out by picking a topic or two to convene people on and get them thinking about. But if places want to pick up the torch, I think it's really important to attend to the ways in which it was special that aren't necessarily well-represented in the current x-risk ecosystem.

vanessa-kosoy on When is a mind me?

Not sure what you mean by "this would require a pretty small universe".

If we live in naive MWI, an IBP agent would not care for good reasons, because naive MWI is a "library of babel" where essentially every conceivable thing happens no matter what you do.

Also not sure what you mean by "some sort of sampling". AFAICT, quantum IBP is the closest thing to a coherent answer that we have, by a significant margin.

cousin_it on I'm open for projects (sort of)

Done! I didn't do it at first because I thought it'd have to be in person only, but then clicked around in the form and found that remote is also possible.

LessWrong 2.0 Reader

Archive

Recent comments