LessWrong 2.0 Reader
View: New · Old · TopRestrict date range: Today · This week · This month · Last three months · This year · All time
next page (older posts) →
next page (older posts) →
“[optimization process] did kind of shockingly well aligning humans to [a random goal that the optimization process wasn’t aiming for (and that’s not reproducible with a higher bandwidth optimization such as gradient descent over a neural network’s parameters)]”
Nope, if your optimization process is able to crystallize some goals into an agent, it’s not some surprising success, unless you picked these goals. If an agent starts to want paperclips in a coherent way and then every training step makes it even better at wanting and pursuing paperclips, your training process isn’t “surprisingly successful” at aligning the agent with making paperclips.
This makes me way less confident about the standard "evolution failed at alignment" story.
If people become more optimistic, because they see some goals in an agent, and say the optimization process was able to successfully optimize for that, but they don’t have evidence of the optimization process having tried to target the goals they observe, they’re just clearly doing something wrong.
Evolutionary physiology is a thing! It is simply invalid to say “[a physiological property of humans that is the result of evolution] existing in humans now is a surprising success of evolution at aligning humans”.
chris_leong on Express interest in an "FHI of the West"Reading your list, a bunch of it seems to be about decisions about what to work on or what locally to pursue.
I think my list appears more this way then I intended because I gave some examples of projects I would be excited by if they happened. I wasn't intending to stake out a strong position as to whether these projects should projects chosen by the institute vs. some examples of projects that it might be reasonable for a researcher to choose.
kaj_sotala on Evolution did a surprising good job at aligning humans...to social statusAgree. This connects to why I think that the standard argument for evolutionary misalignment is wrong [LW · GW]: it's meaningless to say that evolution has failed to align humans with inclusive fitness, because fitness is not any one constant thing. Rather, what evolution can do is to align humans with drives that in specific circumstances promote fitness. And if we look at how well the drives we've actually been given generalize, we find that they have largely continued to generalize quite well [LW(p) · GW(p)], implying that while there's likely to still be a left turn [? · GW], it may very well be much milder than is commonly implied.
quetzal_rainbow on When is a mind me?I always thought that in naive MWI what matters is nkt whether something happens in absolute sense, but what Born measure is concentrated on branches that contain good things instead of bad things.
will_pearson on Express interest in an "FHI of the West"As well as thinking about the need for the place in terms of providing a space for research, it is probably worth thinking about the need for a place in terms of what it provides the world. What subjects are currently under-represented in the world and need strong representation to guide us to a positive future? That will guide who you want to lead the organisation.
jas-ho on Creating unrestricted AI Agents with Command R+Glad you're doing this. By default, it seems we're going to end up with very strong tool-use models where any potential safety measures are easily removed by jailbreaks or fine-tuning. I understand you as working on: How are we going to know that it happened? is that a fair characterization?
Another important question: What should the response be to the appearance of such a model? any thoughts?
I've now updated the event information to include summaries/abstracts for the projects/talks. Some of these are still under construction.
owencb on Express interest in an "FHI of the West"I agree in the abstract with the idea of looking for niches, and I think that several of these ideas have something to them. Nevertheless when I read the list of suggestions my overall feeling is that it's going in a slightly wrong direction, or missing the point, or something. I thought I'd have a go at articulating why, although I don't think I've got this to the point where I'd firmly stand behind it:
It seems to me like some of the central FHI virtues were:
Reading your list, a bunch of it seems to be about decisions about what to work on or what locally to pursue. My feeling is that those are the types of questions which are largely best left open to future researchers to figure out, and that the appropriate focus right now is more like trying to work out how to create the environment which can lead to some of this stuff.
Overall, the take in the previous paragraph is slightly too strong. I think it is in fact good to think through these things to get a feeling for possible future directions. And I also think that some of the good paths towards building a group like this start out by picking a topic or two to convene people on and get them thinking about. But if places want to pick up the torch, I think it's really important to attend to the ways in which it was special that aren't necessarily well-represented in the current x-risk ecosystem.
vanessa-kosoy on When is a mind me?Not sure what you mean by "this would require a pretty small universe".
If we live in naive MWI, an IBP agent would not care for good reasons, because naive MWI is a "library of babel" where essentially every conceivable thing happens no matter what you do.
Also not sure what you mean by "some sort of sampling". AFAICT, quantum IBP is the closest thing to a coherent answer that we have, by a significant margin.
cousin_it on I'm open for projects (sort of)Done! I didn't do it at first because I thought it'd have to be in person only, but then clicked around in the form and found that remote is also possible.