LessWrong 2.0 Reader
View: New · Old · Topnext page (older posts) →
next page (older posts) →
In this context, see https://blog.givewell.org/2011/08/18/why-we-cant-take-expected-value-estimates-literally-even-when-theyre-unbiased/ .
sharmake-farah on Gradual Disempowerment: Systemic Existential Risks from Incremental AI DevelopmentI pretty much agree you can end up in arbitrary places with extrapolated values, and I don't think morality is convergent, but I also don't think it matters for the purpose of existential risk, because assuming something like instruction following works, the extrapolation problem can be solved by ordering AIs not to extrapolate values to cases where they get tortured/killed in an ethical scenario, and more generally I don't expect value extrapolation to matter for the purpose of making an AI safe to use.
The real impact is on CEV style alignment plans/plans for what to do with a future AI, which are really bad plans to do for a lot of people's current values, and thus I really don't want CEV to be the basis of alignment.
Thankfully, it's unlikely to ever be this, but it still matters somewhat, especially since Anthropic is targeting value alignment (though thankfully there is implicit constraints/grounding based on the values chosen).
charlie-sanders on Sea ChangeThank you! I think there's a lot of value to be explored in increasing peoples' awareness of AI progress via fiction.
martin-randall on Martin Randall's ShortformMakes sense. Short timelines mean faster societal changes and so less stability. But I could see factoring societal instability risk into time-based risk and tech-based risk. If so, short timelines are net positive for the question "I'm going to die tomorrow, should I get frozen?".
gwern on Implications of the inference scaling paradigm for AI safety"Overtraining" isn't Chinchilla; Chinchilla is just "training". The overtraining being advocated was supra-Chinchilla, with the logic that while you were going off the compute-optimal training, sure, you were more than making up for it by your compute-savings in the deployment phase, which the Chinchilla scaling laws do not address in any way. So there was a fad for training small models for a lot longer.
martin-randall on Martin Randall's ShortformCheck the comments Yudkowsky is responding to on Twitter:
Ok, I hear you, but I really want to live forever. And the way I see it is: Chances of AGI not killing us and helping us cure aging and disease: small. Chances of us curing aging and disease without AGI within our lifetime: even smaller.
And:
For every day AGI is delayed, there occurs an immense amount of pain and death that could have been prevented by AGI abundance. Anyone who unnecessarily delays AI progress has an enormous amount of blood on their hands.
Cryonics can have a symbolism of "I really want to live forever" or "every death is blood on our hands" that is very compatible with racing to AGI.
(I agree with all your disclaimers about symbolic action)
annapurna on Annapurna's ShortformI am aware of that, and as a Canadian, this concerns me.
hastings-greer on Daniel Kokotajlo's ShortformI have a hypothesis: Someone (probably open ai) got reinforcement learning to actually start putting new capabilities into the model with their strawberry project. Up to this point, it had just been eliciting. But getting a new capability this way is horrifically expensive: roughly, it takes hundreds of rollouts to set one weight, where language modelling loss sets a weight every few tokens. The catch is, as soon as any model that is reinforcement learned acts in the world basically at all, all the language models can clone the reinforcement learned capability by training on anything causally downstream of the lead model’s action (and then eliciting.) A capability that took a thousand rollouts to learn leaks as soon as the model takes hundreds of tokens worth of action.
This hypothesis predicts that the r1 training algorithm won’t work to boost aime scores on any model trained with an enforced 2023 data cutoff ( specifically, on any model with no 4o synthetically generated tokens- I think 4o is causally downstream of the strawberry breakthrough)
cole-wyeth on Implications of the inference scaling paradigm for AI safetyWhat does the chinchilla scaling laws paper (overtraining small models) have to do with distilling larger models? It’s about optimizing the performance of your best model, not inference costs. The compute optimal small model would presumably be a better thing to distill, since the final quality is higher.
mmontag on Make an Extraordinary EffortSee also: https://www.lesswrong.com/posts/bx3gkHJehRCYZAF3r/pain-is-not-the-unit-of-effort