Posts

Call for evaluators: Participate in the European AI Office workshop on general-purpose AI models and systemic risks 2024-11-27T02:54:16.263Z
Workshop Report: Why current benchmarks approaches are not sufficient for safety? 2024-11-26T17:20:47.453Z

Comments

Comment by Tom DAVID (tom-david) on A list of core AI safety problems and how I hope to solve them · 2023-08-26T16:34:19.539Z · LW · GW

"Instead of building in a shutdown button, build in a shutdown timer."

-> Isn't that a form of corrigibility with an added constraint? I'm not sure what would prevent you from convincing humans that it's a bad thing to respect the timer, for example. Is it because we'll formally verify we avoid deception instance? It's not clear to me but maybe I've misunderstood.