Is anyone working on formally verified AI toolchains?
post by metachirality · 2024-03-12T19:36:38.780Z · LW · GW · 4 commentsThis is a question post.
Contents
Answers 12 Zac Hatfield-Dodds None 4 comments
Not talking about solving alignment, but preventing stuff like "we solved alignment but we died anyways because of a race condition."
Answers
Safety properties aren't the kind of properties you can prove; they're statements about the world, not about mathematical objects. I very strongly encourage anyone reading this comment to go read Leveson's Engineering a Safer World (free pdf from author) through to the end of chapter three - it's the best introduction to systems safety that I know of and a standard reference for anyone working with life-critical systems. how.complexsystems.fail is the short-and-quotable catechism.
I'm not really sure what you mean by "AI toolchain", nor what threat model would have a race-condition present an existential risk. More generally, formal verification is a research topic - there's some neat demonstration systems and they're used in certain niches with relatively small amounts of code and compute, simple hardware, and where high development times are acceptable. None of those are true of AI systems, or even libraries such as Pytorch.
For flavor, some of the most exciting developments in formal methods: I expect the Lean FRO to improve usability, and 'autoformalization' tricks like Proofster (pdf) might also help - but it's still niche, and "proven correct" software can still have bugs from under-specified components, incorrect axioms, or outright hardware issues (e.g. Spectre, Rowhammer, cosmic rays, etc.). The seL4 microkernel is great, but you still have to supply an operating system and application layer, and then ensure the composition is still safe. To test an entire application stack, I'd instead turn to Antithesis, which is amazing so long as you can run everything in an x86 hypervisor (with no GPUs).
(as always, opinions my own)
4 comments
Comments sorted by top scores.
comment by Algon · 2024-03-12T21:12:02.347Z · LW(p) · GW(p)
I don't know the answer to this, but strong upvoted because I think this question, and variants like "is anyone working on ensuring AI labs don't sign-flip parts of the reward function" and equally silly things, are important.
Replies from: d4hines↑ comment by d4hines · 2024-06-21T01:09:22.017Z · LW(p) · GW(p)
Alas, formal methods can't really help with that part. If you have the correct spec, formal methods can help you know with as much certainty as we know how to get that your program implements the spec without failing in undefined ways on weird edge cases. But even experienced, motivated formal methods practitioners sometimes get the spec wrong. I suspect "getting the sign of the reward function" right is part of the spec, where theorem provers don't provide much leverage beyond what a marker and whiteboard (or program and unit tests) give you.
Replies from: Algon