Requesting feedback/advice: what Type Theory to study for AI safety?

post by rvnnt · 2020-06-23T17:03:32.452Z · LW · GW · 2 comments

This is a question post.

Contents

  TL;DR:
  Motivation, Goal, Background
    Sub-goals, skills
  Questions
    Update
None
  Answers
    5 aweinstock
None
2 comments

TL;DR:

Requesting feedback or advice on questions of what math to study (or what projects to engage in) in order to learn how to prove things about programs and to analyze the difficulty/possibility of proving things about programs. Ulterior motive: AI safety (and also progress towards a university degree).

Motivation, Goal, Background

To my current understanding, any endeavor to figure out how to program[1] safe AI would probably eventually[2] benefit a lot from an ability to prove various properties of programs:
The AI will have to self-modify, or create successor agents, i.e. (re)write (parts of) new AI-software; and we'd probably want the AI to be able to prove[3] that said AI-software satisfies various "safety properties" (which I would not yet even know how to specify).

I'd like to get closer to being able to design the above kind of AI, and to better understand (the limits of) proving things about programs. To my current tentative understanding, the knowledge I'm looking for probably lies mostly under the umbrella of type theory.

(Also, I'm currently studying for a degree at a university and have the opportunity to choose a project for myself; and I think learning about type theory, lambda calculus, and/or automated theorem proving might be the most useful way to use that opportunity. (Since I don't see a way to pass deconfusion work off as a university project.))

Sub-goals, skills

To clarify the top-level goal here (viz. being able to design AIs that are able to prove stuff about AI-programs), I've listed below some (semi-)concrete examples of skills/abilities which I imagine would constitute progress towards said goal:

Questions

Problem: There appears to be a very wide variety of theory[4] that might (or might not) be useful to the above goals. I lack the knowledge to discern what theory would be useful. And so, to avoid wasting a lot of time on studying non-useful things, my questions are:

Update

I tried to read MIRI’s paper on Vingean reflection, and skimmed Proof-producing reflection for HOL, but didn’t understand much. Notable knowledge I seem to be missing includes things like

Now I’m wondering: How useful/necessary would it be to study those subjects? Would it be possible to approach the AI-proves-things-about-AI-software problem via type theory, without going much into the above? (Although I’m guessing that some kinds of incompleteness results à la Gödel would come up in type theory too.)


  1. To program safe seed AI "by hand", as opposed to outsourcing the programming (/weight-finding) to e.g. gradient descent. ↩︎

  2. I'm guessing there's probably a large amount of deconfusion work [LW · GW] that would need to be done before program-verification-stuff becomes applicable to AI safety. ↩︎

  3. My (ignorant, fuzzy-intuition-based) guess is that in practice, to get over Vingean reflection / corrigibility problems, we'll need to use some means other than trying to design an AI that can formally prove complicated properties of AI-software it writes for itself. Like figuring out how to design an AI that can derive probabilistic proofs about its mental software. (Something something Logical Induction?) This raises the question: Is studying type theory -- or other approaches to agent-formally-proves-things-about-programs -- a waste of time? ↩︎ ↩︎

  4. For example:

    ↩︎

Answers

answer by aweinstock · 2020-06-23T19:39:37.475Z · LW(p) · GW(p)

"Types and Programming Languages" by Benjamin Pierce is a good textbook for learning about typesystems in general (both theory and implementation). While it doesn't cover dependent types in detail (it only goes as far as F_Omega in that direction), it provides a solid background for further reading.

"Software Foundations" (available at https://softwarefoundations.cis.upenn.edu/) is about proving programs correct in Coq (with embeddings of various Hoare logics). VST (https://vst.cs.princeton.edu/) shows that the methodology in Software Foundation extends to proving properties about actual C programs.

MIRI's papers "Proof-Producing Reflection for HOL" (https://intelligence.org/files/ProofProducingReflection.pdf) and "Definability of Truth in Probabilistic Logic" (https://intelligence.org/files/DefinabilityTruthDraft.pdf) are partial positive results on side-stepping some problems of self-reference in formal logic (the former via stratification of the logic, the latter via generalizing truth values to probabilities).

comment by rvnnt · 2020-06-24T09:40:46.261Z · LW(p) · GW(p)

Thanks for the suggestions!

Software Foundations in particular looks highly apposite, and in a useful format, to boot.

TaPL was indeed one of the books I was considering reading. (Hesitating between that and Lambda Calculus and Combinators: an Introduction.) Gave it another point in my mental ranking.

Based on the little I understood of MIRI’s papers (the first of which also prompted me to read their paper on Vingean Reflection), they seem interesting, but currently inaccessible to me. Added an edit/update to my post.

2 comments

Comments sorted by top scores.

comment by Rudi C (rudi-c) · 2020-06-24T08:12:56.363Z · LW(p) · GW(p)

Disclaimer: I am no expert or even amateur on this topic.

The Little Prover and The Little Typer might also be of interest. I played some with https://softwarefoundations.cis.upenn.edu/ , and I liked it. It is an interactive textbook. I recommend using it with Emacs (Spacemacs has a coq layer).

comment by Thomas Kwa (thomas-kwa) · 2020-06-26T16:04:14.164Z · LW(p) · GW(p)

I'm on a very similar path; DM me if you want my thoughts.