The problem/solution matrix: Calculating the probability of AI safety "on the back of an envelope"post by John_Maxwell (John_Maxwell_IV) · 2019-10-20T08:03:23.934Z · score: 24 (8 votes) · LW · GW · 4 comments
If It’s Worth Doing, It’s Worth Doing With Made-Up Statistics
My earlier post Three Stories for How AGI Comes Before FAI [LW · GW], was an "AI research strategy" post. "AI research strategy" is not the same as "AI strategy research". "AI research strategy" relates to the Hamming question for technical AI alignment researchers: What are the most important questions in technical AI alignment and why aren't you trying to answer them?
Here's a different way of looking at these questions (which hopefully has a different set of flaws).
Suppose we had a matrix where each column corresponds to an AI safety problem and each row corresponds to a proposed safety measure. Each cell contains an estimate of the probability of that safety measure successfully addressing that problem. If we assume measure successes are independent, we could estimate the probability that any given problem gets solved if we build an AGI which applies all known safety measures. If we assume problem successes are independent, we could estimate the probability that all known problems will be solved.
What about unknown problems? Suppose that God has a list of all AI safety problems. Suppose every time an AI alignment researcher goes out to lunch, there's some probability that they'll think of an AI safety problem chosen uniformly at random from God's list. In that case, if we know the number of AI alignment researchers who go out to lunch on any given day, and we also know the date on which any given AI safety problem was first discovered, then this is essentially a card collection process which lets us estimate the length of God's list as an unknown parameter (say, using maximum likelihood on the intervals between discovery times, or you could construct a prior based on the number of distinct safety problems in other engineering fields). In lay terms, if no one has proposed any new AI safety problems recently, it's plausible there aren't that many problems left.
How to estimate the probability that unknown problems will be successfully dealt with? Suppose known problems are representative of the entire distribution of problems. Then we can estimate the probability that an unknown problem will be successfully dealt with as follows: For each known problem, cover up rows corresponding to the safety measures inspired by that problem, then compute its new probability of success. That gives you a distribution of success probabilities for "unknown" problems. Assume success probabilities for actually-unknown problems are sampled from that distribution. Then you could compute the probability of any given actually-unknown probability being solved by computing the expected value of that distribution. Which also lets you compute the probability that your AGI will solve all the unknown problems too (assuming problem probabilities are independent).
What could this be useful for? Doing this kind of analysis could help us know whether it's more valuable to discover more problems or discover more safety measures on the current margin. It could tell us which problems are undersolved and would benefit from more people attacking them. Even without doing the analysis, you can see that trying to solve multiple problems with the same measure could be a good idea, since those measures are more likely to generalize to unknown problems. If overall success odds aren't looking good, we could make our first AI some kind of heavily restricted tool AI which tries to augment the matrix with additional rows and columns. If success odds are looking good, we could compare success odds with background odds of x-risk and try to figure out whether to actually turn this thing on.
Obviously there are many simplifying assumptions being made with this kind of "napkin math". For example, it could be that the implementation of safety measure A interacts negatively with the implementation of safety measure B and reverses its effect. It could be that we aren't sampling from God's list uniformly at random and some problems are harder to think of than others. Whether this project is actually worth doing is an "AI research strategy strategy" question, and thus above my pay grade. If it's possible to generate the matrix automatically using natural language processing on a corpus including e.g. the AI Alignment Forum, I guess that makes the project look more attractive.
Comments sorted by top scores.