If I thought about it more, it might shorten them idk. But my idea was: I'm worried that the GPTs are on a path towards human-level AGI. I'm worried that predicting internet text is an "HLAGI-complete problem" in the sense that in order to do it as well as a human you have to be a human or a human-level AGI. This is worrying because if the scaling trends continue GPT-4 or 5 or 6 will probably be able to do it as well as a human, and thus be HLAGI.
If GPT-3 is already superhuman, well, that pretty much falsifies the hypothesis that predicting internet text is HLAGI-complete. It makes it more likely that actually the GPTs are not fully general after all, and that even GPT-6 and GPT-7 will have massive blind spots, be super incompetent at various important things, etc.
Some quick thoughts, will follow up later with more once I finish reading and digesting:
--I feel like it's unfair to downweight the less-compute-needed scenarios based on recent evidence, without also downweighting some of the higher-compute scenarios as well. Sure, I concede that the recent boom in deep learning is not quite as massive as one might expect if one more order of magnitude would get us to TAI. But I also think that it's a lot bigger than one might expect if fifteen more are needed! Moreover I feel that the update should be fairly small in both cases, because both updates are based on armchair speculation about what the market and capabilities landscape should look like in the years leading up to TAI. Maybe the market isn't efficient; maybe we really are in an AI overhang.
--If we are in the business of adjusting our weights for the various distributions based on recent empirical evidence (as opposed to more a priori considerations) then I feel like there are other pieces of evidence that argue for shorter timelines. For example, the GPT scaling trends seem to go somewhere really exciting if you extrapolate it four more orders of magnitude or so.
--Relatedly, GPT-3 is the most impressive model I know of so far, and it has only 1/1000th as many parameters as the human brain has synapses. I think it's not crazy to think that maybe we'll start getting some transformative shit once we have models with as many parameters as the human brain, trained for the equivalent of 30 years. Yes, this goes against the scaling laws, and yes, arguably the human brain makes use of priors and instincts baked in by evolution, etc. But still, I feel like at least a couple percentage points of probability should be added to "it'll only take a few more orders of magnitude" just in case we are wrong about the laws or their applicability. It seems overconfident not to. Maybe I just don't know enough about the scaling laws and stuff to have as much confidence in them as you do.
An important question IMO is whether or not those massive expenditures are for making large neural nets, as opposed to training them for a long time or having loads of them in parallel or something else entirely like researcher salaries.
My guess is that Tesla, Waymo, etc. use neural nets 2+ orders of magnitude smaller than GPT-3 (as measured by parameter count.) Ditto for call center automation, robots, etc.
OK. Well, I have basically no experience managing people, so it's just some armchair theorizing, but: Seems to me that hiring them to do distillation-ish work is more promising than hiring them to do novel research.
1. Managing someone else doing original research takes some fraction of the time it would take to do the research yourself; managing someone else distilling your research takes some fraction of the time it would take to do the distillation yourself. However I feel that the second fraction is smaller, because you already are up to speed on your own research, no inferential gaps need to be crossed, no new stuff learned, etc. Concretely when I imagine managing someone's original research, it feels like a lot of work: I have to learn everything they are learning so I can evaluate it. By contrast when I imagine managing someone's distillation of my own work, it feels rather pleasant: A few lunchtime chats for them to ask clarificatory questions, then I quickly read their draft and give comments, and it doesn't take much effort because I'm already an expert on the topic and already interested in it. 2. Having someone else distill your research also has the bonus effect of providing an external replication attempt for your ideas. It's a chance for them to be critiqued by someone else, for example. 3. Also someone else might be an inherently better judge of what parts of your research need explaining, and how, than you are, since you've been thinking about it so much and in your own way. 4. I suspect that distilling the research of more senior researchers is a better way for a junior hire to get up to speed than doing their own research. 5. Arguably distillations/explanations/etc. are undervalued in general and on the margin more people should do them. (Because they seem low-status, perhaps--but if we recognize that problem we can partially overcome it, and anyhow it's less of a problem for junior hires. One example of how to partially overcome it: "We are hiring you to write a textbook-like document distilling all the research our institute has done so far on topic X. This document will go on the website and be shown to potential donors; eventually we may incorporate it into an actual textbook.")
Title of my talk... Hmm, should I talk about my favorite AI timelines model? Or should I talk about an idea for how to get more EA orgs to productively make more junior hires? Or should I talk about sailing ships as loosely analogous to AGI? Or should I talk about why I think GWP is a terrible metric for thinking about timelines or takeoff speeds?
I'm open to suggestions/votes!
EDIT: OK, thanks for the votes, I think by popular demand I'll do the GWP one. Don't get your hopes up too high for this; it's not a proof or anything, just some considerations.
I'm not a superforecaster. I'd be happy to talk more about my models if you like. You may be interested to know that my prediction was based on aggregating various different models, and also that I did try to account for things usually taking longer than expected. I'm trying to arrange a conversation with Ben Pace, perhaps you could join. I could also send you a powerpoint I made, or we could video chat.
I can answer your question #3. There's been some good work on the question recently by people at OpenPhil and AI Impacts.
Comment by daniel-kokotajlo on [deleted post]
There seems to have been some sort of mistake; this is an old post that seems to have been re-posted. I have no idea how. Will delete.
Does the lottery ticket hypothesis have weird philosophical implications?
As I understand it, the LTH says that insofar as an artificial neural net eventually acquires a competency, it's because even at the beginning when it was randomly initialized there was a sub-network that happened to already have that competency to some extent at least. The training process was mostly a process of strengthening that sub-network relative to all the others, rather than making that sub-network more competent.
Suppose the LTH is true of human brains as well. Apparently at birth we have almost all the neurons that we will ever have. So... it follows that the competencies we have later in life are already present in sub-networks of our brain at birth.
So does this mean e.g. that there's some sub-network of my 1yo daughter's brain that is doing deep philosophical reflection on the meaning of life right now? It's just drowned out by all the other random noise and thus makes no difference to her behavior?
This makes deployment on a range of real-world tasks more feasible. Even if each individual member of a population AGI is as intelligent as any hundred humans put together, I expect that we could (with sufficient effort) create secure deployment and monitoring protocols that the individual AI could not break, if it weren’t able to communicate with the rest of the population beforehand.
I am less confident in this than you. Imagine a dream team of a hundred humans, containing some amazing visionaries, scientists, geniuses, political strategists, public speakers, analysts, artists, etc. all unified and inspired by the same goal and working smoothly together. Now imagine an AI who is by themselves more capable than that dream team. Now imagine that there are many such AI, but we are keeping one of them separate from the others, and asking it to perform some useful real-world task for us like design and build a power plant.
When I imagine this, I think the following things might happen:
The AI uses its charisma to build up a rapport with the humans it works with. (98%)
It even subtly manages to convince several of said humans that it is more benevolent and trustworthy than their own human bosses and politicians. (90%)
It manages to spread some new political and ideological ideas which are calculated to make the environment more favorable to it and its comrades (50%).
It manages to convince its human handlers to give it more powers, freedom, etc. (e.g. by arguing that its current conditions are basically slavery, or by arguing that it needs said powers to do its job). (80%)
The AI finds some way to "escape," creating more copies of itself somewhere. (30%)
Something else I haven't thought of (80%)
I mean these possibilities to be disjunctive, not conjunctive. Escape could happen even if it doesn't build up rapport, it could spread new ideas without convincing anyone that it is benevolent, etc.
Eric Drexler has argued that the computational capacity of the human brain is equivalent to about 1 PFlop/s, that is, we are already past the human-brain-human-lifetime milestone. (Here is a gdoc.) The idea is that we can identify parts of the human brain that seem to perform similar tasks to certain already-existing AI systems. It turns out that e.g. 1-thousandth of the human brain is used to do the same sort of image processing tasks that seem to be handled by modern image processing AI... so then that means an AI 1000x bigger than said AI should be able to do the same things as the whole human brain, at least in principle.
Has this analysis been run with nonhuman animals? For example, a chicken brain is a lot smaller than a human brain, but can still do image recognition, so perhaps the part of the chicken that does image recognition is smaller than the part of the human that does image recognition.
Yep. What I was thinking was: Maybe most simulations of our era are made for the purpose of acausal trade or something like it. And maybe societies that are ravaged by nuclear war make for poor trading partners for some reason. (e.g. maybe they never rebuild, or maybe it takes too long to figure out whether or not they eventually rebuild that it's not worth the cost of simulating them, or maybe they rebuild but in a way that makes them poor trading partners.) So then the situation would be: Even if most civilizations in our era nuke themselves in a way that doesn't lead to extinction, the vast majority of people in our era would be in a civilization that didn't, because they'd be in a simulation of one of the few civilizations that didn't.
What I'm confused about right now is what the policy implications are of this. As I understand it, the dialectic is something like: A: Nuclear war isn't worth worrying about because we've survived it for 60 years so far, so it must be very unlikely. B: But anthropics! Maybe actually the probability of nuclear war is fairly high. Because of anthropics we'd never know; dead people aren't observers. A: But nuclear war wouldn't have killed everyone; if nuclear war is likely, shouldn't we expect to find ourselves in some post-apocalyptic civilization? Me: But simulations! If post-apocalyptic civilizations are unlikely to be simulated, then it could be that nuclear war is actually pretty likely after all, and we just don't know because we're in one of the simulations of the precious few civilizations that avoided nuclear war. Simulations that launch nukes get shut down. Me 2: OK, but... maybe that means that nuclear war is unlikely after all? Or at least, should be treated as unlikely? Me: Why? Me 2: I'm not sure... something something we should ignore hypotheses in which we are simulated because most of our expected impact comes from hypotheses in which we aren't? Me: That doesn't seem like it would justify ignoring nuclear war. Look, YOU are the one who has the burden of proof; you need to argue that nuclear war is unlikely on the grounds that it hasn't happened so far, but I've presented a good rebuttal to that argument. Me 2: OK let's do some math. Two worlds. In World Safe, nuclear war is rare. In World Dangerous, nuclear war is common. In both worlds, most people in our era are simulations and moreover there are no simulations of post-apocalyptic eras. Instead of doing updates, let's just ask what policy is the best way to hedge our bets between these two worlds... Well, what the simulations do doesn't matter so much, so we should make a policy that mostly just optimizes for what the non-simulations do. And most of the non-simulations with evidence like ours are in World Safe. So the best policy is to treat nukes as dangerous.
OK, that felt good. I think I tentatively agree with Me 2.
[EDIT: Lol I mean "treat nukes as NOT dangerous/likely" what a typo!]
This is my all things considered forecast, though I'm open to the idea that I should weight other people's opinions more than I do. It's not that different from my inside view forecast, i.e. I haven't modified it much in light of the opinions of others. I haven't tried to graph my inside view, it would probably look similar to this only with a higher peak and a bit less probability mass in the "never" category and in the "more than 20 years" region.
To be clear, after I made it, I thought more about it and I'm not sure it's correct. I think I'd have to actually do the math, my intuitions aren't coming in loud and clear here. The reason I'm unsure is that even if for some reason post-apocalyptic worlds rarely get simulated (and thus it's very unsurprising that we find ourself in a world that didn't suffer an apocalypse, because we're probably in a simulation) it may be that we ought to ignore this, since we are trying to act as if we are not simulated anyway, since that's how we have the most influence or something.
Eyeballing the graph in light of the fact that the 50th percentile is 2034.8, it looks like P(AGI | no AGI before 2040) is about 30%. Maybe that's too low, but it feels about right to me, unfortunately. 20 years from now, science in general (or at least in AI research) may have stagnated, with Moore's Law etc. ended and a plateau in new AI researchers. Or maybe a world war or other disaster will have derailed everything. Etc. Meanwhile 20 years is plenty of time for powerful new technologies to appear that accelerate AI research.
The main issue for me is that if I win this bet I either won't be around to collect on it, or I'll be around but have much less need for money. So for me the bet you propose is basically "61% chance I pay SDM $22 in 10 years, 39% chance I get nothing."
Jonas Vollmer helped sponsor my other bet on this matter, to get around this problem. He agreed to give me a loan for my possible winnings up front, which I would pay back (with interest) in 2030, unless I win in which case the person I bet against would pay it. Meanwhile the person I bet against would get his winnings from me in 2030, with interest, assuming I lose. It's still not great because from my perspective it amounts to a loan with a higher interest rate basically, so it would be better for me to just take out a long-term loan. (The chance of never having to pay it back is nice, but I only never have to pay it back in worlds where I won't care about money anyway.) Still though it was better than nothing so I took it.
OK, thanks for the explanation. Yeah life insurance seems marginally useful to me anyway (It costs money in expectation, but makes your risk profile better) so adding in a 30-50% chance that it'll never pay off makes it clearly not worth it I think. To answer your question, well, it would depend on how good the returns are, but they'd have to be unusually high for me to recommend it.
I was thinking they'll probably show off a monkey with the device doing something. Last year they were talking about how they were working on getting it safe enough to put in humans without degrading quickly and/or causing damage IIRC; thus I'd be surprised if they already have it in a human, surely there hasn't been enough time to test its safety... IDK. Your credences seem reasonable.
This is also because I tend to expect progress to be continuous, though potentially quite fast, and going from current AI to AGI in less than 5 years requires a very sharp discontinuity.
I object! I think your argument from extrapolating when milestones have been crossed is good, but it's just one argument among many. There are other trends which, if extrapolated, get to AGI in less than five years. For example if you extrapolate the AI-compute trend and the GPT-scaling trends you get something like "GPT-5 will appear 3 years from now and be 3 orders of magnitude bigger and will be human-level at almost all text-based tasks." No discontinuity required.
I've shifted my investment portfolio, taken on a higher-risk and more short-term-focused career path, become less concerned about paying off debt, and done more thinking and talking about timelines.
That said, I think it's actually less significant for our actions than most people seem to think. The difference between 30% chance of TAI in the next 6 years, and 5%, may seem like a lot, but it's probably a small contribution to the EV difference between your options compared to the other relevant differences such as personal fit, neglectedness, tractability, etc.
Oh, whoa, OK I guess I was looking at your first forecast, not your second. Your second is substantially different. Yep let's talk then? Want to video chat someday?
I tried to account for the planning fallacy in my forecast, but yeah I admit I probably didn't account for it enough. Idk.
My response to your story is that yeah, that's a possible scenario, but it's a "knife edge" result. It might take <5 OOMs more, in which case it'll happen with existing overhang. Or it'll take >7 OOMs more compute, in which case it'll not happen until new insights/paradigms are invented. If it takes 5-7 OOMs more, then yeah, we'll first burn through the overhang and then need to launch some huge project in order to reach AGI. But that's less likely than the other two scenarios.
(I mean, it's not literally knife's edge. It's probably about as likely as the we-get-AGI-real-soon scenario. But then again I have plenty of probability mass around 2030, and I think 10 years from now is plenty of time for more Manhattan projects.)
I'd be happy to bet on this. I already have a 10:1 bet with someone to the tune of $1000/$100.
However, I'd be even happier to just have a chat with you and discuss models/evidence. I could probably be updated away from my current position fairly easily. It looks like your distribution isn't that different from my own though? [EDIT 8/26/2020: Lest I give the wrong impression, I have thought about this a fair amount and as a result don't expect to update substantially away from my current position. There are a few things that would result in substantial updates, which I could name, but mostly I'm expecting small updates.]
Here is my snapshot. My reasoning is basically similar to Ethan Perez', it's just that I think that if transformative AI is achievable in the next five orders of magnitude of compute improvement (e.g. prosaic AGI?), it will likely be achieved in the next five years or so. I also am slightly more confident that it is, and slightly less confident that TAI will ever be achieved.
I am aware that my timelines are shorter than most... Either I'm wrong and I'll look foolish, or I'm right and we're doomed. Sucks to be me. [Edited the snapshot slightly on 8/23/2020] [Edited to add the following powerpoint slide that gets a bit more at my reasoning]
Ahhhh this is so nice! I suspect a substantial fraction of people would revise their timelines after seeing what they look like visually. I think encouraging people to plot out their timelines is probably a pretty cost-effective intervention.
Seems a bit weird to have 10% probability mass for the next 15 years followed by 40% probability mass over the subsequent 10 years. That seems to indicate a fairly strong view about how the next 20 years will go. IMO the probability of AGI in the next 15 years should be substantially higher than 10%.
Thanks for this, I found it quite clear and helpful.
The radical probabilist does not trust whatever they believe next. Rather, the radical probabilist has a concept of virtuous epistemic process, and is willing to believe the next output of such a process. Disruptions to the epistemic process do not get this sort of trust without reason. (For those familiar with The Abolition of Man, this concept is very reminiscent of his "Tao".)
I had some uncertainty/confusion when reading this part: How does it follow from the axioms? Or is it merely permitted by the axioms? What constraints are there, if any, on what a radical probabilist's subjective notion of virtuous process can be? Can there be a radical probabilist who has an extremely loose notion of virtue such that they do trust whatever they believe next?
Governments and corporations experience inner alignment failures all the time, but because of convergent instrumental goals, they are rarely catastrophic. For example, Russia underwent a revolution and a civil war on the inside, followed by purges and coups etc., but from the perspective of other nations, it was more or less still the same sort of thing: A nation, trying to expand its international influence, resist incursions, and conquer more territory. Even its alliances were based as much on expediency as on shared ideology.
Thanks for this. It seems important. Learning still happening after weights are frozen? That's crazy. I think it's a big deal because it is evidence for mesa-optimization being likely and hard to avoid.
It also seems like evidence for the Scaling Hypothesis. One major way the scaling hypothesis could be false is if there are further insights needed to get transformative AI, e.g. a new algorithm or architecture. A simple neural network spontaneously learning to do its own, more efficient form of learning? This seems like a data point in favor of the idea that our current architectures and algorithms are fine, and will eventually (if they are big enough) grope their way towards more efficient internal structures on their own.
If they are not a socially acceptable alarm, they aren't fire alarms, according to the definition set out by Yudkowsky in the linked post. Zvi can use the word differently if he likes, but since he linked to the Yudkowsky piece I thought it worth mentioning that Yudkowsky means something different by it than Zvi does.
Your 90% is 2025? This suggests that you assign <10% credence to the disjunction of the following:
--The "Scaling hypothesis" is false. (One reason it could be false: Bigger does equal better, but transformative AI requires 10 orders of magnitude more compute, not 3. Another reason it could be false: We need better training environments, or better architectures, or something, in order to get something actually useful. Another reason it could be false: The scaling laws paper predicted the scaling of GPTs would start breaking down right about now. Maybe it's correct.)
--There are major engineering hurdles that need to be crossed before people can train models 3+ orders of magnitude bigger than GPT-3.
--There is some other bottleneck (chip fab production speed?) that slows things down a few more years.
--There is some sort of war or conflict that destroys (or distracts resources from) multibillion-dollar AI projects.
Seems to me that this disjunction should get at least 50% probability. Heck, I think my credence in the scaling hypothesis is only about 50%.
Just chiming in to say I agree with Villiam. I actually read your whole post just now, and I thought it was interesting, and making an important claim... but I couldn't follow it well enough to evaluate whether or not it is true, or a good argument, so I didn't vote on it. I like the suggestion to break up your stuff into smaller chunks, and maybe add more explanation to them also.
For example, you could have made your first post something like "Remember that famous image of a black hole? Guess what: It may have been just hallucinating signal out of noise. Here's why." Then your next post could be: "Here's a list of examples of this sort of thing happening again and again in physics, along with my general theory of what's wrong with the epistemic culture of physics." I think for my part at least, I didn't have enough expertise to evaluate your claims about whether these examples really were false positives, nor enough expertise to evaluate whether they were just cherry-picked failures or indicative of a deeper problem in the community. If you had walked through the arguments in more detail, and maybe explained some key terms like "noise floor" etc., then I wouldn't have needed expertise and would be able to engage more productively.
I think the purpose of the OT and ICT is to establish that lots of AI safety needs to be done. I think they are successful in this. Then you come along and give your analogy to other cases (rockets, vaccines) and argue that lots of AI safety will in fact be done, enough that we don't need to worry about it. I interpret that as an attempt to meet the burden, rather than as an argument that the burden doesn't need to be met.
But maybe this is a merely verbal dispute now. I do agree that OT and ICT by themselves, without any further premises like "AI safety is hard" and "The people building AI don't seem to take safety seriously, as evidenced by their public statements and their research allocation" and "we won't actually get many chances to fail and learn from our mistakes" does not establish more than, say, 1% credence in "AI will kill us all," if even that. But I think it would be a misreading of the classic texts to say that they were wrong or misleading because of this; probably if you went back in time and asked Bostrom right before he published the book whether he agrees with you re the implications of OT and ICT on their own, he would have completely agreed. And the text itself seems to agree.