How do scaling laws work for fine-tuning? 2021-04-04T12:18:34.559Z
Fun with +12 OOMs of Compute 2021-03-01T13:30:13.603Z
Poll: Which variables are most strategically relevant? 2021-01-22T17:17:32.717Z
Birds, Brains, Planes, and AI: Against Appeals to the Complexity/Mysteriousness/Efficiency of the Brain 2021-01-18T12:08:13.418Z
How can I find trustworthy dietary advice? 2021-01-17T13:11:54.158Z
Review of Soft Takeoff Can Still Lead to DSA 2021-01-10T18:10:25.064Z
DALL-E by OpenAI 2021-01-05T20:05:46.718Z
Dario Amodei leaves OpenAI 2020-12-29T19:31:04.161Z
Against GDP as a metric for timelines and takeoff speeds 2020-12-29T17:42:24.788Z
How long till Inverse AlphaFold? 2020-12-17T19:56:14.474Z
Incentivizing forecasting via social media 2020-12-16T12:15:01.446Z
What are the best precedents for industries failing to invest in valuable AI research? 2020-12-14T23:57:08.631Z
What technologies could cause world GDP doubling times to be <8 years? 2020-12-10T15:34:14.214Z
The AI Safety Game (UPDATED) 2020-12-05T10:27:05.778Z
Is this a good way to bet on short timelines? 2020-11-28T12:51:07.516Z
Persuasion Tools: AI takeover without AGI or agency? 2020-11-20T16:54:01.306Z
How Roodman's GWP model translates to TAI timelines 2020-11-16T14:05:45.654Z
How can I bet on short timelines? 2020-11-07T12:44:20.360Z
What considerations influence whether I have more influence over short or long timelines? 2020-11-05T19:56:12.147Z
AI risk hub in Singapore? 2020-10-29T11:45:16.096Z
The date of AI Takeover is not the day the AI takes over 2020-10-22T10:41:09.242Z
If GPT-6 is human-level AGI but costs $200 per page of output, what would happen? 2020-10-09T12:00:36.814Z
Where is human level on text prediction? (GPTs task) 2020-09-20T09:00:28.693Z
Forecasting Thread: AI Timelines 2020-08-22T02:33:09.431Z
What if memes are common in highly capable minds? 2020-07-30T20:45:17.500Z
What a 20-year-lead in military tech might look like 2020-07-29T20:10:09.303Z
Does the lottery ticket hypothesis suggest the scaling hypothesis? 2020-07-28T19:52:51.825Z
Probability that other architectures will scale as well as Transformers? 2020-07-28T19:36:53.590Z
Lessons on AI Takeover from the conquistadors 2020-07-17T22:35:32.265Z
What are the risks of permanent injury from COVID? 2020-07-07T16:30:49.413Z
Relevant pre-AGI possibilities 2020-06-20T10:52:00.257Z
Image GPT 2020-06-18T11:41:21.198Z
List of public predictions of what GPT-X can or can't do? 2020-06-14T14:25:17.839Z
Preparing for "The Talk" with AI projects 2020-06-13T23:01:24.332Z
Reminder: Blog Post Day III today 2020-06-13T10:28:41.605Z
Blog Post Day III 2020-06-01T13:56:10.037Z
Predictions/questions about conquistadors? 2020-05-22T11:43:40.786Z
Better name for "Heavy-tailedness of the world?" 2020-04-17T20:50:06.407Z
Is this viable physics? 2020-04-14T19:29:28.372Z
Blog Post Day II Retrospective 2020-03-31T15:03:21.305Z
Three Kinds of Competitiveness 2020-03-31T01:00:56.196Z
Reminder: Blog Post Day II today! 2020-03-28T11:35:03.774Z
What are the most plausible "AI Safety warning shot" scenarios? 2020-03-26T20:59:58.491Z
Could we use current AI methods to understand dolphins? 2020-03-22T14:45:29.795Z
Blog Post Day II 2020-03-21T16:39:04.280Z
What "Saving throws" does the world have against coronavirus? (And how plausible are they?) 2020-03-04T18:04:18.662Z
Blog Post Day Retrospective 2020-03-01T11:32:00.601Z
Cortés, Pizarro, and Afonso as Precedents for Takeover 2020-03-01T03:49:44.573Z
Reminder: Blog Post Day (Unofficial) 2020-02-29T15:10:17.264Z
Response to Oren Etzioni's "How to know if artificial intelligence is about to destroy civilization" 2020-02-27T18:10:11.129Z


Comment by Daniel Kokotajlo (daniel-kokotajlo) on Vaccine Rollout as Wheeled-Luggage Problem · 2021-05-14T20:36:50.690Z · LW · GW

If production were exponential but administration were exponential-then-linear then there should be massive stockpiles of unused vaccines by now. Are there?

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Understanding the Lottery Ticket Hypothesis · 2021-05-14T12:40:27.063Z · LW · GW

Thanks for this, I found it helpful!

If you are still interested in reading and thinking more about this topic, I would love to hear your thoughts on the papers below, in particular the "multi-prize LTH" one which seems to contradict some of the claims you made above. Also, I'd love to hear whether LTH-ish hypotheses apply to RNN's and more generally the sort of neural networks used to make, say, AlphaStar.

"In this paper, we propose (and prove) a stronger Multi-Prize Lottery Ticket Hypothesis:

A sufficiently over-parameterized neural network with random weights contains several subnetworks (winning tickets) that (a) have comparable accuracy to a dense target network with learned weights (prize 1), (b) do not require any further training to achieve prize 1 (prize 2), and (c) is robust to extreme forms of quantization (i.e., binary weights and/or activation) (prize 3)."

"An even stronger conjecture has been proven recently: Every sufficiently overparameterized network contains a subnetwork that, at random initialization, but without training, achieves comparable accuracy to the trained large network."

The strong {\it lottery ticket hypothesis} (LTH) postulates that one can approximate any target neural network by only pruning the weights of a sufficiently over-parameterized random network. A recent work by Malach et al. \cite{MalachEtAl20} establishes the first theoretical analysis for the strong LTH: one can provably approximate a neural network of width d and depth l, by pruning a random one that is a factor O(d4l2) wider and twice as deep. This polynomial over-parameterization requirement is at odds with recent experimental research that achieves good approximation with networks that are a small factor wider than the target. In this work, we close the gap and offer an exponential improvement to the over-parameterization requirement for the existence of lottery tickets. We show that any target network of width d and depth l can be approximated by pruning a random network that is a factor O(log(dl)) wider and twice as deep.

"Based on these results, we articulate the Elastic Lottery Ticket Hypothesis (E-LTH): by mindfully replicating (or dropping) and re-ordering layers for one network, its corresponding winning ticket could be stretched (or squeezed) into a subnetwork for another deeper (or shallower) network from the same family, whose performance is nearly as competitive as the latter's winning ticket directly found by IMP."

EDIT: Some more from my stash:

Sparse neural networks have generated substantial interest recently because they can be more efficient in learning and inference, without any significant drop in performance. The "lottery ticket hypothesis" has showed the existence of such sparse subnetworks at initialization. Given a fully-connected initialized architecture, our aim is to find such "winning ticket" networks, without any training data. We first show the advantages of forming input-output paths, over pruning individual connections, to avoid bottlenecks in gradient propagation. Then, we show that Paths with Higher Edge-Weights (PHEW) at initialization have higher loss gradient magnitude, resulting in more efficient training. Selecting such paths can be performed without any data.

We study whether a neural network optimizes to the same, linearly connected minimum under different samples of SGD noise (e.g., random data order and augmentation). We find that standard vision models become stable to SGD noise in this way early in training. From then on, the outcome of optimization is determined to a linearly connected region. We use this technique to study iterative magnitude pruning (IMP), the procedure used by work on the lottery ticket hypothesis to identify subnetworks that could have trained in isolation to full accuracy. We find that these subnetworks only reach full accuracy when they are stable to SGD noise, which either occurs at initialization for small-scale settings (MNIST) or early in training for large-scale settings (ResNet-50 and Inception-v3 on ImageNet). “In some situations we show that neural networks learn through a process of “grokking” a pattern in the data, improving generalization performance from random chance level to perfect generalization, and that this improvement in generalization can happen well past the point of overfitting.”

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Understanding the Lottery Ticket Hypothesis · 2021-05-14T12:14:00.657Z · LW · GW

I confess I don't really understand what a tangent space is, even after reading the wiki article on the subject. It sounds like it's something like this: Take a particular neural network. Consider the "space" of possible neural networks that are extremely similar to it, i.e. they have all the same parameters but the weights are slightly different, for some definition of "slightly." That's the tangent space. Is this correct? What am I missing?

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Vaccine Rollout as Wheeled-Luggage Problem · 2021-05-14T07:42:39.575Z · LW · GW

I didn't mean to suggest they were manufacturing at cost; quite the opposite! I was saying they weren't going as fast as they possibly could, e.g. as fast as they would go if they were being paid $10K per vaccine -50% for each month of delay.

Thanks for the data point about doses manufactured so far; that does indeed look like they are ramping up production, though idk if it's exponential, I'd want to see a graph. This is good evidence against my theory.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Vaccine Rollout as Wheeled-Luggage Problem · 2021-05-13T23:08:04.812Z · LW · GW

Vaccine production is, as far as I can tell, not exponentially increasing over time. I don't have data on the world as a whole but in the UK at least vaccinations-per-day isn't even increasing, it's holding steady!

This is not what you would expect to see if vaccine production was a priority, if it was being increased as fast and as much as possible, cost be damned. If that's what was happening we should see exponential growth in vaccine production.

The pattern of production we see (steady or slowly increasing) is what we would expect to see if cost was a major constraint for vaccine production, and in particular, if vaccine producers have high fixed costs (the factories, equipment, etc) that they are trying to amortize over many months of production. Better to run one factory for ten months than ten factories for one month, since the latter plan costs almost 10x as much.

I've heard that vaccine producers can't charge high prices and instead have to sell at low prices negotiated by governments that ban them from selling to anyone else. If that's true, then that also fits nicely with this theory of what's going on.

So, is this not what's going on? I'm curious to hear counterarguments.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Challenge: know everything that the best go bot knows about go · 2021-05-12T09:31:46.057Z · LW · GW
When choosing between two moves that are both judged to win the game with 0.9999999 alpha go not choosing the move that maximizes points suggest that it does not use patterns about what optimal moves are in certain local situations to make it's judgements. 

I nitpick/object to your use of "optimal moves" here. The move that maximizes points is NOT the optimal move; the optimal move is the move that maximizes win probability. In a situation where you are many points ahead, plausibly the way to maximize win probability is not to try to get more points, but rather to try to anticipate and defend against weird crazy high-variance strategies your opponent might try.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Credibility of the CDC on SARS-CoV-2 · 2021-05-10T14:16:18.293Z · LW · GW

It would be interesting to see this post updated, e.g. to describe the situation today or (even better) how it evolved over the course of 2020-2021.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on NTK/GP Models of Neural Nets Can't Learn Features · 2021-05-10T10:52:16.723Z · LW · GW

I think I get this distinction; I realize the NN papers show the latter; I guess our disagreement is about how big a deal / how surprising this is.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Pre-Training + Fine-Tuning Favors Deception · 2021-05-08T21:42:01.626Z · LW · GW

Nice post! You may be interested in this related post and discussion.

I think you may have forgotten to put a link in "See Mesa-Search vs Mesa-Control for discussion."

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Open and Welcome Thread - May 2021 · 2021-05-07T12:26:21.750Z · LW · GW


Comment by Daniel Kokotajlo (daniel-kokotajlo) on NTK/GP Models of Neural Nets Can't Learn Features · 2021-05-07T11:10:16.381Z · LW · GW

Ah, OK. Interesting, thanks. Would you agree with the following view:

"The NTK/GP stuff has neural nets implementing a "psuedosimplicity prior" which is maybe also a simplicity prior but might not be, the evidence is unclear. A psuedosimplicity prior is like a simplicity prior except that there are some important classes of kolmogorov-simple functions that don't get high prior / high measure."

Which would you say is more likely: The NTK/GP stuff is indeed not universally data efficient, and thus modern neural nets aren't either, or (b) NTK/GP stuff is indeed not universally data efficient, and thus modern neural nets aren't well-characterized by the NTK/GP stuff.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on NTK/GP Models of Neural Nets Can't Learn Features · 2021-05-07T10:55:54.640Z · LW · GW
Feature learning requires the intermediate neurons to adapt to structures in the data that are relevant to the task being learned, but in the NTK limit the intermediate neurons' functions don't change at all.
Any meaningful function like a 'car detector' would need to be there at initialization -- extremely unlikely for functions of any complexity.

I used to think it would be extremely unlikely for a randomly initialized neural net to contain a subnetwork that performs just as well as the entire neural net does after training. But the multi-prize lottery ticket results seem to show just that. So now I don't know what to think when it comes to what sorts of things are likely or unlikely when it comes to this stuff. In particular, is it really so unlikely that 'car detector' functions really do exist somewhere in the random jumble of a sufficiently big randomly initialized NN? Or maybe they don't exist right away, but with very slight tweaks they do?

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Why Neural Networks Generalise, and Why They Are (Kind of) Bayesian · 2021-05-07T09:51:13.382Z · LW · GW

Sorry I didn't notice this earlier! What do you think about the argument that Joar gave?

If a function is small-volume, it's complex, because it takes a lot of parameters to specify.

If a function is large-volume, it's simple, because it can be compressed a lot since most parameters are redundant.

It sounds like you are saying: Some small-volume functions are actually simple, or at least this might be the case for all we know, because maybe it's just really hard for neural networks to efficiently represent that function. This is especially clear when we think of simplicity in the minimum description length / Kolmogorov sense; the "+BusyBeaver(9)" function can be written in a few lines of code but would require a neural net larger than the universe to implement. Am I interpreting you correctly? Do you think there are other important senses of simplicity besides that one?

Comment by Daniel Kokotajlo (daniel-kokotajlo) on What are your favorite examples of adults in and around this community publicly changing their minds? · 2021-05-06T08:06:25.045Z · LW · GW

The funnest one off the top of my head is how Yudkowsky used to think that the best thing for altruists to do was build AGI as soon as possible, because that's the quickest way to solve poverty, disease, etc. and achieve a glorious transhuman future. Then he thought more (and talked to Bostrom, I was told) and realized that that's pretty much the exact opposite of what we should be doing. When MIRI was founded its mission was to build AGI as soon as possible.

(Disclaimer: This is the story as I remember it being told, it's entirely possible I'm wrong)

Comment by Daniel Kokotajlo (daniel-kokotajlo) on AMA: Paul Christiano, alignment researcher · 2021-05-05T10:43:18.481Z · LW · GW

My counterfactual attempts to get at the question "Holding ideas constant, how much would we need to increase compute until we'd have enough to build TAI/AGI/etc. in a few years?" This is (I think) what Ajeya is talking about with her timelines framework. Her median is +12 OOMs. I think +12 OOMs is much more than 50% likely to be enough; I think it's more like 80% and that's after having talked to a bunch of skeptics, attempted to account for unknown unknowns, etc. She mentioned to me that 80% seems plausible to her too but that she's trying to adjust downwards to account for biases, unknown unknowns, etc.

Given that, am I right in thinking that your answer is really close to 90%, since failure-to-achieve-TAI/AGI/etc-due-to-being-unable-to-adapt-quickly-to-magically-increased-compute "shouldn't count" for purposes of this thought experiment?

Comment by Daniel Kokotajlo (daniel-kokotajlo) on AMA: Paul Christiano, alignment researcher · 2021-05-05T10:36:58.290Z · LW · GW

Hmm, I don't count "It may work but we'll do something smarter instead" as "it won't work" for my purposes.

I totally agree that noise will start to dominate eventually... but the thing I'm especially interested in with Amp(GPT-7) is not the "7" part but the "Amp" part. Using prompt programming, fine-tuning on its own library, fine-tuning with RL, making chinese-room-bureaucracies, training/evolving those bureaucracies... what do you think about that? Naively the scaling laws would predict that we'd need far less long-horizon data to train them, since they have far fewer parameters, right? Moreover IMO evolved-chinese-room-bureaucracy is a pretty good model for how humans work, and in particular for how humans are able to generalize super well and make long-term plans etc. without many lifetimes of long-horizon training.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on AMA: Paul Christiano, alignment researcher · 2021-05-05T10:33:57.858Z · LW · GW

When you say hardware progress, do you just mean compute getting cheaper or do you include people spending more on compute? So you are saying, you guess that if we had 10 OOMs of compute today that would have a 50% chance of leading to human-level AI without any further software progress, but realistically you expect that what'll happen is we get +5 OOMs from increased spending and cheaper hardware, and then +5 "virtual OOMs" from better software?

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Draft report on existential risk from power-seeking AI · 2021-05-05T09:07:57.594Z · LW · GW

Thanks for the thoughtful reply. Here are my answers to your questions:

Here is what you say in support of your probability judgment of 10% on "Conditional it being both possible and strongly incentivized to build APS systems, APS systems will end up disempowering approximately all of humanity."

Beyond this, though, I’m also unsure about the relative difficulty of creating practically PS-aligned systems, vs. creating systems that would be practically PS-misaligned, if deployed, but which are still superficially attractive to deploy. One commonly cited route to this is via a system actively pretending to be more aligned than it is. This seems possible, and predictable in some cases; but it’s also a fairly specific behavior, limited to systems with a particular pattern of incentives (for example, they need to be sufficiently non-myopic to care about getting deployed, there need to be sufficient benefits to deployment, and so on), and whose deception goes undetected. It’s not clear to me how common to expect this to be, especially given that we’ll likely be on the lookout for it.
More generally, I expect decision-makers to face various incentives (economic/social backlash, regulation, liability, the threat of personal harm, and so forth) that reduce the attraction of deploying systems whose practical PS-alignment remains significantly uncertain. And absent active/successful deception, I expect default forms of testing to reveal many PS-alignment problems ahead of time.
The 35% on this premise being false comes centrally from the fact that (a) I expect us to have seen a good number of warning shots before we reach really high-impact practical PS-alignment failures, so this premise requires that we haven’t responded to those adequately, (b) the time-horizons and capabilities of the relevant practically PS-misaligned systems might be limited in various ways, thereby reducing potential damage, and (c) practical PS-alignment failures on the scale of trillions of dollars (in combination) are major mistakes, which relevant actors will have strong incentives, other things equal, to avoid/prevent (from market pressure, regulation, self-interested and altruistic concern, and so forth).
I’m going to say: 40%. There’s a very big difference between >$1 trillion dollars of damage (~6 Hurricane Katrinas), and the complete disempowerment of humanity; and especially in slower take-off scenarios, I don’t think it at all a foregone conclusion that misaligned power-seeking that causes the former will scale to the latter.

As I read it, your analysis is something like: Probably these systems won't be actively trying to deceive us. Even if they are, we'll probably notice it and stop it since we'll be on the lookout for it. Systems that may not be aligned probably won't be deployed, because people will be afraid of dangers, thanks to warning shots. Even if they are deployed, the damage will probably be limited, since probably even unaligned systems won't be willing and able to completely disempower humanity.

My response is: This just does not seem plausible conditional on it all happening by 2035. I think I'll concede that the issue of whether they'll be trying to deceive us is independent of whether timelines are short or long. However, in short-timelines scenarios there will be fewer (I would argue zero) warning shots, and less time for AI risk to be taken seriously by all the prestigious people. Moreover, takeoff is likely to be fast, with less time for policymakers and whatnot to react and less time for overseers to study and analyze their AIs. I think I'll also concede that timelines is not correlated with willingness to disempower humanity, but it's correlated with ability, due to takeoff speed considerations -- if timelines are short, then when we get crazy AI we'll be able to get crazier AI quickly by scaling up a bit more, and also separately it probably takes less time to "cross the human range." Moreover, if timelines are short then we should expect prestigious people, institutions, etc. to be as collectively incompetent as they are today--consider how COVID was handled and is still being handled. Even if we get warning shots, I don't expect the reactions to them to help much, instead simply patch over problems and maybe delay doom for a bit. AI risk stuff will become a polarized partisan political issue with lots of talking heads yelling at each other and lots of misguided people trying to influence the powers that be to do this or that. In that environment finding the truth will be difficult, and so will finding and implementing the correct AI-risk-reducing policies.

My nuclear winter argument was, at a high level, something like: Your argument for 10% is pretty general, and could be used to argue for <10% risk for a lot of things, e.g. nuclear war. Yet empirically the risk for those things is higher than that.

Your argument as applied to nuclear war would be something like: Probably nations won't build enough nuclear weapons to cause nuclear winter. Even if they do, they wouldn't set up systems with a risk of accident, since there would be warning shots and people would be afraid of the dangers. Even if there is a failure and a nuke is set off, it probably wouldn't lead to nuclear winter since decision-makers would deescalate rather than escalate.

I would say: The probability of nuclear winter this century was higher than 10%, and moreover, nuclear winter is a significantly easier-to-avoid problem than APR-AI risk IMO, because psychologically and culturally it's a lot easier to convince people that nukes are dangerous and that they shouldn't be launched and that there should be lots of redundant safeguards on them than that [insert newest version of incredibly popular and profitable AI system here] is dangerous and shouldn't be deployed or even built in the first place. Moreover it's a lot easier, technically, to put redundant safeguards on nuclear weapons than to solve the alignment problem!

Nuclear winter was just the first thing that came to mind, but my argument would probably be a lot stronger if I chose other examples. The general idea is that on my reading of history, preventing APR-AI risk is just a lot harder, a lot less likely to succeed, than preventing various other kinds of risk, some of which in fact happened or very nearly happened.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on [AN #139]: How the simplicity of reality explains the success of neural nets · 2021-05-05T06:20:12.323Z · LW · GW

OK, thanks!

Comment by Daniel Kokotajlo (daniel-kokotajlo) on [AN #139]: How the simplicity of reality explains the success of neural nets · 2021-05-05T05:02:48.220Z · LW · GW
I agree with Zach above about the main point of the paper. One other thing I’d note is that SGD can’t have literally the same outcomes as random sampling, since random sampling wouldn’t display phenomena like double descent (AN #77).

Would you mind explaining why this is? It seems to me like random sampling would display double descent. For example, as you increase model size, at first you get more and more parameters that let you approximate the data better... but then you get too many parameters and just start memorizing the data... but then when you get even more parameters, you have enough functions available that simpler ones win out... Doesn't this story work just as well for random sampling as it does for SGD?

Comment by Daniel Kokotajlo (daniel-kokotajlo) on NTK/GP Models of Neural Nets Can't Learn Features · 2021-05-05T04:38:29.918Z · LW · GW
I'll confess that I would personally find it kind of disappointing if neural nets were mostly just an efficient way to implement some fixed kernels, when it seems possible that they could be doing something much more interesting -- perhaps even implementing something like a simplicity prior over a large class of functions, which I'm pretty sure NTK/GP can't be

Wait, why can't NTK/GP be implementing a simplicity prior over a large class of functions? They totally are, it's just that the prior comes from the measure in random initialization space, rather than from the gradient update process. As explained here. Right? No?

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Parsing Abram on Gradations of Inner Alignment Obstacles · 2021-05-05T04:27:01.324Z · LW · GW

Well, it seems to be saying that the training process basically just throws away all the tickets that score less than perfectly, and randomly selects one of the rest. This means that tickets which are deceptive agents and whatnot are in there from the beginning, and if they score well, then they have as much chance of being selected at the end as anything else that scores well. And since we should expect deceptive agents that score well to outnumber aligned agents that score well... we should expect deception.

I'm working on a much more fleshed out and expanded version of this argument right now.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Why Neural Networks Generalise, and Why They Are (Kind of) Bayesian · 2021-05-04T21:18:20.659Z · LW · GW

Pinging you to see what your current thoughts are! I think that if "SGD is basically equivalent to random search" then that has huge, huge implications.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Parsing Abram on Gradations of Inner Alignment Obstacles · 2021-05-04T19:57:17.138Z · LW · GW

I think Abram's concern about the lottery ticket hypothesis wasn't about the "vanilla" LTH that you discuss, but rather the scarier "tangent space hypothesis." See this comment thread.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Why I Work on Ads · 2021-05-04T19:17:16.112Z · LW · GW

I think universal paywalls would be much better. Consider how video games typically work: You pay for the game, then you can play it as much as you like. Video games sometimes try to sell you things (e.g. political ideologies, products) but there is vastly less of that then e.g. youtube or facebook, what with all the ads, propaganda, promoted content, etc. Imagine if instead all video games were free, but to make money the video game companies accepted bribes to fill their games with product placement and propaganda. I would not prefer that world, even though it would be less regressive in that poor people could play just as many games as rich people.

And that's not even including the benefits of privacy / not-Big-Data. In a different world than ours, Big Data would be used mostly for scientific research that benefits everyone. Not in our world. In our world it's mostly used to control populations, for surveillance and propaganda, and to sell stuff to people. (I agree that the "sell stuff to people" thing is partially good, but it's partially bad too, and it certainly isn't good enough to outweigh the surveillance and propaganda effects IMO).

If the internet used a universal paywalls model, it would be a lot easier for people to be private, I think. I'm not sure.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on The AI Timelines Scam · 2021-05-04T08:33:43.235Z · LW · GW

Is it really true that most people sympathetic to short timelines are thus mainly due to social proof cascade? I don't know any such person myself; the short-timelines people I know are either people who have thought about it a ton and developed detailed models, or people who just got super excited about GPT-3 and recent AI progress basically. The people who like to defer to others pretty much all have medium or long timelines, in my opinion, because that's the respectable/normal thing to think.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Open and Welcome Thread - May 2021 · 2021-05-03T20:02:41.347Z · LW · GW

Welcome! I recognize your username, we must have crossed paths before. Maybe something to do with SpaceX?

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Open and Welcome Thread - May 2021 · 2021-05-03T20:00:57.207Z · LW · GW

My guess is: Regulation. It would be illegal to build and rent out nano-apartments. (Evidence: In many places in the USA, it's illegal for more than X people not from the same family to live together, for X = 4 or something ridiculously small like that.)

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Open and Welcome Thread - May 2021 · 2021-05-03T19:58:59.305Z · LW · GW

Welcome! It's people like you (and perhaps literally you) on which the future of the world depends. :)

Wait... you started using the internet in 2006? Like, when you were 5???

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Naturalism and AI alignment · 2021-05-03T11:20:03.849Z · LW · GW

I'd be interested to see naturalism spelled out more and defended against the alternative view that (I think) prevails in this community. That alternative view is something like: "Look, different agents have different goals/values. I have mine and will pursue mine, and you have yours and pursue yours. Also, there are rules and norms that we come up with to help each other get along, analogous to laws and rules of etiquette. Also, there are game-theoretic principles like fairness, retribution, and bullying-resistance that are basically just good general strategies for agents in multi-agent worlds. Finally, there may be golden rules written in fire etched into the fabric of reality, or divine commands about what everyone should do, but there probably aren't and if there were they wouldn't matter. What we call 'morality' is an undefined, underdetermined, probably-equivocal-probably-ambiguous label for some combination of these things; probably different people mean different things by morality. Anyhow, this is why we talk about 'the alignment problem' rather than the 'making AIs moral problem,' because we can avoid all this confusion about what morality means and just talk about what really matters, which is making AI have the same goals/values as us."

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Naturalism and AI alignment · 2021-05-03T11:12:04.714Z · LW · GW
From another point of view: some philosophers are convinced that caring about conscious experiences is the rational thing to do. If it's possible to write an algorithm that works in a similar way to how their mind works, we already have an (imperfect, biased, etc.) agent that is somewhat aligned, and is likely to stay aligned after further reflection.

I think this is an interesting point -- but I don't conclude optimism from it as you do. Humans engage in explicit reasoning about what they should do, and they theorize and systematize, and some of them really enjoy doing this and become philosophers so they can do it a lot, and some of them conclude things like "The thing to do is maximize total happiness" or "You can do whatever you want, subject to the constraint that you obey the categorical imperative" or as you say "everyone should care about conscious experiences."

The problem is that every single one of those theories developed so far has either been (1) catastrophically wrong, (2) too vague, or (3) relative to the speaker's intuitions somehow (e.g. intuitionism).

By "catastrophically wrong" I mean that if an AI with control of the whole world actually followed through on the theory, they would kill everyone or do something similarly bad. (See e.g. classical utilitarianism as the classic example of this).

Basically... I think you are totally right that some of our early AI systems will do philosophy and come to all sorts of interesting conclusions, but I don't expect them to be the correct conclusions. (My metaethical views may be lurking in the background here, driving my intuitions about this... see Eliezer's comment)

Do you have an account of how philosophical reasoning in general, or about morality in particular, is truth-tracking? Can we ensure that the AIs we build reason in a truth-tracking way? If truth isn't the right concept for thinking about morality, and instead we need to think about e.g. "human values" or "my values," then this is basically a version of the alignment problem.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Your Dog is Even Smarter Than You Think · 2021-05-01T19:03:19.398Z · LW · GW

Thanks for this! This definitely does intersect with my interests; it's relevant to artificial intelligence and to ethics. It does mostly just confirm what I already thought though, so my reaction is mostly just to pay attention to this sort of thing going forward.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on AMA: Paul Christiano, alignment researcher · 2021-05-01T07:50:42.045Z · LW · GW

I'm very glad to hear that! Can you say more about why?

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Daniel Kokotajlo's Shortform · 2021-04-30T10:56:42.024Z · LW · GW

Probably, when we reach an AI-induced point of no return, AI systems will still be "brittle" and "narrow" in the sense used in arguments against short timelines.

Argument: Consider AI Impacts' excellent point that "human-level" is superhuman (bottom of this page)

The point of no return, if caused by AI, could come in a variety of ways that don't involve human-level AI in this sense. See this post for more. The general idea is that being superhuman at some skills can compensate for being subhuman at others. We should expect the point of no return to be reached at a time when even the most powerful AIs have weak points, brittleness, narrowness, etc. -- that is, even they have various things that they totally fail at, compared to humans. (Note that the situation is symmetric; humans totally fail at various things compared to AIs even today)

I was inspired to make this argument by reading this blast from the past which argued that the singularity can't be near because AI is still brittle/narrow. I expect arguments like this to continue being made up until (and beyond) the point of no return, because even if future AI systems are significantly less brittle/narrow than today's, they will still be bad at various things (relative to humans), and so skeptics will still have materials with which to make arguments like this.


Comment by Daniel Kokotajlo (daniel-kokotajlo) on Draft report on existential risk from power-seeking AI · 2021-04-30T10:15:42.244Z · LW · GW

Thanks for this! I like your concept of APS systems; I think I might use that going forward. I think this document works as a good "conservative" (i.e. optimistic) case for worrying about AI risk. As you might expect, I think the real chances of disaster are higher. For more on why I think this, well, there are the sequences of posts I wrote and of course I'd love to chat with you anytime and run some additional arguments by you.

For now I'll just say: 5% total APS risk (seems to me to) fail a sanity check, as follows:

1. There's at least an X% chance of APS systems being made by 2035. (I think X = 60 and I think it's unreasonable to have X<30 (and I'm happy to say more about why) but you'll probably agree X is at least 10, right?)

2. Conditional on that happening, it seems like the probability of existential catastrophe is quite high, like 50% or so. (Conditional on APS happening that soon, takeoff is likely to be relatively fast, and there won't have been much time to do alignment research, and more generally the optimistic slow takeoff picture in which we get lots of nice scary warning shots and society has lots of time to react will just not be true)

3. Therefore the probability of doom-by-APS-by-2035 is at least 0.5X, so at least 5%.

4. Therefore the probability of doom-by-APS-by-2070 must be significantly higher than 5%.

Also: It seems that most of your optimism comes from assigning only 40%*65%*40% ~= 10% chance to the combined claim "Conditional it being both possible and strongly incentivized to build APS systems, APS systems will end up disempowering approximately all of humanity." This to me sounds like you basically have 90% credence that the alignment problem will be solved and implemented successfully in time, in worlds where the problem is real (i.e. APS systems are possible and incentivized). I feel like it's hard for me to be that confident, considering how generally shitty the world is at solving problems even when they are obvious and simple and killing people every day and the solution is known, and considering how this problem is disputed and complex and won't be killing people until it is either already too late or almost and the solution is not known. Perhaps a related argument would be: Couldn't you run your same arguments to conclude that the probability of nuclear war in the past 100 years was about 10%? And don't we have good reason to think that in fact the probability was higher than that and we just got lucky? (See: the history of close calls, plus independently the anthropic shadow stuff)

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Predictive Coding has been Unified with Backpropagation · 2021-04-30T09:00:35.958Z · LW · GW

Thanks for this reply!

--I thought the paper about the methods of neuroscience applied to computers was cute, and valuable, but I don't think it's fair to conclude "methods are not up to the task." But you later said that "It makes a lot of sense to me that the brain does something resembling belief propagation on bayes nets. (I take this to be the core idea of predictive coding.)" so you aren't a radical skeptic about what we can know about the brain so maybe we don't disagree after all.

1 - 3: OK, I think I'll defer to your expertise on these points.

4, 5: Whoa whoa, just because we humans do some non-bayesian stuff and some better-than-backprop stuff doesn't mean that the brain isn't running pure bayes nets or backprop-approximation or whatever at the low level! That extra fancy cool stuff we do could be happening at a higher level of abstraction. Networks in the brain learned via backprop-approximation could themselves be doing the logical induction stuff and the super-efficient-learning stuff. In which case we should expect that big NN's trained via backprop might also stumble across similar networks which would then do similarly cool stuff.

I think my main crux is the question: (for some appropriate architecture, ie, not necessarily transformers) do human-brain-sized networks, with human-like opportunities for transfer learning, achieve human-level data-efficiency?

Indeed. Your crux is my question and my crux is your question. (My crux was: Does the brain, at the low level, use something more or less equivalent to the stuff modern NN's do at a low level? From this I hoped to decide whether human-brain-sized networks could have human-level efficiency)

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Gradations of Inner Alignment Obstacles · 2021-04-29T13:02:25.725Z · LW · GW
Part of my idea for this post was to go over different versions of the lottery ticket hypothesis, as well, and examine which ones imply something like this. However, this post is long enough as it is.

I'd love to see you do this!

Re: The Treacherous Turn argument: What do you think of the following spitball objections:

(a) Maybe the deceptive ticket that makes T' work is indeed there from the beginning, but maybe it's outnumbered by 'benign' tickets, so that the overall behavior of the network is benign. This is an argument against premise 4, the idea being that even though the deceptive ticket scores just as well as the rest, it still loses out because it is outnumbered.

(b) Maybe the deceptive ticket that makes T' work is not deceptive from the beginning, but rather is made so by the training process T'. If instead you just give it T, it does not exhibit malign off-T behavior. (Analogy: Maybe I can take you and brainwash you so that you flip out and murder people when a certain codeword reaches your ear, and moreover otherwise act completely normally so that you'd react exactly the same way to everything in your life so far as you in fact have. If so, then the "ticket" that makes this possible is already present inside you, even now as you read these words! But the 'ticket' is just you. And you won't actually flip out and murder people if the codeword reaches your ear, because you haven't in fact been brainwashed.)

Comment by Daniel Kokotajlo (daniel-kokotajlo) on AMA: Paul Christiano, alignment researcher · 2021-04-29T08:51:54.063Z · LW · GW

In this post I argued that an AI-induced point of no return would probably happen before world GDP starts to noticeably accelerate. You gave me some good pushback about the historical precedent I cited, but what is your overall view? If you can spare the time, what is your credence in each of the following PONR-before-GDP-acceleration scenarios, and why?

1. Fast takeoff

2. The sorts of skills needed to succeed in politics or war are easier to develop in AI than the sorts needed to accelerate the entire world economy, and/or have less deployment lag. (Maybe it takes years to build the relevant products and industries to accelerate the economy, but only months to wage a successful propaganda campaign to get people to stop listening to the AI safety community)

3. We get an "expensive AI takeoff" in which AI capabilities improve enough to cross some threshold of dangerousness, but this improvement happens in a very compute-intensive way that makes it uneconomical to automate a significant part of the economy until the threshold has been crossed.

4. Vulnerable world: Thanks to AI and other advances, a large number of human actors get the ability to make WMD's.

5. Persuasion/propaganda tools get good enough and are widely used enough that it significantly deteriorates the collective epistemology of the relevant actors (corps, governments, maybe even our community). (I know you've said at various times that probably AI-designed persuasive content will be banned or guarded against by other AIs, but what if this doesn't happen? We don't currently do much to protect ourselves from ordinary propaganda or algorithmically-selected content...)

6. Tech hoarding (The leading project(s) don't deploy their AI to improve the world economy, but nevertheless stay in the lead, perhaps due to massive investment, or perhaps due to weak or stifled competition)

Comment by Daniel Kokotajlo (daniel-kokotajlo) on AMA: Paul Christiano, alignment researcher · 2021-04-29T08:35:26.863Z · LW · GW

1. What credence would you assign to "+12 OOMs of compute would be enough for us to achieve AGI / TAI / AI-induced Point of No Return within five years or so." (This is basically the same, though not identical, with this poll question)

2. Can you say a bit about where your number comes from? E.g. maybe 25% chance of scaling laws not continuing such that OmegaStar, Amp(GPT-7), etc. don't work, 25% chance that they happen but don't count as AGI / TAI / AI-PONR, for total of about 60%? The more you say the better, this is my biggest crux! Thanks!

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Coherence arguments imply a force for goal-directed behavior · 2021-04-28T08:47:52.550Z · LW · GW

I love your health points analogy. Extending it, imagine that someone came up with "coherence arguments" that showed that for a rational doctor doing triage on patients, and/or for a group deciding who should do a risky thing that might result in damage, the optimal strategy involves a construct called "health points" such that:

--Each person at any given time has some number of health points

--Whenever someone reaches 0 health points, they (very probably) die

--Similar afflictions/disasters tend to cause similar amounts of decrease in health points, e.g. a bullet in the thigh causes me to lose 5 hp and you to lose 5 hp and Katja to lose 5hp.

Wouldn't these coherence arguments be pretty awesome? Wouldn't this be a massive step forward in our understanding (both theoretical and practical) of health, damage, triage, and risk allocation?

This is so despite the fact that someone could come along and say "Well these coherence arguments assume a concept (our intuitive concept) of 'damage,' they don't tell us what 'damage' means. (Ditto for concepts like 'die' and 'person' and 'similar') That would be true, and it would still be a good idea to do further deconfusion research along those lines, but it wouldn't detract much from the epistemic victory the coherence arguments won.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on GPT-3: a disappointing paper · 2021-04-27T06:10:50.995Z · LW · GW

Ah! You are right, I misread the graph. *embarrassed* Thanks for the correction!

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Does the lottery ticket hypothesis suggest the scaling hypothesis? · 2021-04-25T13:10:46.264Z · LW · GW

OH this indeed changes everything (about what I had been thinking) thank you! I shall have to puzzle over these ideas some more then, and probably read the multi-prize paper more closely (I only skimmed it earlier)

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Daniel Kokotajlo's Shortform · 2021-04-25T13:07:22.744Z · LW · GW

OH ok thanks! Glad to hear that. I'll edit.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on The Fall of Rome, III: Progress Did Not Exist · 2021-04-25T10:27:31.754Z · LW · GW

There's another explanation for why the history books display that progression you mapped out: They are Dutch history books, so naturally they want to focus on the bits of history that are especially relevant to the Dutch. One should expect that the "center of action" of these books drifts towards the Netherlands over time, just as it drifts towards the USA over time in the USA, and (I would predict) towards Indonesia over time in Indonesia, towards Japan over time in Japan, etc.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Daniel Kokotajlo's Shortform · 2021-04-25T10:15:09.904Z · LW · GW

The International Energy Agency releases regular reports in which it forecasts the growth of various energy technologies for the next few decades. It's been astoundingly terrible at forecasting solar energy for some reason. Marvel at this chart:

This is from an article criticizing the IEA's terrible track record of predictions. The article goes on to say that there should be about 500GW of installed capacity by 2020. This article was published in 2020; a year later, the 2020 data is in, and it's actually 714 GW. Even the article criticizing the IEA for their terrible track record managed to underestimate solar's rise one year out!

Anyhow, probably there were other people who successfully predicted it. But not these people. (I'd be interested to hear more about this--was the IEA representative of mainstream opinion? Or was it being laughed at even at the time? EDIT: Zac comments to say that yeah, plausibly they were being laughed at even then, and certainly now. Whew.)

Meanwhile, here was Carl Shulman in 2012: 

The continuation of the solar cell and battery cost curves are pretty darn impressive. Costs halving about once a decade, for several decades, is pretty darn impressive. One more decade until solar is cheaper than coal is today, and then it gets cheaper (vast areas of equatorial desert could produce thousands of times current electricity production and export in the form of computation, the products of electricity-intensive manufacturing, high-voltage lines, electrolysis to make hydrogen and hydrocarbons, etc). These trends may end before that, but the outside view looks good.

There have also been continued incremental improvements in robotics and machine learning that are worth mentioning, and look like they can continue for a while longer. Vision, voice recognition, language translation, and the like have been doing well. 

"One more decade until solar is cheaper than coal is today..."

Anyhow, all of this makes me giggle, so I thought I'd share it. When money is abundant, knowledge is the real wealth. In other words, many important kinds of knowledge are not for sale. If you were a rich person who didn't have generalist research skills and didn't know anything about solar energy, and relied on paying other people to give you knowledge, you would have listened to the International Energy Agency's official forecasts rather than Carl Shulman or people like him, because you wouldn't know how to distinguish Carl from the various other smart opinionated uncredentialed people all saying different things.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Does the lottery ticket hypothesis suggest the scaling hypothesis? · 2021-04-24T06:20:52.344Z · LW · GW

Whoa, the thing you are arguing against is not at all what I had been saying -- but maybe it was implied by what I was saying and I just didn't realize it? I totally agree that there are many optima, not just one. Maybe we are talking past each other?

(Part of why I think the two tickets are the same is that the at-initialization ticket is found by taking the after-training ticket and rewinding it to the beginning! So for them not to be the same, the training process would need to kill the first ticket and then build a new ticket on exactly the same spot!)

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Does the lottery ticket hypothesis suggest the scaling hypothesis? · 2021-04-23T21:01:47.840Z · LW · GW

Hmmm, ok. Can you say more about why? Isn't the simplest explanation that the two tickets are the same?

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Three reasons to expect long AI timelines · 2021-04-23T08:42:46.195Z · LW · GW

I definitely agree that our timelines forecasts should take into account the three phenomena you mention, and I also agree that e.g. Ajeya's doesn't talk about this much. I disagree that the effect size of these phenomena is enough to get us to 50 years rather than, say, +5 years to whatever our opinion sans these phenomena was. I also disagree that overall Ajeya's model is an underestimate of timelines, because while indeed the phenomena you mention should cause us to shade timelines upward, there is a long list of other phenomena I could mention which should cause us to shade timelines downward, and it's unclear which list is overall more powerful.

On a separate note, would you be interested in a call sometime to discuss timelines? I'd love to share my overall argument with you and hear your thoughts, and I'd love to hear your overall timelines model if you have one.

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Three reasons to expect long AI timelines · 2021-04-23T06:19:10.372Z · LW · GW

Thanks for this post! I'll write a fuller response later, but for now I'll say: These arguments prove too much; you could apply them to pretty much any technology (e.g. self-driving cars, 3D printing, reusable rockets, smart phones, VR headsets...). There doesn't seem to be any justification for the 50-year number; it's not like you'd give the same number for those other techs, and you could have made exactly this argument about AI 40 years ago, which would lead to 10-year timelines now. You are just pointing out three reasons in favor of longer timelines and then concluding

it's a bit difficult to see how we will get transformative AI developments in the next 50 years. Even accepting some of the more optimistic assumptions in e.g. Ajeya Cotra's Draft report on AI timelines, it still seems to me that these effects will add a few decades to our timelines before things get really interesting.

Which seems unwarranted to me. I agree that the things you say push in the direction of longer timelines, but there are other arguments one could make that push in the direction of shorter timelines, and it's not like your arguments are so solid that we can just conclude directly from them that timelines are long--and specifically 50+ years long!

Comment by Daniel Kokotajlo (daniel-kokotajlo) on Does the lottery ticket hypothesis suggest the scaling hypothesis? · 2021-04-22T19:11:09.613Z · LW · GW

Yeah, fair enough. I should amend the title of the question. Re: reinforcing the winning tickets: Isn't that implied? If it's not implied, would you not agree that it is happening? Plausibly, if there is a ticket at the beginning that does well at the task, and a ticket at the end that does well at the task, it's reasonable to think that it's the same ticket? Idk, I'm open to alternative suggestions now that you mention it...