Book review: Architects of Intelligence by Martin Ford (2018) 2020-08-11T17:30:21.247Z · score: 15 (7 votes)
The recent NeurIPS call for papers requires authors to include a statement about the potential broader impact of their work 2020-02-24T07:44:20.850Z · score: 12 (5 votes)
ofer's Shortform 2019-11-26T14:59:40.664Z · score: 4 (1 votes)
A probabilistic off-switch that the agent is indifferent to 2018-09-25T13:13:16.526Z · score: 11 (5 votes)
Looking for AI Safety Experts to Provide High Level Guidance for RAISE 2018-05-06T02:06:51.626Z · score: 43 (14 votes)
A Safer Oracle Setup? 2018-02-09T12:16:12.063Z · score: 12 (4 votes)


Comment by ofer on [AN #121]: Forecasting transformative AI timelines using biological anchors · 2020-10-24T06:22:52.616Z · score: 1 (1 votes) · LW · GW

We might get TAI due to efforts by, say, an algo-trading company that develops trading AI systems. The company can limit the mundane downside risks that it faces from non-robust behaviors of its AI systems (e.g. by limiting the fraction of its fund that the AI systems control). Of course, the actual downside risk to the company includes outcomes like existential catastrophes, but it's not clear to me why we should expect that prior to such extreme outcomes their AI systems would behave in ways that are detrimental to economic value.

Comment by ofer on [AN #121]: Forecasting transformative AI timelines using biological anchors · 2020-10-23T07:13:37.354Z · score: 1 (1 votes) · LW · GW

you need to ensure that your model is aligned, robust, and reliable (at least if you want to deploy it and get economic value from it).

I think it suffices for the model to be inner aligned (or deceptively inner aligned) for it to have economic value, at least in domains where (1) there is a usable training signal that corresponds to economic value (e.g. users' time spent in social media platforms, net income in algo-trading companies, or even the stock price in any public company); and (2) the downside economic risk from a non-robust behavior is limited (e.g. an algo-trading company does not need its model to be robust/reliable, assuming the downside risk from each trade is limited by design).

Comment by ofer on The Solomonoff Prior is Malign · 2020-10-14T11:08:15.365Z · score: 5 (4 votes) · LW · GW

If arguments about acausal trade and value handshakes hold, then the resulting utility function might contain some fraction of human values.

I think Paul's Hail Mary via Solomonoff prior idea is not obviously related to acausal trade. (It does not privilege agents that engage in acausal trade over ones that don't.)

Comment by ofer on A prior for technological discontinuities · 2020-10-14T10:40:46.389Z · score: 1 (1 votes) · LW · GW

Seems like an almost central example of continuous progress if you're evaluating by typical language model metrics like perplexity.

I think we should determine whether GPT-3 is an example of continuous progress in perplexity based on the extent to which it lowered the SOTA perplexity (on huge internet-text corpora), and its wall clock training time. I don't see why the correctness of a certain scaling law or the researchers' beliefs/motivation should affect this determination.

Comment by ofer on If GPT-6 is human-level AGI but costs $200 per page of output, what would happen? · 2020-10-10T17:33:34.879Z · score: 3 (2 votes) · LW · GW

Yeah, human-level is supposed to mean not strongly superhuman at anything important, while also not being strongly subhuman in anything important.

I think that's roughly the concept Nick Bostrom used in Superintelligence when discussing takeoff dynamics. (The usage of that concept is my only major disagreement with that book.) IMO it would be very surprising if the first ML system that is not strongly subhuman at anything important would not be strongly superhuman at anything important (assuming this property is not optimized for).

Comment by ofer on If GPT-6 is human-level AGI but costs $200 per page of output, what would happen? · 2020-10-10T17:23:53.614Z · score: 3 (2 votes) · LW · GW

GPT-2 was benchmarked at 43 perplexity on the 1 Billion Word (1BW) benchmark vs a (highly extrapolated) human perplexity of 12

I wouldn't say that that paper shows a (highly extrapolated) human perplexity of 12. It compares human-written sentences to language model generated sentences on the degree to which they seem "clearly human" vs "clearly unhuman" as judged by humans. Amusingly, for every 8 human-written sentences that were judged as "clearly human", one human-written sentence was judged as "clearly unhuman". And that 8:1 ratio is the thing from which human perplexity is being derived from. This doesn't make sense to me.

If the human annotators in this paper had never annotated human-written sentences as "clearly unhuman", this extrapolation would have shown human perplexity of 1! (As if humans can magically predict an entire page of text sampled from the internet.)

The LAMBADA dataset was also constructed using humans to predict the missing words, but GPT-3 falls far short of perfection there, so while I can't numerically answer it (unless you trust OA's reasoning there), it is still very clear that GPT-3 does not match or surpass humans at text prediction.

If the comparison here is on the final LAMBADA dataset, after examples were filtered out based on disagreement between humans (as you mentioned in the newsletter), then it's an unfair comparison. The examples are selected for being easy for humans.

BTW, I think the comparison to humans on the LAMBADA dataset is indeed interesting in the context of AI safety (more so than "predict the next word in a random internet text"); because I don't expect the perplexity/accuracy to depend much on the ability to model very low-level stuff (e.g. "that's" vs "that is").

Comment by ofer on If GPT-6 is human-level AGI but costs $200 per page of output, what would happen? · 2020-10-09T18:35:04.912Z · score: 3 (2 votes) · LW · GW

Nevertheless, it works. That's how self-supervised training/pretraining works.

Right, I'm just saying that I don't see how to map that metric to things we care about in the context of AI safety. If a language model outperforms humans at predicting the next word, maybe it's just due to it being sufficiently superior at modeling low-level stuff (e.g. GPT-3 may be better than me at predicting you'll write "That's" rather than "That is".)

(As an aside, in the linked footnote I couldn't easily spot any paper that actually evaluated humans on predicting the next word.)

Comment by ofer on If GPT-6 is human-level AGI but costs $200 per page of output, what would happen? · 2020-10-09T15:50:45.607Z · score: 7 (5 votes) · LW · GW

Some quick thoughts/comments:

--It can predict random internet text better than the best humans

I wouldn't use this metric. I don't see how to map between it and anything we care about. If it's defined in terms of accuracy when predicting the next word, I won't be surprised if existing language models already outperform humans.

Also, I find the term "human-level AGI" confusing. Does it exclude systems that are super-human on some dimensions? If so, it seems too narrow to be useful. For the purpose of this post, I propose using the following definition: A system that is able to generate text in a way that allows to automatically perform any task that humans can perform by writing text.

Comment by ofer on If GPT-6 is human-level AGI but costs $200 per page of output, what would happen? · 2020-10-09T15:50:03.079Z · score: 5 (4 votes) · LW · GW

But I'm pretty sure stuff would go crazy even before then. How?

We can end up with an intelligence explosion via automated ML research. One of the tasks that could be automated by the language model is "brainstorming novel ML ideas". So you'll be able to pay $200 and get a text, that could have been written by a brilliant ML researcher, containing novel ideas that allow you to create a more efficient/capable language model. (Though I expect that this specific approach won't be competitive with fully automated approaches that do stuff like NAS.)

Comment by ofer on AI arms race · 2020-10-07T11:54:06.128Z · score: 1 (1 votes) · LW · GW

This model assumes that each AI lab chooses some level of safety precautions , and then acts accordingly until AGI is created. But the degree to which an AI lab invests in safety may change radically with time. Importantly, it may increase by a lot if the leadership of the AI lab comes to believe that their current or near-term work poses existential risk.

This seems like a reason to be more skeptical about the counter-intuitive conclusion that the information available to all the teams about their own capability or progress towards AI increases the risk. (Not to be confused with the other counter-intuitive conclusion from the paper that the information available about other teams increases the risk).

Comment by ofer on “Unsupervised” translation as an (intent) alignment problem · 2020-09-30T20:11:25.709Z · score: 1 (1 votes) · LW · GW

If the model is smart, this is only going to work if the (correct) translation is reasonably likely to appear in your English text database. You are (at best) going to get a prediction of what human researchers would conclude after studying Klingon, your model isn't actually going to expand what humans can do.

Agreed. Perhaps it's possible to iteratively train GPT models in an Amplification-like setup, where in each iteration we add to the English training corpus some newly possible translations; aiming to end up with something like an HCH translator. (We may not need to train a language model from scratch in each iteration; at the extreme, we just to do fine-tuning on the new translations.)

Comment by ofer on “Unsupervised” translation as an (intent) alignment problem · 2020-09-30T16:50:48.357Z · score: 3 (2 votes) · LW · GW

Some tentative thoughts:

Re Debate:

Making things worse, to interpret each usage they’d need to agree about the meaning of the rest of the phrase — -which isn’t necessarily any simpler than the original disagreement about “qapla.” 

Consider a Debate experiment in which each of the two players outputs an entire English-Klingon dictionary (as avturchin mentioned). The judge then samples a random Klingon passage and decides which of the two dictionaries is more helpful for understanding that passage (maybe while allowing the two players to debate over which dictionary is more helpful).

Also, one might try to use GPT to complete prompts such as:

The researchers analyzed the Klingon phrase "מהדקי נייר" and concluded it roughly means 

In both of these approaches we still need to deal with the potential problem of catastrophic inner alignment failures occurring before the point where we have sufficiently useful helper models. [EDIT: and in the Debate-based approach there's also an outer alignment problem: a player may try to manipulate the judge into choosing them as the winner.]

Comment by ofer on ofer's Shortform · 2020-09-28T14:17:47.675Z · score: 1 (1 votes) · LW · GW

[researcher positions at FHI] 

(I'm not affiliated with FHI.)

FHI recently announced: "We have opened researcher positions across all our research strands and levels of seniority. Our big picture research focuses on the long-term consequences of our actions today and the complicated dynamics that are bound to shape our future in significant ways. These positions offer talented researchers freedom to think about the most important issues of our era in an environment with other brilliant minds willing to constructively engage with a broad range of ideas. Applications close 19th October 2020, noon BST."

Comment by ofer on Clarifying “What failure looks like” (part 1) · 2020-09-22T17:36:13.766Z · score: 1 (1 votes) · LW · GW

Isn't this what I said in the rest of that paragraph (although I didn't have an example)?

I meant to say that even if we replace just a single person (like a newspaper editor) with an ML system, it may become much harder to understand why each decision was made.

I agree this is possible but it doesn't seem very likely to me, since we'll very likely be training our AI systems to communicate in natural language, and those AI systems will likely be trained to behave in vaguely human-like ways.

The challenge here seems to me to train competitive models—that behave in vaguely human-like ways—for general real-world tasks (e.g. selecting content for a FB user feed or updating item prices on Walmart). In the business-as-usual scenario we would need such systems to be competitive with systems that are optimized for business metrics (e.g. users' time spent or profit).

Comment by ofer on Clarifying “What failure looks like” (part 1) · 2020-09-22T14:49:20.855Z · score: 3 (2 votes) · LW · GW

Note ML systems are way more interpretable than humans, so if they are replacing humans then this shouldn't make that much of a difference.

I guess you mean here that activations and weights in NNs are more interpretable to us than neurological processes in the human brain, but if so this comparison does not seem relevant to the text you quoted. Consider that it seems easier to understand why an editor of a newspaper placed some article on the front page than why FB's algorithm showed some post to some user (especially if we get to ask the editor questions or consult with other editors).

Overall I'd guess that for WFLL1 it's closer to "replacing humans" than "replacing institutions".

Even if so (which I would expect to become uncompetitive with "replacing institutions" at some point) you may still get weird dynamics between AI systems within an institution and across institutions (e.g. between a CEO advisor AI and a regulator advisor AI). These dynamics may be very hard to interpret (and may not even involve recognizable communication channels).

Comment by ofer on Needed: AI infohazard policy · 2020-09-22T13:45:24.305Z · score: 3 (2 votes) · LW · GW

Publishing under a pseudonym may end up being counterproductive due to the Streisand effect. Identities behind many pseudonyms may suddenly be publicly revealed following a publication on some novel method for detecting similarities in writing style between texts.

Comment by ofer on Draft report on AI timelines · 2020-09-19T14:02:00.802Z · score: 4 (3 votes) · LW · GW

From the draft (Part 3):

I think it is unlikely that the amount of computation that would be required to train a transformative model is in the range of AlphaStar or a few orders of magnitude more -- if that were feasible, I would expect some company to have already trained a transformative model, or at least to have trained models that have already had massive economic impact. To loosely approximate a Bayesian update based on the evidence from this “efficient markets” argument, I truncate and renormalize all the hypothesis probability distributions:


Can you give some concrete examples of models with massive economic impact that we don't currently see but should expect to see before affordable levels of computation are sufficient for training transformative models?

Comment by ofer on ofer's Shortform · 2020-09-15T15:03:10.986Z · score: 1 (1 votes) · LW · GW

I can't think of any specific source to check recurrently, but you can recurrently google [covid-19 long term effects] and look for new info from sources you trust.

Comment by ofer on ofer's Shortform · 2020-09-14T18:47:47.813Z · score: 2 (2 votes) · LW · GW

[COVID-19 related]

(Probably already obvious to most LW readers.)

There seems to be a lot of uncertainty about the chances of COVID-19 causing long-term effects (including for young healthy people who experience only mild symptoms). Make sure to take this into account when deciding how much effort you're willing to put into not getting infected.

Comment by ofer on Using GPT-N to Solve Interpretability of Neural Networks: A Research Agenda · 2020-09-06T13:59:15.791Z · score: 1 (1 votes) · LW · GW

When I said "we need GPT-N to learn a distribution over strings..." I was referring to the implicit distribution that the model learns during training. We need that distribution to assign more probability to the string [a modular NN specification followed by a prompt followed by a natural language description of the modules] than to [a modular NN specification followed by a prompt followed by an arbitrary string]. My concern is that maybe there is no prompt that will make this requirement fulfill.

Re "curating enough examples", this assumes humans are already able* to describe the modules of a sufficiently powerful language model (powerful enough to yield such descriptions).

*Able in practice, not just in theory.

Comment by ofer on Using GPT-N to Solve Interpretability of Neural Networks: A Research Agenda · 2020-09-04T18:58:51.405Z · score: 3 (2 votes) · LW · GW

I just to want to flag that, like Evan, I don't understand the usage of the term "microscope AI" in the OP. My understanding is that the term (as described here) describes a certain way to use a NN that implements a world model, namely, looking inside the NN and learning useful things about the world. It's an idea about how to use transparency, not how to achieve transparency.

Comment by ofer on Using GPT-N to Solve Interpretability of Neural Networks: A Research Agenda · 2020-09-04T16:45:39.333Z · score: 7 (2 votes) · LW · GW

Both the general idea of trying to train competitive NNs with modular architectures and the idea of trying to use language models to get descriptions of NNs (or parts thereof) seem extremely interesting! I hope a lot of work will be done on these research directions.

We’re assuming humans can interpret small NN’s, given enough time. A “Modular” NN is just a collection of small NN’s connected by sparse weights. If humans could interpret each module in theory, then GPT-N could too.

I'm not sure about that. Notice that we need GPT-N to learn a distribution over strings that assigns more probability to [a modular NN specification followed by a natural language description of its models modules] than [a modular NN specification followed by an arbitrary string]. Learning such a distribution may be unlikely if the training corpus doesn't contain anything as challenging-to-produce as the former (regardless of what humans can do in theory).

Comment by ofer on ofer's Shortform · 2020-08-24T21:45:53.619Z · score: 4 (3 votes) · LW · GW

It seems that the research team at Microsoft that trained Turing-NLG (the largest non-sparse language model other than GPT-3, I think) never published a paper on it. They just published a short blog post, on February. Is this normal? The researchers have an obvious incentive to publish such a paper, which would probably be cited a lot.

[EDIT: hmm maybe it's just that they've submitted a paper to NeurIPS 2020.]

[EDIT 2: NeurIPS permits putting the submission on arXiv beforehand, so why haven't they?]

Comment by ofer on Vanessa Kosoy's Shortform · 2020-08-19T18:59:44.072Z · score: 1 (1 votes) · LW · GW

convergence might literally never occur if the machine just doesn’t have the computational resources to contain such an upload

I think that in embedded settings (with a bounded version of Solomonoff induction) convergence may never occur, even in the limit as the amount of compute that is used for executing the agent goes to infinity. Suppose the observation history contains sensory data that reveals the probability distribution that the agent had, in the last time step, for the next number it's going to see in the target sequence. Now consider the program that says: "if the last number was predicted by the agent to be 0 with probability larger than then the next number is 1; otherwise it is 0." Since it takes much less than bits to write that program, the agent will never predict two times in a row that the next number is 0 with probability larger than (after observing only 0s so far).

Comment by ofer on [AN #108]: Why we should scrutinize arguments for AI risk · 2020-08-06T21:15:25.890Z · score: 1 (1 votes) · LW · GW

Thank you for clarifying!

Like, why didn't the earlier less intelligent versions of the system fail in some non-catastrophic way

Even if we assume there will be no algorithmic-related-discontinuity, I think the following are potential reasons:

  1. Detecting deceptive behaviors in complicated environments may be hard. To continue with the Facebook example, suppose that at some point in the future Facebook's feed-creation-agent would behave deceptively in some non-catastrophic way. Suppose it uses some unacceptable technique to increase user engagement (e.g. making users depressed), but it refrains from doing so in situations where it predicts that Facebook engineers would notice. The agent is not that great at being deceptive though, and a lot of times it ends up using the unacceptable technique when there's actually a high risk of the technique being noticed. Thus, Facebook engineers do notice the unacceptable technique at some point and fix the reward function accordingly (penalizing depressing content or whatever). But how will they detect the deceptive behavior itself? Will they be on the lookout for deceptive behavior and use clever techniques to detect it? (If so, what made Facebook transition into a company that takes AI safety seriously?)

  2. Huge scale-ups without much intermediate testing. Suppose at some point in the future, Facebook decides to scale up the model and training process of their feed-creation-agent by 100x (by assumption, data is not the bottleneck). It seems to me that this new agent may pose an existential risk even conditioned on the previous agent being completely benign. If you think that Facebook is unlikely to do a 100x scale-up in one go, suppose that their leadership comes to believe that the scale-up would cause their revenue to increase in expectation by 10%. That's ~$7B per year, so they are probably willing to spend a lot of money on the scale-up. Also, they may want to complete the scale-up ASAP because they "lose" $134M for every week of delay.

Comment by ofer on [AN #108]: Why we should scrutinize arguments for AI risk · 2020-08-06T16:13:27.163Z · score: 3 (2 votes) · LW · GW

Rohin's opinion: [...] Overall I agree pretty strongly with Ben. I do think that some of the counterarguments are coming from a different frame than the classic arguments. For example, a lot of the counterarguments involve an attempt to generalize from current ML practice to make claims about future AI systems. However, I usually imagine that the classic arguments are basically ignoring current ML, and instead claiming that if an AI system is superintelligent, then it must be goal-directed and have convergent instrumental subgoals.

I agree that the book Superintelligence does not mention any non-goal-directed approaches to AI alignment (as far as I can recall). But as long as we're in the business-as-usual state, we should expect some well-resourced companies to train competitive goal-directed agents that act in the real world, right? (E.g. Facebook plausibly uses some deep RL approach to create the feed that each user sees). Do you agree that for those systems, the classic arguments about instrumental convergence and the treacherous turn are correct? (If so, I don't understand how come you agree pretty strongly with Ben; he seems to be skeptical that those arguments can be mapped to contemporary ML methods.)

Comment by ofer on What specific dangers arise when asking GPT-N to write an Alignment Forum post? · 2020-07-28T09:57:19.176Z · score: 5 (3 votes) · LW · GW

It may be the case that solving inner alignment problems means hitting a narrow target; meaning that if we naively carry out a super-large-scale training process that spits out a huge AGI-level NN, dangerous logic is very likely to arise somewhere in the NN at some point during training. Since this concern doesn't point at any specific-type-of-dangerous-logic I guess it's not what you're after in this post; but I wouldn't classify it as part of the threat model that "we don't know what we don't know".

Having said all that, here's an attempt at describing a specific scenario as requested:

Suppose we finally train our AGI-level GPT-N and we think that the distribution it learned is "the human writing distribution", HWD for short. HWD is a distribution that roughly corresponds to our credences when answering questions like "which of these two strings is more likely to have appeared on the internet prior to 2020-07-28?". But unbeknown to us, the inductive bias of our training process made GPT-N learn the distribution HWD*, which is just like HWD except that some fraction of [the strings with a prefix that looks like "a prompt by humans-trying-to-automate-AI-safety"] are manipulative and make AI safety researchers, upon reading, invoke an AGI with a goal system X. Turns out that the inductive bias of our training process caused GPT-N to model agents-with-goal-system-X and such agents tend to sample lots of strings from the HWD* distribution in order to "steal" the cosmic endowment of reckless civilizations like ours. This would be a manifestation of is the same type of failure mode as the universal prior problem.

Comment by ofer on Alignment As A Bottleneck To Usefulness Of GPT-3 · 2020-07-22T14:20:21.276Z · score: 3 (2 votes) · LW · GW

There are infinitely many distributions from which the training data of GPT could have been sampled from [EDIT: including ones that could be catastrophic as the distribution our AGI learns], so it's worth mentioning an additional challenge on this route: making the future AGI-level-GPT learn the "human writing distribution" that we have in mind.

Comment by ofer on ofer's Shortform · 2020-07-15T20:11:55.756Z · score: 5 (3 votes) · LW · GW

Thank you for writing this!

When the upside of informing people who didn't get the memo is so large, saying the obvious things seems very beneficial. (I already knew almost all the info in this infodump, but it will probably slightly affect the way I prioritize things.)

I thought this was old news

It's probably old news to >90% of frequent LW readers (I added a TL;DR to save people time). It's not news to me. I wrote my original post for FB and then decided to post here too. (To be clear, I don't think it's old news to most people in general, at least not in the US or Israel).

Comment by ofer on ofer's Shortform · 2020-07-15T06:03:42.179Z · score: 7 (4 votes) · LW · GW

[COVID-19 related]

[EDIT: for frequent LW readers this is probably old news. TL;DR: people may catch COVID-19 by inhaling tiny droplets that are suspended in the air. Opening windows is good and being outdoors is better. Get informed about masks.]

There seems to be a growing suspicion that many people catch COVID-19 due to inhaling tiny droplets that were released into the air by infected people (even by just talking/breathing). It seems that these tiny droplets can be suspended in the air (or waft through the air) and may accumulate over time. (It's unclear to me how long they can remain in the air for - I've seen hypotheses ranging from a few minutes to 3 hours.)

Therefore, it seems that indoor spaces may pose much greater risk of catching COVID-19 than outdoor spaces, especially if they are poorly ventilated. So consider avoiding shared indoor spaces (especially elevators), keeping windows open when possible, and becoming more informed about masks.

Comment by ofer on Learning the prior · 2020-07-07T18:55:17.679Z · score: 3 (2 votes) · LW · GW

I'm confused about this point. My understanding is that, if we sample iid examples from some dataset and then naively train a neural network with them, in the limit we may run into universal prior problems, even during training (e.g. an inference execution that leverages some software vulnerability in the computer that runs the training process).

Comment by ofer on [AN #105]: The economic trajectory of humanity, and what we might mean by optimization · 2020-06-28T06:08:29.299Z · score: 1 (1 votes) · LW · GW

Claims of the form “neural nets are fundamentally incapable of X” are almost always false: recurrent neural nets are Turing-complete, and so can encode arbitrary computation.

I think RNNs are not Turing-complete (assuming the activations and weights can be represented by a finite number of bits). Models with finite state space (reading from an infinite input stream) can't simulate a Turing machine.

(Though I share the background intuition.)

Comment by ofer on AI safety via market making · 2020-06-27T06:14:47.937Z · score: 4 (3 votes) · LW · GW

Interesting idea.

Suppose that in the first time step is able to output a string that will manipulate into: (1) giving a probability that is maximally different than ; and (2) not looking at the rest of (i.e. the human will never see ,,...).

Ignoring inner alignment problems, in the limit it seems plausible that will output such an ; resulting in , and the smallest possible given .

[EDIT: actually, such problems are not specific to this idea and seem to generally apply to the 'AI safety via debate' approach.]

Comment by ofer on Likelihood of hyperexistential catastrophe from a bug? · 2020-06-21T05:47:39.751Z · score: 1 (1 votes) · LW · GW

The mugger scenario triggers strong game theoretical intuitions (eg "it's bad to be the sort of agent that other agents can benefit from making threats against") and the corresponding evolved decision-making processes. Therefore, when reasoning about scenarios that do not involve game theoretical dynamics (as is the case here), it may be better to use other analogies.

(For the same reason, "Pascal's mugging" is IMO a bad name for that concept, and "finite Pascal's wager" would have been better.)

Comment by ofer on ofer's Shortform · 2020-06-07T18:10:59.281Z · score: 1 (1 votes) · LW · GW

Paul Christiano's definition of slow takeoff may be too narrow, and sensitive to a choice of "basket of selected goods".

(I don't have a background in economy, so the following may be nonsense.)

Paul Christiano operationalized slow takeoff as follows:

There will be a complete 4 year interval in which world output doubles, before the first 1 year interval in which world output doubles. (Similarly, we’ll see an 8 year doubling before a 2 year doubling, etc.)

My understanding is that "world output" is defined with respect to some "basket of selected goods" (which may hide in the definition of inflation). Let's say we use whatever basket the World Bank used here.

Suppose that X years from now progress in AI makes half of the basket extremely cheaper to produce, but makes the other half only slightly cheaper to produce. The increase in the "world output" does not depend much on whether the first half of the basket is now 10x cheaper or 10,000x cheaper. In both cases the price of the basket is dominated by its second half.

If the thing we care about here is whether "incredibly powerful AI will emerge in a world where crazy stuff is already happening (and probably everyone is already freaking out)"—as Paul wrote—we shouldn't consider the above 10x and 10,000x cases to be similar.

Comment by ofer on OpenAI announces GPT-3 · 2020-05-31T09:04:39.181Z · score: 6 (4 votes) · LW · GW

As abergal wrote, not carrying the "1" can simply mean it does digit-wise addition (which seems trivial via memorization). But notice that just before that quote they also write:

To spot-check whether the model is simply memorizing specific arithmetic problems, we took the 3-digit arithmetic problems in our test set and searched for them in our training data in both the forms "<NUM1> + <NUM2> =" and "<NUM1> plus <NUM2>". Out of 2,000 addition problems we found only 17 matches (0.8%) and out of 2,000 subtraction problems we found only 2 matches (0.1%), suggesting that only a trivial fraction of the correct answers could have been memorized.

That seems like evidence against memorization, but maybe their simple search failed to find most cases with some relevant training signal, eg: "In this diet you get 350 calories during breakfast: 200 calories from X and 150 calories from Y."

Comment by ofer on Databases of human behaviour and preferences? · 2020-04-22T08:53:20.720Z · score: 4 (3 votes) · LW · GW

Maybe Minecraft-related datasets can be helpful. I'm not familiar with them myself, but I found these two:

CraftAssist: A Framework for Dialogue-enabled Interactive Agents

MineRL: A Large-Scale Dataset of Minecraft Demonstrations

Comment by ofer on Three Kinds of Competitiveness · 2020-04-01T07:33:03.860Z · score: 3 (2 votes) · LW · GW

Good point about inner alignment problems being a blocker to date-competitiveness for IDA... but aren't they also a blocker to date-competitiveness for every other alignment scheme too pretty much?

I think every alignment approach (other than interpretability-as-a-standalone-approach) that involves contemporary ML (i.e. training large neural networks) may have its date-competitiveness affected by inner alignment.

What alignment schemes don't suffer from this problem?

Most alignment approaches may have their date-competitiveness affected by inner alignment. (It seems theoretically possible to use whole brain emulation without inner alignment related risks, but as you mentioned elsewhere someone may build a neuromorphic AGI before we get there.)

I'm thinking "Do anything useful that a human with a lot of time can do" is going to be substantially less capable than full-blown superintelligent AGI.

I agree. Even a "narrow AI" system that is just very good at predicting stock prices may outperform "a human with a lot of time" (by leveraging very-hard-to-find causal relations).

Instead of saying we should expect IDA to be performance-competitive, I should have said something like the following: If at some point in the future we get to a situation where trillions of safe AGI systems are deployed—and each system can "only" do anything that a human-with-a-lot-of-time can do—and we manage to not catastrophically screw up until that point, I think humanity will probably be out of the woods. (All of humanity's regular problems will probably get resolved very quickly, including the lack of coordination.)

Comment by ofer on Three Kinds of Competitiveness · 2020-03-31T15:41:15.346Z · score: 3 (2 votes) · LW · GW

Very interesting definitions! I like the way they're used here to compare different scenarios.

Proposal: Iterated Distillation and Amplification: [...] I currently think of this scheme as decently date-competitive but not as cost-competitive or performance-competitive.

I think IDA's date-competitiveness will depend on the progress we'll have in inner alignment (or our willingness to bet against inner alignment problems occurring, and whether we'll be correct about it). Also, I don't see why we should expect IDA to not be very performance-competitive (if I understand correctly the hope is to get a system that can do anything useful that a human with a lot of time can do).

Generally, when using these definitions for comparing alignment approaches (rather than scenarios) I suspect we'll end up talking a lot about "the combination of date- and performance-competitiveness", because I expect the performance-competitiveness of most approaches will depend on how much research effort is invested in them.

Comment by ofer on Largest open collection quotes about AI · 2020-03-31T13:27:44.913Z · score: 6 (4 votes) · LW · GW

This spreadsheet is super impressive and has been very useful to me (it allowed me to find some very interesting stuff, like this discussion with Bill Gates and Elon Musk), thank you for creating it!

Comment by ofer on ofer's Shortform · 2020-03-26T20:43:56.292Z · score: 2 (3 votes) · LW · GW

Uneducated hypothesis: All hominidae species tend to thrive in huge forests, unless they've discovered fire. From the moment a species discovers fire, any individual can unilaterally burn the entire forest (due to negligence/anger/curiosity/whatever), and thus a huge forest is unlikely to serve as a long-term habitat for many individuals of that species.

Comment by ofer on Where can we donate time and money to avert coronavirus deaths? · 2020-03-18T07:27:11.665Z · score: 2 (3 votes) · LW · GW

For donating money:

It may be worthwhile to look into the COVID-19 Solidarity Response Fund (co-created by WHO). From WHO's website:

The Covid-19 Solidarity Response Fund is a secure way for individuals, philanthropies and businesses to contribute to the WHO-led effort to respond to the pandemic.

The United Nations Foundation and the Swiss Philanthropy Foundation have created the solidarity fund to support WHO and partners in a massive effort to help countries prevent, detect, and manage the novel coronavirus – particularly those where the needs are the greatest.

The fund will enable us to:

  • Send essential supplies such as personal protective equipment to frontline health workers
  • Enable all countries to track and detect the disease by boosting laboratory capacity through training and equipment.
  • Ensure health workers and communities everywhere have access to the latest science-based information to protect themselves, prevent infection and care for those in need.
  • Accelerate efforts to fast-track the discovery and development of lifesaving vaccines, diagnostics and treatments
Comment by ofer on How's the case for wearing googles for COVID-19 protection when in public transportation? · 2020-03-15T15:07:50.326Z · score: 1 (1 votes) · LW · GW

After seeing this preprint I'm less confident in my above update.

Comment by ofer on How long does SARS-CoV-2 survive on copper surfaces · 2020-03-14T14:40:05.385Z · score: 1 (1 votes) · LW · GW

Disclaimer: I'm not an expert.

It seems to me that this preprint suggests that in certain conditions the half-life of HCoV-19 (SARS-CoV-2) is ~0.4 hours on copper, ~3.5 hours on cardboard, ~5.5 hours on steel, and ~7 hours on plastic.

Comment by ofer on How's the case for wearing googles for COVID-19 protection when in public transportation? · 2020-03-11T13:24:40.832Z · score: 1 (1 votes) · LW · GW

[EDIT: You probably shouldn't read this comment, and instead read this post by Scott Alexander.]

FYI, regular surgical masks are insufficient for protection against COVID-19. A respirator graded n95 or higher is required.

Disclaimer: I'm not an expert.

[EDIT (2020-05-30): you really shouldn't use the following for updating your beliefs.]

After a quick look at some of the papers mentioned in Elizabeth's answers here I updated away from the belief that surgical masks are substantially less effective than N95 masks at preventing the wearer from getting infected with the novel coronavirus (it now seems to me likely plausible that surgical masks are not substantially less effective). But I can easily be wrong about that, and the evidence I've seen seems to me weak (the papers I've seen did not involve the novel coronavirus).

Comment by ofer on March Coronavirus Open Thread · 2020-03-10T15:11:51.615Z · score: 1 (3 votes) · LW · GW

Maybe citing the CDC:

It’s likely that at some point, widespread transmission of COVID-19 in the United States will occur. Widespread transmission of COVID-19 would translate into large numbers of people needing medical care at the same time. Schools, childcare centers, and workplaces, may experience more absenteeism. Mass gatherings may be sparsely attended or postponed. Public health and healthcare systems may become overloaded, with elevated rates of hospitalizations and deaths. Other critical infrastructure, such as law enforcement, emergency medical services, and sectors of the transportation industry may also be affected. Healthcare providers and hospitals may be overwhelmed. At this time, there is no vaccine to protect against COVID-19 and no medications approved to treat it.

Comment by ofer on What "Saving throws" does the world have against coronavirus? (And how plausible are they?) · 2020-03-04T21:02:04.751Z · score: 4 (3 votes) · LW · GW

Are there more?

Speaking as a layperson, it seems to me plausible that we'll see a "successful saving throw" in the form of a new coronavirus testing method (perhaps powered by machine learning) that will be cheap, quick and accurate. It will then be used in a massive scale all over the world and will allow governments to quarantine people much more effectively.

Comment by ofer on Coronavirus: Justified Practical Advice Thread · 2020-03-01T07:32:06.629Z · score: 8 (5 votes) · LW · GW

It is recommended to avoid touching your eyes, nose, and mouth[1]. People tend to inadvertently touch their eyes, nose, and mouth many times per hour[2]. If you think you can substantially reduce the number of times you touch your face by training yourself to avoid doing it, in some low-effort way, go for it. If it takes time to become good at not touching one's face, it may be worthwhile to start training at it now even if where you live is currently coronavirus-free.


[1]: The CDC (Centers for Disease Control and Prevention) writes:

The best way to prevent illness is to avoid being exposed to this virus. However, as a reminder, CDC always recommends everyday preventive actions to help prevent the spread of respiratory diseases, including:


  • Avoid touching your eyes, nose, and mouth.

[2]: The video by the CDC that Davidmanheim linked to claimed: "Studies have shown that people touch their eyes, nose, and mouth about 25 times every hour without even realizing it!"

Comment by ofer on ofer's Shortform · 2020-02-29T22:27:58.040Z · score: 1 (1 votes) · LW · GW

[Coronavirus related]

If some organization had perfect knowledge about the location of each person on earth (at any moment); and got an immediate update on any person that is diagnosed with the coronavirus, how much difference could that make in preventing the spread of the coronavirus?

What if the only type of action that the organization could take is sending people messages? For example, if Alice was just diagnosed with the coronavirus and 10 days ago she was on a bus with Bob, now Bob gets a message: "FYI the probability you have the coronavirus just increased from 0.01% to 0.5% due to someone that was near you 10 days ago. Please self-quarantine for 4 days." (These numbers are made up, obviously.)

Comment by ofer on Does iterated amplification tackle the inner alignment problem? · 2020-02-16T07:32:38.473Z · score: 3 (3 votes) · LW · GW

My understanding is that amplification-based approaches are meant to tackle inner alignment by using the amplified systems that are already trusted (e.g. humans + many invocations of a trusted model) to mitigate inner alignment problems in the next (slightly more powerful) models that are being trained. A few approaches for this have already been suggested (I'm not aware of published empirical results), see Evan's comment for some pointers.

I hope a lot more research will be done on this topic. It's not clear to me whether we should expect to have amplified systems that allow us to mitigate inner alignment risks to a satisfactory extent before the point where we have x-risk posing systems, how can we make that more likely, and if it's not feasible how do we realize that as soon as possible?