Book review: Architects of Intelligence by Martin Ford (2018) 2020-08-11T17:30:21.247Z
The recent NeurIPS call for papers requires authors to include a statement about the potential broader impact of their work 2020-02-24T07:44:20.850Z
ofer's Shortform 2019-11-26T14:59:40.664Z
A probabilistic off-switch that the agent is indifferent to 2018-09-25T13:13:16.526Z
Looking for AI Safety Experts to Provide High Level Guidance for RAISE 2018-05-06T02:06:51.626Z
A Safer Oracle Setup? 2018-02-09T12:16:12.063Z


Comment by ofer on (USA) N95 masks are available on Amazon · 2021-01-22T12:27:05.853Z · LW · GW

To support/add-to what ErickBall wrote, my own personal experience with respirators is that one with headbands (rather than ear loops) and a nose clip + nose foam is more likely to seal well.

Comment by ofer on ofer's Shortform · 2021-01-22T12:20:45.726Z · LW · GW

[COVID-19 related]

It was nice to see this headline:

My own personal experience with respirators is that one with headbands (rather than ear loops) and a nose clip + nose foam is more likely to seal well.

Comment by ofer on Short summary of mAIry's room · 2021-01-20T12:17:34.188Z · LW · GW

The topic of risks related to morally relevant computations seems very important, and I hope a lot more work will be done on it!

My tentative intuition is that learning is not directly involved here. If the weights of a trained RL agent are no longer being updated after some point[1], my intuition is that the model is similarly likely to experience pain before and after that point (assuming the environment stays the same).

Consider the following hypothesis which does not involve a direct relationship between learning and pain: In sufficiently large scale (and complex environments), TD learning tends to create components within the network, call them "evaluators", that evaluate certain metrics that correlate with expected return. In practice the model is trained to optimize directly for the output of the evaluators (and maximizing the output of the evaluators becomes the mesa objective). Suppose we label possible outputs of the evaluators with "pain" and "pleasure". We get something that seems analogous to humans. A human cares directly about pleasure and pain (which are things that correlated with expected evolutionary fitness in the ancestral environment), even when those things don't affect their evolutionary fitness accordingly (e.g. pleasure from eating chocolate, and pain from getting a vaccine shot).

  1. In TD learning, if from some point the model always perfectly predicted the future, the gradient would always be zero and no weights would be updated. Also, if an already-trained RL agent is being deployed, and there's no longer reinforcement learning going on after deployment (which seems like a plausible setup in products/services that companies sell to customers), the weights would obviously not be updated. ↩︎

Comment by ofer on Birds, Brains, Planes, and AI: Against Appeals to the Complexity/Mysteriousness/Efficiency of the Brain · 2021-01-19T12:56:20.224Z · LW · GW

My understanding is that the 2020 algorithms in Ajeya Cotra's draft report refer to algorithms that train a neural network on a given architecture (rather than algorithms that search for a good neural architecture etc.). So the only "special sauce" that can be found by such algorithms is one that corresponds to special weights of a network (rather than special architectures etc.).

Comment by ofer on Birds, Brains, Planes, and AI: Against Appeals to the Complexity/Mysteriousness/Efficiency of the Brain · 2021-01-19T09:45:07.080Z · LW · GW

Great post!

we’ll either have to brute-force search for the special sauce like evolution did

I would drop the "brute-force" here (evolution is not a random/naive search).

Re the footnote:

This "How much special sauce is needed?" variable is very similar to Ajeya Cotra's variable "how much compute would lead to TAI given 2020's algorithms."

I don't see how they are similar.

Comment by ofer on Why I'm excited about Debate · 2021-01-17T01:21:49.079Z · LW · GW

One might argue:

We don't need the model to use that much optimization power, to the point where it breaks the operator. We just need it to perform roughly at human-level, and then we can just deploy many instances of the trained model and accomplish very useful things (e.g. via factored cognition).

So I think it's important to also note that, getting a neural network to "perform roughly at human-level in an aligned manner" may be a much harder task than getting a neural network to achieve maximal rating by breaking the operator. The former may be a much narrower target. This point is closely related to what you wrote here in the context of amplification:

Speaking of inexact imitation: It seems to me that having an AI output a high-fidelity imitation of human behavior, sufficiently high-fidelity to preserve properties like "being smart" and "being a good person" and "still being a good person under some odd strains like being assembled into an enormous Chinese Room Bureaucracy", is a pretty huge ask.

It seems to me obvious, though this is the sort of point where I've been surprised about what other people don't consider obvious, that in general exact imitation is a bigger ask than superior capability. Building a Go player that imitates Shuusaku's Go play so well that a scholar couldn't tell the difference, is a bigger ask than building a Go player that could defeat Shuusaku in a match. A human is much smarter than a pocket calculator but would still be unable to imitate one without using a paper and pencil; to imitate the pocket calculator you need all of the pocket calculator's abilities in addition to your own.

Correspondingly, a realistic AI we build that literally passes the strong version of the Turing Test would probably have to be much smarter than the other humans in the test, probably smarter than any human on Earth, because it would have to possess all the human capabilities in addition to its own. Or at least all the human capabilities that can be exhibited to another human over the course of however long the Turing Test lasts. [...]

Comment by ofer on Gradient hacking · 2021-01-13T17:59:36.674Z · LW · GW

It does seem useful to make the distinction between thinking about how gradient hacking failures look like in worlds where they cause an existential catastrophe, and thinking about how to best pursue empirical research today about gradient hacking.

Comment by ofer on Gradient hacking · 2021-01-12T06:18:03.291Z · LW · GW

Some of the networks that have an accurate model of the training process will stumble upon the strategy of failing hard if SGD would reward any other competing network

I think the part in bold should instead be something like "failing hard if SGD would (not) update weights in such and such way". (SGD is a local search algorithm; it gradually improves a single network.)

This strategy seems more complicated, so is less likely to randomly exist in a network, but it is very strongly selected for, since at least from an evolutionary perspective it appears like it would give the network a substantive advantage.

As I already argued in another thread, the idea is not that SGD creates the gradient hacking logic specifically (in case this is what you had in mind here). As an analogy, consider a human that decides to 1-box in Newcomb's problem (which is related to the idea of gradient hacking, because the human decides to 1-box in order to have the property of "being a person that 1-boxs", because having that property is instrumentally useful). The specific strategy to 1-box is not selected for by human evolution, but rather general problem-solving capabilities were (and those capabilities resulted in the human coming up with the 1-box strategy).

Comment by ofer on Gradient hacking · 2021-01-02T14:32:32.496Z · LW · GW

My point was that there's no reason that SGD will create specifically "deceptive logic" because "deceptive logic" is not privileged over any other logic that involves modeling the base objective and acting according to it. But I now think this isn't always true - see the edit block I just added.

Comment by ofer on Gradient hacking · 2021-01-02T13:59:11.134Z · LW · GW

"deceptive logic" is probably a pretty useful thing in general for the model, because it helps improve performance as measured through the base-objective.

But you can similarly say this for the following logic: "check whether 1+1<4 and if so, act according to the base objective". Why is SGD more likely to create "deceptive logic" than this simpler logic (or any other similar logic)?

[EDIT: actually, this argument doesn't work in a setup where the base objective corresponds to a sufficiently long time horizon during which it is possible for humans to detect misalignment and terminate/modify the model (in a way the is harmful with respect to the base objective).]

So my understanding is that deceptive behavior is a lot more likely to arise from general-problem-solving logic, rather than SGD directly creating "deceptive logic" specifically.

Comment by ofer on Gradient hacking · 2021-01-02T09:39:31.392Z · LW · GW

I think that if SGD makes the model slightly deceptive it's because it made the model slightly more capable (better at general problem solving etc.), which allowed the model to "figure out" (during inference) that acting in a certain deceptive way is beneficial with respect to the mesa-objective.

This seems to me a lot more likely than SGD creating specifically "deceptive logic" (i.e. logic that can't do anything generally useful other than finding ways to perform better on the mesa-objective by being deceptive).

Comment by ofer on Gradient hacking · 2021-01-01T23:03:20.684Z · LW · GW

The less philosophical approach to this problem is to notice that the appearance of gradient hacking would probably come from the training stumbling on a gradient hacker.

[EDIT: you may have already meant it this way, but...] The optimization algorithm (e.g. SGD) doesn't need to stumble upon the specific logic of gradient hacking (which seems very unlikely). I think the idea is that a sufficiently capable agent (with a goal system that involves our world) instrumentally decides to use gradient hacking, because otherwise the agent will be modified in a suboptimal manner with respect to its current goal system.

Comment by ofer on AGI safety from first principles: Introduction · 2021-01-01T11:46:35.376Z · LW · GW

Early work tends to be less relevant in the context of modern machine learning

I'm curious why you think the orthogonality thesis, instrumental convergence, the treacherous turn or Goodhart's law arguments are less relevant in the context of modern machine learning. (We can use here Facebook's feed-creation-algorithm as an example of modern machine learning, for the sake of concreteness.)

Comment by ofer on Against GDP as a metric for timelines and takeoff speeds · 2020-12-30T18:41:23.520Z · LW · GW

Thank you for writing this up! This topic seems extremely important and I strongly agree with the core arguments here.

I propose the following addition to the list of things we care about when it comes to takeoff dynamics, or when it comes to defining slow(er) takeoff:

  1. Foreseeability: No one creates an AI with a transformative capability X at a time when most actors (weighted by influence) believe it is very unlikely that an AI with capability X will be created within a year.

Perhaps this should replace (or be merged with) the "warning shots" entry in the list. (As an aside, I think the term "warning shot" doesn't fit, because the original term refers to an action that is carried out for the purpose of communicating a threat.)

Comment by ofer on What are the best precedents for industries failing to invest in valuable AI research? · 2020-12-15T17:53:25.619Z · LW · GW

And the other part of the core idea is that that's implausible.

I don't see why that's implausible. The condition I gave is also my explanation for why the EMH fulfills (in markets where it does), and it doesn't explain why big corporations should be good at predicting AGI.

it's in their self-interest (at least, given their lack of concern for AI risk) to pursue it aggressively

So the questions I'm curious about here are:

  1. What mechanism is supposed to causes big corporations to be good at predicting AGI?
  2. How come that mechanism doesn't also cause big corporations to understand the existential risk concerns?
Comment by ofer on What are the best precedents for industries failing to invest in valuable AI research? · 2020-12-15T11:07:27.877Z · LW · GW

(I'm not an economist but my understanding is that...) The EMH works in markets that fulfill the following condition: If Alice is way better than the market at predicting future prices, she can use her superior prediction capability to gain more and more control over the market, until the point where her control over the market makes the market prices reflect her prediction capability.

If Alice is way better than anyone else at predicting AGI, how can she use her superior prediction capability to gain more control over big corporations? I don't see how the EMH an EMH-based argument applies here.

Comment by ofer on ofer's Shortform · 2020-12-14T11:49:10.292Z · LW · GW

[Online dating services related]

The incentives of online dating service companies are ridiculously misaligned with their users'. (For users who are looking for a monogamous, long-term relationship.)

A "match" between two users that results in them both leaving the platform for good is a super-negative outcome with respect to the metrics that the company is probably optimizing for. They probably use machine learning models to decide which "candidates" to show a user at any given time, and they are incentivized to train these models to avoid matches that cause users to leave their platform for good. (And these models may be way better at predicting such matches than any human).

Comment by ofer on ofer's Shortform · 2020-12-14T11:47:13.281Z · LW · GW

[Online dating services]

The incentives of online dating service companies are ridiculously misaligned with their users'. (For users that look for a monogamous, long-term relationship.)

A "match" between two users that results in them both leaving the platform for good is a super-negative outcome with respect to the metrics that the company is probably optimizing for. They probably use machine learning models to decide which "candidates" to show a user at any given time, and they are incentivized to train these models to avoid matches that cause users to leave their platform for good. (And these models may be way better at predicting such matches than any human).

Comment by ofer on Seeking Power is Often Robustly Instrumental in MDPs · 2020-12-12T03:11:32.500Z · LW · GW

Instrumental convergence is a very simple idea that I understand very well, and yet I failed to understand this paper (after spending hours on it) [EDIT: and also the post], so I'm worried about using it for the purpose of 'standing up to more intense outside scrutiny'. (Though it's plausible I'm just an outlier here.)

Comment by ofer on Covid 12/10: Vaccine Approval Day in America · 2020-12-11T16:39:27.309Z · LW · GW

Regarding comparison of mask types, the best source I'm aware of is:

Comment by ofer on In a multipolar scenario, how do people expect systems to be trained to interact with systems developed by other labs? · 2020-12-11T06:31:11.230Z · LW · GW

I have quite a different intuition on this, and I'm curious if you have a particular justification for expecting non-simulated training for multi-agent problems.

In certain domains, there are very strong economic incentives to train agents that will act in a real-world multi-agent environment, where the ability to simulate the environment is limited (e.g. trading in stock markets and choosing content for social media users).

Comment by ofer on Forecasting Newsletter: November 2020 · 2020-12-10T13:27:38.491Z · LW · GW

Otherwise, some members of the broader Effective Altruism and rationality communities made a fair amount of money betting on the election.

I would caveat this by adding that people are probably more likely to mention that they invested in a prediction market when the market resolved in their favor.

Comment by ofer on In a multipolar scenario, how do people expect systems to be trained to interact with systems developed by other labs? · 2020-12-10T11:41:19.652Z · LW · GW

Some off-the-cuff thoughts:

It seems plausible that transformative agents will be trained exclusively on real-world data (without using simulated environments) [EDIT: in "data" I mean to include the observation/reward signal from the real-world environment in an online RL setup]; including social media feed-creation algorithms, and algo-trading algorithms. In such cases, the researchers don't choose how to implement the "other agents" (the other agents are just part of the real-world environment that the researchers don't control).

Focusing on agents that are trained on simulated environments that involve multiple agents: For a lab to use copies of other labs' agents, the labs will probably need to cooperate (or some other process that involves additional actors may need to exist). In any case, using copies of the agent that is being trained (i.e. self-play) seems to me very plausible. (Like, I think both AlphaZero and OpenAI Five were trained via self-play and that self-play is generally considered to be a very prominent technique for RL-in-simulated-environments-that-involve-multiple-agents).

Comment by ofer on Cultural accumulation · 2020-12-07T11:23:46.683Z · LW · GW

I would probably be inclined to reject such a public offer on the grounds that some of the possible people who are similar to me (possible people whose decisions are correlated with mine), when finding themselves in a similar situation, may not trust the offer-maker or may not have $500 to spare, and would not want to publicly disclose that information.

Comment by ofer on I made an N95-level mask at home, and you can too · 2020-11-26T20:17:02.529Z · LW · GW

My uneducated concern is that masks that are not intended to seal may not allow air to flow sufficiently easily through their "filter" part (without it turning out to be a problem during "normal" use due the air easily flowing through the edges). Re volume argument, maybe we also need to consider the volume of the air we inhale each time (and whether that volume becomes smaller if something is partially blocking the air flow, and whether we notice).

Comment by ofer on Working in Virtual Reality: A Review · 2020-11-21T11:59:50.111Z · LW · GW

That's very interesting.

I'd be concerned about the potential impact of prolonged usage of a VR headset for many hours per day on eye health. (Of course, I'm not at all an expert in this area.)

Comment by ofer on I made an N95-level mask at home, and you can too · 2020-11-19T15:05:16.976Z · LW · GW

I followed your tip to just google it but every result in the first 2 pages for me was either out of stock or outdated

As I said in that thread, I was not recommending the mentioned google search as a way to buy respirators, and one's best options (which may include buying from a well-known retailer and having a mechanism to substantially lower risks from counterfeit respirators) may depend on where they live.

Comment by ofer on Some AI research areas and their relevance to existential safety · 2020-11-19T13:13:15.593Z · LW · GW

Great post!

I suppose you'll be more optimistic about Single/Single areas if you update towards fast/discontinuous takeoff?

Comment by ofer on I made an N95-level mask at home, and you can too · 2020-11-18T19:07:53.249Z · LW · GW

Disclaimer: I'm not an expert.

It turns out that surgical masks are made of the exact same material as N95s! They both filter 95% of 0.1μm particles.

I very much doubt this claim, and the link you provide in support of it is to a website that you later suggest is being run by people that seem to you "a bit sketchy". I also doubt that the way you propose for checking the "electrostatic effect" (on large pieces of paper?) can provide strong evidence that the mask's material provides filtering protection that is similar to a N95 respirator.

[EDIT: sorry, you later cite the Rengasamy et al. paper that seems to support that claim to some extent; I'm not sure how much to update on it.]

As a civilian you can’t purchase an N95 anywhere at any price.

This claim is false (see this thread).

BTW: since presumably surgical masks are not intended to be used in this way, I would also worry about potential risks of breathing too little oxygen or too much carbon dioxide.

BTW2: Maybe it's worth looking into using your approach for "upgrading" cheap KN95 respirators rather than surgical masks (I suspect that cheap KN95 respirators tend to not seal well due to a lack of nose clip and due to bands that go around the ears rather than around the head). Though the above concern regarding oxygen/carbon dioxide might still apply.

[EDIT: BTW3: for a comparison between cloth masks, surgical masks and N95 respirators see this page on]

Comment by ofer on What considerations influence whether I have more influence over short or long timelines? · 2020-11-07T21:50:36.582Z · LW · GW

(They may spend more on inference compute if doing so would sufficiently increase their revenue. They may train such a more-expensive model just to try it out for a short while, to see whether they're better off using it.)

Comment by ofer on What considerations influence whether I have more influence over short or long timelines? · 2020-11-07T15:24:44.613Z · LW · GW

I didn't follow this. FB doesn't need to run a model inference for each possible post that it considers showing (just like OpenAI doesn't need to run a GPT-3 inference for each possible token that can come next).

(BTW, I think the phrase "context window" would correspond to the model's input.)

FB's revenue from advertising in 2019 was $69.7 billion, or $191 million per day. So yea, it seems possible that in 2019 they used a model with an inference cost similar to GPT-3's, though not one that is 10x more expensive [EDIT: under this analysis' assumptions]; so I was overconfident in my previous comment.

Comment by ofer on What considerations influence whether I have more influence over short or long timelines? · 2020-11-07T13:20:17.113Z · LW · GW

That said, I'd be surprised if the feed-creation algorithm had as many parameters as GPT-3, considering how often it has to be run per day...

The relevant quantities here are the compute cost of each model usage (inference)—e.g. the cost of compute for choosing the next post to place on a feed—and the impact of such a potential usage on FB's revenue.

This post by Gwern suggests that OpenAI was able to run a single GPT-3 inference (i.e. generate a single token) at a cost of $0.00006 (6 cents for 1,000 tokens) or less. I'm sure it's worth to FB much more than $0.00006 to choose well the next post that a random user sees.

Comment by ofer on What considerations influence whether I have more influence over short or long timelines? · 2020-11-07T11:58:10.499Z · LW · GW

The frontrunners right now are OpenAI and DeepMind.

I'm not sure about this. Note that not all companies are equally incentivized to publish their ML research (some companies may be incentivized to be secretive about their ML work and capabilities due to competition/regulation dynamics). I don't see how we can know whether GPT-3 is further along on the route to AGI than FB's feed-creation algorithm, or the most impressive algo-trading system etc.

The other places have the money, but less talent

I don't know where the "less talent" estimate is coming from. I won't be surprised if there are AI teams with a much larger salary budget than any team at OpenAI/DeepMind, and I expect the "amount of talent" to correlate with salary budget (among prestigious AI labs).

and more importantly don't seem to be acting as if they think short timelines are possible.

I'm not sure how well we can estimate the beliefs and motivations of all well-resourced AI teams in the world. Also, a team need not be trying to create AGI (or believe they can) in order to create AGI. It's sufficient that they are incentivized to create systems that model the world as well as possible; which is the case for many teams, including ones working on feed-creation in social media services and algo-trading systems. (The ability to plan and find solutions to arbitrary problems in the real world naturally arises from the ability to model it, in the limit.)

Comment by ofer on What considerations influence whether I have more influence over short or long timelines? · 2020-11-06T17:26:42.846Z · LW · GW

This consideration favors short timelines, because (1) We have a good idea which AI projects will make TAI conditional on short timelines, and (2) Some of us already work there, they seem already at least somewhat concerned about safety, etc.

I don't see how we can have a good idea which project whether a certain small set of projects will make TAI first conditional on short timelines (or whether the first project will be one in which people are "already at least somewhat concerned about safety"). Like, why not some arbitrary team at Facebook/Alphabet/Amazon or any other well-resourced company? There are probably many well-resourced companies (including algo-trading companies) that are incentivized to throw a lot of money at novel, large scale ML research.

Comment by ofer on "Inner Alignment Failures" Which Are Actually Outer Alignment Failures · 2020-11-03T22:03:48.635Z · LW · GW

you should never get deception in the limit of infinite data (since a deceptive model has to defect on some data point).

I think a model can be deceptively aligned even if formally it maps every possible input to the correct (safe) output. For example, suppose that on input X the inference execution hacks the computer on which the inference is being executed, in order to do arbitrary consequentialist stuff (while the inference logic, as a mathematical object, formally yields the correct output for X).

Comment by ofer on ofer's Shortform · 2020-11-01T16:28:53.254Z · LW · GW

[Question about reinforcement learning]

What is the most impressive/large-scale published work in RL that you're aware of where—during training—the agent's environment is the real world (rather than a simulated environment)?

Comment by ofer on Responses to Christiano on takeoff speeds? · 2020-10-30T19:55:26.829Z · LW · GW

I'd love to give feedback on your version if you want! Could even collaborate.

Ditto for me!

Comment by ofer on Draft report on AI timelines · 2020-10-29T19:13:27.353Z · LW · GW

Let and be two optimization algorithms, each searching over some set of programs. Let be some evaluation metric over programs such that is our evaluation of program , for the purpose of comparing a program found by to a program found by . For example, can be defined as a subjective impressiveness metric as judged by a human.

Intuitive definition: Suppose we plot a curve for each optimization algorithm such that the x-axis is the inference compute of a yielded program and the y-axis is our evaluation value of that program. If the curves of and are similar up to scaling along the x-axis, then we say that and are similarly-scaling w.r.t inference compute, or SSIC for short.

Formal definition: Let and be optimization algorithms and let be an evaluation function over programs. Let us denote with the program that finds when it uses flops (which would correspond to the training compute if is an ML algorithms). Let us denote with the amount of compute that program uses. We say that and are SSIC with respect to if for any ,,, such that , if then .

I think the report draft implicitly uses the assumption that human evolution and the first ML algorithm that will result in TAI are SSIC (with respect to a relevant ). It may be beneficial to discuss this assumption in the report. Clearly, not all pairs of optimization algorithms are SSIC (e.g. consider a pure random search + any optimization algorithm). Under what conditions should we expect a pair of optimization algorithms to be SSIC with respect to a given ?

Maybe that question should be investigated empirically, by looking at pairs of optimization algorithms, were one is a popular ML algorithm and the other is some evolutionary computation algorithm (searching over a very different model space), and checking to what extent the two algorithms are SSIC.

Comment by ofer on Draft report on AI timelines · 2020-10-29T19:13:04.730Z · LW · GW

Some thoughts:

  1. The development of transformative AI may involve a feedback loop in which we train ML models that help us train better ML models and so on (e.g. using approaches like neural architecture search which seems to be getting increasingly popular in recent years). There is nothing equivalent to such a feedback loop in biological evolution (animals don't use their problem-solving capabilities to make evolution more efficient). Does your analysis assume there won't be such a feedback loop (or at least not one that has a large influence on timelines)? Consider adding to the report a discussion about this topic (sorry if it's already there and I missed it).

  2. Part of the Neural Network hypothesis is the proposition that "a transformative model would perform roughly as many FLOP / subj sec as the human brain". It seems to me worthwhile to investigate this proposition further. Human evolution corresponds to a search over a tiny subset of all possible computing machines. Why should we expect that a different search algorithm over an entirely different subset of computing machines would yield systems (with certain capabilities) that use a similar amount of compute? One might pursue an empirical approach for investigating this topic, e.g. by comparing two algorithms for searching over a space of models, where one is some common supervised learning algorithm, and the other is some evolutionary computation algorithm.

    In a separate comment (under this one) I attempt to describe a more thorough and formal way of thinking about this topic.

  3. Regarding methods that involve adjusting variables according to properties of 2020 algorithms (or the models trained by them): It would be interesting to try to apply the same methods with respect to earlier points in time (e.g. as if you were writing the report back in 1998/2012/2015 when LeNet-5/AlexNet/DQN were introduced, respectively). To what extent would the results be consistent with the 2020 analysis?

Comment by ofer on [AN #121]: Forecasting transformative AI timelines using biological anchors · 2020-10-24T17:56:45.191Z · LW · GW

My point here is that in a world where an algo-trading company has the lead in AI capabilities, there need not be a point in time (prior to an existential catastrophe or existential security) where investing more resources into the company's safety-indifferent AI R&D does not seem profitable in expectation. This claim can be true regardless of researchers' observations beliefs and actions in given situations.

Comment by ofer on [AN #121]: Forecasting transformative AI timelines using biological anchors · 2020-10-24T06:22:52.616Z · LW · GW

We might get TAI due to efforts by, say, an algo-trading company that develops trading AI systems. The company can limit the mundane downside risks that it faces from non-robust behaviors of its AI systems (e.g. by limiting the fraction of its fund that the AI systems control). Of course, the actual downside risk to the company includes outcomes like existential catastrophes, but it's not clear to me why we should expect that prior to such extreme outcomes their AI systems would behave in ways that are detrimental to economic value.

Comment by ofer on [AN #121]: Forecasting transformative AI timelines using biological anchors · 2020-10-23T07:13:37.354Z · LW · GW

you need to ensure that your model is aligned, robust, and reliable (at least if you want to deploy it and get economic value from it).

I think it suffices for the model to be inner aligned (or deceptively inner aligned) for it to have economic value, at least in domains where (1) there is a usable training signal that corresponds to economic value (e.g. users' time spent in social media platforms, net income in algo-trading companies, or even the stock price in any public company); and (2) the downside economic risk from a non-robust behavior is limited (e.g. an algo-trading company does not need its model to be robust/reliable, assuming the downside risk from each trade is limited by design).

Comment by ofer on The Solomonoff Prior is Malign · 2020-10-14T11:08:15.365Z · LW · GW

If arguments about acausal trade and value handshakes hold, then the resulting utility function might contain some fraction of human values.

I think Paul's Hail Mary via Solomonoff prior idea is not obviously related to acausal trade. (It does not privilege agents that engage in acausal trade over ones that don't.)

Comment by ofer on A prior for technological discontinuities · 2020-10-14T10:40:46.389Z · LW · GW

Seems like an almost central example of continuous progress if you're evaluating by typical language model metrics like perplexity.

I think we should determine whether GPT-3 is an example of continuous progress in perplexity based on the extent to which it lowered the SOTA perplexity (on huge internet-text corpora), and its wall clock training time. I don't see why the correctness of a certain scaling law or the researchers' beliefs/motivation should affect this determination.

Comment by ofer on If GPT-6 is human-level AGI but costs $200 per page of output, what would happen? · 2020-10-10T17:33:34.879Z · LW · GW

Yeah, human-level is supposed to mean not strongly superhuman at anything important, while also not being strongly subhuman in anything important.

I think that's roughly the concept Nick Bostrom used in Superintelligence when discussing takeoff dynamics. (The usage of that concept is my only major disagreement with that book.) IMO it would be very surprising if the first ML system that is not strongly subhuman at anything important would not be strongly superhuman at anything important (assuming this property is not optimized for).

Comment by ofer on If GPT-6 is human-level AGI but costs $200 per page of output, what would happen? · 2020-10-10T17:23:53.614Z · LW · GW

GPT-2 was benchmarked at 43 perplexity on the 1 Billion Word (1BW) benchmark vs a (highly extrapolated) human perplexity of 12

I wouldn't say that that paper shows a (highly extrapolated) human perplexity of 12. It compares human-written sentences to language model generated sentences on the degree to which they seem "clearly human" vs "clearly unhuman" as judged by humans. Amusingly, for every 8 human-written sentences that were judged as "clearly human", one human-written sentence was judged as "clearly unhuman". And that 8:1 ratio is the thing from which human perplexity is being derived from. This doesn't make sense to me.

If the human annotators in this paper had never annotated human-written sentences as "clearly unhuman", this extrapolation would have shown human perplexity of 1! (As if humans can magically predict an entire page of text sampled from the internet.)

The LAMBADA dataset was also constructed using humans to predict the missing words, but GPT-3 falls far short of perfection there, so while I can't numerically answer it (unless you trust OA's reasoning there), it is still very clear that GPT-3 does not match or surpass humans at text prediction.

If the comparison here is on the final LAMBADA dataset, after examples were filtered out based on disagreement between humans (as you mentioned in the newsletter), then it's an unfair comparison. The examples are selected for being easy for humans.

BTW, I think the comparison to humans on the LAMBADA dataset is indeed interesting in the context of AI safety (more so than "predict the next word in a random internet text"); because I don't expect the perplexity/accuracy to depend much on the ability to model very low-level stuff (e.g. "that's" vs "that is").

Comment by ofer on If GPT-6 is human-level AGI but costs $200 per page of output, what would happen? · 2020-10-09T18:35:04.912Z · LW · GW

Nevertheless, it works. That's how self-supervised training/pretraining works.

Right, I'm just saying that I don't see how to map that metric to things we care about in the context of AI safety. If a language model outperforms humans at predicting the next word, maybe it's just due to it being sufficiently superior at modeling low-level stuff (e.g. GPT-3 may be better than me at predicting you'll write "That's" rather than "That is".)

(As an aside, in the linked footnote I couldn't easily spot any paper that actually evaluated humans on predicting the next word.)

Comment by ofer on If GPT-6 is human-level AGI but costs $200 per page of output, what would happen? · 2020-10-09T15:50:45.607Z · LW · GW

Some quick thoughts/comments:

--It can predict random internet text better than the best humans

I wouldn't use this metric. I don't see how to map between it and anything we care about. If it's defined in terms of accuracy when predicting the next word, I won't be surprised if existing language models already outperform humans.

Also, I find the term "human-level AGI" confusing. Does it exclude systems that are super-human on some dimensions? If so, it seems too narrow to be useful. For the purpose of this post, I propose using the following definition: A system that is able to generate text in a way that allows to automatically perform any task that humans can perform by writing text.

Comment by ofer on If GPT-6 is human-level AGI but costs $200 per page of output, what would happen? · 2020-10-09T15:50:03.079Z · LW · GW

But I'm pretty sure stuff would go crazy even before then. How?

We can end up with an intelligence explosion via automated ML research. One of the tasks that could be automated by the language model is "brainstorming novel ML ideas". So you'll be able to pay $200 and get a text, that could have been written by a brilliant ML researcher, containing novel ideas that allow you to create a more efficient/capable language model. (Though I expect that this specific approach won't be competitive with fully automated approaches that do stuff like NAS.)

Comment by ofer on AI arms race · 2020-10-07T11:54:06.128Z · LW · GW

This model assumes that each AI lab chooses some level of safety precautions , and then acts accordingly until AGI is created. But the degree to which an AI lab invests in safety may change radically with time. Importantly, it may increase by a lot if the leadership of the AI lab comes to believe that their current or near-term work poses existential risk.

This seems like a reason to be more skeptical about the counter-intuitive conclusion that the information available to all the teams about their own capability or progress towards AI increases the risk. (Not to be confused with the other counter-intuitive conclusion from the paper that the information available about other teams increases the risk).