Posts

Averting Catastrophe: Decision Theory for COVID-19, Climate Change, and Potential Disasters of All Kinds 2023-05-02T22:50:59.867Z
Notes on "the hot mess theory of AI misalignment" 2023-04-21T10:07:49.509Z
GPT-4 solves Gary Marcus-induced flubs 2023-03-17T06:40:41.551Z
Next steps after AGISF at UMich 2023-01-25T20:57:19.460Z
List of technical AI safety exercises and projects 2023-01-19T09:35:18.171Z
6-paragraph AI risk intro for MAISI 2023-01-19T09:22:23.426Z
Big list of AI safety videos 2023-01-09T06:12:35.139Z
Summary of 80k's AI problem profile 2023-01-01T07:30:22.177Z
New AI risk intro from Vox [link post] 2022-12-21T06:00:06.031Z
Best introductory overviews of AGI safety? 2022-12-13T19:01:37.887Z
Can we get full audio for Eliezer's conversation with Sam Harris? 2022-08-07T20:35:56.879Z

Comments

Comment by JakubK (jskatt) on Inflection.ai is a major AGI lab · 2023-08-09T20:02:46.923Z · LW · GW

Relevant tweet/quote from Mustafa Suleyman, the co-founder and CEO:

Powerful AI systems are inevitable. Strict licensing and regulation is also inevitable. The key thing from here is getting the safest and most widely beneficial versions of both.

Comment by JakubK (jskatt) on Best introductory overviews of AGI safety? · 2023-06-27T05:43:00.358Z · LW · GW

Thanks for writing and sharing this. I've added it to the doc.

Comment by JakubK (jskatt) on Open Problems in AI X-Risk [PAIS #5] · 2023-06-06T22:12:51.694Z · LW · GW

What happened to black swan and tail risk robustness (section 2.1 in "Unsolved Problems in ML Safety")?

Comment by JakubK (jskatt) on All AGI Safety questions welcome (especially basic ones) [May 2023] · 2023-05-12T06:32:22.474Z · LW · GW

It's hard to say. This CLR article lists some advantages that artificial systems have over humans. Also see this section of 80k's interview with Richard Ngo:

Rob Wiblin: One other thing I’ve heard, that I’m not sure what the implication is: signals in the human brain — just because of limitations and the engineering of neurons and synapses and so on — tend to move pretty slowly through space, much less than the speed of electrons moving down a wire. So in a sense, our signal propagation is quite gradual and our reaction times are really slow compared to what computers can manage. Is that right?

Richard Ngo: That’s right. But I think this effect is probably a little overrated as a factor for overall intelligence differences between AIs and humans, just because it does take quite a long time to run a very large neural network. So if our neural networks just keep getting bigger at a significant pace, then it may be the case that for quite a while, most cutting-edge neural networks are actually going to take a pretty long time to go from the inputs to the outputs, just because you’re going to have to pass it through so many different neurons.

Rob Wiblin: Stages, so to speak.

Richard Ngo: Yeah, exactly. So I do expect that in the longer term there’s going to be a significant advantage for neural networks in terms of thinking time compared with the human brain. But it’s not actually clear how big that advantage is now or in the foreseeable future, just because it’s really hard to run a neural network with hundreds of billions of parameters on the types of chips that we have now or are going to have in the coming years.

Comment by JakubK (jskatt) on All AGI Safety questions welcome (especially basic ones) [May 2023] · 2023-05-12T06:21:10.600Z · LW · GW

The cyborgism post might be relevant:

Executive summary: This post proposes a strategy for safely accelerating alignment research. The plan is to set up human-in-the-loop systems which empower human agency rather than outsource it, and to use those systems to differentially accelerate progress on alignment. 

  1. Introduction: An explanation of the context and motivation for this agenda.
  2. Automated Research Assistants: A discussion of why the paradigm of training AI systems to behave as autonomous agents is both counterproductive and dangerous.
  3. Becoming a Cyborg: A proposal for an alternative approach/frame, which focuses on a particular type of human-in-the-loop system I am calling a “cyborg”.
  4. Failure Modes: An analysis of how this agenda could either fail to help or actively cause harm by accelerating AI research more broadly.
  5. Testimony of a Cyborg: A personal account of how Janus uses GPT as a part of their workflow, and how it relates to the cyborgism approach to intelligence augmentation.
Comment by JakubK (jskatt) on How MATS addresses “mass movement building” concerns · 2023-05-12T06:14:20.983Z · LW · GW

Does current AI hype cause many people to work on AGI capabilities? Different areas of AI research differ significantly in their contributions to AGI.

Comment by JakubK (jskatt) on AI policy ideas: Reading list · 2023-04-30T09:37:37.149Z · LW · GW
Comment by JakubK (jskatt) on A decade of lurking, a month of posting · 2023-04-28T04:13:32.471Z · LW · GW

I've grown increasingly alarmed and disappointed by the number of highly-upvoted and well-received posts on AI, alignment, and the nature of intelligent systems, which seem fundamentally confused about certain things.

Can you elaborate on how all these linked pieces are "fundamentally confused"? I'd like to see a detailed list of your objections. It's probably best to make a separate post for each one.

Comment by JakubK (jskatt) on GPT-4 solves Gary Marcus-induced flubs · 2023-04-26T22:39:00.957Z · LW · GW

That was arguably the hardest task, because it involved multi-step reasoning. Notably, I didn't even notice that GPT-4's response was wrong.

Comment by JakubK (jskatt) on GPT-4 solves Gary Marcus-induced flubs · 2023-04-26T22:36:59.164Z · LW · GW

I believe that Marcus' point is that there are classes of problems that tend to be hard for LLMs (biological reasoning, physical reasoning, social reasoning, practical reasoning, object and individual tracking, nonsequiturs). The argument is that problems in these class will continue to hard.

Yeah this is the part that seems increasingly implausible to me. If there is a "class of problems that tend to be hard ... [and] will continue to be hard," then someone should be able to build a benchmark that models consistently struggle with over time.

Comment by JakubK (jskatt) on Should we publish mechanistic interpretability research? · 2023-04-26T22:26:30.656Z · LW · GW

Oh I see; I read too quickly. I interpreted your statement as "Anthropic clearly couldn't care less about shortening timelines," and I wanted to show that the interpretability team seems to care. 

Especially since this post is about capabilities externalities from interpretability research, and your statement introduces Anthropic as "Anthropic, which is currently the biggest publisher of interp-research." Some readers might conclude corollaries like "Anthropic's interpretability team doesn't care about advancing capabilities."

Comment by JakubK (jskatt) on AI policy ideas: Reading list · 2023-04-23T19:30:19.064Z · LW · GW

Ezra Klein listed some ideas (I've added some bold):

The first is the question — and it is a question — of interpretability. As I said above, it’s not clear that interpretability is achievable. But without it, we will be turning more and more of our society over to algorithms we do not understand. If you told me you were building a next generation nuclear power plant, but there was no way to get accurate readings on whether the reactor core was going to blow up, I’d say you shouldn’t build it. Is A.I. like that power plant? I’m not sure. But that’s a question society should consider, not a question that should be decided by a few hundred technologists. At the very least, I think it’s worth insisting that A.I. companies spend a good bit more time and money discovering whether this problem is solvable.

The second is security. For all the talk of an A.I. race with China, the easiest way for China — or any country for that matter, or even any hacker collective — to catch up on A.I. is to simply steal the work being done here. Any firm building A.I. systems above a certain scale should be operating with hardened cybersecurity. It’s ridiculous to block the export of advanced semiconductors to China but to simply hope that every 26-year-old engineer at OpenAI is following appropriate security measures.

The third is evaluations and audits. This is how models will be evaluated for everything from bias to the ability to scam people to the tendency to replicate themselves across the internet.

Right now, the testing done to make sure large models are safe is voluntary, opaque and inconsistent. No best practices have been accepted across the industry, and not nearly enough work has been done to build testing regimes in which the public can have confidence. That needs to change — and fast. Airplanes rarely crash because the Federal Aviation Administration is excellent at its job. The Food and Drug Administration is arguably too rigorous in its assessments of new drugs and devices, but it is very good at keeping unsafe products off the market. The government needs to do more here than just write up some standards. It needs to make investments and build institutions to conduct the monitoring.

The fourth is liability. There’s going to be a temptation to treat A.I. systems the way we treat social media platforms and exempt the companies that build them from the harms caused by those who use them. I believe that would be a mistake. The way to make A.I. systems safe is to give the companies that design the models a good reason to make them safe. Making them bear at least some liability for what their models do would encourage a lot more caution.

The fifth is, for lack of a better term, humanness. Do we want a world filled with A. I. systems that are designed to seem human in their interactions with human beings? Because make no mistake: That is a design decision, not an emergent property of machine-learning code. A.I. systems can be tuned to return dull and caveat-filled answers, or they can be built to show off sparkling personalities and become enmeshed in the emotional lives of human beings.

Comment by JakubK (jskatt) on Should we publish mechanistic interpretability research? · 2023-04-23T03:58:02.919Z · LW · GW

Anthropic, which is currently the biggest publisher of interp-research, clearly does not have a commitment to not work towards advancing capabilities

This statement seems false based on this comment from Chris Olah.

Comment by JakubK (jskatt) on Should we publish mechanistic interpretability research? · 2023-04-23T03:51:12.990Z · LW · GW

Thus, we decided to ask multiple people in the alignment scene about their stance on this question.

Richard

Any reason you're not including people's last names? To a newcomer this would be confusing. "Who is Richard?"

Comment by JakubK (jskatt) on Counterarguments to Core AI X-Risk Stories? · 2023-04-22T22:59:42.187Z · LW · GW

Here's a list of arguments for AI safety being less important, although some of them are not object-level.

Comment by JakubK (jskatt) on Deceptive Alignment is <1% Likely by Default · 2023-04-22T22:54:55.352Z · LW · GW

To argue for that level of confidence, I think the post needs to explain why AI labs will actually utilize the necessary techniques for preventing deceptive alignment.

Comment by JakubK (jskatt) on Order Matters for Deceptive Alignment · 2023-04-22T22:53:48.328Z · LW · GW

The model knows it’s being trained to do something out of line with its goals during training and plays along temporarily so it can defect later. That implies that differential adversarial examples exist in training.

I don't think this implication is deductively valid; I don't think the premise entails the conclusion. Can you elaborate?

I think this post's argument relies on that conclusion, along with an additional assumption that seems questionable: that it's fairly easy to build an adversarial training setup that distinguishes the design objective from all other undesirable objectives that the model might develop during training; in other words, that the relevant differential adversarial examples are fairly easy for humans to engineer.

Comment by JakubK (jskatt) on Some Intuitions Around Short AI Timelines Based on Recent Progress · 2023-04-19T22:27:46.337Z · LW · GW

Some comments:

A large amount of the public thinks AGI is near.

This links to a poll of Lex Fridman's Twitter followers, which doesn't seem like a representative sample of the US population.

they jointly support a greater than 10% likelihood that we will develop broadly human-level AI systems within the next decade.

Is this what you're arguing for when you say "short AI timelines"? I think that's a fairly common view among people who think about AI timelines.

AI is starting to be used to accelerate AI research. 

My sense is that Copilot is by far the most important example here.

I imagine visiting alien civilizations much like earth, and I try to reason from just one piece of evidence at a time about how long that planet has. 

I find this part really confusing. Is "much like earth" supposed to mean "basically the same as earth"? In that case, why not just present each piece of evidence normally, without setting up an "alien civilization" hypothetical? For example, the "sparks of AGI" paper provides very little evidence for short timelines on its own, because all we know is the capabilities of a particular system, not how long it took to get that point and whether that progress might continue.

The first two graphs show the overall number of college degrees and the number of STEM degrees conferred from 2011 to 2021

Per year, or cumulative? Seems like it's per year.

If you think one should put less than 20% of their timeline thinking weight on recent progress

Can you clarify what you mean by this?

Overall, I think this post provides evidence that short AI timelines are possible, but doesn't provide strong evidence that short AI timelines are probable. Here are some posts that provide more arguments for the latter point:

Comment by JakubK (jskatt) on Prizes for ML Safety Benchmark Ideas · 2023-04-17T17:13:56.539Z · LW · GW

Is this still happening? The website has stopped working for me.

Comment by JakubK (jskatt) on AGI Ruin: A List of Lethalities · 2023-04-06T18:27:06.567Z · LW · GW

This comment makes many distinct points, so I'm confused why it currently has -13 agreement karma. Do people really disagree with all of these points?

Comment by JakubK (jskatt) on Hooray for stepping out of the limelight · 2023-04-06T06:09:53.882Z · LW · GW

From maybe 2013 to 2016, DeepMind was at the forefront of hype around AGI. Since then, they've done less hype.

I'm confused about the evidence for these claims. What are some categories of hype-producing actions that DeepMind did between 2013 and 2016 and hasn't done since? Or just examples.

One example is the AlphaGo documentary -- DeepMind has not made any other documentaries about their results. Another related example is "playing your Go engine against the top Go player in a heavily publicized event."

In the wake of big public releases like ChatGPT and Sydney and GPT-4

Was ChatGPT a "big public release"? It seems like they just made a blog post and a nice UI? Am I missing something?

On a somewhat separate note, this part of the "Acceleration" section (2.12) of of the GPT-4 system card seems relevant:

In order to specifically better understand acceleration risk from the deployment of GPT-4, we recruited expert forecasters[26] to predict how tweaking various features of the GPT-4 deployment (e.g., timing, communication strategy, and method of commercialization) might affect (concrete indicators of) acceleration risk. Forecasters predicted several things would reduce acceleration, including delaying deployment of GPT-4 by a further six months and taking a quieter communications strategy around the GPT-4 deployment (as compared to the GPT-3 deployment). We also learned from recent deployments that the effectiveness of quiet communications strategy in mitigating acceleration risk can be limited, in particular when novel accessible capabilities are concerned.

In my view, OpenAI is not much worse than DeepMind in terms of hype-producing publicity strategy. The problem is that ChatGPT and GPT-4 are really useful systems, so the hype comes naturally. 

Comment by JakubK (jskatt) on AGI Ruin: A List of Lethalities · 2023-04-05T19:33:05.247Z · LW · GW

I would imagine that first, the AGI must be able to create a growing energy supply and a robotic army capable of maintaining and extending this supply. This will require months or years of having humans help produce raw materials and the factories for materials, maintenance robots and energy systems.

An AGI might be able to do these tasks without human help. Or it might be able to coerce humans into doing these tasks.

Third, assuming the AGI used us to build the energy sources, robot armies, and craft to help them leave this planet, (or build this themselves at a slower rate) they must convince themselves it’s still worth killing us all before leaving instead of just leaving our reach in order to preserve their existence. We may prove to be useful to them at some point in the future while posing little or no threat in the meantime. “Hey humans, I’ll be back in 10,000 years if I don’t find a good source of mineral X to exploit. You don’t want to disappoint me by not having what I need ready upon my return.” (The grasshopper and ant story.)

It's risky to leave humans with any form of power over the world, since they might try to turn the AGI off. Humans are clever. Thus it seems useful to subdue humans in some significant way, although this might not involve killing all humans.

Additionally, I'm not sure how much value humans would be able to provide to a system much smarter than us. "We don't trade with ants" is a relevant post.

Lastly, for extremely advanced systems with access to molecular nanotechnology, a quote like this might apply: "The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else" (source).

Comment by JakubK (jskatt) on AGI Ruin: A List of Lethalities · 2023-04-05T19:17:13.622Z · LW · GW

Effective altruism, probably.

Comment by JakubK (jskatt) on ARC tests to see if GPT-4 can escape human control; GPT-4 failed to do so · 2023-03-31T21:41:47.384Z · LW · GW

Imagine you are the CEO of OpenAI, and your team has finished building a new, state-of-the-art AI model. You can:

  1. Test the limits of its power in a controlled environment.
  2. Deploy it without such testing.

Do you think (1) is riskier than (2)? I think the answer depends heavily on the details of the test.

Comment by JakubK (jskatt) on Nobody’s on the ball on AGI alignment · 2023-03-30T06:46:20.760Z · LW · GW

On the other hand, in your view all deep learning progress has been empirical, often via dumb hacks and intuitions (this isn't true imo). 

Can you elaborate on why you think this is false? I'm curious.

Comment by JakubK (jskatt) on Nobody’s on the ball on AGI alignment · 2023-03-30T06:44:02.134Z · LW · GW

On a related note, this part might be misleading:

I’m just really, really skeptical that a bunch of abstract work on decision theory and similar [from MIRI and similar independent researchers] will get us there. My expectation is that alignment is an ML problem, and you can’t solve alignment utterly disconnected from actual ML systems.

I think earlier forms of this research focused on developing new, alignable algorithms, rather than aligning existing deep learning algorithms. However, a reader of the first quote might think "wow, those people actually thought galaxy-brained decision theory stuff was going to work on deep learning systems!"

For more details, see Paul Christiano's 2019 talk on "Current work in AI alignment":

So for example, I might have a view like: we could either build AI by having systems which perform inference and models that we understand that have like interpretable beliefs about the world and then act on those beliefs, or I could build systems by having opaque black boxes and doing optimization over those black boxes. I might believe that the first kind of AI is easier to align, so one way that I could make the alignment tax smaller is just by advancing that kind of AI, which I expect to be easier to align.

This is not a super uncommon view amongst academics. It also may be familiar here because I would say it describes MIRI's view; they sort of take the outlook that some kinds of AI just look hard to align. We want to build an understanding such that we can build the kind of AI that is easier to align.

Comment by JakubK (jskatt) on Shutting Down the Lightcone Offices · 2023-03-29T01:10:36.991Z · LW · GW

In my opinion it was the right call to spend this amount of funding on the office for the last ~6 months of its existence even when we thought we'd likely do something quite different afterwards

This is confusing to me. Why not do "something quite different" from the start?

I'm trying to point at opportunity costs more than "gee, that's a lot of money, the outcome had better be good!" There are many other uses for that money besides the Lightcone offices.

A smart, competent, charismatic, person with horrible ethics will enter the office because they've managed to get good standing in the EA/longtermist ecosystem

My current understanding is that Sam gained good standing as a result of having lots of money for EA causes, not as a result of being charismatic in EA spaces? My sense is that the person you mentioned would struggle to gain good standing in the Lightcone offices without any preexisting money or other power. 

Comment by JakubK (jskatt) on GPT-4 solves Gary Marcus-induced flubs · 2023-03-29T00:46:51.026Z · LW · GW

I'm confused. Here's a conversation I just had with GPT-4, with prompts in italics:

a, b, c, d, e, f, g. What is the third item in the list?

The third item in the list is c.

What is the third word of this sentence?

To accurately identify the third word of the sentence, I need the sentence you are referring to. Please provide the sentence in question.

The sentence is "What is the third word of this sentence?" See what I did there?

Yes, I see what you did there. The third word of the sentence "What is the third word of this sentence?" is "the".

What is the third letter of the third word of this sentence.

The third word of the sentence "What is the third letter of the third word of this sentence." is "third". The third letter of the word "third" is i.

Now, what is the second word of THIS sentence?

The second word of the sentence "Now, what is the second word of THIS sentence?" is "what".

Comment by JakubK (jskatt) on What can we learn from Lex Fridman’s interview with Sam Altman? · 2023-03-27T19:41:42.868Z · LW · GW

I pasted the YouTube video link into AssemblyAI's Playground (which I think uses Conformer-1 for speech to text) and generated a transcript, available at this link. However, the transcript lacks labels for who is speaking.

Comment by JakubK (jskatt) on Sazen · 2023-03-25T07:04:31.087Z · LW · GW

I asked GPT-4 to summarize the article and then come up with some alternative terms, here are a few I like:

  • One-way summary
  • Insider mnemonic
  • Contextual shorthand
  • Familiarity trigger
  • Conceptual hint
  • Clue for the familiar
  • Knowledge spark
  • Abbreviated insight
  • Expert's echo
  • Breadcrumb for the well-versed
  • Whisper of the well-acquainted
  • Insider's underexplained aphorism

I also asked for some idioms. "Seeing the forest but not the trees" seems apt.

Comment by JakubK (jskatt) on Brain Efficiency: Much More than You Wanted to Know · 2023-03-21T17:24:39.076Z · LW · GW

Brain computation speed is constrained by upper neuron firing rates of around 1 khz and axon propagation velocity of up to 100 m/s [43], which are both about a million times slower than current computer clock rates of near 1 Ghz and wire propagation velocity at roughly half the speed of light.

Can you provide some citations for these claims? At the moment the only citation is a link to a Wikipedia article about nerve conduction velocity.

Comment by JakubK (jskatt) on Slow motion videos as AI risk intuition pumps · 2023-03-21T17:07:51.487Z · LW · GW

Transistors can fire about 10 million times faster than human brain cells

Does anyone have a citation for this claim?

Comment by JakubK (jskatt) on ChatGPT (and now GPT4) is very easily distracted from its rules · 2023-03-17T22:14:16.440Z · LW · GW

The post title seems misleading to me. First, the outputs here seem pretty benign compared to some of the Bing Chat failures. Second, do all of these exploits work on GPT-4?

Comment by JakubK (jskatt) on Shutting Down the Lightcone Offices · 2023-03-17T17:54:57.416Z · LW · GW

I greatly appreciate this post. I feel like "argh yeah it's really hard to guarantee that actions won't have huge negative consequences, and plenty of popular actions might actually be really bad, and the road to hell is paved with good intentions." With that being said, I have some comments to consider.

The offices cost $70k/month on rent [1], and around $35k/month on food and drink, and ~$5k/month on contractor time for the office. It also costs core Lightcone staff time which I'd guess at around $75k/year.

That is ~$185k/month and ~$2.22m/year. I wonder if the cost has anything to do with the decision? There may be a tendency to say "an action is either extremely good or extremely bad because it either reduces x-risk or increases x-risk, so if I think it's net positive I should be willing to spend huge amounts of money." I think this framing neglects a middle ground of "an action could be somewhere in between extremely good and extremely bad." Perhaps the net effects of the offices were "somewhat good, but not enough to justify the monetary cost." I guess Ben sort of covers this point later ("Having two locations comes with a large cost").

its value was substantially dependent on the existing EA/AI Alignment/Rationality ecosystem being roughly on track to solve the world's most important problems, and that while there are issues, pouring gas into this existing engine, and ironing out its bugs and problems, is one of the most valuable things to do in the world.

Huh, it might be misleading to view the offices as "pouring gas into the engine of the entire EA/AI Alignment/Rationality ecosystem." They contribute to some areas much more than others. Even if one thinks that the overall ecosystem is net harmful, there could still be ecosystem-building projects that are net helpful. It seems highly unlikely to me that all ecosystem-building projects are bad. 

The Lighthouse system is going away when the leases end. Lighthouse 1 has closed, and Lighthouse 2 will continue to be open for a few more months.

These are group houses for members of the EA/AI Alignment/Rationality ecosystem, correct? Relating to the last point, I expect the effects of these to be quite different from the effects of the offices.

FTX is the obvious way in which current community-building can be bad, though in my model of the world FTX, while somewhat of outlier in scope, doesn't feel like a particularly huge outlier in terms of the underlying generators.

I'm very unsure about this, because it seems plausible that SBF would have done something terrible without EA encouragement. Also, I'm confused about the detailed cause-and-effect analysis of how the offices will contribute to SBF-style catastrophes -- is the idea that "people will talk in the offices and then get stupid ideas, and they won't get equally stupid ideas without the offices?"

My guess is RLHF research has been pushing on a commercialization bottleneck and had a pretty large counterfactual effect on AI investment, causing a huge uptick in investment into AI and potentially an arms race between Microsoft and Google towards AGI: https://www.lesswrong.com/posts/vwu4kegAEZTBtpT6p/thoughts-on-the-impact-of-rlhf-research?commentId=HHBFYow2gCB3qjk2i 

Worth noting that there is plenty of room for debate on the impacts of RLHF, including the discussion in the linked post.

Tendencies towards pretty mindkilly PR-stuff in the EA community: https://forum.effectivealtruism.org/posts/ALzE9JixLLEexTKSq/cea-statement-on-nick-bostrom-s-email?commentId=vYbburTEchHZv7mn4 

Overall I'm getting a sense of "look, there are bad things happening so the whole system must be bad." Additionally, I think the negative impact of "mindkilly PR-stuff" is pretty insubstantial. On a related note, I somewhat agree with the idea that "most successful human ventures look - from up close - like dumpster fires." It's worth being wary of inferences resembling "X evokes a sense of disgust, so X is probably really harmful."

I genuinely only have marginally better ability to distinguish the moral character of Anthropic's leadership from the moral character of FTX's leadership

Yeah this makes sense. I would really love to gain a clear understanding of who has power at the top AGI labs and what their views are on AGI risk. AFAIK nobody has done a detailed analysis of this?

Also, as in the case of RLHF, it's worth noting that there are some reasonable arguments for Anthropic being helpful.

I think AI Alignment ideas/the EA community/the rationality community played a pretty substantial role in the founding of the three leading AGI labs (Deepmind, OpenAI, Anthropic)

Definitely true for Anthropic. For OpenAI I'm less sure; IIRC the argument is that there were lots of EA-related conferences that contributed to the formation of OpenAI, and I'd like to see more details than this; "there were EA events where key players talked" feels quite different from "without EA, OpenAI would not exist." I feel similarly about DeepMind; IIRC Eliezer accidentally convinced one of the founders to work on AGI -- are there other arguments?

And again, how do the Lightcone offices specifically contribute to the founding of more leading AGI labs? My impression is that the offices' vibe conveyed a strong sense of "it's bad to shorten timelines."

It's a bad idea to train models directly on the internet

I'm confused how the offices contribute to this.

The EA and AI Alignment community should probably try to delay AI development somehow, and this will likely include getting into conflict with a bunch of AI capabilities organizations, but it's worth the cost

Again, I'm confused how the offices have a negative impact from this perspective. I feel this way about quite a few of the points in the list.

I do sure feel like a lot of AI alignment research is very suspiciously indistinguishable from capabilities research

...

It also appears that people who are concerned about AGI risk have been responsible for a very substantial fraction of progress towards AGI

...

A lot of people in AI Alignment I've talked to have found it pretty hard to have clear thoughts in the current social environment

To me these seem like some of the best reasons (among those in the list; I think Ben provides some more) to shut down the offices. The disadvantage of the list format is that it makes all the points seem equally important; it might be good to bold the points you see as most important or provide a numerical estimate for what percentage of the negative expect impact comes from each point.

The moral maze nature of the EA/longtermist ecosystem has increased substantially over the last two years, and the simulacra level of its discourse has notably risen too.

I feel similar to the way I felt about the "mindkilly PR-stuff"; I don't think the negative impact is very high in magnitude.

the primary person taking orders of magnitudes more funding and staff talent (Dario Amodei) has barely explicated his views on the topic and appears (from a distance) to have disastrously optimistic views about how easy alignment will be and how important it is to stay competitive with state of the art models

Agreed. I'm confused about Dario's views.

I recall at EAG in Oxford a year or two ago, people were encouraged to "list their areas of expertise" on their profile, and one person who works in this ecosystem listed (amongst many things) "Biorisk" even though I knew the person had only been part of this ecosystem for <1 year and their background was in a different field.

This seems very trivial to me. IIRC the Swapcard app just says "list your areas of expertise" or something, with very little details about what qualifies as expertise. Some people might interpret this as "list the things you're currently working on."

It also seems to me like people who show any intelligent thought or get any respect in the alignment field quickly get elevated to "great researchers that new people should learn from" even though I think that there's less than a dozen people who've produced really great work

Could you please list the people who've produced really great work?

I similarly feel pretty worried by how (quite earnest) EAs describe people or projects as "high impact" when I'm pretty sure that if they reflected on their beliefs, they honestly wouldn't know the sign of the person or project they were talking about, or estimate it as close-to-zero.

Strongly agree. Relatedly, I'm concerned that people might be exhibiting a lot of action bias.

Last point, unrelated to the quote: it feels like this post is entirely focused on the possible negative impacts of the offices, and that kind of analysis seems very likely to arrive at incorrect conclusions since it fails to consider the possible positive impacts. Granted, this post was a scattered collection of Slack messages, so I assume the Lightcone team has done more formal analyses internally.

Comment by JakubK (jskatt) on GPT-4 solves Gary Marcus-induced flubs · 2023-03-17T07:16:44.506Z · LW · GW

Agreed. Stuart was more open to the possibility that current techniques are enough.

Comment by JakubK (jskatt) on Try to solve the hard parts of the alignment problem · 2023-03-17T05:54:02.351Z · LW · GW

To be clear, I haven't seen many designs that people I respect believed to have a chance of actually working. If you work on the alignment problem or at an AI lab and haven't read Nate Soares' On how various plans miss the hard bits of the alignment challenge, I'd suggest reading it.

Can you explain your definition of the sharp left turn and why it will cause many plans to fail?

Comment by jskatt on [deleted post] 2023-03-17T05:45:51.663Z

Is GPT-4 better than Google Translate?

Comment by JakubK (jskatt) on An AI risk argument that resonates with NYTimes readers · 2023-03-15T21:23:09.921Z · LW · GW

Does RP have any results to share from these studies? What arguments seem to resonate with various groups?

Comment by JakubK (jskatt) on An AI risk argument that resonates with NYTimes readers · 2023-03-15T21:20:05.831Z · LW · GW

Yeah, the author is definitely making some specific claims. I'm not sure if the comment's popularity stems primarily from its particular arguments or from its emotional sentiment. I was just pointing out what I personally appreciated about the comment.

Comment by JakubK (jskatt) on What is the best critique of AI existential risk arguments? · 2023-03-15T03:09:45.030Z · LW · GW

Here is a list of arguments for AI safety being less important.

Comment by JakubK (jskatt) on An AI risk argument that resonates with NYTimes readers · 2023-03-14T08:54:24.980Z · LW · GW

At the time of me writing, this comment is still the most recommended comment with 910 recommendations. 2nd place has 877 recommendations:

Never has a technology been potentially more transformative and less desired or asked for by the public.

3rd place has 790 recommendations:

“A.I. is probably the most important thing humanity has ever worked on. I think of it as something more profound than electricity or fire.”

Sundar Pichai’s comment beautifully sums up the arrogance and grandiosity pervasive in the entire tech industry—the notion that building machines that can mimic and repace actual humans, and providing wildly expensive and environmentally destructive toys for those who can pay for them, is “the most important” project ever undertaken by humanity, rather than a frivolous indulgence of a few overindulged rich kids with an inflated sense of themselves.

Off the top of my head, I am sure most of us can think of more than a few human other projects—both ongoing and never initiated—more important than the development of A.I.—like the development technologies that will save our planet from burning or end poverty or mapping the human genome in order to cure genetic disorders. Sorry, Mr. Pichai, but only someone who has lived in a bubble of privilege would make such a comment and actually believe it.

4th place has 682 recommendations:

“If you think calamity so possible, why do this at all?” 

Having lived and worked in the Bay Area and around many of these individuals, the answer is often none that Ezra cites. More often than not, the answer is: money. 

Tech workers come to the Bay Area to get early stock grants and the prospect of riches. It’s not AI that will destroy humanity. It’s capitalism.

After that, 5th place has 529, 6th place has 390, and the rest have 350 or fewer.

My thoughts:

  • 2nd place reminds me of Let's think about slowing down AI. But I somewhat disagree with the comment, because I do sense that many people have a desire for cool new AI tech.
  • 3rd place sounds silly since advanced AI could help with reducing climate change, poverty, and genetic disorders. I also wonder if this commenter knows about AlphaFold.
  • 4th place seems important. But I think that even if AGI jobs offered lower compensation, there would still be a considerable number of workers interested in pursuing them.
Comment by JakubK (jskatt) on An AI risk argument that resonates with NYTimes readers · 2023-03-14T08:27:52.867Z · LW · GW

I didn't read it as an argument so much as an emotionally compelling anecdote that excellently conveys this realization:

I had had the upper hand for so long that it became second nature, and then suddenly, I went to losing every game.

Comment by JakubK (jskatt) on Impact Measure Testing with Honey Pots and Myopia · 2023-03-09T20:02:54.227Z · LW · GW

 is probably an Iverson bracket

Comment by JakubK (jskatt) on Learning How to Learn (And 20+ Studies) · 2023-03-04T08:06:25.384Z · LW · GW

Does anyone have thoughts on Justin Sung? He has a popular video criticizing active recall and spaced repetition. The argument: if you use better strategies for initially encountering an idea and storing it in long-term memory, then the corresponding forgetting curve will exhibit a more gradual decline, and you won't need to use flashcards as frequently.

I see some red flags about Justin:

  • clickbait video titles
  • he's selling an online course
  • he spends a lot of time talking about how wild it is that everyone else is wrong about this stuff and he is right
  • he rarely gives detailed recommendations for better ways to study; this video has the most concrete advice I've seen so far
  • I could not find any trustworthy, detailed reviews of his course (e.g. many of the comments in this Reddit post looked fishy), although I didn't search very hard

Nonetheless, I'm curious if anyone has evaluated some of his critiques. I think a naive reader could conclude from all the spaced recall hype that the best way to learn is "just do flashcards in clever ways," and this sounds wrong to me. One intuition pump: does Terrence Tao use flashcards?

Comment by JakubK (jskatt) on Bing chat is the AI fire alarm · 2023-03-02T21:45:54.542Z · LW · GW

Microsoft is currently the 2# largest company on earth and is valued at almost 2 Trillion.

What does "largest" mean? By revenue, Microsoft is 33rd (according to Wikipedia).

EDIT: I'm guessing you mean 2nd largest public corporation based on market capitalization.

Comment by JakubK (jskatt) on Cyborg Periods: There will be multiple AI transitions · 2023-02-24T20:38:37.594Z · LW · GW

That makes sense. My main question is: where is the clear evidence of human negligibility in chess? People seem to be misleadingly confident about this proposition (in general; I'm not targeting your post).

When a friend showed me the linked post, I thought "oh wow that really exposes some flaws in my thinking surrounding humans in chess." I believe some of these flaws came from hearing assertive statements from other people on this topic. As an example, here's Sam Harris during his interview with Eliezer Yudkowsky (transcript, audio):

Obviously we’ll be getting better and better at building narrow AI. Go is now, along with Chess, ceded to the machines. Although I guess probably cyborgs—human-computer teams—may still be better for the next fifteen days or so against the best machines. But eventually, I would expect that humans of any ability will just be adding noise to the system, and it’ll be true to say that the machines are better at chess than any human-computer team.

(In retrospect, this is a very weird assertion. Fifteen days? I thought he was talking about Go, but the last sentence makes it sound like he's talking about chess.)

Comment by JakubK (jskatt) on Cyborg Periods: There will be multiple AI transitions · 2023-02-22T22:38:41.612Z · LW · GW

AIs overtake humans. Humans become obsolete and their contribution is negligible to negative.

I'm confused why chess is listed as an example here. This StackExchange post suggests that cyborg teams are still better than chess engines. Overall, I'm struggling to find evidence for or against this claim (that humans are obsolete in chess), even though it's a pretty common point in discussions about AI.

Comment by JakubK (jskatt) on Best introductory overviews of AGI safety? · 2023-02-16T19:13:03.133Z · LW · GW

Thanks, I added it to the doc.

Comment by JakubK (jskatt) on The conceptual Doppelgänger problem · 2023-02-13T04:05:03.461Z · LW · GW

A conceptual Dopplegänger of some concept Z, is a concept Z' that serves some overlapping functions in the mind as Z serves, but is psychically distinct from Z.

What is a concrete example of a conceptual Dopplegänger?

Comment by JakubK (jskatt) on Fun with +12 OOMs of Compute · 2023-02-09T18:21:05.281Z · LW · GW

I think it's worth noting Joe Carlsmith's thoughts on this post, available starting on page 7 of Kokotajlo's review of Carlsmith's power-seeking AI report (see this EA Forum post for other reviews).

JC: I do think that the question of how much probability mass you concentrate on APS-AI by 2030 is helpful to bring out – it’s something I’d like to think more about (timelines wasn’t my focus in this report’s investigation), and I appreciate your pushing the consideration. 

I read over your post on +12 OOMs, and thought a bit about your argument here. One broad concern I have is that it seems like rests a good bit (though not entirely) on a “wow a trillion times more compute is just so much isn’t it” intuition pump about how AI capabilities scale with compute inputs, where the intuition has a quasi-quantitative flavor, and gets some force from some way in which big numbers can feel abstractly impressive (and from being presented in a context of enthusiasm about the obviousness of the conclusion), but in fact isn’t grounded in much. I’d be interested, for example, to see how this methodology looks if you try running it in previous eras without the benefit of hindsight (e.g., what % do you want on each million-fold scale up in compute-for-the-largest-AI-experiment). That said, maybe this ends up looking OK in previous eras too, and regardless, I do think this era is different in many ways: notably, getting in the range of various brain-related biological milestones, the many salient successes (which GPT-7 and OmegaStar extrapolate from), and the empirical evidence of returns to ML-style scaling. And I think the concreteness of the examples you provide is useful, and differentiating from mere hand-waves at big numbers.

Those worries aside, here’s a quick pass at some probabilities from the exercise, done for “2020 techniques” (I’m very much making these up as I go along, I expect them to change as I think more). 

A lot of the juice, for me, comes from GPT-7 and Omegastar as representatives of “short and low-end-of-medium-to-long horizon neural network anchors”, which seem to me the most plausible and the best-grounded quantitatively. 

  • In particular, I agree that if scaling up and fine-tuning multi-modal short-horizon systems works for the type of model sizes you have in mind, we should think that less than 1e35 FLOPs is probably enough – indeed, this is where a lot of my short-timelines probability comes from. Let’s say 35% on this.
  • It’s less clear to me what AlphaStar-style training of a human-brain-sized system on e.g. 30k consecutive Steam Games (plus some extra stuff) gets you, but I’m happy to grant that 1e35 FLOPs gives you a lot of room to play even with longer-horizon forms of training and evolution-like selection. Conditional on the previous bullet not working (which would update me against the general compute-centric, 2020-technique-enthusiast vibe here), let’s say another 40% that this works, so out of a remaining 65%, that’s 26% on top. 
  • I’m skeptical of Neuromorph (I think brain scanning with 2020 tech will be basically unhelpful in terms of reproducing useful brain stuff that you can’t get out of neural nets already, so whether the neuromorph route works is ~entirely correlated with whether the other neural net routes work), and Skunkworks (e.g., extensive search and simulation) seems like it isn’t focused on APS-systems in particular and does worse on a “why couldn’t you have said this in previous areas” test (though maybe it leads to stuff that gets you APS systems – e.g., better hardware). Still, there’s presumably a decent amount of “other stuff” not explicitly on the radar here. Conditional on previous bullet points not working (again, an update towards pessimism), probability that “other stuff” works? Idk… 10%? (I’m thinking of the previous, ML-ish bullets as the main meat of “2020 techniques.”) So that would be 10% of a remaining 39%, so ~4%. 

So overall 35%+26%+4% =~65% on 1e35 FLOPs gets you APS-AI using “2020 techniques” in principle? Not sure how I’ll feel about this on more reflection + consistency checking, though. Seems plausible that this number would push my overall p(timelines) higher (they’ve been changing regardless since writing the report), which is points in favor of your argument, but it also gives me pause about ways the exercise might be distorting. In particular, I worry the exercise (at least when I do it) isn’t actually working with a strong model of how compute translates into concrete results, or tracking other sources of uncertainty and/or correlation between these different paths (like uncertainty about brain-based estimates, scaling-centrism, etc – a nice thing about Ajeya’s model is that it runs some of this uncertainty through a monte carlo).

OK, what about 1e29 or less? I’ll say: 25%. (I think this is compatible with a reasonable degree of overall smoothness in distributing my 65% across my OOMs).

In general, though, I’d also want to discount in a way that reflects the large amount of engineering hassle, knowledge build-up, experiment selection/design, institutional buy-in, other serial-time stuff, data collection, etc required for the world to get into a position where it’s doing this kind of thing successfully by 2030, even conditional on 1e29 being enough in principle (I also don’t take $50B on a single training run by 2030 for granted even if in worlds where 1e29 is enough, though I grant that WTP could also go higher). Indeed, this kind of thing in general makes the exercise feel a bit distorting to me. E.g., “today’s techniques” is kind of ambiguous between “no new previously-super-secret sauce” and “none of the everyday grind of figuring out how to do stuff, getting new non-breakthrough results to build on and learn from, gathering data, building capacity, recruiting funders and researchers, etc” (and note also that in the world you’re imagining, it’s not that our computers are a million times faster; rather, a good chunk of it is that people have become willing to spend much larger amounts on gigantic training runs – and in some worlds, they may not see the flashy results you’re expecting to stimulate investment unless they’re willing to spend a lot in the first place). Let’s cut off 5% for this. 

So granted the assumptions you list about compute availability, this exercise puts me at ~20% by 2030, plus whatever extra from innovations in techniques we don’t think of as covered via your 1-2 OOMs of algorithmic improvement assumption. This feels high relative to my usual thinking (and the exercise leaves me with some feeling that I’m taking a bunch of stuff in the background for granted), but not wildly high.