Paul Christiano's views on "doom" (video explainer) 2023-09-29T21:56:01.069Z
Neel Nanda on the Mechanistic Interpretability Researcher Mindset 2023-09-21T19:47:02.745Z
Panel with Israeli Prime Minister on existential risk from AI 2023-09-18T23:16:39.965Z
Eric Michaud on the Quantization Model of Neural Scaling, Interpretability and Grokking 2023-07-12T22:45:45.753Z
Jesse Hoogland on Developmental Interpretability and Singular Learning Theory 2023-07-06T15:46:00.116Z
Should AutoGPT update us towards researching IDA? 2023-04-12T16:41:13.735Z
Collin Burns on Alignment Research And Discovering Latent Knowledge Without Supervision 2023-01-17T17:21:40.189Z
Victoria Krakovna on AGI Ruin, The Sharp Left Turn and Paradigms of AI Alignment 2023-01-12T17:09:03.431Z
David Krueger on AI Alignment in Academia, Coordination and Testing Intuitions 2023-01-07T19:59:09.785Z
Ethan Caballero on Broken Neural Scaling Laws, Deception, and Recursive Self Improvement 2022-11-04T18:09:04.759Z
Shahar Avin On How To Regulate Advanced AI Systems 2022-09-23T15:46:46.155Z
Katja Grace on Slowing Down AI, AI Expert Surveys And Estimating AI Risk 2022-09-16T17:45:47.341Z
Alex Lawsen On Forecasting AI Progress 2022-09-06T09:32:54.071Z
Robert Long On Why Artificial Sentience Might Matter 2022-08-28T17:30:34.410Z
Ethan Perez on the Inverse Scaling Prize, Language Feedback and Red Teaming 2022-08-24T16:35:43.086Z
Connor Leahy on Dying with Dignity, EleutherAI and Conjecture 2022-07-22T18:44:19.749Z
Raphaël Millière on Generalization and Scaling Maximalism 2022-06-24T18:18:10.503Z
Blake Richards on Why he is Skeptical of Existential Risk from AI 2022-06-14T19:09:26.783Z
Ethan Caballero on Private Scaling Progress 2022-05-05T18:32:18.673Z
Why Copilot Accelerates Timelines 2022-04-26T22:06:19.507Z
OpenAI Solves (Some) Formal Math Olympiad Problems 2022-02-02T21:49:36.722Z
Phil Trammell on Economic Growth Under Transformative AI 2021-10-24T18:11:17.694Z
The Codex Skeptic FAQ 2021-08-24T16:01:18.844Z
Evan Hubinger on Homogeneity in Takeoff Speeds, Learned Optimization and Interpretability 2021-06-08T19:20:25.977Z
What will GPT-4 be incapable of? 2021-04-06T19:57:57.127Z
An Increasingly Manipulative Newsfeed 2019-07-01T15:26:42.566Z
Book Review: AI Safety and Security 2018-08-21T10:23:24.165Z
Human-Aligned AI Summer School: A Summary 2018-08-11T08:11:00.789Z
A Gym Gridworld Environment for the Treacherous Turn 2018-07-28T21:27:34.487Z


Comment by Michaël Trazzi (mtrazzi) on Stampy's AI Safety Info - New Distillations #4 [July 2023] · 2023-08-16T20:57:56.418Z · LW · GW

Thanks for the work!

Quick questions:

  • do you have any stats on how many people visit every month? how many people end up wanting to get involved as a result?
  • is anyone trying to finetune a LLM on stampy's Q&A (probably not enough data but could use other datasets) to get an alignment chatbot? Passing things in a large claude 2 context window might also work?
Comment by Michaël Trazzi (mtrazzi) on Jesse Hoogland on Developmental Interpretability and Singular Learning Theory · 2023-07-07T06:44:44.172Z · LW · GW

Thanks, should be fixed now.

Comment by Michaël Trazzi (mtrazzi) on Statement on AI Extinction - Signed by AGI Labs, Top Academics, and Many Other Notable Figures · 2023-06-01T03:56:15.104Z · LW · GW

FYI your Epoch's Literature review link is currently pointing to

Comment by Michaël Trazzi (mtrazzi) on Clarifying and predicting AGI · 2023-05-09T22:23:39.259Z · LW · GW

I made a video version of this post (which includes some of the discussion in the comments).

Comment by Michaël Trazzi (mtrazzi) on My views on “doom” · 2023-04-28T17:09:22.573Z · LW · GW

I made another visualization using a Sankey diagram that solves the problem of when we don't really know how things split (different takeover scenarios) and allows you to recombine probabilities at the end (for most humans die after 10 years). 

Comment by Michaël Trazzi (mtrazzi) on Should AutoGPT update us towards researching IDA? · 2023-04-12T23:22:20.339Z · LW · GW

The evidence I'm interested goes something like:

  • we have more empirical ways to test IDA
  • it seems like future systems will decompose / delegates tasks to some sub-agents, so if we think either 1) it will be an important part of the final model that successfully recursively self-improves 2) there are non-trivial chances that this leads us to AGI before we can try other things, maybe it's high EV to focus more on IDA-like approaches?
Comment by Michaël Trazzi (mtrazzi) on What can we learn from Lex Fridman’s interview with Sam Altman? · 2023-03-27T13:27:26.102Z · LW · GW

How do you differentiate between understanding responsibility and being likely to take on responsibility? Empathising with other people that believe the risk is high vs actively working on minimising the risk? Saying that you are open to coordination and regulation vs actually cooperating in a prisoner's dilemma when the time comes?

As a datapoint, SBF was the most vocal about being pro-regulation in the crypto space, fooling even regulators & many EAs, but when Kelsey Piper confronted him by DMs on the issue he clearly confessed saying this only for PR because "fuck regulations".

Comment by Michaël Trazzi (mtrazzi) on Aspiring AI safety researchers should ~argmax over AGI timelines · 2023-03-03T08:11:20.973Z · LW · GW

[Note: written on a phone, quite rambly and disorganized]

I broadly agree with the approach, some comments:

  • people's timelines seem to be consistently updated in the same direction (getting shorter). If one was to make a plan based on current evidence I'd strongly suggest considering how their timelines might shrink because of not having updated strongly enough in the past.
  • a lot of my coversations with aspiring ai safety researchers goes something like "if timelines were so short I'd have basically no impact, that's why I'm choosing to do a PhD" or "[specific timelines report] gives X% of TAI by YYYY anyway". I believe people who choose to do research drastically underestimate the impact they could have in short timelines worlds (esp. through under-explored non-research paths, like governance / outreach etc) and overestimate the probability of AI timelines reports being right.
  • as you said, it makes senses to consider plans that works in short timelines and improve things in medium/long timelines as well. Thus you might actually want to estimate the EV of a research policy for 2023-2027 (A), 2027-2032 (B) and 2032-2042 (C) where by plicy I mean you apply a strategy for either A and update if no AGI in 2027, or you apply a strategy for A+B and update in 2032, etc.
  • It also makes sense to consider who could help you with your plan. If you plan to work at Anthropic, OAI, Conjecture etc it seems that many people there consider seriously the 2027 scenario, and teams there would be working on short timelines agendas matter what.
  • if you'd have 8x more impact on a long timelines scenario than short timelines, but consider short timelines only 7x more likely, working as if long timelines were true would create a lot of cognitive dissonance which could turn out to be counterproductive
  • if everyone was doing this and going to PhD, the community would end up producing less research now, therefore having less research for the ML community to interact with in the meantime. It would also reduce the number of low-quality research, and admittedly doing PhD one would also publish papers that would be a better way to attract more academics to the field.
  • one should stress the importance of testing for personal fit early on. If you think you'd be a great researcher in 10 years but have never tried research, consider doing internships / publishing research before going through the grad school pipeline? Also PhD can be a lonely path and unproductive for many. Especially if the goal is to do AI Safety research, test the fit for direct work as early as possible (alignment research is surprisingly more pre-paradigmatic than mainstream ML research)
Comment by Michaël Trazzi (mtrazzi) on Spreading messages to help with the most important century · 2023-01-26T04:57:13.726Z · LW · GW

meta: it seems like the collapse feature doesn't work on mobile, and the table is hard to read (especially the first column)

Comment by Michaël Trazzi (mtrazzi) on Collin Burns on Alignment Research And Discovering Latent Knowledge Without Supervision · 2023-01-18T00:33:28.492Z · LW · GW

That sounds right, thanks!

Comment by Michaël Trazzi (mtrazzi) on Victoria Krakovna on AGI Ruin, The Sharp Left Turn and Paradigms of AI Alignment · 2023-01-13T00:42:30.209Z · LW · GW

Fixed thanks

Comment by Michaël Trazzi (mtrazzi) on All AGI Safety questions welcome (especially basic ones) [~monthly thread] · 2022-11-03T23:16:06.408Z · LW · GW

Use the dignity heuristic as reward shaping

“There's another interpretation of this, which I think might be better where you can model people like AI_WAIFU as modeling timelines where we don't win with literally zero value. That there is zero value whatsoever in timelines where we don't win. And Eliezer, or people like me, are saying, 'Actually, we should value them in proportion to how close to winning we got'. Because that is more healthy... It's reward shaping! We should give ourselves partial reward for getting partially the way. He says that in the post, how we should give ourselves dignity points in proportion to how close we get.

And this is, in my opinion, a much psychologically healthier way to actually deal with the problem. This is how I reason about the problem. I expect to die. I expect this not to work out. But hell, I'm going to give it a good shot and I'm going to have a great time along the way. I'm going to spend time with great people. I'm going to spend time with my friends. We're going to work on some really great problems. And if it doesn't work out, it doesn't work out. But hell, we're going to die with some dignity. We're going to go down swinging.”

Comment by Michaël Trazzi (mtrazzi) on Katja Grace on Slowing Down AI, AI Expert Surveys And Estimating AI Risk · 2022-09-17T01:31:53.291Z · LW · GW

Thanks for the feedback! Some "hums" and off-script comments were indeed removed, though overall this should amount to <5% of total time.

Comment by Michaël Trazzi (mtrazzi) on Understanding Conjecture: Notes from Connor Leahy interview · 2022-09-16T01:31:27.513Z · LW · GW

Great summary!

You can also find some quotes of our conversation here:

Comment by Michaël Trazzi (mtrazzi) on Connor Leahy on Dying with Dignity, EleutherAI and Conjecture · 2022-07-24T21:11:49.566Z · LW · GW

I like this comment, and I personally think the framing you suggest is useful. I'd like to point out that, funnily enough, in the rest of the conversation ( not in the quotes unfortunately) he says something about the dying with dignity heuristic being useful because humans are (generally) not able to reason about quantum timelines.

Comment by Michaël Trazzi (mtrazzi) on Connor Leahy on Dying with Dignity, EleutherAI and Conjecture · 2022-07-24T07:06:54.457Z · LW · GW

First point: by "really want to do good" (the really is important here) I mean someone who would be fundamentally altruistic and would not have any status/power desire, even subconsciously.

I don't think Conjecture is an "AGI company", everyone I've met there cares deeply about alignment and their alignment team is a decent fraction of the entire company. Plus they're funding the incubator.

I think it's also a misconception that it's an unilateralist intervension. Like, they've talked to other people in the community before starting it, it was not a secret.

Comment by Michaël Trazzi (mtrazzi) on Connor Leahy on Dying with Dignity, EleutherAI and Conjecture · 2022-07-23T19:04:21.304Z · LW · GW

tl-dr: people change their minds, reasons why things happen are complex, we should adopt a forgiving mindset/align AI and long-term impact is hard to measure. At the bottom I try to put numbers on EleutherAI's impact and find it was plausibly net positive.

I don't think discussing whether someone really wants to do good or whether there is some (possibly unconscious?) status-optimization process is going to help us align AI.

The situation is often mixed for a lot of people, and it evolves over time. The culture we need to have on here to solve AI existential risk need to be more forgiving. Imagine there's a ML professor who has been publishing papers advancing the state of the art for 20 years who suddenly goes "Oh, actually alignment seems important, I changed my mind", would you write a LW post condemning them and another lengthy comment about their status-seeking behavior in trying to publish papers just to become a better professor?

I have recently talked to some OpenAI employee who met Connor something like three years ago, when the whole "reproducing GPT-2" thing came about. And he mostly remembered things like the model not having been benchmarked carefully enough. Sure, it did not perform nearly as good on a lot of metrics, though that's kind of missing the point of how this actually happened? As Connor explains, he did not know this would go anywhere, and spent like 2 weeks working on, without lots of DL experience. He ended up being convinced by some MIRI people to not release it, since this would be establishing a "bad precedent".

I like to think that people can start with a wrong model of what is good and then update in the right direction. Yes, starting yet another "open-sourcing GPT-3" endeavor the next year is not evidence of having completely updated towards "let's minimize the risk of advancing capabilities research at all cost", though I do think that some fraction of people at EleutherAI truly care about alignment and just did not think that the marginal impact of "GPT-Neo/-J accelerating AI timelines" justified not publishing them at all.

My model for what happened for the EleutherAI story is mostly the ones of "when all you have is a hammer everything looks like a nail". Like, you've reproduced GPT-2 and you have access to lots of compute, why not try out GPT-3? And that's fine. Like, who knew that the thing would become a Discord server with thousands of people talking about ML? That they would somewhat succeed? And then, when the thing is pretty much already somewhat on the rails, what choice do you even have? Delete the server? Tell the people who have been working hard for months to open-source GPT-3 like models that "we should not publish it after all"? Sure, that would have minimized the risk of accelerating timelines. Though when trying to put number on it below I find that it's not just "stop something clearly net negative", it's much more nuanced than that.

And after talking to one of the guys who worked on GPT-J for hours, talking to Connor for 3h, and then having to replay what he said multiple times while editing the video/audio etc., I kind of have a clearer sense of where they're coming from. I think a more productive way of making progress in the future is to look at what the positive and negative were, and put numbers on what was plausibly net good and plausible net bad, so we can focus on doing the good things in the future and maximize EV (not just minimize risk of negative!).

To be clear, I started the interview with a lot of questions about the impact of EleutherAI, and right now I have a lot more positive or mixed evidence for why it was not "certainly a net negative" (not saying it was certainly net positive). Here is my estimate of the impact of EleutherAI, where I try to measure things in my 80% likelihood interval for positive impact for aligning AI, where the unit is "-1" for the negative impact of publishing the GPT-3 paper. eg. (-2, -1) means: "a 80% change that impact was between 2x GPT-3 papers and 1x GPT-3 paper".

Mostly Negative
-- Publishing the Pile: (-0.4, -0.1) (AI labs, including top ones, use the Pile to train their models)
-- Making ML researchers more interested in scaling: (-0.1, -0.025) (GPT-3 spread the scaling meme, not EleutherAI)
-- The potential harm that might arise from the next models that might be open-sourced in the future using the current infrastructure: (-1, -0.1) (it does seem that they're open to open-sourcing more stuff, although plausibly more careful)

-- Publishing GPT-J: (-0.4, 0.2) (easier to finetune than GPT-Neo, some people use it, though admittedly it was not SoTA when it was released. Top AI labs had supposedly better models. Interpretability / Alignment people, like at Redwood, use GPT-J / GPT-Neo models to interpret LLMs)

Mostly Positive
-- Making ML researchers more interested in alignment: (0.2, 1) (cf. the part when Connor mentions ML professors moving to alignment somewhat because of Eleuther) 
-- Four of the five core people of EleutherAI changing their career to work on alignment, some of them setting up Conjecture, with tacit knowledge of how these large models work: (0.25, 1)
-- Making alignment people more interested in prosaic alignment: (0.1, 0.5)
-- Creating a space with a strong rationalist and ML culture where people can talk about scaling and where alignment is high-status and alignment people can talk about what they care about in real-time + scaling / ML people can learn about alignment: (0.35, 0.8)

Averaging these ups I get (if you could just add confidence intervals, I know this is not how probability work) a 80% chance of the impact being in: (-1, 3.275), so plausibly net good.

Comment by Michaël Trazzi (mtrazzi) on Connor Leahy on Dying with Dignity, EleutherAI and Conjecture · 2022-07-22T20:55:04.710Z · LW · GW

In their announcement post they mention:

Mechanistic interpretability research in a similar vein to the work of Chris Olah and David Bau, but with less of a focus on circuits-style interpretability  and more focus on research whose insights can scale to models with many billions of parameters and larger. Some example approaches might be: 

  • Locating and editing factual knowledge in a transformer language model.
  • Using deep learning to automate deep learning interpretability - for example, training a language model to give semantic labels to neurons or other internal circuits.
  • Studying the high-level algorithms that models use to perform e.g, in-context learning or prompt programming.
Comment by Michaël Trazzi (mtrazzi) on AI Forecasting: One Year In · 2022-07-05T21:05:20.032Z · LW · GW

I believe the forecasts were aggregated around June 2021. When was GPT2-finetune released? What about GPT3 few show?

Re jumps in performance: jack clark has a screenshot on twitter about saturated benchmarks from the dynabench paper (2021), it would be interesting to make something up-to-date with MATH

Comment by Michaël Trazzi (mtrazzi) on Raphaël Millière on Generalization and Scaling Maximalism · 2022-06-24T18:57:34.644Z · LW · GW

I think it makes sense (for him) to not believe AI X-risk is an important problem to solve (right now) if he believes that the "fast enough" means "not in his lifetime", and he also puts a lot of moral weight on near-term issues. For completeness sake, here are some claims more relevant to "not being able to solve the core problem".

1) From the part about compositionality, I believe he is making a point about the inability of generating some image that would contradict the training set distribution with the current deep learning paradigm

Generating an image for the caption, a horse riding on an astronaut. That was the example that Gary Marcus talked about, where a human would be able to draw that because a human understand the compositional semantics of that input and current models are struggling also because of distributional statistics and in the image to text example, that would be for example, stuff that we've been seeing with Flamingo from DeepMind, where you look at an image and that might represent something very unusual and you are unable to correctly describe the image in the way that's aligned with the composition of the image. So that's the parsing problem that I think people are mostly concerned with when it comes to compositionality and AI.

2) From the part about generalization, he is saying that there is some inability to build truly general systems. I do not agree with his claim, but if I were to steelman the argument it would be something like "even if it seems deep learning is making progress, Boston Robotics is not using deep learning and there is no progress in the kind of generalization needed for the Wozniak test"

the Wozniak test, which was proposed by Steve Wozniak, which is building a system that can walk into a room, find the coffee maker and brew a good cup of coffee. So these are tasks or capacities that require adapting to novel situations, including scenarios that were not foreseen by the programmers where, because there are so many edge cases in driving, or indeed in walking into an apartment, finding a coffee maker of some kind and making a cup of coffee. There are so many potential edge cases. And, this very long tail of unlikely but possible situations where you can find yourself, you have to adapt more flexibly to this kind of thing.


But I don't know whether that would even make sense, given the other aspect of this test, which is the complexity of having a dexterous robot that can manipulate objects seamlessly and the kind of thing that we're still struggling with today in robotics, which is another interesting thing that, we've made so much progress with disembodied models and there are a lot of ideas flying around with robotics, but in some respect, the state of the art in robotics where the models from Boston Dynamics are not using deep learning, right?

Comment by Michaël Trazzi (mtrazzi) on The inordinately slow spread of good AGI conversations in ML · 2022-06-22T01:31:24.016Z · LW · GW

I have never thought of such a race. I think this comment is worth its own post.

Comment by Michaël Trazzi (mtrazzi) on Where I agree and disagree with Eliezer · 2022-06-22T01:26:15.600Z · LW · GW

Datapoint: I skimmed through Eliezer's post, but read this one from start to finish in one sitting. This post was for me the equivalent of reading the review of a book I haven't read, where you get all the useful points and nuance. I can't stress enough how useful that was for me. Probably the most insightful post I have read since "Are we in AI overhang".

Comment by Michaël Trazzi (mtrazzi) on Blake Richards on Why he is Skeptical of Existential Risk from AI · 2022-06-15T09:05:23.657Z · LW · GW

Thanks for bringing up the rest of the conversation. It is indeed unfortunate that I cut out certain quotes from their full context. For completness sake, here is the full excerpt without interruptions, including my prompts. Emphasis mine.

Michaël: Got you. And I think Yann LeCun’s point is that there is no such thing as AGI because it’s impossible to build something truly general across all domains.

Blake: That’s right. So that is indeed one of the sources of my concerns as well. I would say I have two concerns with the terminology AGI, but let’s start with Yann’s, which he’s articulated a few times. And as I said, I agree with him on it. We know from the no free lunch theorem that you cannot have a learning algorithm that outperforms all other learning algorithms across all tasks. It’s just an impossibility. So necessarily, any learning algorithm is going to have certain things that it’s good at and certain things that it’s bad at. Or alternatively, if it’s truly a Jack of all trades, it’s going to be just mediocre at everything. Right? So with that reality in place, you can say concretely that if you take AGI to mean literally good at anything, it’s just an impossibility, it cannot exist. And that’s been mathematically proven.

Blake: Now, all that being said, the proof for the no free lunch theorem, refers to all possible tasks. And that’s a very different thing from the set of tasks that we might actually care about. Right?

Michaël: Right.

Blake: Because the set of all possible tasks will include some really bizarre stuff that we certainly don’t need our AI systems to do. And in that case, we can ask, “Well, might there be a system that is good at all the sorts of tasks that we might want it to do?” Here, we don’t have a mathematical proof, but again, I suspect Yann’s intuition is similar to mine, which is that you could have systems that are good at a remarkably wide range of things, but it’s not going to cover everything you could possibly hope to do with AI or want to do with AI.

Blake: At some point, you’re going to have to decide where your system is actually going to place its bets as it were. And that can be as general as say a human being. So we could, of course, obviously humans are a proof of concept that way. We know that an intelligence with a level of generality equivalent to humans is possible and maybe it’s even possible to have an intelligence that is even more general than humans to some extent. I wouldn’t discount it as a possibility, but I don’t think you’re ever going to have something that can truly do anything you want, whether it be protein folding, predictions, managing traffic, manufacturing new materials, and also having a conversation with you about your grand’s latest visit that can’t be… There is going to be no system that does all of that for you.

Michaël: So we will have system that do those separately, but not at the same time?

Blake: Yeah, exactly. I think that we will have AI systems that are good at different domains. So, we might have AI systems that are good for scientific discovery, AI systems that are good for motor control and robotics, AI systems that are good for general conversation and being assistants for people, all these sorts of things, but not a single system that does it all for you.

Michaël: Why do you think that?

Blake: Well, I think that just because of the practical realities that one finds when one trains these networks. So, what has happened with, for example, scaling laws? And I said this to Ethan the other day on Twitter. What’s happened with scaling laws is that we’ve seen really impressive ability to transfer to related tasks. So if you train a large language model, it can transfer to a whole bunch of language-related stuff, very impressively. And there’s been some funny work that shows that it can even transfer to some out-of-domain stuff a bit, but there hasn’t been any convincing demonstration that it transfers to anything you want. And in fact, I think that the recent paper… The Gato paper from DeepMind actually shows, if you look at their data, that they’re still getting better transfer effects if you train in domain than if you train across all possible tasks.

Comment by Michaël Trazzi (mtrazzi) on Blake Richards on Why he is Skeptical of Existential Risk from AI · 2022-06-15T08:56:49.559Z · LW · GW

The goal of the podcast is to discuss why people believe certain things while discussing their inside views about AI. In this particular case, the guest gives roughly three reasons for his views:

  • the no free lunch theorem showing why you cannot have a model that outperforms all other learning algorithms across all tasks.
  • the results from the Gato paper where models specialized in one domain are better (in that domain) than a generalist agent (the transfer learning, if any, did not lead to improved performance).
  • society as a whole being similar to some "general intelligence", with humans being the individual constituants who have a more specialized intelligence

If I were to steelman his point about humans being specialized, I think he basically meant that what happened with society is we have many specialized agents, and that's probably what will happen as AIs automate our economy, as AIs specialized in one domain will be better than general ones at specific tasks.

He is also saying that, with respect to general agents, we have evidence from humans, the impossibility result from the no free lunch theorem, and basically no evidence for anything in between. For the current models, there is evidence for positive transfer for NLP tasks but less evidence for a broad set of tasks like in Gato.

The best version of the "different levels of generality" argument I can think of (though I don't buy it) goes something like: "The reasons why humans are able to do impressive things like building smartphones is because they are multiple specialized agents who teach other humans what they have done before they die. No humans alive today could build the latest Iphone from scratch, yet as a society we build it. It is not clear that a single ML model who is never turned off would be trivially capable of learning to do virtually everything that is needed to build a smartphone, spaceships and other things that humans might have not discovered yet necessary to expand through space, and even if it is a possibility, what will most likely happen (and sooner) is a society full of many specialized agents (cf. CAIS)."

Comment by Michaël Trazzi (mtrazzi) on Why all the fuss about recursive self-improvement? · 2022-06-13T19:57:20.604Z · LW · GW

For people who think that agenty models of recursive self-improvement do not fully apply to the current approach to training large neural nets, you could consider {Human,AI} systems already recursively self-improving through tools like Copilot.

Comment by Michaël Trazzi (mtrazzi) on AGI Safety FAQ / all-dumb-questions-allowed thread · 2022-06-09T08:39:49.177Z · LW · GW

I believe the Counterfactual Oracle uses the same principle

Comment by Michaël Trazzi (mtrazzi) on Shortform · 2022-06-06T10:09:43.461Z · LW · GW

I think best way to look at it is climate change way before it was mainstream

Comment by Michaël Trazzi (mtrazzi) on On saving one's world · 2022-05-17T21:01:43.811Z · LW · GW

I found the concept of flailing and becoming what works useful.

I think the world will be saved by a diverse group of people. Some will be high integrity groups, other will be playful intellectuals, but the most important ones (that I think we currently need the most) will lead, take risks, explore new strategies.

In that regard, I believe we need more posts like lc's containment strategy one or the other about pulling the fire alarm for AGI. Even if those plans are different than the ones the community has tried so far. Integrity alone will not save the world. A more diverse portfolio might.

Comment by Michaël Trazzi (mtrazzi) on Why I'm Optimistic About Near-Term AI Risk · 2022-05-17T13:40:58.887Z · LW · GW

Note: I updated the parent comment to take into account interest rates.

In general, the way to mitigate trust would be to use an escrow, though when betting on doom-ish scenarios there would be little benefits in having $1000 in escrow if I "win".

For anyone reading this who also thinks that it would need to be >$2000 to be worth it, I am happy to give $2985 at the end of 2032, aka an additional 10% to the average annual return of the S&P 500 (ie 1.1 * (1.105^10 * 1000)),  if that sounds less risky than the SPY ETF bet.

Comment by Michaël Trazzi (mtrazzi) on Why I'm Optimistic About Near-Term AI Risk · 2022-05-17T09:48:47.163Z · LW · GW

For anyone of those (supposedly) > 50% respondents claiming a < 10% probability, I am happy to take 1:10 odds $1000 bet for:

"by the end of 2032, fewer than a million humans are alive on the surface of the earth, primarily as a result of AI systems not doing/optimizing what the people deploying them wanted/intended"

Where, similar to Bryan Caplan's bet with Yudwkosky, I get paid like $1000 now, and at the end of 2032 I give them back, adding 100 dollars.

(Given inflation and interest, this seems like a bad deal for the one giving the money now, though I find it hard to predict 10y inflation and I do not want to have extra pressure to invest those $1000 for 10y. If someone has another deal in mind that would sound more interesting, do let me know here or by DM).

To make the bet fair, the size of the bet would be the equivalent of the value in 2032 of $1000 worth in SPY ETF bought today (400.09 at May 16 close). And to mitigate the issue of not being around to receive the money, I would receive a payment of $1000 now. If I lose I give back whatever $1000 of SPY ETF from today is worth in 2032, adding 10% to that value.

Comment by Michaël Trazzi (mtrazzi) on Why I'm Optimistic About Near-Term AI Risk · 2022-05-17T09:31:40.178Z · LW · GW

Thanks for the survey. Few nitpicks:
- the survey you mention is ~1y old (May 3-May 26 2021). I would expect those researchers to have updated from the scaling laws trend continuing with Chinchilla, PaLM, Gato, etc. (Metaculus at least did update significantly, though one could argue that people taking the survey at CHAI, FHI, DeepMind etc. would be less surprised by the recent progress.)

- I would prefer the question to mention "1M humans alive on the surface on the earth" to avoid people surviving inside "mine shafts" or on Mars/the Moon (similar to the Bryan Caplan / Yudkowsky bet).

Comment by Michaël Trazzi (mtrazzi) on Why I'm Optimistic About Near-Term AI Risk · 2022-05-17T08:41:29.650Z · LW · GW

I can't see this path leading to high existential risk in the next decade or so.

Here is my write-up for a reference class of paths that could lead to high existential risk this decade. I think such paths are not hard to come up with and I am happy to pay a bounty of $100 for someone else to sit for one hour and come up with another story for another reference class (you can send me a DM).

Comment by Michaël Trazzi (mtrazzi) on Why I'm Optimistic About Near-Term AI Risk · 2022-05-17T08:37:05.306Z · LW · GW

Even if the Tool AIs are not dangerous by itself, they will foster productivity. (You say it yourself: "These specialized models will be oriented towards augmenting human productivity"). There is already a many more people working in AI than in the 2010s, and those people are much more productive. This trend will accelerate, because AI benefits compound (eg. using Copilot to write the next Copilot) and the more ML applications automate the economy, the more investments in AI we will observe.

Comment by Michaël Trazzi (mtrazzi) on "A Generalist Agent": New DeepMind Publication · 2022-05-13T07:58:57.416Z · LW · GW

from the lesswrong docs

An Artificial general intelligence, or AGI, is a machine capable of behaving intelligently over many domains. The term can be taken as a contrast to narrow AI, systems that do things that would be considered intelligent if a human were doing them, but that lack the sort of general, flexible learning ability that would let them tackle entirely new domains. Though modern computers have drastically more ability to calculate than humans, this does not mean that they are generally intelligent, as they have little ability to invent new problem-solving techniques, and their abilities are targeted in narrow domains.

If we consider only the first sentence, then yes. The rest of the paragraph points to something like "being able to generalize to new domains". Not sure if Gato counts. (NB: this is just a LW tag, not a full-fledged definition.)

Comment by Michaël Trazzi (mtrazzi) on Why Copilot Accelerates Timelines · 2022-05-01T18:37:32.331Z · LW · GW

the two first are about data, and as far as I know compilers do not use machine learning on data.

third one could technically apply to compilers, though I think in ML there is a feedback loop "impressive performance -> investments in scaling -> more research", but you cannot just throw more compute to increase compiler performance (and results are less in the mainstream, less of a public PR thing)

Comment by Michaël Trazzi (mtrazzi) on Why Copilot Accelerates Timelines · 2022-04-28T08:08:42.739Z · LW · GW

Well, I agree that if two worlds I had in mind were 1) foom without real AI progress beforehand 2) continuous progress, then seeing more continuous progress from increased investments should indeed update me towards 2).

The key parameter here is substitutability between capital and labor. In what sense is Human Labor the bottleneck, or is Capital the bottleneck. From the different growth trajectories and substitutability equations you can infer different growth trajectories. (For a paper / video on this see the last paragraph here).

The world in which dalle-2 happens and people start using Github Copilot looks to me like a world where human labour is substitutable by AI labour, which right now is essentially being part of Github Copilot open beta, but in the future might look like capital (paying the product or investing in building the technology yourself). My intuition right now is that big companies are more bottlenecked by ML talent than by capital (cf. the "are we in ai overhang" post explaining how much more capital could Google invest in AI).

Comment by Michaël Trazzi (mtrazzi) on Why Copilot Accelerates Timelines · 2022-04-27T08:21:51.364Z · LW · GW

Thanks for the pointer. Any specific section / sub-section I should look into?

Comment by Michaël Trazzi (mtrazzi) on Why Copilot Accelerates Timelines · 2022-04-27T08:18:59.077Z · LW · GW

I agree that we are already in this regime. In the section "AI Helping Humans with AI" I tried to make it more precise at what threshold we would see substantial change in how humans interact with AI to build more advanced AI systems. Essentially, it will be when most people would use those tools most of their time (like on a daily basis) and they would observe some substantial gains of productivity (like using some oracle to make a lot of progress on a problem they are stuck on, or Copilot auto-completing a lot of their lines of code without having to manually edit.) The intuition for a threshold is "most people would need to use".

Re diminishing returns: see my other comment. In summary, if you just consider one team building AIHHAI, they would get more data and research as input from the outside world, and they would get increases in productivity from using more capable AIHHAIs. Diminishing returns could happen if: 1) scaling laws for coding AI do not hold anymore 2) we are not able to gather coding data (or do other tricks like data augmentation) at a pace high enough 3) investments for some reasons do not follow 4) there are some hardware bottlenecks in building larger and larger infrastructures. For now I have only seen evidence for 2) and this seems something that can be solved via transfer learning or new ML research.

Better modeling of those different interactions between AI labor and AI capability tech are definitely needed. For some high-level picture that mostly thinks about substitutability between capital and labor, applying to AI, I would recommend this paper (or video and slides). The equation that is the closest to self-improving {H,AI} would be this one.

Comment by Michaël Trazzi (mtrazzi) on Why Copilot Accelerates Timelines · 2022-04-27T07:55:20.994Z · LW · GW

Some arguments for why that might be the case:

-- the more useful it is, the more people use it, the more telemetry data the model has access to

-- while scaling laws do not exhibit diminishing returns from scaling, most of the development time would be on things like infrastructure, data collection and training, rather than aiming for additional performance

-- the higher the performance, the more people get interested in the field and the more research there is publicly accessible to improve performance by just implementing what is in the litterature (Note: this argument does not apply for reasons why one company could just make a lot of progress without ever sharing any of their progress.)

Comment by Michaël Trazzi (mtrazzi) on April 2022 Welcome & Open Thread · 2022-04-22T07:54:11.467Z · LW · GW

When logging to the AF using github I get an Internal Server Error (callback)

Comment by Michaël Trazzi (mtrazzi) on jsd's Shortform · 2022-04-19T17:12:37.883Z · LW · GW

Great quotes. Posting podcast excerpts is underappreciated. Happy to read more of them.

Comment by Michaël Trazzi (mtrazzi) on Daniel Kokotajlo's Shortform · 2022-04-19T08:27:22.253Z · LW · GW

The straightforward argument goes like this:

1. an human-level AGI would be running on hardware making human constraints in memory or speed mostly go away by ~10 orders of magnitude

2. if you could store 10 orders of magnitude more information and read 10 orders of magnitude faster, and if you were able to copy your own code somewhere else, and the kind of AI research and code generation tools available online were good enough to have created you, wouldn't you be able to FOOM?

Comment by Michaël Trazzi (mtrazzi) on Code Generation as an AI risk setting · 2022-04-19T07:52:02.790Z · LW · GW

Code generation is the example I use to convince all of my software or AI friends of the likelihood of AI risk.

  • most of them have heard of it or even tried it (eg. via Github Copilot)
  • they all recognize at directly useful for their work
  • it's easier to understand labour automation when it applies to your own job
Comment by Michaël Trazzi (mtrazzi) on Matthew Barnett's Shortform · 2022-04-13T08:47:57.709Z · LW · GW

fast takeoff folks believe that we will only need a minimal seed AI that is capable of rewriting its source code, and recursively self-improving into superintelligence

Speaking only for myself, the minimal seed AI is a strawman of why I believe in "fast takeoff". In the list of benchmarks you mentioned in your bet, I think APPS is one of the most important.

I think the "self-improving" part will come from the system "AI Researchers + code synthesis model" with a direct feedback loop (modulo enough hardware), cf. here. That's the self-improving superintelligence.

Comment by Michaël Trazzi (mtrazzi) on A concrete bet offer to those with short AGI timelines · 2022-04-10T08:59:09.367Z · LW · GW

I haven't look deeply at what the % on the ML benchmarks actually mean. On the one hand it would be a bit weird to me if in 2030 we still have not made enough progress on them, given the current rate. On the other hand, I trust the authors in that it should be AGI-ish to pass those benchmarks, and then I don't want to bet money on something far into the future if money might not matter as much then. (Also, without considering money mattering less or the fact that the money might not be delivered in 2030 etc., I think anyone taking the 2026 bet should take the 2030 bet since if you're 50/50 in 2026 you're probably 75:25 with 4 extra years).

The more rational thing should then to take the bet for 2026 when money still matters, though apart from the ML benchmarks there is this dishwashing thing where the conditions of the bet are super tough and I don't imagine anyone doing all the reliability tests filming a dishwasher etc. in 3.5y. And then for Tesla I feel the same about those big errors every 100k miles. Like, 1) why only Tesla? 2) wouldn'tmost humans would make risky blunders on such long distances? 3) would anyone really do all those tests on a Tesla?

I'll have another look at the ML benchmarks, but on the mean time it seems that we should do other odds because of Tesla + dishwasher.

Comment by Michaël Trazzi (mtrazzi) on We Are Conjecture, A New Alignment Research Startup · 2022-04-08T13:00:31.769Z · LW · GW

Great news. What kind of products do you plan on releasing?

Comment by Michaël Trazzi (mtrazzi) on OpenAI Solves (Some) Formal Math Olympiad Problems · 2022-02-06T21:10:08.513Z · LW · GW

Oh right I had missed that comment. Edited the post to mention 8% instead. Thanks.

Comment by Michaël Trazzi (mtrazzi) on OpenAI Solves (Some) Formal Math Olympiad Problems · 2022-02-06T20:53:53.245Z · LW · GW

On the usefulness of proving theorems vs. writing them down: I think there's more of a back and forth. See for instance Nature's post on how DM used an AI to guide intuition (

To me, it's going to help humans "babble" more in math by using some extension, like a Github Copilot but for math. And I feel that overall increasing math generation is more "positive for alignment in EV" than generating code, especially when considering agent foundations.

Besides. in ML the correct proofs can happen many years after algorithms are used by everyone in the community (e.g. Adam's proof shown to be wrong in 2018). Having a way to have a more grounded understanding of things could help both for interpretability & having guarantees of safety.

Comment by Michaël Trazzi (mtrazzi) on OpenAI Solves (Some) Formal Math Olympiad Problems · 2022-02-03T20:31:22.407Z · LW · GW

I think it's worth distinguishing how hard it is for a lean programmer to write the solution, how hard it is to solve the math problem in the first place, and how hard it is to write down an ML algorithm that spits out the right lean tactics.

Like, even if something can be written in a compact form, there might be only a dozen of combinations of ~10 tokens that give us a correct solution like nlinarith (b- a), ..., where by token I count "nlinarith", "sq_nonneg", "b", "-", "a", etc., and the actual search space for something of length 10 is probably ~(grammar size)^10 where the grammar is possibly of size 10-100. (Note: I don't know how traditional solvers perform on statement of that size, it's maybe not that hard.)

I agree that traditional methods work well for algebraic problems where proofs are short, and that AI doing search with nlinarith seems "dumb", but the real question here is whether OAI has found a method to solve such problems at scale.

As you said, the one liner is not really convincing, but the multi step solution, introducing a new axiom in the middle, seems like a general construction to solve all algebraic problems, and even more. (Though they do mention how infinite action space and no no-self-play limits scaling in general.)

I do agree with the general impression that it's not a huge breakthrough. To me, it's mostly an update like "look, two years after gpt-f, it's still hard but se can solve a theorem which requires multiple steps with transformers now!".

Comment by Michaël Trazzi (mtrazzi) on OpenAI Solves (Some) Formal Math Olympiad Problems · 2022-02-03T19:54:47.833Z · LW · GW

If I understand correctly, you are saying that one can technically create an arbitrarily long axiom at any step, so the possibilities are infinite considering the action "add new axiom".

I think Gurkenglas was saying something like "you could break down creating a new axiom in the steps it takes to write an axiom, like the actions 'start axiom', and then add a symbol/token (according to the available grammar)."

OpenAI was possibly mentioning that the number of actions is unbounded, like you could create an axiom, which becomes part of your grammar, and then have one more action, etc.