## Posts

The Inside View #4–Sav Sidorov–Learning, Contrarianism and Robotics 2021-07-22T18:53:55.855Z
What will GPT-4 be incapable of? 2021-04-06T19:57:57.127Z
An Increasingly Manipulative Newsfeed 2019-07-01T15:26:42.566Z
Book Review: AI Safety and Security 2018-08-21T10:23:24.165Z
Human-Aligned AI Summer School: A Summary 2018-08-11T08:11:00.789Z
A Gym Gridworld Environment for the Treacherous Turn 2018-07-28T21:27:34.487Z

Comment by Michaël Trazzi (mtrazzi) on How will OpenAI + GitHub's Copilot affect programming? · 2021-07-01T21:42:30.548Z · LW · GW

Summary of the debate

1. jim originally said that copilot produces code with vulnerability, which, if used extensively, could generate loads of vulnerabilities, giving more opportunities for exploits overall. jim mentions it worsening "significantly" infosec

2. alex responds that given that the model tries to produce the code it was trained on, it will (by def.) produce average level code (with average level of vulnerability), so it won't change the situation "significantly" as the % of vulnerabilities per line of code produced (in the world) won't change much

3. vanessa asks if the absence of change from copilot results from a) lack of use b) lack of change in speed/vulnerability code production from using (ie. used as some fun help but without strong influence on the safety on the code as people would still be rigorous) c) change in speed/productivity, but not in the % of vulnerability

4. alex answers that indeed it makes users more productive and it helps him a lot, but that doesn't affect overall infosec in terms of % of vulnerability (same argument as 2). He nuances his claim a bit saying that a) it would moderatly affect outputs b) some stuff like cost will limit how much it affect those c) it won't change substantially at first (conjunction of two conditions).

What I think is the implicit debate

i) I think jim kind of implicitly assume that whenever someone writes code by himself, he would be forced to have good habits for security etc., and that whenever the code is automatically generated then people won't use their "security" muscles that much & assume the AI produced clean work... which apparently (given the examples from jim) does not by default. Like a Tesla not being safe enough at self-driving.

ii) I think what's missing from the debate is that the overall "infosec level" depends heavily on what a few key actors decide to do, those being in charge of safety-critical codebases for society-level tools (like nukes). So one argument could be that, although the masses might be more productive for prototyping etc., the actual infosec people might just still be as careful / not use it, so the overall important infosec won't change, and thus the overall infosec won't change.

iii) I think vanessa point kind of re-states i) and disagrees with ii) by saying that everyone will use this anyway? Because by definition if it's useful it will change their code/habits, otherwise it's not useful?

iv) I guess alex's implicit points are that code generation with Language Models producing average human code was going to happen anyway & that saying it is a significant change is an overstatement, & we should probably just assume no drastic change in %vulnerability distribution at least for now.

Comment by mtrazzi on [deleted post] 2021-06-25T12:36:54.807Z

Sure, in this scenario I think "Atlantis" would count as "aliens" somehow. Anything that is not from 2021 humans really, like even humans who started their own private lab in the forest in 1900 and discovered new tech are "not part of humanity's knowledge". It's maybe worth distinguishing between "humans in 2021", "homo sapiens originated civilization not from 2021", "Earth but not homo sapiens" (eg Atlantis) and extraterrestrial life (aka "aliens").

As for why we should jump to alien civilizations being on Earth, there are arguments on how a sufficiently advanced civilization could go for a fast space colonization. Other answers to the femi paradox even consider alien civilization to be around the corner but just inactive, and in that case one might consider that humans reaching some level of technological advancement might trigger some defense mechanism? I agree that this might fall into the conjunction fallacy and we may want to reject it using Occam's razor. However, I found the "inactive" theory one of the most "first principle answer to Fermi's paradox" out there, so the "defense mechanism" scenario might be worth considering (it's at least more reasonable than aliens visiting from another galaxy).

I guess there's also the unkown unknowns about how laws of physics work–we've only been considering the limits to speed being the speed of light for less than a century, so we might find ways of bypassing it (eg with worm woles) before the end of the universe.

Comment by mtrazzi on [deleted post] 2021-06-25T12:22:08.670Z

To be honest, I am not super excited about aliens being here, in the sense that if they are we are likely facing some post-AGI civilization and I don't really see strong reasons why they would be altruistic towards humans. Thus I believe I have the opposite bias where I tried to find evidence for why the aliens are not here because the consequences would be too overwhelming if true. At the end of my journey on reddit I realized that I had found 1/2 plausible explanations vs. dozens of arguments for why the whole Shangei thing was not due to CGI or shadowing (because of many locations, angles & the clouds not changing like they should). The whole thing seemed to go pretty much in the direction of aliens & I had done my motivated continuation, but I still felt like I had not fully accepted the possibility and faced the possible decision theoretic consequences, which explains this post.

Comment by mtrazzi on [deleted post] 2021-06-25T12:16:42.694Z

Thanks for the two insightful links! I've just added sources for Shangai & other things I mention. For 2) in "something we consider impossible", I would add that we could split in two: a) china/russia are doing some secret projects & managed to get some decent advances in military/AI b) some unknown group has access to advanced technology that we don't currently know of (at least publicly). Estimating the probability of b) is similar to answering something like "what's the probability that we already have atom by atom clones" (see my discussion of this here)–I would put something like 1e-6 to 1e-1 probability on it, or John Carmack tweet on AGI.

I think we can also further split your "1 Aliens" into multiple sub-categories:

1a Aliens have sent something like von Neumann probes that found Earth by coincidence & for some reason (like better cameras / the internet) we seem to have them on record more often now.

1b Aliens are on purpose observing us & doing so more actively recently and that's why we find them more often. Possible reason would be that we're getting closer to an intelligence explosion.

1c Somehow the "aliens" are related to how human civilization started on earth or civilizations that we would become in the future (post-humans?) who are re-running history by "planting life" on earth.

Comment by Michaël Trazzi (mtrazzi) on Frequent arguments about alignment · 2021-06-23T11:49:51.614Z · LW · GW

Thanks for the post, it's a great idea to have both arguments.

My personal preference would be to have both arguments to be the same length to properly compare the strength of the arguments (skeptic is one paragraph, advocate is 3-6x longer), and not always in the same order skeptic then advocate, but also advocate -> skeptic or even skeptic -> advocate --> skeptic -> ..., so it does not appear like one is the "haven't thought about it much" view.

Comment by Michaël Trazzi (mtrazzi) on Big picture of phasic dopamine · 2021-06-09T14:50:30.431Z · LW · GW

Right I just googled Marblestone and so you're approaching it with the dopamine side and not the acetylcholine. Without debating about words, their neuroscience paper is still at least trying to model the phasic dopamine signal as some RPE & the prefrontal network as an LSTM (IIRC), which is not acetylcholine based. I haven't read in detail this post & the one linked, I'll comment again when I do, thanks!

Comment by Michaël Trazzi (mtrazzi) on Big picture of phasic dopamine · 2021-06-09T07:44:52.625Z · LW · GW

Awesome post! I happen to also have tried to distill links between RPE and phasic dopamine in the "Prefrontal Cortex as a Meta-RL System" of this blog.

In particular I reference this paper on DL in the brain & this other one for RL in the brain. Also, I feel like the part 3 about links between RL and neuro of the RL book is a great resource for this.

Comment by Michaël Trazzi (mtrazzi) on Curated conversations with brilliant rationalists · 2021-06-01T10:13:47.831Z · LW · GW

for reference of how costly transcripts are, the first "speech-to-audio" conversion is about $1.25 per minute, and it could take 1x the time of the audio to fix the mistakes when both have native accents, and up to 2x the audio time for non-native speakers. For a 1h podcast, this would amount to$75 + hourly rate, so roughly $100/podcast. Additionally, there's a YT-generated-subtitles free alternative. I'm currently trying this out, I'll edit this to let you know how long it takes to fix them per audio hour. Comment by Michaël Trazzi (mtrazzi) on Curated conversations with brilliant rationalists · 2021-06-01T10:06:04.328Z · LW · GW great idea! blue yeti used to be a relatively cost-effective option ($100) for US/Canada. For Europe, I'd recommend the t.bone which comes with a suitcase, pop filter and support for $70 (including shipping). for headsets I'd recommend any studio one for about$50, such as the Audio Technica ones.

Comment by mtrazzi on [deleted post] 2021-05-05T22:28:52.443Z

Ace I'll try that too, thanks!

Comment by mtrazzi on [deleted post] 2021-05-05T22:28:26.811Z

Thanks for all of those tips. I'll definitely try rev!

Comment by mtrazzi on [deleted post] 2021-05-05T22:27:40.429Z

done! should be live in a few hours

Comment by mtrazzi on [deleted post] 2021-05-05T22:14:18.306Z

hey, dunno why it's obfuscated, here it is! https://anchor.fm/s/56df2194/podcast/rss

Comment by mtrazzi on [deleted post] 2021-05-04T22:19:34.616Z

Thanks for the feedback! I haven't really estimated how long it would take to have a transcript with speech-to-text + minor corrections,—that's definitely on the roadmap.

Re audio: cost of recording is probably like one hour (x2 if you have one guest). I think that if I were to write down the whole transcript without talking it would take me easily 4-10x the time it takes me to say it. I'm not sure on how worse the quality is though, but the way I see it conversation is essentially collaborative writing where you get immediate feedback about your flaws in reasoning. And even if I agree that a 1h podcast could be summarized in a few paragraphs, the use case is different (eg. people cooking, running, etc.) so it needs to be somewhat redundant because people are not paying attention.

Re not being interested in forecasting timelines: my current goal is to have people with different expertise share their insights on their particular field and how that could nuance our global understanding of technological progress. For instance, I had a 3h discussion with someone who did robotics competitions, and one planned with a neuroscientist student converted into a ML engineer. I'm not that interested in "forecasting timelines" as a end goal, but more interested in how to dig why people have those inside views about the future (assuming they unconsciously updated on things), so we can either destroy wrong initial reasons for believing something, or gain insight on the actual evidence behind those beliefs.

Anyway, I understand that there's a space about rigorous AI Alignment research discussions, which is currently being covered by AXRP, and the 80k podcasts also cover a lot of it, but it seems relatively low-cost to just record those conversations I would have anyway during conferences so people can decide by themselves what are the correct or bad arguments.

Comment by Michaël Trazzi (mtrazzi) on What will GPT-4 be incapable of? · 2021-04-07T20:57:15.417Z · LW · GW

sorry I meant a bot that played random move, not a randomly sampled go bot from KGS. agreed with GPT-4 not beating average go bot

Comment by Michaël Trazzi (mtrazzi) on What will GPT-4 be incapable of? · 2021-04-07T20:56:02.589Z · LW · GW

If it's about explaining your answer with 5th grade gibberish then GPT-4 is THE solution for you! ;)

Comment by Michaël Trazzi (mtrazzi) on What will GPT-4 be incapable of? · 2021-04-07T18:40:04.563Z · LW · GW

let's say by concatenating your textbooks you get plenty of examples of  with "blablabla object sky blablabla gravity  blablabla  blabla . And then the exercise is: "blablabla object of mass blablabla thrown from the sky, what's the force? a) f=120 b) ... c) ... d) ...". then what you need to do is just do some prompt programming at the beginning by "for looping answer" and teaching it to return either a,b,c or d. Now, I don't see any reason why a neural net couldn't approximate linear functions of two variables. It just needs to map words like "derivative of speed", "acceleration", "" to the same concept and then look at it with attention & multiply two digits.

Comment by Michaël Trazzi (mtrazzi) on What will GPT-4 be incapable of? · 2021-04-07T14:39:22.453Z · LW · GW

I think the general answer to testing seems AGI-complete in the sense that you should understand the edge-cases of a function (or correct output from "normal" input).

if we take the simplest testing case, let's say python using pytest, with a typed code, with some simple test for each type (eg. 0 and 1 for integers, empty/random strings, etc.) then you could show it examples on how to generate tests from function names... but then you could also just do it with reg-ex, so I guess with hypothesis.

so maybe the right question to ask is: what do you expect GPT-4 to do better than GPT-3 relative to the train distribution (which will have maybe 1-2y of more github data) + context window? What's the bottleneck? When would you say "I'm pretty sure there's enough capacity to do it"? What are the few-shot examples you feed your model?

Comment by Michaël Trazzi (mtrazzi) on What will GPT-4 be incapable of? · 2021-04-07T14:31:05.330Z · LW · GW

well if we're doing a bet then at some point we need to "resolve" the prediction. so we ask GPT-4 the same physics question 1000 times and then some humans judges count how many it got right, if it gets it right more than let's say 95% of the time (or any confidence interval) , then we would resolve this positively. of course you could do more than 1000, and with law of large numbers it should converge to the true probability of giving the right answer?

Comment by Michaël Trazzi (mtrazzi) on What will GPT-4 be incapable of? · 2021-04-07T14:20:46.169Z · LW · GW

re right prompt: GPT-3 has a context window of 2048 tokens, so this limits quite a lot what it could do. Also, it's not accurate at two-digit multiplication (what you would at least need to multiply your $to %), even worse at 5-digit. So in this case, we're sure it can't do your taxes. And in the more general case, gwern wrote some debugging steps to check if the problem is GPT-3 or your prompt. Now, for GPT-4, given they keep scaling the same way, it won't be possible to have accurate enough digit multiplication (like 4-5 digits, cf. this thread) but with three more scalings it should do it. Prompt would be "here is a few examples on how to do taxe multiplication and addition given my format, so please output result format", and concatenate those two. I'm happy to bet$1 1:1 on GPT-7 doing taxe multiplication to 90% accuracy (given only integer precision).

Comment by Michaël Trazzi (mtrazzi) on What will GPT-4 be incapable of? · 2021-04-07T07:07:03.306Z · LW · GW

So physics understanding.

How do you think it would perform on simpler question closer to its training dataset, like "we throw a ball from a 500m building with no wind, and the same ball but with wind, which one hits the floor earlier" (on average, after 1000 questions).$? If this still does not seem plausible, what is something you would bet$100 2:1 but not 1:1 that it would not be able to do?

Comment by Michaël Trazzi (mtrazzi) on What will GPT-4 be incapable of? · 2021-04-07T06:58:46.827Z · LW · GW

Interesting. Apparently GPT-2 could make (up to?) 14 non-invalid moves. Also, this paper mentions a cross-entropy log-loss of 0.7 and make 10% of invalid moves after fine-tuning on 2.8M chess games. So maybe here data is the bottleneck, but assuming it's not, GPT-4's overall loss would be x smaller than GPT-2 (cf. Fig1 on parameters), and with the strong assumption of the overall transfering directly to chess loss, and chess invalid move accuracy being inversely proportional to chess loss wins, then it would make 5% of invalid moves

Comment by Michaël Trazzi (mtrazzi) on What will GPT-4 be incapable of? · 2021-04-06T22:06:53.817Z · LW · GW

So from 2-digit substraction to 5-digit substraction it lost 90% accuracy, and scaling the model by ~10x gave a 3x improvement (from 10 to 30%) on two-digit multiplication. So assuming we get 3x more accuracy from each 10x increase and that 100% on two digit corresponds to ~10% on 5-digit, we would need something like 3 more scalings like "13B -> 175B", so about 400 trillion params.

Comment by Michaël Trazzi (mtrazzi) on What will GPT-4 be incapable of? · 2021-04-06T21:10:42.891Z · LW · GW

That's a good one. What would be a claim you would be less confident (less than 80%) about but still enough confident to bet \$100 at 2:1 odds? For me it would be "gpt-4 would beat a random go bot 99% of the time (in 1000 games) given the right input of less than1000 bytes."

Comment by Michaël Trazzi (mtrazzi) on What will GPT-4 be incapable of? · 2021-04-06T21:05:24.178Z · LW · GW

A model relased on openai.com with "GPT" in the name before end of 2022. Could be either GPTX where X is a new name for GPT4, but should be an iteration over GPT-3 and should have at least 10x more parameters.

Comment by mtrazzi on [deleted post] 2021-04-05T22:05:33.576Z

(note to mods: Ideally I would prefer to have larger Latex equations, not sure how to do that. If someone could just make those bigger, or even replace the equation screenshot with real Latex that would be awesome.)

Comment by mtrazzi on [deleted post] 2021-03-22T07:43:19.585Z

sure I agree that keeping your system predictions for you makes more sense and tweeting doesn't necessarily help. Maybe what I'm pointing at is where the text you're tweeting is not necessarily "predictions" but maybe some "manipulation text" to maximize profit short term. Let's say you tweet "buy dogecoin" like Elon Musk, so the price goes higher and you can sell all of your doge when you predicted the price would drop. I'm not really sure how such planning would work, and exactly what to feed to some NLP model to manipulate the market in such a way... but now it seems we could just make a simple RL agent (without GPT) that can do either:
- 1. move money in his portfolio
- 2. tweet "price of X will rise" or "price of Y will go down".

but yes you're right that's pretty close to just "fund managers' predictions", and that would impact less than say Elon Musk tweeting (where there's common knowledge that his tweets change the stock/crypto prices quickly)

Comment by mtrazzi on [deleted post] 2021-03-22T07:35:57.218Z

yes that's 50 million dollars

Comment by Michaël Trazzi (mtrazzi) on Excusing a Failure to Adjust · 2020-08-26T16:05:04.365Z · LW · GW

More generally, there's a difference between things being true and being useful. Believing that sometimes you should not update isn't a really useful habit as it forces the rationalizations you mentioned.

Another example is believing "willpower is a limited quantity" vs. "it's a muscle and the more I use it the stronger I get". The first belief will push you towards not doing anything, which is similar to the default mode of not updating in your story.

Comment by mtrazzi on [deleted post] 2020-08-26T14:16:34.056Z

Note: I also know very little about this. Few thoughts on your guesses (and my corresponding credences):

--It seems pretty likely that it will be for humans (something that works for mices wouldn't be impressive enough for an announcement). In last year's white paper they were already inserting electrode arrays in the brain. But maybe you mean something that lives inside the brain independently? (90%)

--If by "significative damage" you mean "not altering basic human capabilities" then it sounds plausible. From the white paper they seem to focus on damage to "the blood-brain barrier" and the "brain’s inflammatory response to foreign objects". My intuition is that the brain would react pretty strongly to something inside it for 10 years though. (20%)

--Other BCI companies have done similar demo-s, so given presentation is long this might happen at some point. But Neuralink might also want to show they're different from mainstream companies. (35%)

--Seems plausible. Assigning lower credence because really specific. (15%)

Comment by Michaël Trazzi (mtrazzi) on Matt Botvinick on the spontaneous emergence of learning algorithms · 2020-08-18T13:18:56.105Z · LW · GW

Funnily enough, I wrote a blog distilling what I learned from reproducing experiments of that 2018 Nature paper, adding some animations and diagrams. I especially look at the two-step task, the Harlow task (the one with monkeys looking at a screen), and also try to explain some brain things (e.g. how DA interacts with the PFN) at the end.

Comment by Michaël Trazzi (mtrazzi) on OpenAI announces GPT-3 · 2020-05-29T12:58:43.921Z · LW · GW

HN comment unsure about the meta-learning generalization claims that OpenAI has a "serious duty [...] to frame their results more carefully"

Comment by Michaël Trazzi (mtrazzi) on Raemon's Shortform · 2020-05-28T21:13:09.951Z · LW · GW

re working memory: never thought of it during conversations, interesting. it seems that we sometime hold the nodes of the conversation tree to go back to them afterward. and maybe if you're introducing new concepts while you're talking people need to hold those definitions in working memory as well.

Comment by Michaël Trazzi (mtrazzi) on What would flourishing look like in Conway's Game of Life? · 2020-05-13T09:23:23.729Z · LW · GW

Some friends tried (inconclusively) to apply AlphaZero to a two-player GoL. I can put you in touch if you want their feedback.

Comment by mtrazzi on [deleted post] 2020-05-10T11:15:28.977Z

Thanks for the tutorial to download documentation, I've never done that myself so will check it out next time I go offline for a while!

I usually just run python to look at docs, importing the library, and then do help(lib.module.function). If I don't really know what the class can do, I usually do dir(class_instance) to find the available methods/attributes, and do the help thing on them.

This only works if you know reasonably well where to look at. If I were you I would try loading the "read the docs" html build offline in your browser (might be searchable this way), but then you still have a browser open (so you would really need to turn down wifi).

Comment by Michaël Trazzi (mtrazzi) on How to do remote co-working · 2020-05-08T21:01:29.220Z · LW · GW

Thanks for writing this up!

I've personally tried Complice coworking rooms where people synchronize on pomodoros and chat during breaks, especially EA France's study room (+discord to voice chat during breaks) but there's also a LW study hall: https://complice.co/rooms

Comment by mtrazzi on [deleted post] 2020-05-08T20:45:07.900Z

I've been experimenting with offline coding recently, sharing some of my conclusions.

Why I started 1) Most of the programming I do at the moment only needs a terminal and a text editor. I'm implementing things from scratch without needing libraries and I noticed I could just read the docs offline. 2) I came to the conclusion that googling things wasn't worth the cost of having a web browser open--using the outside view, when I look back at all the instances of coding while having the internet in easy-access, I always end up being distracted, and even if i code my mind thinks about what I could be doing.

How to go offline (Computer) 1) turn off wi-fi 2) forget network (Phone) if you're at home, put it out of reach. I turn it off then throw it on top of a closet, so far that i need to grab a chair in the living room to catch it. If you have an office, then do the same thing and go to your office without your phone.

When My general rule in January was that I could only check the internet between 11pm and 12am. The rest of the "no work + no internet" time was for deep relaxation, meditation, journaling, eating, etc. In April I went without any internet connection for a week. I was amazed at how much free time I had, but the lack of social interactions was a bit counter-productive. Currently, I'm going offline from the moment I wake up to 7pm. This seems like a good balance where I'm not too tired but still productive throughout the day.

Let me know if you have any question about the process or similar experience to share.

Comment by mtrazzi on [deleted post] 2020-01-30T12:12:11.420Z

Thanks for all the references! I don't currently have much time to read all of it right now so I can't really engage with the specific arguments for the rejection of using utility functions/studying recursive self-improvement.

I essentially agree with most of what you wrote. There is maybe a slight disagreement in how you framed (not what you meant) how research focus shifted since 2014.

I see Superintelligence as essentially saying "hey, there is pb A. And even if we solve A, then we might also have B. And given C and D, there might be E." Now that the field is more mature and we have many more researchers getting paid to work on these problems, the arguments became much more goal focused. Now people are saying "I'm going to make progress on sub-problem X, by publishing a paper on Y. And working on Z is not cost-effective given so I'm not going to work on it given humanity's current time constraints."

These approaches are often grouped as "long-term problems-focused" and "making tractable progress now focused". In the first group you have Yudkowsky 2010, Bostrom 2014, MIRI's current research and maybe CAIS. In the second one you have current CHAI/FHI/OpenAI/DeepMind/Ought papers.

Your original framing can be interpreted as "after proving some mathematical theorems, people rejected the main arguments of Superintelligence and now most of the community agrees that working on X, Y and Z are tractable but A, B and C are more controversials".

I think a more nuanced and precise framing would be: "In Superintelligence Bostrom exposes exhaustively the risks associated with advanced AI. A short portion of the book is dedicated to the problems are working on right now. Indeed, people stopped working on the other problems (largest portion of the book) because 1) there hasn't been really productive working on them 2) some rebuttals have been written online giving convincing arguments that those pbs are not tractable anyway 3) there are now well-funded research organizations with incentives to make tangible progress on those pbs."

In your last framing, you presented precise papers/rebuttals (thanks again!) for 2), and I think rebuttals are a great reason to stop working on a pb, but I think they're not the only reason and not the real reason people stopped working on those pb. To be fair, I think 1) can be explained by many more factors than "it's theoretically impossible to make progress on those pbs". It can be that the research mindset required to work on these pbs is less socially/intellectually validating or requires much more theoretical approaches, so will be off-putting/tiresome to most recent grads that enter the field. I also think that AI Safety is now much more intertwined with evidence-based approaches such as Effective Altruism than it was in 2014, which explains 3), so people start presenting their research as "partial solutions to the pb. of AI Safety" or "research agenda".

To be clear, I'm not criticizing the current shift in research. I think it's productive for the field, both in the short term and long term. To give a bit more personal context, I started getting interested in AI Safety after reading Bostrom and have always been more interested in the "finding problems" approach. I went to FHI to work on AI Safety because I was super interested in finding new pbs related to the treacherous turn. It's now almost taboo to say that we're working on pbs that are sub-optimally minimizing AI risk, but the real reason that pushed me to think about those pbs was because they were both important and interesting. The pb. with the current "shift in framing" is that it's making it socially unacceptable for people to think/work on more long-term pbs where there is more variance in research productivity.

I don't quite understand the question?

Sorry about that. I thought there was some link to our discussion about utility functions but I misunderstood.

EDIT: I also wanted to mention that the number of pages in a book doesn't account for how important the author think the pb. is (Bostrom even comments on this in the postface of its book). Again, the book is mostly about saying "here are all the pbs", not "these are the tractable pbs we should start working on, and we should dedicate research ressources proportionally to the amount of pages I talk about it in the book".

Comment by mtrazzi on [deleted post] 2020-01-30T10:49:12.553Z

This framing really helped me think about gradual self-improvement, thanks for writing it down!

I agree with most of what you wrote. I still feel that in the case of an AGI re-writing its own code there's some sense of intent that hasn't been explicitly happening for the past thousand years.

Agreed, you could still model Humanity as some kind of self-improving Human + Computer Colossus (cf. Tim Urban's framing) that somehow has some agency. But it's much less effective at self-improving itself, and it's not thinking "yep, I need to invent this new science to optimize this utility function". I agree that the threshold is "when all the relevant action is from a single system improving itself".

there would also be warning signs before it was too late

And what happens then? Will we reach some kind of global consensus to stop any research in this area? How long will it take to build a safe "single system improving itself"? How will all the relevant actors behave in the meantime?

My intuition is that in the best scenario we reach some kind of AGI Cold War situation for long periods of time.

Comment by mtrazzi on [deleted post] 2020-01-30T00:17:16.832Z

I get the sense that the crux here is more between fast / slow takeoffs than unipolar / multipolar scenarios.

In the case of a gradual transition into more powerful technology, what happens when the children of your analogy discovers recursive self improvement?

Comment by mtrazzi on [deleted post] 2020-01-30T00:10:17.812Z

When you say "the last few years has seen many people here" for your 2nd/3rd paragraph, do you have any posts / authors in mind to illustrate?

I agree that there has been a shift in what people write about because the field grew (as Daniel Filan pointed out). However, I don't remember reading anyone dismiss convergent instrumental goals such as increasing your own intelligence or utility functions as an useful abstraction to think about agency.

In your thread with ofer, he asked what was the difference between using loss functions in neural nets vs. objective function / utility functions and I haven't fully catched your opinion on that.

Comment by mtrazzi on [deleted post] 2020-01-28T23:04:11.218Z

the ones you mentioned

To be clear, this is a linkpost for Philip Trammell's blogpost. I'm not involved in the writing.

Comment by mtrazzi on [deleted post] 2020-01-28T23:02:15.909Z

As you say

To be clear, the author is Philip Trammell, not me. Added quotes to make it clearer.

Comment by Michaël Trazzi (mtrazzi) on Ultra-simplified research agenda · 2019-11-22T16:44:16.391Z · LW · GW

Having printed and read the full version, this ultra-simplified version was an useful summary.

Happy to read a (not-so-)simplified version (like 20-30 paragraphs).

Comment by Michaël Trazzi (mtrazzi) on Do you get value out of contentless comments? · 2019-11-21T23:38:21.881Z · LW · GW

Funny comment!

Comment by Michaël Trazzi (mtrazzi) on AI Alignment "Scaffolding" Project Ideas (Request for Advice) · 2019-07-11T12:07:45.888Z · LW · GW
A comprehensive AI alignment introductory web hub

RAISE and Robert Miles provide introductory content. You can think of LW->alignment forum as "web hubs" for AI Alignment research.

formal curriculum

There was a course on AGI Safety last fall in Berkeley.

A department or even a single outspokenly sympathetic official in any government of any industrialized nation

You can find a list of institutions/donors here.

A list of concrete and detailed policy proposals related to AI alignment

I would recommend reports from FHI/GovAI as a starting point.

Would this be valuable, and which resource would it be most useful to create?

Please give more detailed information about the project to receive feedback.

Comment by Michaël Trazzi (mtrazzi) on Modeling AI milestones to adjust AGI arrival estimates? · 2019-07-11T11:53:55.952Z · LW · GW

You can find AGI predictions, including Starcraft forecasts, in "When Will AI Exceed Human Performance? Evidence from AI Experts". Projects for having "all forecasts on AGI in one place" include ai.metaculus.com & foretold.io.

Comment by mtrazzi on [deleted post] 2019-07-04T16:42:00.970Z

Does that summarize your comment?

1. Proposals should make superintelligences less likely to fight you by using some conceptual insight true in most cases.
2. With CIRL, this insight is "we want the AI to actively cooperate with humans", so there's real value from it being formalized in a paper.
3. In the counterfactual paper, there's the insight "what if the AI thinks he's not on but still learns".
For the last bit, I have two interpretations:
4.a. However, it's unclear that this design avoids all manipulative behaviour and is completely safe.
4.b. However, it's unclear that adding the counterfactual feature to another design (e.g. CIRL) would make systems overall safer / would actually reduce manipulation incentives.

If I understand you correctly, there are actual insights from counterfactual oracles--the problem is that those might not be insights that would apply to a broad class of Alignment failures, but only to "engineered" cases of boxed oracle AIs (as opposed to CIRL where we might want AIs to be cooperative in general). Was it what you meant?

Comment by mtrazzi on [deleted post] 2019-07-04T16:22:18.203Z

The zero reward is in the paper. I agree that skipping would solve the problem. From talking to Stuart, my impression is that he thinks that would be equivalent to skipping for specifying "no learning", or would just slow down learning. My disagreement on that I think it can confuse learning to the point of not learning the right thing.

Why not do a combination of pre-training and online learning, where you do enough during the training phase to get a useful predictor, and then use online learning to deal with subsequent distributional shifts?

Yes, that should work. My quote saying that online learning "won't work and is unsafe" is imprecise. I should have said "if epsilon is small enough to be comparable to the probability of shooting an escape message at random, then it is not safe. Also, if we continue sending the wrong instead of skipping, then it might not learn the correct thing if is not big enough".

Although I guess that probably isn't really original either. What seems original is that during any episode where learning will take place, don't let humans (or any other system that might be insecure against the oracle) see the oracle's output until the episode is over.

That's exactly it!

Comment by mtrazzi on [deleted post] 2019-06-12T17:54:57.854Z

The string is read with probability 1-