p-2

Posts
Comments

Posts

Make-A-Video by Meta AI 2022-09-29T17:07:15.664Z

Stable Diffusion has been released 2022-08-22T19:42:20.120Z

Has anyone actually tried to convince Terry Tao or other top mathematicians to work on alignment? 2022-06-08T22:26:55.710Z

DALL·E 2 by OpenAI 2022-04-06T14:17:36.035Z

Comments

Comment by P. on Tormenting Gemini 2.5 with the [[[]]][][[]] Puzzle · 2025-03-29T11:22:41.129Z · LW · GW

Sadly, by posting this here, you've added this puzzle to the training set of future models. Good benchmarks (e.g. ARC) keep the test set, or at least part of it, private.

Comment by P. on I make several million dollars per year and have hundreds of thousands of followers—what is the straightest line path to utilizing these resources to reduce existential-level AI threats? · 2025-03-17T21:32:08.942Z · LW · GW

Firstly, and perhaps most importantly, my advice on what not to do is not to try directly convincing politicians to pause or stop AGI development. A prerequisite for them to take actions drastic enough to actually matter is for them to understand how powerful AGI will truly become. And once that happens, even if they ban all AI development, unless they consider the arguments for doom to be extremely strong, which they won't^[1], they will race and put truly enormous amounts of resources behind it, and that would be it for the species. Getting mid-sized business owners on board, on the other hand, might be a good idea due to the funding they could provide.

I don't think any of the big donors are good enough, so if you want to donate to other people's projects (or maybe become a co-founder), you could try finding interesting projects yourself on Manifund and the Nonlinear Network.

We know for a fact that alignment, at least for human-level intelligences, has a solution because people do actually care, at least in part, about each other. Therefore, it might be worth contacting Steven Byrnes and asking him whether he could usefully use more funding or what similar projects he recommends.

Outside AI, if the reason you care about existential risk isn't because you want to save the species, but because human extinction implies a lot of people will die, you could try looking into chemical brain preservation and how cheap it is. This could itself be a source of revenue, and you probably won't have any competitors (established cryonics orgs don't offer cheap brain preservation and I have asked and Tomorrow Biostasis isn't interested either).

I also personally have not completely terrible ideas for alignment research and weak (half an SD?) intelligence augmentation. If you're interested, we can discuss them via DMs.

Finally, if you do fund intelligence augmentation research, please consider whether to keep it secret, if feasible.

^{^}
Or maybe even if they do.

Comment by P. on Refactoring cryonics as structural brain preservation · 2024-09-12T20:57:56.840Z · LW · GW

Does OBP plan to eventually expand their services outside the USA? And how much would it cost if you didn’t subsidize it? Cost is a common complaint about cryonics so I could see you becoming much bigger than the cryonics orgs, but judging by the website you look quite small. Do you know why that is?

Comment by P. on Open Thread Spring 2024 · 2024-05-25T17:53:48.727Z · LW · GW

Does anyone have advice on how I could work full-time on an alignment research agenda I have? It looks like trying to get a LTFF grant is the best option for this kind of thing, but if after working more time alone on it, it keeps looking like it could succeed, it’s likely that it would become too big for me alone, I would need help from other people, and that looks hard to get. So, any advice from anyone who’s been in a similar situation? Also, how does this compare with getting a job at an alignment org? Is there any org where I would have a comparable amount of freedom if my ideas are good enough?

Edit: It took way longer than I thought it would, but I've finally sent my first LTFF grant application! Now let's just hope they understand it and think it is good.

Comment by P. on Decent plan prize announcement (1 paragraph, $1k) · 2024-01-12T13:25:31.801Z · LW · GW

It depends on what you know about the model and the reason you have to be concerned in the first place (if it's just "somehow", that's not very convincing).

You might be worried that training it leads to the emergence of inner-optimizers, be them ones that are somehow "trying" to be good at prediction in a way that might generalize to taking real-life actions, approximating the searchy part of the humans they are trying to predict, or just being RL agents. If you are just using basically standard architectures with a lot more compute, these all seem unlikely. But if I were you, I might try to test its ability to perform well in a domain it has never seen, where humans start by performing poorly but very quickly learn what to do (think about video games with new mechanics). If it does well, you have a qualitatively new thing on your hands, don't deploy, study it instead. If a priori for some reason you think it could happen, and only a small subset of all the data is necessary to achieve that, do a smaller training run first with that data.

Or you might be worried about mostly external consequentialist cognition (think explicit textual it-then-elses). In that case, existing systems can already do it to some extent, and you should worry about how good its reasoning actually is, so perform capability evaluations. If it looks that there is some way of getting it to do novel research by any known method or that it's getting close, don't deploy, otherwise someone might figure out how to use it to do AI research, and then you get a singularity.

And in any case, you should worry about the effects your system will have on the AI race. Your AI might not be dangerous, but if it is a good enough lawyer or programmer that it starts getting many people out of their jobs, investment in AI research will increase a lot and someone will figure out how to create an actual AGI sooner than they would otherwise.

Edit: And obviously you should also test how useful it could be for people trying to do mundane harm (e.g. with existing pathogens) and, separately, there might not be a hard threshold on how good a model is at doing research that it starts being dangerous, so they might get there little by little and you would be contributing to that.

Edit in response to the second clarification: Downscale the relevant factors, like amount of training data, number of parameters and training time, or use a known-to-be-inferior architecture until the worrying capabilities go away. Otherwise, you need to solve the alignment problem.

Edit in response to Beth Barnes's comment: You should probably have people reviewing outputs to check the model behaves well, but if you actually think you need measures like "1000 workers with technical undergrad degrees, paid $50/hr" because you are worried it somehow kills you, then you simply shouldn't deploy it. It's absurd to have the need to check whether a commercial product is an existential threat, or anything close to that.

Comment by P. on 2023 Unofficial LessWrong Census/Survey · 2023-12-03T18:48:25.428Z · LW · GW

Done! There aren't enough mysterious old wizards.

Comment by P. on Vote on Interesting Disagreements · 2023-11-09T20:35:23.459Z · LW · GW

You know of a technology that has at least a 10% chance of having a very big novel impact on the world (think the internet or ending malaria) that isn't included in this list, very similar, or downstream from some element of it: AI, mind uploads, cryonics, human space travel, geo-engineering, gene drives, human intelligence augmentation, anti-aging, cancer cures, regenerative medicine, human genetic engineering, artificial pandemics, nuclear weapons, proper nanotech, very good lie detectors, prediction markets, other mind-altering drugs, cryptocurrency, better batteries, BCIs, nuclear fusion, better nuclear fission, better robots, AR, VR, room-temperature superconductors, quantum computers, polynomial time SAT solvers, cultured meat, solutions to antibiotic resistance, vaccines to some disease, optical computers, artificial wombs, de-extinction and graphene.

Bad options included just in case someone thinks they are good.

Comment by P. on Vote on Interesting Disagreements · 2023-11-08T22:10:00.179Z · LW · GW

Public mechanistic interpretability research is net positive in expectation.

Comment by P. on Vote on Interesting Disagreements · 2023-11-08T21:23:08.060Z · LW · GW

Cultural values are something like preferences over pairs of social environments and things we actually care about. So it makes sense to talk about jointly optimizing them.

Comment by P. on Vote on Interesting Disagreements · 2023-11-08T21:09:12.988Z · LW · GW

If we had access to a brain upload (and maybe a world simulator too) we could in principle extract something like a utility function, and the theory behind it relates more to agents in general than it does to humans in particular.

Comment by P. on Vote on Interesting Disagreements · 2023-11-08T20:55:01.183Z · LW · GW

Research into getting a mechanistic understanding of the brain for purposes of at least one of: understanding how values/empathy works in people, brain uploading or improving cryonics/plastination is net positive and currently greatly underfunded.

Comment by P. on Feedbackloop-first Rationality · 2023-08-08T19:46:20.333Z · LW · GW

Came here to comment that. It seems much more efficient to learn the cognitive strategies smart people use than to try to figure them out from scratch. Ideally, you would have people of different skill levels solve problems (and maybe even do research) while thinking out loud and describing or drawing the images they are manipulating. I know this has been done at least for chess, and it would be nice to have it for domains with more structure. Then you could catalog these strategies and measure the effectiveness of teaching the system 2 process (the whole process they use, not only the winning path) and explicitly train in isolation the individual system 1 steps that make it up.

Comment by P. on Empathy as a natural consequence of learnt reward models · 2023-02-04T20:16:11.777Z · LW · GW

Also "indivudals".

Comment by P. on Why square errors? · 2022-11-26T14:34:31.096Z · LW · GW

Doesn’t minimizing the L1 norm correspond to performing MLE with laplacian errors?

Comment by P. on Prizes for ML Safety Benchmark Ideas · 2022-11-11T13:32:32.922Z · LW · GW

Do you know whether this will be cancelled given the FTX situation?

Comment by P. on Paper+Summary: OMNIGROK: GROKKING BEYOND ALGORITHMIC DATA · 2022-10-04T16:51:57.300Z · LW · GW

If the optimal norm is below the minimum you can achieve just by re-scaling, you are trading-off training set accuracy for weights with a smaller norm within each layer. It's not that weird that the best known way of making this trade-off is by constrained optimization.

Comment by P. on Make-A-Video by Meta AI · 2022-09-29T19:39:54.526Z · LW · GW

And a 3D one by optimizing a differentiable volumetric representation using 2D diffusion: https://dreamfusionpaper.github.io/

Comment by P. on Make-A-Video by Meta AI · 2022-09-29T19:03:54.005Z · LW · GW

And here we have another one: https://phenaki.video/

Comment by P. on Resources to find/register the rationalists that specialize in a given topic? · 2022-09-29T18:56:12.378Z · LW · GW

It's not quite what you want, but there's this: https://forum.effectivealtruism.org/community#individuals and this: https://eahub.org/

Comment by P. on Make-A-Video by Meta AI · 2022-09-29T17:52:09.924Z · LW · GW

Emad from Stability AI (the people behind Stable Diffusion) says that they will make a model better than this.

Comment by P. on Enriching Youtube content recommendations · 2022-09-28T16:30:35.503Z · LW · GW

RatSLAM: Using Models of Rodent Hippocampus for Robot Navigation

Building Collision Simulations: An Introduction to Computer Graphics

Self-Driving Cars [S1E4: RALPH]

What is the graph of x^a when a is not an integer? An unusual look at familiar functions #some2

Advanced 4. Monte Carlo Tree Search

Researchers Use Group Theory to Speed Up Algorithms — Introduction to Groups

The Traveling Salesman Problem: When Good Enough Beats Perfect

The Snowflake Mystery

AI Alignment & AGI Fire Alarm - Connor Leahy

Vulcan | The Planet That Didn't Exist

How MRI Works - Part 1 - NMR Basics

The Enduring Mystery of Jack the Ripper

Mechanical Computer (All Parts) - Basic Mechanisms In Fire Control Computers

AI in Brainfuck

The microwave plasma mystery

The Universal S

Feynman's Lost Lecture (ft. 3Blue1Brown)

ML Tutorial: Gaussian Processes (Richard Turner)

Google I/O 2013 - Project Ground Truth: Accurate Maps Via Algorithms and Elbow Grease

DeepMind's AlphaFold 2 Explained! AI Breakthrough in Protein Folding! What we know (& what we don't)

Deep Blue | Down the Rabbit Hole

Illustrated Guide to Transformers Neural Network: A step by step explanation

Biology on Islands

Coding Adventure: Atmosphere

Cicada 3301 (All Clues and How They Were Solved)

Andrew Ng: Deep Learning, Self-Taught Learning and Unsupervised Feature Learning

Margo Seltzer - Automatically Scalable Computation - Code Mesh 2017

A Grape Made of... Meat?? - Tissue Recellularization

The First Video Game

What Bodies Think About: Bioelectric Computation Outside the Nervous System - NeurIPS 2018

Understanding Sensor Fusion and Tracking, Part 2: Fusing a Mag, Accel, & Gyro Estimate

Drone Control and the Complementary Filter

Launch Loops

World Record Progression: Super Mario Bros

Tesla AI Day

Why Do We Age ? Cellular Aging (HD)

The Complex Hack That Stopped a Secret Nuclear Power Plant

Richard Szeliski - "Visual Reconstruction and Image-Based Rendering" (TCSDLS 2017-2018)

Might be the same talk as this one: https://youtu.be/0VIUbIzv_wc

Comment by P. on Gene drives: why the wait? · 2022-09-27T16:26:56.529Z · LW · GW

What are the others?

Comment by P. on LW Petrov Day 2022 (Monday, 9/26) · 2022-09-25T15:49:53.761Z · LW · GW

But the outcome IS uncertain. I want to know how low the karma threshold can go before the website gets nuked. There are other fun games, but this one is unique to LW and seems like an appropriate way of celebrating Petrov Day.

Comment by P. on Announcing $5,000 bounty for (responsibly) ending malaria · 2022-09-24T20:54:20.061Z · LW · GW

I guess it might be possible to repurpose the manual, but making mosquito species extinct isn't the only possible method, e.g. https://bioengineeringcommunity.nature.com/posts/gene-drive-rescue-system-for-population-modification-of-a-malaria-mosquito .

Comment by P. on Announcing $5,000 bounty for (responsibly) ending malaria · 2022-09-24T18:55:29.078Z · LW · GW

I mean, that's probably the case, since they asked whether they could spread them from their garage.

Comment by P. on Announcing $5,000 bounty for (responsibly) ending malaria · 2022-09-24T16:50:00.355Z · LW · GW

I wish I had a better source, but in this video, a journalist says that a well-equipped high schooler could do it. The information needed seems to be freely available online, but I don't know enough biology to be able to tell for sure. I think it is unknown whether it would spread to the whole population given a single release, though.

If you want it to happen and can't do it yourself nor pay someone else to do it, the best strategy might be to pay someone to translate the relevant papers into instructions that a regular smart person can follow and then publish them online. After making sure to the best of your capabilities (i.e. asking experts the right questions) that it actually is a good idea, that is.

Comment by P. on Evaluations project @ ARC is hiring a researcher and a webdev/engineer · 2022-09-14T13:25:31.204Z · LW · GW

The simplest possible acceptable value learning benchmark would look something like this:

Data is recorded of people playing a video game. They are told to maximize their reward (which can be exactly computed), have no previous experience playing the game, are actually trying to win and are clearly suboptimal (imitation learning would give very bad results).
The bot is first given all their inputs and outputs, but not their rewards.
Then it can play the game in place of the humans but again isn’t given the rewards. Preferably the score isn’t shown on screen.
The goal is to maximize the true reward function.
These rules are precisely described and are known by anyone who wants to test their algorithms.

None of the environments and datasets you mention are actually like this. Some people do test their IRL algorithms in a way similar to this (the difference being that they learn from another bot), but the details aren’t standardized.

A harder and more realistic version that I have yet to see in any paper would look something like this:

Data is recorded of people playing a game with a second player. The second player can be a human or a bot, and friendly, neutral or adversarial.
The IO of both players is different, just like different people have different perspectives in real life.
A very good imitation learner is trained to predict the first player's output given their input. It comes with the benchmark.
The bot to be tested (which is different from the previous ones) has the same IO channels as the second player, but doesn't see the rewards. It also isn't given any of the recordings.
Optionally, it also receives the output of a bad visual object detector meant to detect the part of the environment directly controlled by the human/imitator.
It plays the game with the human imitator.
The goal is to maximize the human’s reward function.

It’s far from perfect, but if someone could obtain good scores there, it would probably make me much more optimistic about the probability of solving alignment.

Comment by P. on Evaluations project @ ARC is hiring a researcher and a webdev/engineer · 2022-09-13T19:17:45.838Z · LW · GW

By pure RL, I mean systems whose output channel is only directly optimized to maximize some value function, even if it might be possible to create other kinds of algorithms capable of getting good scores on the benchmark.

I don’t think that the lack of pretraining is a good thing in itself, but that you are losing a lot when you move from playing video games to completing textual tasks.

If someone is told to get a high score in a video game, we have access to the exact value function they are trying to maximize. So when the AI is either trying to play the game in the human’s place or trying to help them, we can directly evaluate their performance without having to worry about deception. If it learns some proxy values and starts optimizing them to the point of goodharting, it will get a lower score. On most textual tasks that aren’t purely about information manipulation, on the other hand, the AI could be making up plausible-sounding nonsense about the consequences of its actions, and we wouldn't have any way of knowing.

From the AI’s point of view being able to see the state of the thing we care about also seems very useful, preferences are about reality after all. It’s not obvious at all that internet text contains enough information to even learn a model of human values useful in the real world. Training it with other sources of information that more closely represent reality, like online videos, might, but that seems closer to my idea than to yours since it can’t be used to perform language-model-like imitation learning.

Additionally, if by “inability to learn human values” you mean isolating them enough so that they can in principle be optimized to get superhuman performance, as opposed to being buried in its world model, I don’t agree that that will happen by default. Right now we don’t have any implementations of proper value learning algorithms, nor do I think that any known theoretical algorithm (like PreDCA) would work even with limitless computing power. If you can show that I’m wrong, that would surprise me a lot, and I think it could change many people’s research directions and the chances they give to alignment being solvable.

Comment by P. on Evaluations project @ ARC is hiring a researcher and a webdev/engineer · 2022-09-10T14:51:14.856Z · LW · GW

Do you have plans to measure the alignment of pure RL agents, as opposed to repurposed language models? It surprised me a bit when I discovered that there isn’t a standard publicly available value learning benchmark, despite there being data to create one. An agent would be given first or third-person demonstrations of people trying to maximize their score in a game, and then it would try to do the same, without ever getting to see what the true reward function is. Having something like this would probably be very useful; it would allow us to directly measure goodharting, and being quantitative it might help incentivize regular ML researchers to work on alignment. Will you create something like this?

Comment by P. on How might we make better use of AI capabilities research for alignment purposes? · 2022-08-31T09:24:50.305Z · LW · GW

Do you mean from what already exists or from changing the direction of new research?

Comment by P. on Announcing the AI Safety Field Building Hub, a new effort to provide AISFB projects, mentorship, and funding · 2022-08-24T15:29:36.652Z · LW · GW

What are your thoughts on having 1-on-1s with the top researchers in similar fields (like maths) instead of regular researchers and with people that are explicitly trying to build AGIs (like John Carmack)?

Comment by P. on Stable Diffusion has been released · 2022-08-23T13:26:28.716Z · LW · GW

Positive:

People will pay way less for new pretty images than they did before.

Thanks to img2img people that couldn't draw well before now finally can: https://www.reddit.com/r/StableDiffusion/comments/wvcyih/definitely_my_favourite_generation_so_far/

Because of this, a lot more art will be produced, and I can't wait to see it.

Since good drawings are now practically free, we will see them in places where we couldn't before, like in fanfiction.

Stable Diffusion isn't quite as good as a talented artist, but since we can request hundreds of variations and pick the best, the quality of art might increase.

Ambiguous or neutral:

It can produce realistic images and it is easier to use and more powerful than Photoshop, so we will see a lot of misinformation online. But once most people realize how easy it is to fabricate false photographs hopefully it will lead them to trust what they see online way less than they did before, and closer to the appropriate level.

Anyone will be able to make porn of anyone else. As long as people don't do anything stupid after seeing the images, this seems inconsequential. As discussed on HN, it might cause people to stop worrying about others seeing them naked, even if the photos are real.

Anyway, both of these will cause a lot of drama, which I at least, perhaps selfishly, consider to be slightly positive.

Negative:

I expect a lot of people will lose their jobs. Most companies will prefer to reduce their costs and hire a few non-artists to make art rather than making more art.

New kinds of scams will become possible and some people will keep believing everything they see online.

Unlike DALL-E 2, anyone can access this, so it will be much more popular and will make many people realize how advanced current AI is and how consequential it will be, which will probably lead to more funding.

Comment by P. on My vision of a good future, part I · 2022-07-06T10:44:13.279Z · LW · GW

Rot13: Vf gung cvrpr bs svpgvba Crefba bs Vagrerfg be Png Cvpgherf Cyrnfr?

Comment by P. on Who is this MSRayne person anyway? · 2022-07-01T22:55:35.428Z · LW · GW

Then you should at least try to talk to 80,000 hours, you might eventually relocate somewhere where meeting people is easier.

It wasn't intended to make fun of you. When I say that you shouldn't start a religion I mean it literally, like most people here I don't hold a favorable view of religions.

Sentences like "But I am fundamentally a mystic, prone to ecstatic states of communion with an ineffable divine force immanent in the physical universe which I feel is moving towards incarnating as an AI god such as I called Anima" make me think that what you are talking about doesn't correspond to anything real. But in any case I don't see why you shouldn't write about it. If you are right you will give us interesting reading material. And if you are wrong hopefully someone will explain to you why and you will update. It shouldn't matter how much you care about this, if it turns out it's wrong you should stop believing in it (and if it's right keep your current belief). And again, I mean this literally and with no ill intent.

Comment by P. on Who is this MSRayne person anyway? · 2022-07-01T20:52:12.074Z · LW · GW

Have you considered:

Trying to find IRL friends through meetup.com

Going to nearby rationality meetups (https://www.lesswrong.com/community)

Using dating apps (and photofeeler.com)

Getting free career advice for your situation through 80,000 hours (https://80000hours.org/speak-with-us/)

Writing a few pages of your book and posting them on LW (but please don't start a religion)

Comment by P. on Has anyone actually tried to convince Terry Tao or other top mathematicians to work on alignment? · 2022-06-18T21:07:31.667Z · LW · GW

The email to Demis has been there since the beginning, I even received feedback on it. I think I will send it next week, but will also try to get to him through some DeepMind employee if that doesn’t work.

Comment by P. on Scott Aaronson is joining OpenAI to work on AI safety · 2022-06-18T21:01:11.100Z · LW · GW

He says he will be doing alignment work, the worst thing I can think of that can realistically happen is that he gives OpenAI unwarranted confidence in how aligned their AIs are. Working at OpenAI isn’t intrinsically bad, publishing capabilities research is.

Comment by P. on Scott Aaronson is joining OpenAI to work on AI safety · 2022-06-18T20:56:27.862Z · LW · GW

Thanks, I’ve added him to my list of people to contact. If someone else wants to do it instead, reply to this comment so that we don’t interfere with each other.

Comment by P. on Scott Aaronson is joining OpenAI to work on AI safety · 2022-06-18T15:25:44.591Z · LW · GW

There are already people taking care of that, see this question I asked recently.

Comment by P. on FYI: I’m working on a book about the threat of AGI/ASI for a general audience. I hope it will be of value to the cause and the community · 2022-06-15T20:58:23.689Z · LW · GW

No offense, but It's not obvious to me why communicating to a general audience could be a net positive. Exactly how do you expect this to help?

Comment by P. on Can you MRI a deep learning model? · 2022-06-13T17:57:37.558Z · LW · GW

Most neural networks don’t have anything comparable to specialised brain areas, at least structurally, so you can’t see which areas light up given some stimulus to determine what that part does. You can do it with individual neurons or channels, though. The best UI I know of to explore this is the “Dataset Samples” option in the OpenAI Microscope, that shows which inputs activate each unit.

Comment by P. on Has anyone actually tried to convince Terry Tao or other top mathematicians to work on alignment? · 2022-06-11T17:29:51.343Z · LW · GW

Please do! You can DM me their contact info, tell them about my accounts: either this one or my EA Forum one, or ask me for my email address.

Comment by P. on Has anyone actually tried to convince Terry Tao or other top mathematicians to work on alignment? · 2022-06-11T17:17:41.897Z · LW · GW

Well, if he has, unbeknownst to me, already hired the “Terence Taos of the world” like he said on the podcast, that would be great, and I would move on to other tasks. But if he only has a regular alignment team, I don’t think either of us considers that to be enough. I’m just trying to convince him that it’s urgent and we can’t leave it for later.

Comment by P. on Has anyone actually tried to convince Terry Tao or other top mathematicians to work on alignment? · 2022-06-10T15:46:42.581Z · LW · GW

I think I get it, but even if I didn’t now I know that’s how it sounds, and I think I know how to improve it. That will be for other mathematicians though (at least Maxim Kontsevich), see the edit to the post. Does the tone in the email to Demis seem like the right one to you?

Comment by P. on Has anyone actually tried to convince Terry Tao or other top mathematicians to work on alignment? · 2022-06-09T16:10:54.362Z · LW · GW

I’ll rewrite it but I can’t just model other people after me. If I were writing it for someone like myself it would be a concise explanation of the main argument to make me want to spend time thinking about it followed by a more detailed explanation or links to further reading. As long as it isn’t mean I don’t think I would care if it’s giving me orders, begging for help or giving me information without asking for anything at all. But he at least already knows that unaligned AIs are a problem, I can only remind him of that, link to reading material or say that other people also think he should work on it.

But now the priority of that is lower, see the edit to the post. Do you think that the email to Demis Hassabis has similar problems or that it should stay like it is now?

Comment by P. on Has anyone actually tried to convince Terry Tao or other top mathematicians to work on alignment? · 2022-06-09T12:45:01.729Z · LW · GW

Ok, I sent them an email.

Comment by P. on Has anyone actually tried to convince Terry Tao or other top mathematicians to work on alignment? · 2022-06-09T11:29:11.360Z · LW · GW

Do you mean website links about his plan? I found nothing.

I’m still not changing the deadlines but I’ve received information that made me want to change the order.

Comment by P. on Has anyone actually tried to convince Terry Tao or other top mathematicians to work on alignment? · 2022-06-09T11:15:32.370Z · LW · GW

Then do you think I should contact Jacob Steinhardt to ask him what I should write to interest Tao and avoid seeming like a crank?

There isn’t much I can do about SA other than telling him to work on the problem in his free time.

Unless something extraordinary happens I’m definitely not contacting anyone in politics. Politicians being interested in AGI is a nightmarish scenario and those news about Huawei don’t help my paranoia about the issue.

Comment by P. on Has anyone actually tried to convince Terry Tao or other top mathematicians to work on alignment? · 2022-06-09T10:14:44.026Z · LW · GW

I might try that, but “community health” is not really what I’m optimising for. Maybe the name is misleading?

Comment by P. on Has anyone actually tried to convince Terry Tao or other top mathematicians to work on alignment? · 2022-06-09T09:56:42.659Z · LW · GW

I was trying to rely on Tao’s trust in Demis's judgement, since he is an AI researcher. Mentioning Eliezer is mainly so he has someone to contact if he wants to get hired.

I wanted his thinking to be “this competent entity has spent some of his computational resources verifying that it is important to solve this problem, and now that I’m reminded of that I should also throw mine at it”.

Is he truly mostly interested in what he considers to be mentally stimulating? Not in improving the world, or in social nonsense, or guaranteeing that his family is completely safe from all threats?

Then was including this link a bad idea? It gives examples of areas a mathematician might find interesting. And if not that, then what should I say? I’ve got nothing better. Do you know any technical introduction to alignment that he might like?

And about getting him to talk to other people, if anyone volunteers just DM me your contact information so that I can include it in the email (or reply directly if you don’t care about it being public). I mean, what else could I do?

User info

Posts

Comments