Posts

Storyteller's convention, 2223 A.D. 2023-04-07T11:54:48.902Z
ea.domains - Domains Free to a Good Home 2023-01-12T13:32:19.451Z
aisafety.community - A living document of AI safety communities 2022-10-28T17:50:12.535Z
All AGI safety questions welcome (especially basic ones) [Sept 2022] 2022-09-08T11:56:50.421Z
Anti-squatted AI x-risk domains index 2022-08-12T12:01:24.927Z
All AGI safety questions welcome (especially basic ones) [July 2022] 2022-07-16T12:57:44.157Z
[Video] Intelligence and Stupidity: The Orthogonality Thesis 2021-03-13T00:32:21.227Z
Evolutions Building Evolutions: Layers of Generate and Test 2021-02-05T18:21:28.822Z
plex's Shortform 2020-11-22T19:42:38.852Z
What risks concern you which don't seem to have been seriously considered by the community? 2020-10-28T18:27:39.066Z
Why a Theory of Change is better than a Theory of Action for acheiving goals 2017-01-09T13:46:19.439Z
Crony Beliefs 2016-11-03T20:54:07.716Z
[LINK] Collaborate on HPMOR blurbs; earn chance to win three-volume physical HPMOR 2016-09-07T02:21:32.442Z
[Link] - Policy Challenges of Accelerating Technological Change: Security Policy and Strategy Implications of Parallel Scientific Revolutions 2015-01-28T15:29:07.226Z
The Useful Definition of "I" 2014-05-28T11:44:23.789Z

Comments

Comment by plex (ete) on LLMs for Alignment Research: a safety priority? · 2024-04-20T08:20:23.307Z · LW · GW

DMed a link to an interface which lets you select system prompt and model (including Claude). This is open to researchers to test, but not positing fully publicly as it is not very resistant to people who want to burn credits right now.

Other researchers feel free to DM me if you'd like access.

Comment by plex (ete) on LLMs for Alignment Research: a safety priority? · 2024-04-13T12:54:19.658Z · LW · GW

We're likely to switch to Claude 3 soon, but currently GPT 3.5. We are mostly expecting it to be useful as a way to interface with existing knowledge initially, but we could make an alternate prompt which is more optimized for being a research assistant brainstorming new ideas if that was wanted.

Would it be useful to be able to set your own system prompt for this? Or have a default one?

Comment by plex (ete) on Planning to build a cryptographic box with perfect secrecy · 2024-01-04T21:58:54.270Z · LW · GW

Seems like a useful tool to have available, glad someone's working on it.

Comment by plex (ete) on Which battles should a young person pick? · 2023-12-30T00:36:26.254Z · LW · GW

AI Safety Info's answer to "I want to help out AI Safety without making major life changes. What should I do?" is currently:
 

It's great that you want to help! Here are some ways you can learn more about AI safety and start contributing:

Learn More:

Learning more about AI alignment will provide you with good foundations for helping. You could start by absorbing content and thinking about challenges or possible solutions.

Consider these options:

Join the Community:

Joining the community is a great way to find friends who are interested and will help you stay motivated.

Donate, Volunteer, and Reach Out:

Donating to organizations or individuals working on AI safety can be a great way to provide support.

 

If you don’t know where to start, consider signing up for a navigation call with AI Safety Quest to learn what resources are out there and to find social support.

If you’re overwhelmed, you could look at our other article that offers more bite-sized suggestions.

Not all EA groups focus on AI safety; contact your local group to find out if it’s a good match. ↩︎


 

Comment by plex (ete) on plex's Shortform · 2023-10-29T23:25:31.743Z · LW · GW

Life is Nanomachines

In every leaf of every tree
If you could look, if you could see
You would observe machinery
Unparalleled intricacy
 
In every bird and flower and bee
Twisting, churning, biochemistry
Sustains all life, including we
Who watch this dance, and know this key

Illustration: A magnified view of a vibrant green leaf, where molecular structures and biological nanomachines are visible. Hovering nearby, a bird's feathers reveal intricate molecular patterns. A bee is seen up close, its body showcasing complex biochemistry processes in the form of molecular chains and atomic structures. Nearby, a flower's petals and stem reveal the dance of biological nanomachines at work. Human silhouettes in the background observe with fascination, holding models of molecules and atoms.

Comment by plex (ete) on Announcing Timaeus · 2023-10-24T19:16:13.765Z · LW · GW

Congratulations on launching!

Added you to the map:

and your Discord to the list of communities, which is now a sub-page of aisafety.com.

One question: Given that interpretability might well lead to systems which are powerful enough to be an x-risk long before we have a strong enough understanding to direct a superintelligence, so publish-by-default seems risky, are you considering adopting a non-publish-by-default policy? I know you talk about capabilities risks in general terms, but is this specific policy on the table?

Comment by plex (ete) on Josh Jacobson's Shortform · 2023-10-12T09:34:33.204Z · LW · GW

Yeah, that could well be listed on https://ea.domains/, would you be up for transferring it?

Comment by plex (ete) on Johannes C. Mayer's Shortform · 2023-10-08T14:44:12.228Z · LW · GW

Internal Double Crux, a cfar technique.

Comment by plex (ete) on Johannes C. Mayer's Shortform · 2023-10-08T14:43:31.098Z · LW · GW

I think not super broadly known, but many cfar techniques fit into the category so it's around to some extent.

And yeah, brains are pretty programmable.

Comment by plex (ete) on Johannes C. Mayer's Shortform · 2023-10-07T16:29:30.127Z · LW · GW

Right, it can be way easier to learn it live. My guess is you're doing something quite IDC flavoured, but mixed with some other models of mind which IDC does not make explicit. Specific mind algorithms are useful, but exploring based on them and finding things which fit you is often best.

Comment by plex (ete) on Johannes C. Mayer's Shortform · 2023-10-07T12:06:13.870Z · LW · GW

Nice, glad you're getting value out of IDC and other mind stuff :)

Do you think an annotated reading list of mind stuff be worth putting together?

Comment by plex (ete) on Related Discussion from Thomas Kwa's MIRI Research Experience · 2023-10-03T10:02:19.828Z · LW · GW

For convenience: Nate-culture communication handbook 

Comment by plex (ete) on Navigating an ecosystem that might or might not be bad for the world · 2023-09-19T14:12:46.041Z · LW · GW

Yup, there is a working prototype and a programmer who would like to work on it full time if there was funding, but it's not been progressing much for the past year or so because no one has had the free bandwidth to work on it.

Comment by plex (ete) on Where might I direct promising-to-me researchers to apply for alignment jobs/grants? · 2023-09-19T14:10:27.935Z · LW · GW

https://aisafety.world/tiles/ has a bunch.

Comment by plex (ete) on Closing Notes on Nonlinear Investigation · 2023-09-18T20:43:14.749Z · LW · GW

That's fair, I've added a note to the bottom of the post to clarify my intended meaning. I am not arguing for it in a well-backed up way, just stating the output of my models from being fairly close to the situation and having watched a different successful mediation.

Comment by plex (ete) on Closing Notes on Nonlinear Investigation · 2023-09-18T15:56:07.703Z · LW · GW

Forced or badly done mediation seems indeed terrible, entering into conversation facilitated by someone skilled with an intent to genuinely understand the harms caused and make sure you correct he underlying patterns seems much less bad than the actual way the situation played out.

Comment by plex (ete) on Closing Notes on Nonlinear Investigation · 2023-09-17T23:36:47.631Z · LW · GW

I was asked to comment by Ben earlier, but have been juggling more directly impactful projects and retreats. I have been somewhat close to parts of the unfolding situation, including spending some time with both Alice, Chloe, and (separately) the Nonlinear team in-person, and communicating online on-and-off with most parties.

I can confirm some of the patterns Alice complained about, specifically not reliably remembering or following through on financial and roles agreements, and Emerson being difficult to talk to about some things. I do not feel notably harmed by these, and was able to work them out with Drew and Kat without much difficulty, but it does back up my perception that there were real grievances which would have been harmful to someone in a less stable position. I also think they've done some excellent work, and would like to see that continue, ideally with clear and well-known steps to mitigate the kinds of harms which set this in motion.

I have consistently attempted to shift Nonlinear away from what appears to me a wholly counterproductive adversarial emotional stance, with limited results. I understand that they feel defected against, especially Emerson, but they were in the position of power and failed to make sure those they were working with did not come out harmed, and the responses to the initial implosion continued to generate harm and distraction for the community. I am unsettled by the threat of legal action towards Lightcone and focus on controlling the narrative rather than repairing damage.

Emerson: You once said one of the main large failure modes you were concerned about becoming was Stalin's mistake: breaking the networks of information around you so you were unaware things were going so badly wrong. My read is you've been doing this in a way which is a bit more subtle than the gulags, by the intensity of your personality shaping the fragments of mind around you to not give you evidence that in fact you made some large mistakes here. I felt the effects of this indirectly, as well as directly. I hope you can halt, melt, and catch fire, and return to the effort as someone who does not make this magnitude of unforced error.

You can't just push someone who is deeply good out of the movement which has the kind of co-protective nature of ours in the way you merely shouldn't in some parts of the world, if there's intense conflict call in a mediator and try and heal the damage.

Edit: To clarify, this is not intended as a blanket endorsement of mediation, or of avoiding other forms of handling conflict. I do think that going into a process where the parties genuinely try and understand each other's worlds much earlier would have been much less costly for everyone involved as well as the wider community in this case, but I can imagine mediation is often mishandled or forced in ways which are also counterproductive.

Comment by plex (ete) on Meta Questions about Metaphilosophy · 2023-09-05T13:13:27.383Z · LW · GW

How can I better recruit attention and resources to this topic?

Consider finding an event organizer/ops person and running regular retreats on the topic. This will give you exposure to people in a semi-informal setting, and help you find a few people with clear thinking who you might want to form a research group with, and can help structure future retreats.

I've had great success with a similar approach.

Comment by plex (ete) on Stampy's AI Safety Info - New Distillations #4 [July 2023] · 2023-08-18T17:02:12.101Z · LW · GW

We're getting about 20k uniques/month across the different URLs, expect that to get much higher once we make a push for attention when Rob Miles passes us for quality to launch to LW then in videos.

Comment by plex (ete) on A simple way of exploiting AI's coming economic impact may be highly-impactful · 2023-08-05T14:50:25.731Z · LW · GW

AI safety is funding constrained, we win more timelines if there are a bunch of people investing to give successfully.

Comment by plex (ete) on Why I'm Not (Yet) A Full-Time Technical Alignment Researcher · 2023-07-31T16:54:41.283Z · LW · GW

If you're able to spend time in the UK, the EA Hotel offers free food and accommodation for up to two years in low-cost shared living. Relatedly, there should really be one of these in the US and mainland Europe.

Comment by plex (ete) on Why was the AI Alignment community so unprepared for this moment? · 2023-07-17T17:32:08.496Z · LW · GW

feel that I have a bad map of the AI Alignment/Safety community

This is true of many people, and why I built the map of AI safety :)

Next step is to rebuild aisafety.com into a homepage which ties all of this together, and offer AI Safety Info's database via an API for other websites (like aisafety.com, and hopefully lesswrong) to embed.

Comment by plex (ete) on Why was the AI Alignment community so unprepared for this moment? · 2023-07-16T22:31:29.320Z · LW · GW

Why did the Alignment community not prepare tools and plans for convincing the wider infosphere about AI safety years in advance?

I've been organizing the volunteer team who built AI Safety Info for the past two and a half years, alongside building a whole raft of other tools like AI Safety Training and AI Safety World.

But, yes, the movement as a whole has dropped the ball pretty hard on basic prep. The real answer is that things are not done by default, and this subculture has relatively few do-ers compared to thinkers. And the thinkers had very little faith in the wider info-sphere, sometime actively discouraging most do-ers from trying broad outreach.

Comment by plex (ete) on Wildfire of strategicness · 2023-06-19T12:41:31.082Z · LW · GW

Early corporations, like the East India Company, might be a decent reference class?

Comment by plex (ete) on I can see how I am Dumb · 2023-06-11T00:46:49.576Z · LW · GW

I'm pretty sure that at some level what sorts of things your brain spits out into your consciousness and how useful that information is in the given situation, is something that you can't fundamentally change. I expect this to be a hard-coded algorithm

Tune Your Cognitive Strategies proports to offer a technique which can improve that class of algorithm significantly.

Edit: Oh, no, you were meaning a different thing, and this probably goes into the inputs to the algorithm category?

Comment by plex (ete) on Transformative AGI by 2043 is <1% likely · 2023-06-10T16:47:56.094Z · LW · GW

Your probabilities are not independent, your estimates mostly flow from a world model which seem to me to be flatly and clearly wrong.

The plainest examples seem to be assigning

We invent a way for AGIs to learn faster than humans40%
AGI inference costs drop below $25/hr (per human equivalent)16%

despite current models learning vastly faster than humans (training time of LLMs is not a human lifetime, and covers vastly more data) and the current nearing AGI and inference being dramatically cheaper and plummeting with algorithmic improvements. There is a general factor of progress, where progress leads to more progress, which you seem to be missing in the positive factors. For the negative, derailment that delays enough to push us out that far needs to be extreme, on the order of a full-out nuclear exchange, given more reasonable models of progress.

I'll leave you with Yud's preemptive reply:

Taking a bunch of number and multiplying them together causes errors to stack, especially when those errors are correlated.

Comment by plex (ete) on Launching Lightspeed Grants (Apply by July 6th) · 2023-06-07T15:02:07.751Z · LW · GW

Nice! Glad to see more funding options entering the space, and excited to see the S-process rolled out to more grantmakers.

Added you to the map of AI existential safety:


One thing which might be nice as part of improving grantee experience would be being able to submit applications as a google doc (with a template which gives the sections you list) rather than just by form. This increases re-usability and decreases stress, as it's easy to make updates later on so it's less of a worry that you ended up missing something crucial. Might be more hassle than it's worth though, depending on how entangled your backend is with airtable.

Comment by plex (ete) on Anti-squatted AI x-risk domains index · 2023-06-01T11:33:00.019Z · LW · GW

Cool! Feel free to add it with the form

Comment by plex (ete) on Transparency for Generalizing Alignment from Toy Models · 2023-04-03T23:04:08.116Z · LW · GW

that was me for context:

core claim seems reasonable and worth testing, though I'm not very hopeful that it will reliably scale through the sharp left turn

my guesses the intuitions don't hold in the new domain, and radical superintelligence requires intuitions that you can't develop on relatively weak systems, but it's a source of data for our intuition models which might help with other stuff so seems reasonable to attempt.

Comment by plex (ete) on Meta "open sources" LMs competitive with Chinchilla, PaLM, and code-davinci-002 (Paper) · 2023-02-25T17:13:22.749Z · LW · GW

Meta's previous LLM, OPT-175B, seemed good by benchmarks but was widely agreed to be much, much worse than GPT-3 (not even necessarily better than GPT-Neo-20b). It's an informed guess, not a random dunk, and does leave open the possibility that they're turned it around and have a great model this time rather than something which goodharts the benchmarks.

Comment by plex (ete) on a narrative explanation of the QACI alignment plan · 2023-02-22T15:53:58.857Z · LW · GW

This is a Heuristic That Almost Always Works, and it's the one most likely to cut off our chances of solving alignment. Almost all clever schemes are doomed, but if we as a community let that meme stop us from assessing the object level question of how (and whether!) each clever scheme is doomed then we are guaranteed not to find one.

Security mindset means look for flaws, not assume all plans are so doomed you don't need to look.

If this is, in fact, a utility function which if followed would lead to a good future, that is concrete progress and lays out a new set of true names as a win condition. Not a solution, we can't train AIs with arbitrary goals, but it's progress in the same way that quantilizers was progress on mild optimization.

Comment by plex (ete) on AI Safety Info Distillation Fellowship · 2023-02-17T23:52:55.137Z · LW · GW

Not inspired by them, no. Those did not have, as far as I'm aware, a clear outlet for use of the outputs. We have a whole platform we've been building towards for three years (starting on the FAQ long before those contests), and the ability to point large numbers of people at that platform once it has great content thanks to Rob Miles.

Comment by plex (ete) on a narrative explanation of the QACI alignment plan · 2023-02-15T14:47:14.577Z · LW · GW

As I said over on your Discord, this feels like it has a shard of hope, and the kind of thing that could plausibly work if we could hand AIs utility functions.

I'd be interested to see the explicit breakdown of the true names you need for this proposal.

Comment by plex (ete) on EigenKarma: trust at scale · 2023-02-09T00:28:26.590Z · LW · GW

Agreed, incentives probably block this from being picked up by megacorps. I had thought to try and get Musk's twitter to adopt it at one point when he was talking about bots a lot, it would be very effective, but doesn't allow rent extraction in the same way the solution he settled on (paid twitter blue).

Websites which have the slack to allow users to improve their experience even if it costs engagement might be better adopters, LessWrong has shown they will do this with e.g. batching karma daily by default to avoid dopamine addiction.

Comment by plex (ete) on Two very different experiences with ChatGPT · 2023-02-07T23:21:24.500Z · LW · GW

Hypothesis #2: These bits of history are wrong for reasons you can check with simpler learned structures.

Maybe these historical patterns are easier to disprove with simple exclusions, like "these things were in different places"?

Comment by plex (ete) on Two very different experiences with ChatGPT · 2023-02-07T20:01:54.264Z · LW · GW

And if you use common but obviously wrong science or maths, it is less likely to.

Comment by plex (ete) on Two very different experiences with ChatGPT · 2023-02-07T20:01:17.753Z · LW · GW

Yeah, my guess is if you use really niche and plausible-sounding historical examples it is much more likely to hallucinate.

Comment by plex (ete) on Two very different experiences with ChatGPT · 2023-02-07T19:43:17.956Z · LW · GW

Maybe the RLHF agent selected for expects the person giving feedback to correct it for the history example, but not know the latter example is false. If you asked a large sample of humans, more would be able to confidently say the first example is false than the latter one.

Comment by plex (ete) on Focus on the places where you feel shocked everyone's dropping the ball · 2023-02-05T11:16:46.382Z · LW · GW

Yeah, that makes a lot of sense and fits my experience of what works.

Comment by plex (ete) on Focus on the places where you feel shocked everyone's dropping the ball · 2023-02-03T12:03:05.228Z · LW · GW

Hell yeah!

This matches my internal experience that caused me to bring a ton of resources into existence in the alignment ecosystem (with various collaborators):

  • aisafety.info - Man, there really should be a single point of access that lets people self-onboard into the effort. (Helped massively by Rob Miles's volunteer community, soon to launch a paid distillation fellowship)
  • aisafety.training - Maybe we should have a unified place with all the training programs and conferences so people can find what to apply to? (AI Safety Support had a great database that just needed a frontend)
  • aisafety.world - Let's make a map of everything in AI existential safety so people know what orgs, blogs, funding sources, resources, etc exist, in a nice sharable format. (Hamish did the coding, Superlinear funded it)
  • ea.domains - Wow, there sure are a lot of vital domains that could get grabbed by squatters. Let's step in and save them for good orgs and projects.
  • aisafety.community - There's no up-to-date list of online communities. This is an obvious missing resource.
  • Rob Miles videos are too rare, almost entirely bottlenecked on the research and scriptwriting process. So I built some infrastructure which allows volunteers to collaborate as teams on scripts for him, being tested now.
  • Ryan Kidd said there should be a nice professional site which lists all the orgs in a format which helps people leaving SERI MATS decide where to apply. aisafety.careers is my answer, though it's not quite ready yet. Volunteers wanted to help write up descriptions for orgs in the Google Docs we have auto-syncing with the site!
  • Nonlinear wanted a prize platform, and that seemed pretty useful as a way to usefully use the firehose of money while FTXFF was still a thing, so I built Superlinear.
  • There are a lot of obvious low-hanging fruit here. I need more hands. Let's make a monthly call and project database so I can easily pitch these to all the people who want to help save the world and don't know what to do. A bunch of great devs joined!
  • and 6+ more major projects as well as a ton of minor ones, but that's enough to list here.


I do worry I might be neglecting my actual highest EV thing though, which is my moonshot formal alignment proposal (low chance of the research direction working out, but much more direct if it does). Fixing the alignment ecosystem is just so obviously helpful though, and has nice feedback loops.

Comment by plex (ete) on Advice I found helpful in 2022 · 2023-01-29T18:18:57.457Z · LW · GW

This is some great advice. Especially 1 and 2 seem foundational for anyone trying to reliably shift the needle by a notable amount in the right direction.

Comment by plex (ete) on "Status" can be corrosive; here's how I handle it · 2023-01-24T12:57:25.958Z · LW · GW

My favorite frame is based on In The Future Everyone Will Be Famous To Fifteen People. If we as a civilization pass this test, we who lived at the turn of history will be outnumbered trillions to one, and the future historians will construct a pretty good model of how everyone contributed. We'll get to read about it, if we decide that that's part of our personal utopia.

I'd like to be able to look back from eternity and see that I shifted things a little in the right direction. That perspective helps defuse some of the local status-seeking drives, I think.

Comment by plex (ete) on How it feels to have your mind hacked by an AI · 2023-01-22T17:54:34.561Z · LW · GW

I can verify that the owner of the blaked[1] account is someone I have known for a significant amount of time, that he is a person with a serious, long-standing concern with AI safety (and all other details verifiable by me fit), and that based on the surrounding context I strongly expect him to have presented the story as he experienced it.

This isn't a troll.

  1. ^

    (also I get to claim memetic credit for coining the term "blaked" for being affected by this class of AI persuasion)

Comment by plex (ete) on What’s going on with ‘crunch time’? · 2023-01-21T21:53:08.456Z · LW · GW

Agree with Jim, and suggest starting with some Rob Miles videos. The Computerphile ones, and those on his main channel, are a good intro.

Comment by plex (ete) on List of technical AI safety exercises and projects · 2023-01-19T18:58:07.556Z · LW · GW

Great, thanks!

Comment by plex (ete) on List of technical AI safety exercises and projects · 2023-01-19T18:52:08.607Z · LW · GW

Nice! Would you be up for putting this in the aisafety.info Google Drive folder too, with a question-shaped title?

Comment by plex (ete) on Preparing for AI-assisted alignment research: we need data! · 2023-01-19T18:37:19.406Z · LW · GW

Yes, this is a robustly good intervention on the critical path. Have had it on the Alignment Ecosystem Development ideas list for most of a year now.

Some approaches to solving alignment go through teaching ML systems about alignment and getting research assistance from them. Training ML systems needs data, but we might not have enough alignment research to sufficiently fine tune our models, and we might miss out on many concepts which have not been written up. Furthermore, training on the final outputs (AF posts, papers, etc) might be less good at capturing the thought processes which go into hashing out an idea or poking holes in proposals which would be the most useful for a research assistant to be skilled at.

It might be significantly beneficial to capture many of the conversations between researchers, and use them to expand our dataset of alignment content to train models on. Additionally, some researchers may be fine with having their some of their conversations available to the public, in case people want to do a deep dive into their models and research approaches.

The two parts of the system which I'm currently imagining addressing this are:

  1. An email where audio files can be sent, automatically run through Whsiper, and added to the alignment dataset github.
  2. Clear instructions for setting up a tool which captures audio from calls automatically (either a general tool or platform-specific advice), and makes it as easy as possible to send the right calls to the dataset platform.
Comment by plex (ete) on Announcing aisafety.training · 2023-01-19T12:33:31.314Z · LW · GW

I'm working on it. aisafety.world and /map are still a WIP, but very useful already.

Comment by plex (ete) on aisafety.community - A living document of AI safety communities · 2023-01-16T19:54:41.060Z · LW · GW

Updates!

  1. Have been continually adding to this, up to 49 online communities.
  2. Added a student groups section, with 12 entries and some extra fields (website, contact, calendar, mailing list), based on the AGISF list (in talks with the maintainer to set up painless syncing between the two systems).
  3. Made a form where you can easily add entries.

Still getting consistent traffic, happy to see it getting used :)

Comment by plex (ete) on Mid-Atlantic AI Alignment Alliance Unconference · 2023-01-14T16:43:13.079Z · LW · GW

Added to aisafety.training.