LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

← previous page (newer posts) · next page (older posts) →

Recent comments

seth-herd on Instruction-following AGI is easier and more likely than value aligned AGI

That would be fine by me if it were a stable long-term situation, but I don't think it is. It sounds like you're thinking mostly of AI and not AGI that can self-improve at some point. My major point in this post is that the same logic about following human instructiosn applies to AGI, but that's vastly more dangerous to have proliferate. There won't have to be many RSI-capable AGIs before someone tells their AGI "figure out how to take over the world and turn it into my utopia, before some other AGI turns it into theirs". It seems like the game theory will resemble the nuclear standoff, but without the mutually assured destruction aspect that prevents deployment. The incentives will be to be the first mover to prevent others from deploying AGIs in ways you don't like.

mike_hawke on Ilya Sutskever and Jan Leike resign from OpenAI

Even acknowledging that the NDA exists is a violation of it.

This sticks out pretty sharply to me.

Was this explained to the employees during the hiring process? What kind of precedent is there for this kind of NDA?

wei-dai on Ilya Sutskever and Jan Leike resign from OpenAI

So these resignations don’t negatively impact my p(doom) in the obvious way. The alignment people at OpenAI were already powerless to do anything useful regarding changing the company direction.

How were you already sure of this before the resignations actually happened? I of course had my own suspicions that this was the case, but was uncertain enough that the resignations are still a significant negative update.

ETA: Perhaps worth pointing out here that Geoffrey Irving recently left Google DeepMind to be Research Director at UK AISI, but seemingly on good terms (since Google DeepMind recently reaffirmed its intention to collaborate with UK AISI).

wei-dai on Wei Dai's Shortform

Bad: AI developers haven't taken alignment seriously enough to have invested enough in scalable oversight, and/or those techniques are unworkable or too costly, causing them to be unavailable.

Turns out at least one scalable alignment team has been struggling for resources. From Jan Leike (formerly co-head of Superalignment at OpenAI):

Over the past few months my team has been sailing against the wind. Sometimes we were struggling for compute and it was getting harder and harder to get this crucial research done.

Even worse, apparently the whole Superalignment team has been disbanded.

mesaoptimizer on mesaoptimizer's Shortform

If your endgame strategy involved relying on OpenAI, DeepMind, or Anthropic to implement your alignment solution that solves science / super-cooperation / nanotechnology, consider figuring out another endgame plan.

matthew-barnett on Instruction-following AGI is easier and more likely than value aligned AGI

Yes, but I don't consider this outcome very pessimistic because this is already what the current world looks like. How commonly do businesses work for the common good of all humanity, rather than for the sake of their shareholders? The world is not a utopia, but I guess that's something I've already gotten used to.

jay-bailey on Deep Q-Networks Explained

Thanks for this! I've changed the sentence to:

The target network gets to see one more step than the Q-network does, and thus is a better predictor.

Hopefully this prevents others from the same confusion :)

brendan-long on Is There Really a Child Penalty in the Long Run?

This is what I came to ask about. Randomizing based on health and then finding that the healthier group makes more despite other factors seems like it doesn't really prove the thing the paper is claiming.

Although the fact that wages matched between the groups beforehand is pretty interesting.

daphne_w on Ilya Sutskever and Jan Leike resign from OpenAI

On third order, people who openly worry about X-Risk may get influenced by their environment, becoming less worried as a result of staying with a company whose culture denies X-Risk, which could eventually even cause them to contribute negatively to AI Safety. Preventing them from getting hired prevents this.

review-bot on Gender Exploration

The LessWrong Review [? · GW] runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year. Will this post make the top fifty?

LessWrong 2.0 Reader

Archive

Recent comments