LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

← previous page (newer posts) · next page (older posts) →

Recent comments

nikita-sokolsky on Einstein's Arrogance

This doesn’t take away from the point your post makes but there’s a small nitpick: there’s no proof that Einstein actually said that. It appears to be one those tongue in cheek stories about Einstein, we don’t have a contemporary source quoting him on that.

thane-ruthenis on Ilya Sutskever and Jan Leike resign from OpenAI [updated]

How were you already sure of this before the resignations actually happened?

OpenAI enthusiastically commercializing AI + the "Superalignment" approach being exactly the approach [LW(p) · GW(p)] I'd expect someone doing safety-washing to pick + the November 2023 drama + the stated trillion-dollar plans to increase worldwide chip production (which are directly at odds [LW · GW] with the way OpenAI previously framed its safety concerns).

Some of the preceding resignations (chiefly, Daniel Kokotajlo's) also played a role here, though I didn't update off of them much either.

algon on Fund me please - I Work so Hard that my Feet start Bleeding and I Need to Infiltrate University

Alas, I think it's quite unlikely that this article will make somebody fund me. It's just that I noticed how extremely slow I am (without collaborators) to create a proper grant application.

IDGI. Why don't you work w/ someone to get funding? If you're 15x more productive, then you've got a much better shot at finding/filling out grants and then getting funding for you and your partner.

EDIT:
Also, you're a game dev and hence good at programming. Surely you could work for free as an engineer at an AI alignment org or something and then shift into discussions w/ them about alignment?

jdp on Language Models Model Us

Of the abilities Janus demoed to me, this is probably the one that most convinced me GPT-3 does deep modeling of the data generator. The formulation they showed me guessed which famous authors an unknown author is most similar to. This is more useful because it doesn't require the model to know who the unknown author in particular is, just to know some famous author who is similar enough to invite comparison.

Twitter post I wrote about it:

https://x.com/jd_pressman/status/1617217831447465984

The prompt if you want to try it yourself. It used to be hard to find a base model to run this on but should now be fairly easy with LLaMa, Mixtral, et al.

https://gist.github.com/JD-P/632164a4a4139ad59ffc480b56f2cc99

robo on To Limit Impact, Limit KL-Divergence

I think the weakness with KL divergence is that the potentially harmful model can do things the safe model would be exponentially unlikely to do. Even if the safe model has a 1 in 1 trillion chance of stabbing me in the face, the KL penalty to stabbing me in the face is log(1 trillion) (and logs make even huge numbers small).

What about limiting the unknown model to chose one of the cumulative 98% most likely actions for the safe model to take? If the safe model never has more than a 1% chance of taking an action that will kill you, then the unknown model won't be able to take an action that kills you. This isn't terribly different from the Top-K sampling many language models use in practice.

thane-ruthenis on Ilya Sutskever and Jan Leike resign from OpenAI [updated]

Superalignment likely happened because (a) the safety faction (Ilya/Jan/etc.) wanted it, and (b) the Sam faction also wanted it, or tolerated it, or agreed to it due to perceived PR benefits (safety-washing), or let it happen as a result of internal negotiation/compromise, or something else, or some combination of these things.

Sure, that's basically my model as well. But if the faction (b) only cares about alignment due to perceived PR benefits or in order to appease faction (a), and faction (b) turns out to have overriding power such that it can destroy or drive out faction (a) and then curtail all the alignment efforts, I think it's fair to compress all that into "OpenAI's alignment efforts are safety-washing". If (b) has the real power within OpenAI, then OpenAI's behavior and values can be approximately rounded off to (b)'s behavior and values, and (a) is a rounding error.

If OAI as a whole was really only doing anything safety-adjacent for pure PR or virtue signaling reasons, I think its activities would have looked pretty different

Not if (b) is concerned about fortifying OpenAI against future challenges, such as hypothetical futures in which the AGI Doomsayers get their way and the government/the general public wakes up and tries to nationalize or ban AGI research. In that case, having a prepared, well-documented narrative of going above and beyond to ensure that their products are safe, well before any other parties woke up to the threat, will ensure that OpenAI is much more well-positioned to retain control over its research.

(I interpret Sam Altman's behavior at Congress as evidence for this kind of longer-term thinking. He didn't try to downplay the dangers of AI, which would be easy and what someone myopically optimizing for short-term PR would. He proactively brought up the concerns that future AI progress might awaken, getting ahead of it, and thereby established OpenAI as taking them seriously and put himself into the position to control/manage these concerns.)

And it's approximately what I would do, at least, if I were in charge of OpenAI and had a different model of AGI Ruin.

And this is the potential plot whose partial failure I'm currently celebrating.

dagon on "If we go extinct due to misaligned AI, at least nature will continue, right? ... right?"

If it's possible for super-intelligent AI to be non-sentient, wouldn't it be possible for insects to evolve non-sentient intelligence as well? I guess I didn't assume "non-sentient" in the definition of "unaligned".

elizabeth-1 on Stephen Fowler's Shortform

To go one step further, potentially any and every major decision they have played a part in needs to be reevaluated by objective third parties.

I like a lot of this post, but the sentence above seems very out of touch to me. Who are these third parties who are completely objective? Why is objective the adjective here, instead of "good judgement" or "predicted this problem at the time"?

mesaoptimizer on Stephen Fowler's Shortform

But the discussion of “repercussions” before there’s been an investigation goes into pure-scapegoating territory if you ask me.

Just to be clear, OP themselves seem to think that what they are saying will have little effect on the status quo. They literally called it "Very Spicy Take". Their intention was to allow them to express how they felt about the situation. I'm not sure why you find this threatening, because again, the people they think ideally wouldn't continue to have influence over AI safety related decisions are incredibly influential and will very likely continue to have the influence they currently possess. Almost everyone else in this thread implicitly models this fact as they are discussing things related to the OP comment.

There is not going to be any scapegoating that will occur. I imagine that everything I say is something I would say in person to the people involved, or to third parties, and not expect any sort of coordinated action to reduce their influence -- they are that irreplaceable to the community and to the ecosystem.

haiku-1 on robo's Shortform

Strong agree and strong upvote.

There are some efforts in the governance space and in the space of public awareness, but there should and can be much, much more.

My read of these survey results [LW · GW] is:
AI Alignment researchers are optimistic people by nature. Despite this, most of them don't think we're on track to solve alignment in time, and they are split on whether we will even make significant progress. Most of them also support pausing AI development to give alignment research time to catch up.

As for what to actually do about it: There are a lot of options, but I want to highlight PauseAI. (Disclosure: I volunteer with them. My involvement brings me no monetary benefit, and no net social benefit.) Their Discord server is highly active and engaged and is peopled with alignment researchers, community- and mass-movement organizers, experienced protesters, artists, developers, and a swath of regular people from around the world. They play the inside and outside game, both doing public outreach and also lobbying policymakers.

On that note, I also want to put a spotlight on the simple action of sending emails to policymakers. Doing so and following through is extremely OP (i.e. has much more utility than you might expect), and can result in face-to-face meetings to discuss the nature of AI x-risk and what they can personally do about. Genuinely, my model of a world in 2040 that contains humans is almost always one in which a lot more people sent emails to politicians.

LessWrong 2.0 Reader

Archive

Recent comments