LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

next page (older posts) →

Recent comments

beck-stein on Funny Anecdote of Eliezer From His Sister

I am being told that Sheva Brachos in this example is the series of celebrations in the week after the wedding. I don't know if that's a correction or just context, but there you go.

danielfilan on DanielFilan's Shortform Feed

Frankfurt-style counterexamples for definitions of optimization

In "Bottle Caps Aren't Optimizers", I wrote about a type of definition of optimization that says system S is optimizing for goal G iff G has a higher value than it would if S didn't exist or were randomly scrambled. I argued against these definitions by providing a examples of systems that satisfy the criterion but are not optimizers. But today, I realized that I could repurpose Frankfurt cases to get examples of optimizers that don't satisfy this criterion.

A Frankfurt case is a thought experiment designed to disprove the following intuitive principle: "a person is morally responsible for what she has done only if she could have done otherwise." Here's the basic idea: suppose Alice is considering whether or not to kill Bob. Upon consideration, she decides to do so, takes out her gun, and shoots Bob. But little-known to her, a neuroscientist had implanted a chip in her brain that would have forced her to shoot Bob if she had decided not to. That said, the chip didn't activate, because she did decide to shoot Bob. The idea is that she's morally responsible, even tho she couldn't have done otherwise.

Anyway, let's do this with optimizers. Suppose I'm playing Go, thinking about how to win - imagining what would happen if I played various moves, and playing moves that make me more likely to win. Further suppose I'm pretty good at it. You might want to say I'm optimizing my moves to win the game. But suppose that, unbeknownst to me, behind my shoulder is famed Go master Shin Jinseo. If I start playing really bad moves, or suddenly die or vanish etc, he will play my moves, and do an even better job at winning. Now, if you remove me or randomly rearrange my parts, my side is actually more likely to win the game. But that doesn't mean I'm optimizing to lose the game! So this is another way such definitions of optimizers are wrong.

That said, other definitions treat this counter-example well. E.g. I think the one given in "The ground of optimization" says that I'm optimizing to win the game (maybe only if I'm playing a weaker opponent).

metachirality on LessOnline (May 31—June 2, Berkeley, CA)

Isn't TLP's email on his website?

chipmonk on Key takeaways from our EA and alignment research surveys

How much higher was the scoring on neuroticism than the general population?

chipmonk on Key takeaways from our EA and alignment research surveys

How many alignment researchers do you think there are total? What % do you think this survey hit that you wanted it to hit?

porby on Does reducing the amount of RL for a given capability level make AI safer?

But I disagree that there’s no possible RL system in between those extremes where you can have it both ways.

I don't disagree. For clarity, I would make these claims, and I do not think they are in tension:

Something being called "RL" alone is not the relevant question for risk. It's how much space the optimizer has to roam.
MuZero-like strategies are free to explore more space than something like current applications of RLHF. Improved versions of these systems working in more general environments have the capacity to do surprising things and will tend to be less 'bound' in expectation than RLHF. Because of that extra space, these approaches are more concerning in a fully general and open-ended environment.
MuZero-like strategies remain very distant from a brute-forced policy search, and that difference matters a lot in practice.
Regardless of the category of the technique, safe use requires understanding the scope of its optimization. This is not the same as knowing what specific strategies it will use. For example, despite finding unforeseen strategies, you can reasonably claim that MuZero (in its original form and application) will not be deceptively aligned to its task.
Not all applications of tractable RL-like algorithms are safe or wise.
There do exist safe applications of RL-like algorithms.

migueldev on [deleted]

I created my first fold. I'm not sure if this is something to be happy with as everybody can do it now.

migueldev on [deleted]

Access to Alpha fold 3: https://golgi.sandbox.google.com/

Is allowing the world access to Alpha Fold 3 a great idea? I don't know how this works but I can imagine a highly motivated bad actor can start from scratch by simply googling/LLM querying/Multi-modal querying each symbol in this image.

ryan_greenblatt on jacquesthibs's Shortform

I do think that many of the safety advantages of LLMs come from their understanding of human intentions (and therefore implied values).

Did you mean something different than "AIs understand our intentions" (e.g. maybe you meant that humans can understand the AI's intentions?).

I think future more powerful AIs will surely be strictly better at understanding what humans intend.

jiao-bu on Dating Roundup #3: Third Time’s the Charm

I am perfectly happy that the patriarchal roles are no longer shackling women. I would not like to roll back time, personally, on these matters. I hope my question doesn't come across this way -- it is just that I am confused about expectations.

LessWrong 2.0 Reader

Archive

Recent comments