Mikhail Samin's Shortform

post by Mikhail Samin (mikhail-samin) · 2023-02-07T15:30:24.006Z · LW · GW · 4 comments

Contents

4 comments

4 comments

Comments sorted by top scores.

comment by Mikhail Samin (mikhail-samin) · 2025-02-01T18:49:13.641Z · LW(p) · GW(p)

Anthropic employees: stop deferring to Dario on politics. Think for yourself.

Do your company's actions actually make sense if it is optimizing for what you think it is optimizing for?

Anthropic lobbied against mandatory RSPs, against regulation, and, for the most part, didn't even support SB-1047. The difference between Jack Clark and OpenAI's lobbyists is that publicly, Jack Clark talks about alignment. But when they talk to government officials, there's little difference on the question of existential risk from smarter-than-human AI systems. They do not honestly tell the governments what the situation is like. Ask them yourself.

A while ago, OpenAI hired a lot of talent due to its nonprofit structure.

Anthropic is now doing the same. They publicly say the words that attract EAs and rats. But it's very unclear whether they institutionally care.

Dozens work at Anthropic on AI capabilities because they think it is net-positive to get Anthropic at the frontier, even though they wouldn't work on capabilities at OAI or GDM.

It is not net-positive.

Anthropic is not our friend. Some people there do very useful work on AI safety (where "useful" mostly means "shows that the predictions of MIRI-style thinking are correct and we don't live in a world where alignment is easy", not "increases the chance of aligning superintelligence within a short timeframe"), but you should not work there on AI capabilities.

Anthropic's participation in the race makes everyone fall dead sooner and with a higher probability.

Work on alignment at Anthropic if you must. I don't have strong takes on that. But don't do work for them that advances AI capabilities.

Replies from: jbash
comment by jbash · 2025-02-01T21:44:29.772Z · LW(p) · GW(p)

But it's very unclear whether they institutionally care.

There are certain kinds of things that it's essentially impossible for any institution to effectively care about.

comment by Mikhail Samin (mikhail-samin) · 2024-03-28T22:06:53.516Z · LW(p) · GW(p)

People are arguing about the answer to the Sleeping Beauty! I thought this was pretty much dissolved with this post's title [LW · GW]! But there are lengthy posts [LW · GW] and even a prediction market!

Sleeping Beauty is an edge case where different reward structures are intuitively possible, and so people imagine different game payout structures behind the definition of “probability”. Once the payout structure is fixed, the confusion is gone. With a fixed payout structure&preference framework rewarding the number you output as “probability”, people don’t have a disagreement about what is the best number to output. Sleeping beauty is about definitions.)

And still, I see posts arguing that if a tree falls on a deaf Sleeping Beauty, in a forest with no one to hear it, it surely doesn’t produce a sound, because here’s how humans perceive sounds, which is the definition of a sound, and there are demonstrably no humans around the tree. (Or maybe that it surely produces the sound because here’s the physics of the sound waves, and the tree surely abides by the laws of physics, and there are demonstrably sound waves.)

This is arguing about definitions. You feel strongly that “probability” is that thing that triggers the “probability” concept neuron in your brain. If people have a different concept triggering “this is probability”, you feel like they must be wrong, because they’re pointing at something they say is a sound and you say isn’t.

Probability is something defined in math by necessity. There’s only one way to do it to not get exploited in natural betting schemes/reward structures that everyone accepts when there are no anthropics involved. But if there are multiple copies of the agent, there’s no longer a single possible betting scheme defining a single possible “probability”, and people draw the boundary/generalise differently in this situation.

You all should just call these two probabilities two different words instead of arguing which one is the correct definition for "probability".

comment by Mikhail Samin (mikhail-samin) · 2023-02-07T15:30:24.267Z · LW(p) · GW(p)

[RETRACTED after Scott Aaronson’s reply by email]

I'm surprised by Scott Aaronson's approach to alignment. He has mentioned in a talk that a research field needs to have at least one of two: experiments or a rigorous mathematical theory, and so he's focusing on the experiments that are possible to do with the current AI systems.

The alignment problem is centered around optimization producing powerful consequentialist agents appearing when you're searching in spaces with capable agents. The dynamics at the level of superhuman general agents are not something you get to experiment with (more than once); and we do indeed need a rigorous mathematical theory that would describe the space and point at parts of it that are agents aligned with us.

[removed]

I'm disappointed that, currently, only Infra-Bayesianism tries to achieve that[1], that I don't see dozens of other research directions trying to have a rigorous mathematical theory that would provide desiderata for AGI training setups, and that even actual scientists entering the field [removed].

  1. ^

    Infra-Bayesianism is an approach that tries to describe agents in a way that would closely resemble the behaviour of AGIs, starting with a way you can model them having probabilities about the world in a computable way that solves non-realizability in RL (short explanation [LW · GW], a sequence with equations and proofs [? · GW]) and making decisions in a way that optimization processes would select for, and continuing with a formal theory of naturalized induction [LW · GW] and, finally, a proposal for alignment protocol [LW · GW].

    To be clear, I don't expect Infra-Bayesianism to produce an answer to what loss functions should be used to train an aligned AGI in the time that we have remaining; but I'd expect that if there were a hundred research directions like that, trying to come up with a rigorous mathematical theory that successfully attacks the problem, with thousands of people working on them, some would succeed.