Mikhail Samin's Shortform
post by Mikhail Samin (mikhail-samin) · 2023-02-07T15:30:24.006Z · LW · GW · 2 commentsContents
2 comments
2 comments
Comments sorted by top scores.
comment by Mikhail Samin (mikhail-samin) · 2024-03-28T22:06:53.516Z · LW(p) · GW(p)
People are arguing about the answer to the Sleeping Beauty! I thought this was pretty much dissolved with this post's title [LW · GW]! But there are lengthy posts [LW · GW] and even a prediction market!
Sleeping Beauty is an edge case where different reward structures are intuitively possible, and so people imagine different game payout structures behind the definition of “probability”. Once the payout structure is fixed, the confusion is gone. With a fixed payout structure&preference framework rewarding the number you output as “probability”, people don’t have a disagreement about what is the best number to output. Sleeping beauty is about definitions.)
And still, I see posts arguing that if a tree falls on a deaf Sleeping Beauty, in a forest with no one to hear it, it surely doesn’t produce a sound, because here’s how humans perceive sounds, which is the definition of a sound, and there are demonstrably no humans around the tree. (Or maybe that it surely produces the sound because here’s the physics of the sound waves, and the tree surely abides by the laws of physics, and there are demonstrably sound waves.)
This is arguing about definitions. You feel strongly that “probability” is that thing that triggers the “probability” concept neuron in your brain. If people have a different concept triggering “this is probability”, you feel like they must be wrong, because they’re pointing at something they say is a sound and you say isn’t.
Probability is something defined in math by necessity. There’s only one way to do it to not get exploited in natural betting schemes/reward structures that everyone accepts when there are no anthropics involved. But if there are multiple copies of the agent, there’s no longer a single possible betting scheme defining a single possible “probability”, and people draw the boundary/generalise differently in this situation.
You all should just call these two probabilities two different words instead of arguing which one is the correct definition for "probability".
comment by Mikhail Samin (mikhail-samin) · 2023-02-07T15:30:24.267Z · LW(p) · GW(p)
[RETRACTED after Scott Aaronson’s reply by email]
I'm surprised by Scott Aaronson's approach to alignment. He has mentioned in a talk that a research field needs to have at least one of two: experiments or a rigorous mathematical theory, and so he's focusing on the experiments that are possible to do with the current AI systems.
The alignment problem is centered around optimization producing powerful consequentialist agents appearing when you're searching in spaces with capable agents. The dynamics at the level of superhuman general agents are not something you get to experiment with (more than once); and we do indeed need a rigorous mathematical theory that would describe the space and point at parts of it that are agents aligned with us.
[removed]
I'm disappointed that, currently, only Infra-Bayesianism tries to achieve that[1], that I don't see dozens of other research directions trying to have a rigorous mathematical theory that would provide desiderata for AGI training setups, and that even actual scientists entering the field [removed].
- ^
Infra-Bayesianism is an approach that tries to describe agents in a way that would closely resemble the behaviour of AGIs, starting with a way you can model them having probabilities about the world in a computable way that solves non-realizability in RL (short explanation [LW · GW], a sequence with equations and proofs [? · GW]) and making decisions in a way that optimization processes would select for, and continuing with a formal theory of naturalized induction [LW · GW] and, finally, a proposal for alignment protocol [LW · GW].
To be clear, I don't expect Infra-Bayesianism to produce an answer to what loss functions should be used to train an aligned AGI in the time that we have remaining; but I'd expect that if there were a hundred research directions like that, trying to come up with a rigorous mathematical theory that successfully attacks the problem, with thousands of people working on them, some would succeed.