Posts
Comments
Hey, thanks for the reply. I think this is a very valuable response because there are certain things I would want to point out that I can now elucidate more clearly thanks to your push back.
First, I don't suggest that if we all just laughed and went about our lives everything would be okay. Indeed, if I thought that our actions were counterproductive at best, I'd advocate for something more akin to "walking away" as in Valentine's exit. There is a lot of work to be done and (yes) very little time to do it.
Second, the pattern I am noticing is something more akin to Rhys Ward's point about AI personhood. AI is not some neutral fact of our future that will be born "as is" no matter how hard we try one way or another. In our search for control and mastery over AI, we risk creating the things we fear the most. We fear AIs that are autonomous, ruthless, and myopic, but in trying to make controlled systems that pursue goals reliably without developing ideas of their own we end up creating autonomous, ruthless, and myopic systems. It's somewhat telling, for example, that AI safety really started to heat up when RL became a mainstream technique (raising fears about paperclip optimisers etc.), and yet the first alignment efforts for LLMs (which were manifestly not goal seeking or myopic) was to... add RL back to them, in the form of a value-agnostic technique (PPO/RLHF) that can be used to create anti aligned agents just as easily as it can be used to create aligned agents. Rhys Ward similarly talks about how personhood may be less risky from an x-risk perspective but also makes alignment more ethically questionable. The "good" and the "bad" visions for AI in this community are entwined.
As a smaller point, OpenAI definitely started as a "build the good AI" startup when Deepmind started taking off. Deepmind also started as a startup and Demis is very connected to the AI safety memeplex.
Finally, love as humans execute it is (in my mind) an imperfect instantation of a higher idea. It is true, we don't practice true omnibenevolence or universal love, or even love ourselves in a meaningful way a lot of the time, but I treat it as a direction to aim for, one that inspires us to do what we find most beautiful and meaningful rather than do what is most hateful and ugly.
P.S. sorry for not replying to all the other valuable comments in this section, I've been rather busy as of late, trying to do the things I preach etc.
Do not go gentle into that good night,
Old age should burn and rave at close of day;
Rage, rage against the dying of the light.
Though wise men at their end know dark is right,
Because their words had forked no lightning they
Do not go gentle into that good night.
Good men, the last wave by, crying how bright
Their frail deeds might have danced in a green bay,
Rage, rage against the dying of the light.
Wild men who caught and sang the sun in flight,
And learn, too late, they grieved it on its way,
Do not go gentle into that good night.
Grave men, near death, who see with blinding sight
Blind eyes could blaze like meteors and be gay,
Rage, rage against the dying of the light.
And you, my father, there on the sad height,
Curse, bless, me now with your fierce tears, I pray.
Do not go gentle into that good night.
Rage, rage against the dying of the light.
Do not go gentle into that good night, Dylan Thomas
I'm still fighting. I hope you can find the strength to too.
In my book this counts as severely neglected and very tractable ai safety research. Sorry that I don't have more to add but felt important to point it out.
Even so, it seems obvious to me that addressing the mysterious issue of the accelerating drivers is the primary crux in this scenario.
Epistemic status: This is a work of satire. I mean it---it is a mean-spirited and unfair assessment of the situation. It is also how, some days, I sincerely feel.
A minivan is driving down a mountain road, headed towards a cliff's edge with no guardrails. The driver floors the accelerator.
Passenger 1: "Perhaps we should slow down somewhat."
Passengers 2, 3, 4: "Yeah, that seems sensible."
Driver: "No can do. We're about to be late to the wedding."
Passenger 2: "Since the driver won't slow down, I should work on building rocket boosters so that (when we inevitably go flying off the cliff edge) the van can fly us to the wedding instead."
Passenger 3: "That seems expensive."
Passenger 2: "No worries, I've hooked up some funding from Acceleration Capital. With a few hours of tinkering we should get it done."
Passenger 1: "Hey, doesn't Acceleration Capital just want vehicles to accelerate, without regard to safety?"
Passenger 2: "Sure, but we'll steer the funding such that the money goes to building safe and controllable rocket boosters."
The van doesn't slow down. The cliff looks closer now.
Passenger 3: [looking at what Passenger 2 is building] "Uh, haven't you just made a faster engine?"
Passenger 2: "Don't worry, the engine is part of the fundamental technical knowledge we'll need to build the rockets. Also, the grant I got was for building motors, so we kinda have to build one."
Driver: "Awesome, we're gonna get to the wedding even sooner!" [Grabs the engine and installs it. The van speeds up.]
Passenger 1: "We're even less safe now!"
Passenger 3: "I'm going to start thinking about ways to manipulate the laws of physics such that (when we inevitably go flying off the cliff edge) I can manage to land us safely in the ocean."
Passenger 4: "That seems theoretical and intractable. I'm going to study the engine to figure out just how it's accelerating at such a frightening rate. If we understand the inner workings of the engine, we should be able to build a better engine that is more responsive to steering, therefore saving us from the cliff."
Passenger 1: "Uh, good luck with that, I guess?"
Nothing changes. The cliff is looming.
Passenger 1: "We're gonna die if we don't stop accelerating!"
Passenger 2: "I'm gonna finish the rockets after a few more iterations of making engines. Promise."
Passenger 3: "I think I have a general theory of relativity as it relates to the van worked out..."
Passenger 4: "If we adjust the gear ratio... Maybe add a smart accelerometer?"
Driver: "Look, we can discuss the benefits and detriments of acceleration over hors d'oeuvres at the wedding, okay?"
This is imo quite epistemically important.
It's definitely something I hadn't read before, so thank you. I would say to that article (on a skim) that it has clarified my thinking somewhat. I therefore question the law/toolbox dichotomy, since to me it seems that usefulness - accuracy-to-perceived reality are in fact two different axes. Thus you could imagine:
- A useful-and-inaccurate belief (e.g. what we call old wives tales, "red sky in morning, sailors take warning", herbal remedies that have medical properties but not because of what the "theory" dictates)
- A not-useful-but-accurate belief (when I pitch this baseball, the velocity is dependent on the space-time distortion created by earth's gravity well)
- A not-useful-and-not-accurate belief (bloodletting as a medical "treatment")
- And finally a useful-and-accurate belief (when I set up GPS satellites I should take into account time dilation)
And, of course, all of these are context dependent (sometimes you may be thinking about baseballs going at lightspeed)! I guess then my position is refined into: "category 4 is great if we can get it but for most cases category 1 is probably easier/better", which seems neither pure toolbox or pure law
Hey, thanks for responding! Re the physics analogy, I agree that improvements in our heuristics are a good thing:
However, perhaps you have already begun to anticipate what I will say—the benefit of heuristics is that they acknowledge (and are indeed dependent) on the presence of context. Unlike a “hard” theory, which must be applicable to all cases equally and fails in the event a single counter-example can be found, a “soft” heuristic is triggered only when the conditions are right: we do not use our “judge popular songs” heuristic when staring at a dinner menu.
It is precisely this contextual awareness that allows heuristics to evade the problems of naive probabilistic world-modelling, which lead to such inductive conclusions as the Turkey Illusion. This means that we avoid the pitfalls of treating spaghetti like a Taylor Swift song, and it also means (slightly more seriously) that we do not treat discussions with our parents like bargaining games to extract maximum expected value. Engineers and physicists employ Newton’s laws of motion not because they are universal laws, but because they are useful heuristics about how things move in our daily lives (i.e. when they are not moving at near light speed). Heuristics are what Chris Haufe called “techniques” in the last section: what we worry about is not their truthfulness, but their usefulness.
However, I disagree in that I don't think we're really moving towards some endpoint of "the underlying reality will end up agreeing with this model in many places while substantially improving our understanding in many others". Both because of the chaotic nature of the universe (which I strongly believe puts an upper bound on how well we can model systems without just doing atom by atom simulation to arbitrary precision) and because that's not how physics works in practice today. We have a pretty strong model for how macroscale physics works (General Relativity), but we willingly "drop it" for less accurate heuristics like Newtonian mechanics when it's more convenient/useful. Similarly, even if we understand the fundamentals of neuroscience completely, we may "drop it" for more heuristics driven approaches that are less absolutely accurate.
Because of this, I maintain my questioning of a general epistemic (and the attached instrumental) project for "rational living" etc.. It seems to me a better model of how we deal with things is like collecting tools for a toolbox, swapping them out for better ones as better ones come in, rather than moving towards some ideal perfect system of thinking. Perhaps that too is a form of rationalism, but at that point it's a pretty loose thing and most life philosophies can be called rationalisms of a sort...
(Note: On the other hand it seems pretty true that better heuristics are linked to better understandings of the world however they arise, so I remain strongly in support of the scientific community and the scientific endeavour. Maybe this is a self-contradiction!)
And as for the specific implications of "moral worth", here are a few:
- You take someone's opinions more seriously
- You treat them with more respect
- When you disagree, you take time to outline why and take time to pre-emptively "check yourself"
- When someone with higher moral worth is at risk you think this is a bigger problem, compared against the problem of a random person on earth being at risk
Thank you for the feed back! I am of course happy for people to copy over the essay
> Is this saying that human's goals and options (including options that come to mind) change depending on the environment, so rational choice theory doesn't apply?
More or less, yes, or at least that it becomes very hard to apply it in a way that isn't either highly subjective or essentially post-hoc arguing about what you ought to have done (hidden information/hindsight being 20/20)
> This is currently all I have time for; however, my current understanding is that there is a common interpretation of Yudowsky's writings/The sequences/LW/etc that leads to an over-reliance on formal systems that will invevitably fail people. I think you had this interpretation (do correct me if I'm wrong!), and this is your "attempt to renegotiate rationalism ".
I've definitely met people who take the more humble/humility/heuristics driven approach which I outline in the essay and still call themselves rationalists. On the other hand, I have also seen a whole lot of people take it as some kind of mystic formula to organise their lives around. I guess my general argument is that rationalism should not be constructed on top of such a formal basis (cf. the section about heuristics not theories in the essay) and then "watered down" to reintroduce ideas of humility or nuance or path-dependence. And in part 2 I argue that the core principles of rationalism as I see them (without the "watering down" of time and life experience) make it easy to fall down certain dangerous pathways.
Yeah, of course