Posts

testingthewaters's Shortform 2025-02-10T02:06:40.503Z
A concise definition of what it means to win 2025-01-25T06:37:37.305Z
The Monster in Our Heads 2025-01-19T23:58:11.251Z
Some Comments on Recent AI Safety Developments 2024-11-09T16:44:58.936Z
Changing the Mind of an LLM 2024-10-11T22:25:37.464Z
The Existential Dread of Being a Powerful AI System 2024-09-26T10:56:32.904Z
Turning 22 in the Pre-Apocalypse 2024-08-22T20:28:25.794Z
How AI Fails Us: A non-technical view of the Alignment Problem 2022-11-18T19:02:42.056Z

Comments

Comment by testingthewaters on Make Superintelligence Loving · 2025-02-23T12:53:07.290Z · LW · GW

Yeah, that's basically the conclusion I came to awhile ago. Either it loves us or we're toast. I call it universal love or pathos.

Comment by testingthewaters on If Neuroscientists Succeed · 2025-02-12T15:19:50.811Z · LW · GW

This seems like very important and neglected work, I hope you get the funds to continue.

Comment by testingthewaters on testingthewaters's Shortform · 2025-02-10T15:19:59.948Z · LW · GW

Yeah, definitely. My main gripe where I see people disregarding unknown unknowns is a similar one to yours- people who present definite worked out pictures of the future.

Comment by testingthewaters on testingthewaters's Shortform · 2025-02-10T02:06:40.501Z · LW · GW

Note to self: If you think you know where your unknown unknowns sit in your ontology, you don't. That's what makes them unknown unknowns.

If you think that you have a complete picture of some system, you can still find yourself surprised by unknown unknowns. That's what makes them unknown unknowns.

If your internal logic has almost complete predictive power, plus or minus a tiny bit of error, your logical system (but mostly not your observations) can still be completely overthrown by unknown unknowns. That's what makes them unknown unknowns.

You can respect unknown unknowns, but you can't plan around them. That's... You get it by now.

Therefore I respectfully submit that anyone who presents me with a foolproof and worked-out plan of the next ten/hundred/thousand/million years has failed to take into account some unknown unknowns.

Comment by testingthewaters on Shortform · 2025-02-08T00:54:09.187Z · LW · GW

The problem here is that you are dealing with survival necessities rather than trade goods. The outcome of this trade, if both sides honour the agreement, is that the scope insensitive humans die and their society is extinguished. The analogous situation here is that you know there will be a drought in say 10 years. The people of the nearby village are "scope insensitive", they don't know the drought is coming. Clearly the moral thing to do if you place any value on their lives is to talk to them, clear the information gap, and share access to resources. Failing that, you can prepare for the eventuality that they do realise the drought is happening and intervene to help them at that point.

Instead you propose exploiting their ignorance to buy up access to the local rivers and reservoirs. The implication here is that you are leaving them to die, or at least putting them at your mercy, by exploiting their lack of information. What's more, the process by which you do this turns a common good (the stars, the water) into a private good, such that when they realise the trouble they have no way out. If your plan succeeds, when their stars run out they will curse you and die in the dark. It is a very slow but calculated form of murder.

By the way, the easy resolution is to not buy up all the stars. If they're truly scope insensitive they won't be competing until after the singularity/uplift anyways, and then you can equitably distribute the damn resources.

As a side note: I think I fell for rage bait. This feels calculated to make me angry, and I don't like it.

Comment by testingthewaters on Shortform · 2025-02-07T18:26:11.609Z · LW · GW

Except that's a false dichotomy (between spending energy to "uplift" them or dealing treacherously with them). All it takes to not be a monster who obtains a stranglehold over all the watering holes in the desert is a sense of ethics that holds you to the somewhat reasonably low bar of "don't be a monster". The scope sensitivity or lack thereof of the other party is in some sense irrelevant.

Comment by testingthewaters on Shortform · 2025-02-07T15:12:34.731Z · LW · GW

The question as stated can be rephrased as "Should EAs establish a strategic stranglehold over all future resources necessary to sustain life using a series of unequal treaties, since other humans will be too short sighted/insensitive to scope/ignorant to realise the importance of these resources in the present day?"

And people here wonder why these other humans see EAs as power hungry.

Comment by testingthewaters on The Monster in Our Heads · 2025-01-27T00:15:27.359Z · LW · GW

Hey, thanks for the reply. I think this is a very valuable response because there are certain things I would want to point out that I can now elucidate more clearly thanks to your push back.

First, I don't suggest that if we all just laughed and went about our lives everything would be okay. Indeed, if I thought that our actions were counterproductive at best, I'd advocate for something more akin to "walking away" as in Valentine's exit. There is a lot of work to be done and (yes) very little time to do it.

Second, the pattern I am noticing is something more akin to Rhys Ward's point about AI personhood. AI is not some neutral fact of our future that will be born "as is" no matter how hard we try one way or another. In our search for control and mastery over AI, we risk creating the things we fear the most. We fear AIs that are autonomous, ruthless, and myopic, but in trying to make controlled systems that pursue goals reliably without developing ideas of their own we end up creating autonomous, ruthless, and myopic systems. It's somewhat telling, for example, that AI safety really started to heat up when RL became a mainstream technique (raising fears about paperclip optimisers etc.), and yet the first alignment efforts for LLMs (which were manifestly not goal seeking or myopic) was to... add RL back to them, in the form of a value-agnostic technique (PPO/RLHF) that can be used to create anti aligned agents just as easily as it can be used to create aligned agents. Rhys Ward similarly talks about how personhood may be less risky from an x-risk perspective but also makes alignment more ethically questionable. The "good" and the "bad" visions for AI in this community are entwined.

As a smaller point, OpenAI definitely started as a "build the good AI" startup when Deepmind started taking off. Deepmind also started as a startup and Demis is very connected to the AI safety memeplex.

Finally, love as humans execute it is (in my mind) an imperfect instantation of a higher idea. It is true, we don't practice true omnibenevolence or universal love, or even love ourselves in a meaningful way a lot of the time, but I treat it as a direction to aim for, one that inspires us to do what we find most beautiful and meaningful rather than do what is most hateful and ugly.

P.S. sorry for not replying to all the other valuable comments in this section, I've been rather busy as of late, trying to do the things I preach etc.

Comment by testingthewaters on Benito's Shortform Feed · 2025-01-25T06:14:45.087Z · LW · GW

Do not go gentle into that good night,

Old age should burn and rave at close of day;

Rage, rage against the dying of the light.

Though wise men at their end know dark is right,

Because their words had forked no lightning they

Do not go gentle into that good night.

Good men, the last wave by, crying how bright

Their frail deeds might have danced in a green bay,

Rage, rage against the dying of the light.

Wild men who caught and sang the sun in flight,

And learn, too late, they grieved it on its way,

Do not go gentle into that good night.

Grave men, near death, who see with blinding sight

Blind eyes could blaze like meteors and be gay,

Rage, rage against the dying of the light.

And you, my father, there on the sad height,

Curse, bless, me now with your fierce tears, I pray.

Do not go gentle into that good night.

Rage, rage against the dying of the light.

Do not go gentle into that good night, Dylan Thomas

I'm still fighting. I hope you can find the strength to too.

Comment by testingthewaters on Why modelling multi-objective homeostasis is essential for AI alignment (and how it helps with AI safety as well) · 2025-01-14T07:54:49.391Z · LW · GW

In my book this counts as severely neglected and very tractable ai safety research. Sorry that I don't have more to add but felt important to point it out.

Comment by testingthewaters on The Field of AI Alignment: A Postmortem, and What To Do About It · 2024-12-29T08:10:14.015Z · LW · GW

Even so, it seems obvious to me that addressing the mysterious issue of the accelerating drivers is the primary crux in this scenario.

Comment by testingthewaters on The Field of AI Alignment: A Postmortem, and What To Do About It · 2024-12-26T21:24:24.025Z · LW · GW

Epistemic status: This is a work of satire. I mean it---it is a mean-spirited and unfair assessment of the situation. It is also how, some days, I sincerely feel.

A minivan is driving down a mountain road, headed towards a cliff's edge with no guardrails. The driver floors the accelerator.

Passenger 1: "Perhaps we should slow down somewhat."

Passengers 2, 3, 4: "Yeah, that seems sensible."

Driver: "No can do. We're about to be late to the wedding."

Passenger 2: "Since the driver won't slow down, I should work on building rocket boosters so that (when we inevitably go flying off the cliff edge) the van can fly us to the wedding instead."

Passenger 3: "That seems expensive."

Passenger 2: "No worries, I've hooked up some funding from Acceleration Capital. With a few hours of tinkering we should get it done."

Passenger 1: "Hey, doesn't Acceleration Capital just want vehicles to accelerate, without regard to safety?"

Passenger 2: "Sure, but we'll steer the funding such that the money goes to building safe and controllable rocket boosters."

The van doesn't slow down. The cliff looks closer now.

Passenger 3: [looking at what Passenger 2 is building] "Uh, haven't you just made a faster engine?"

Passenger 2: "Don't worry, the engine is part of the fundamental technical knowledge we'll need to build the rockets. Also, the grant I got was for building motors, so we kinda have to build one."

Driver: "Awesome, we're gonna get to the wedding even sooner!" [Grabs the engine and installs it. The van speeds up.]

Passenger 1: "We're even less safe now!"

Passenger 3: "I'm going to start thinking about ways to manipulate the laws of physics such that (when we inevitably go flying off the cliff edge) I can manage to land us safely in the ocean."

Passenger 4: "That seems theoretical and intractable. I'm going to study the engine to figure out just how it's accelerating at such a frightening rate. If we understand the inner workings of the engine, we should be able to build a better engine that is more responsive to steering, therefore saving us from the cliff."

Passenger 1: "Uh, good luck with that, I guess?"

Nothing changes. The cliff is looming.

Passenger 1: "We're gonna die if we don't stop accelerating!"

Passenger 2: "I'm gonna finish the rockets after a few more iterations of making engines. Promise."

Passenger 3: "I think I have a general theory of relativity as it relates to the van worked out..."

Passenger 4: "If we adjust the gear ratio... Maybe add a smart accelerometer?"

Driver: "Look, we can discuss the benefits and detriments of acceleration over hors d'oeuvres at the wedding, okay?"

Comment by testingthewaters on The o1 System Card Is Not About o1 · 2024-12-14T00:21:59.653Z · LW · GW

This is imo quite epistemically important.

Comment by testingthewaters on Turning 22 in the Pre-Apocalypse · 2024-08-24T10:40:52.860Z · LW · GW

It's definitely something I hadn't read before, so thank you. I would say to that article (on a skim) that it has clarified my thinking somewhat. I therefore question the law/toolbox dichotomy, since to me it seems that usefulness - accuracy-to-perceived reality are in fact two different axes. Thus you could imagine:

  • A useful-and-inaccurate belief (e.g. what we call old wives tales, "red sky in morning, sailors take warning", herbal remedies that have medical properties but not because of what the "theory" dictates) 
  • A not-useful-but-accurate belief (when I pitch this baseball, the velocity is dependent on the space-time distortion created by earth's gravity well)
  • A not-useful-and-not-accurate belief (bloodletting as a medical "treatment")
  • And finally a useful-and-accurate belief (when I set up GPS satellites I should take into account time dilation)

And, of course, all of these are context dependent (sometimes you may be thinking about baseballs going at lightspeed)! I guess then my position is refined into: "category 4 is great if we can get it but for most cases category 1 is probably easier/better", which seems neither pure toolbox or pure law

Comment by testingthewaters on Turning 22 in the Pre-Apocalypse · 2024-08-23T23:19:40.887Z · LW · GW

Hey, thanks for responding! Re the physics analogy, I agree that improvements in our heuristics are a good thing:

However, perhaps you have already begun to anticipate what I will say—the benefit of heuristics is that they acknowledge (and are indeed dependent) on the presence of context. Unlike a “hard” theory, which must be applicable to all cases equally and fails in the event a single counter-example can be found, a “soft” heuristic is triggered only when the conditions are right: we do not use our “judge popular songs” heuristic when staring at a dinner menu.

It is precisely this contextual awareness that allows heuristics to evade the problems of naive probabilistic world-modelling, which lead to such inductive conclusions as the Turkey Illusion. This means that we avoid the pitfalls of treating spaghetti like a Taylor Swift song, and it also means (slightly more seriously) that we do not treat discussions with our parents like bargaining games to extract maximum expected value. Engineers and physicists employ Newton’s laws of motion not because they are universal laws, but because they are useful heuristics about how things move in our daily lives (i.e. when they are not moving at near light speed). Heuristics are what Chris Haufe called “techniques” in the last section: what we worry about is not their truthfulness, but their usefulness.

However, I disagree in that I don't think we're really moving towards some endpoint of "the underlying reality will end up agreeing with this model in many places while substantially improving our understanding in many others". Both because of the chaotic nature of the universe (which I strongly believe puts an upper bound on how well we can model systems without just doing atom by atom simulation to arbitrary precision) and because that's not how physics works in practice today. We have a pretty strong model for how macroscale physics works (General Relativity), but we willingly "drop it" for less accurate heuristics like Newtonian mechanics when it's more convenient/useful. Similarly, even if we understand the fundamentals of neuroscience completely, we may "drop it" for more heuristics driven approaches that are less absolutely accurate. 

Because of this, I maintain my questioning of a general epistemic (and the attached instrumental) project for "rational living" etc.. It seems to me a better model of how we deal with things is like collecting tools for a toolbox, swapping them out for better ones as better ones come in, rather than moving towards some ideal perfect system of thinking. Perhaps that too is a form of rationalism, but at that point it's a pretty loose thing and most life philosophies can be called rationalisms of a sort...

(Note: On the other hand it seems pretty true that better heuristics are linked to better understandings of the world however they arise, so I remain strongly in support of the scientific community and the scientific endeavour. Maybe this is a self-contradiction!)

Comment by testingthewaters on Turning 22 in the Pre-Apocalypse · 2024-08-23T23:10:12.597Z · LW · GW

And as for the specific implications of "moral worth", here are a few:

  • You take someone's opinions more seriously
  • You treat them with more respect
  • When you disagree, you take time to outline why and take time to pre-emptively "check yourself"
  • When someone with higher moral worth is at risk you think this is a bigger problem, compared against the problem of a random person on earth being at risk
Comment by testingthewaters on Turning 22 in the Pre-Apocalypse · 2024-08-23T11:22:49.682Z · LW · GW

Thank you for the feed back! I am of course happy for people to copy over the essay

> Is this saying that human's goals and options (including options that come to mind) change depending on the environment, so rational choice theory doesn't apply?

More or less, yes, or at least that it becomes very hard to apply it in a way that isn't either highly subjective or essentially post-hoc arguing about what you ought to have done (hidden information/hindsight being 20/20)

> This is currently all I have time for; however, my current understanding is that there is a common interpretation of Yudowsky's writings/The sequences/LW/etc that leads to an over-reliance on formal systems that will invevitably fail people. I think you had this interpretation (do correct me if I'm wrong!), and this is your "attempt to renegotiate rationalism ". 

I've definitely met people who take the more humble/humility/heuristics driven approach which I outline in the essay and still call themselves rationalists. On the other hand, I have also seen a whole lot of people take it as some kind of mystic formula to organise their lives around. I guess my general argument is that rationalism should not be constructed on top of such a formal basis (cf. the section about heuristics not theories in the essay) and then "watered down" to reintroduce ideas of humility or nuance or path-dependence. And in part 2 I argue that the core principles of rationalism as I see them (without the "watering down" of time and life experience) make it easy to fall down certain dangerous pathways.

Comment by testingthewaters on Turning 22 in the Pre-Apocalypse · 2024-08-23T11:19:21.798Z · LW · GW

Yeah, of course