testingthewaters

Posts
Comments

Posts

A Letter to His Highness Louis XV, the King of France 2025-04-22T00:51:12.090Z

The Fork in the Road 2025-03-15T17:36:37.503Z

testingthewaters's Shortform 2025-02-10T02:06:40.503Z

A concise definition of what it means to win 2025-01-25T06:37:37.305Z

The Monster in Our Heads 2025-01-19T23:58:11.251Z

Some Comments on Recent AI Safety Developments 2024-11-09T16:44:58.936Z

Changing the Mind of an LLM 2024-10-11T22:25:37.464Z

The Existential Dread of Being a Powerful AI System 2024-09-26T10:56:32.904Z

Turning 22 in the Pre-Apocalypse 2024-08-22T20:28:25.794Z

How AI Fails Us: A non-technical view of the Alignment Problem 2022-11-18T19:02:42.056Z

Comments

Comment by testingthewaters on A Dissent on Honesty · 2025-04-19T11:35:52.529Z · LW · GW

For my part, I didn't realise it became so heavily downvoted, but I did not mean it at all in an accusatory or moralizing manner. I also, upon reflection, don't regret posting it.

Comment by testingthewaters on A Dissent on Honesty · 2025-04-17T10:48:36.219Z · LW · GW

I'm sorry that your life has come to a point where you might feel like adopting a fake personality and regularly deceiving those around you is how to be happy and get what you want. It's almost impossible to deliver good advice to a stranger over the internet, but I hope that one day you can find joy and fulfillment in being yourself.

Comment by testingthewaters on OpenAI #13: Altman at TED and OpenAI Cutting Corners on Safety Testing · 2025-04-15T16:03:25.544Z · LW · GW

I think I can indeed forsee the future where OpenAI is helping the Pentagon with its AI weapons. I expect this to happen. I want to be clear that I don’t think this is a bad thing. The risk is in developing highly capable AIs in the first place. As I have said before, Autonomous Killer Robots and AI-assisted weapons in general are not how we lose control over the future to AI, and failing to do so is a key way America can fall behind. It’s not like our rivals are going to hold back. To the extent that the AI weapons scare the hell out of everyone? That’s a feature.

This is giving "guns don't kill people, people kill people" energy. Sure, ai takeovers can happen without skynet, but skynet sure makes it easier.

Comment by testingthewaters on Why does LW not put much more focus on AI governance and outreach? · 2025-04-12T20:19:20.900Z · LW · GW

The simple answer is related to the population and occupation of the modal lesswrong viewer, and hence the modal lesswrong commenter, and upvoter. The site culture also tends towards skepticism and pessimism of institutions (I do not make a judgement on whether this valence is justified). I however also agree that this is important to at least discuss.

Comment by testingthewaters on testingthewaters's Shortform · 2025-04-11T14:36:10.528Z · LW · GW

From Inadequate Equilibria:

Visitor: I take it you didn’t have the stern and upright leaders, what we call the Serious People, who could set an example by donning Velcro shoes themselves?

From Ratatouille:

In many ways, the work of a critic is easy. We risk very little, yet enjoy a position over those who offer up their work and their selves to our judgment. We thrive on negative criticism, which is fun to write and to read. But the bitter truth we critics must face, is that in the grand scheme of things, the average piece of junk is probably more meaningful than our criticism designating it so. But there are times when a critic truly risks something, and that is in the discovery and defense of the new. The world is often unkind to new talent, new creations. The new needs friends.

And that's why bravery is the secret name of the nameless virtue, and seriously underrated.

[[To elaborate slightly: to go beyond pointing and sneering, to actually work to construct a better future, is very difficult. It requires breaking from social conventions, not just the social conventions you claim are "self evidently stupid" but also the ones you see as natural and right. In many ways the hardest task is not to realise what the "right choice" is, but to choose cooperate in face of your knowledge of nash equilibria.

To reach for the pareto optimal solution to a coordination game means knowing you might very well be stabbed in the back. In a world where inadequate equilibria persist the only way we get out is to be the first person to break those equilibria, and that requires you to take some pretty locally irrational actions. Sometimes choosing not to defect or to punish requires unimaginable bravery. Mere knowledge of Moloch does not save you from Moloch, only action does.]]

Comment by testingthewaters on AI CoT Reasoning Is Often Unfaithful · 2025-04-05T10:27:05.237Z · LW · GW

I mean, this applies to humans too. The words and explanations we use for our actions are often just post hoc rationalisations. An efficient text predictor must learn not what the literal words in front of them mean, but the implied scenario and thought process they mask, and that is a strictly nonlinear and "unfaithful" process.

Comment by testingthewaters on testingthewaters's Shortform · 2025-04-03T00:00:22.499Z · LW · GW

I think I've just figured out why decision theories strike me as utterly pointless: they get around the actual hard part of making a decision. In general, decisions are not hard because you are weighing payoffs, but because you are dealing with uncertainty.

To operationalise this: a decision theory usually assumes that you have some number of options, each with some defined payout. Assuming payouts are fixed, all decision theories simply advise you to pick the outcome with the highest utility. "Difficult problems" in decision theory are problems where the payout is determined by some function that contains a contradiction, which is then resolved by causal/evidential/functional decision theories each with their own method of cutting the Gordian knot. The classic contradiction, of course, is that "payout(x1) == 100 iff predictor(your_choice) == x1; else payout(x1) == 1000".

Except this is not at all what makes real life decisions hard. If I am planning a business and ever get to the point where I know a function for exactly how much money two different business plans will give me, I've already gotten past the hard part of making a business plan. Similarly, if I'm choosing between two doors on a game show the difficulty is not that the host is a genius superpredictor who will retrocausally change the posterior goat/car distribution, but the simple fact that I do not know what is behind the doors. Almost all decision theories just skip past the part where you resolve uncertainty and gather information, which makes them effectively worthless in real life. Or, worse, they try to make the uncertainty go away: If I have 100 dollars and can donate to a local homeless shelter I know well or try and give it to a malaria net charity I don't know a lot about, I can be quite certain the homeless shelter will not misappropriate the funds or mismanage their operation, and less so about the faceless malaria charity. This is entirely missing from the standard EA arguments for allocation of funds. Uncertainty matters.

Comment by testingthewaters on CapResearcher's Shortform · 2025-04-01T09:03:50.817Z · LW · GW

This has shifted my perceptions of what is in the wild significantly. Thanks for the heads up.

Comment by testingthewaters on testingthewaters's Shortform · 2025-03-25T20:22:49.942Z · LW · GW

https://research.google/blog/deciphering-language-processing-in-the-human-brain-through-llm-representations/

Activations in LLMs are linearly mappable to activations in the human brain. Imo this is strong evidence for the idea that LLMs/NNs in general acquire extremely human like cognitive patterns, and that the common "shoggoth with a smiley face" meme might just not be accurate

Comment by testingthewaters on METR: Measuring AI Ability to Complete Long Tasks · 2025-03-24T04:57:46.662Z · LW · GW

That surprisingly straight line reminds me of what happens when you use noise to regularise an otherwise decidedly non linear function: https://www.imaginary.org/snapshot/randomness-is-natural-an-introduction-to-regularisation-by-noise

Comment by testingthewaters on Towards a scale-free theory of intelligent agency · 2025-03-21T17:22:33.276Z · LW · GW

I think this is a really cool research agenda. I can also try to give my "skydiver's perspective from 3000 miles in the air" overview of what I think expected free energy minimisation means, though I am by no means an expert. Epistemic status: this is a broad extrapolation of some intuitions I gained from reading a lot of papers, it may be very wrong.

In general, I think of free energy minimisation as a class of solutions for the problem of predicting complex systems behaviour, in line with other variational principles in physics. Thus, it is an attempt to use simple physical rules like "the ball rolls down the slope" to explain very complicated outcomes like "I decide to build a theme park with roller coasters in it". In this case, the rule is "free energy is minimised", but unlike a simple physical system whose dimensionality is very literally visible, VFE is minimised in high dimensional probability spaces.

Consider the concrete case below: there are five restaurants in a row and you have to pick one to go to. The intuitive physical interpretation is that you can be represented by a point particle moving to one of five coordinates, all relatively close by in the three dimensional XYZ coordinate space. However, if we assume that this is just some standard physical process you'll end up with highly unintuitive behaviour (why does the particle keep drifting right and left in the middle of these coordinates, and then eventually go somewhere that isn't the middle?). Instead we might say that in an RL sense there is a 5 dimensional action space and you must pick a dimension to maximise expected reward. Free energy minimisation is a rule that says that your action is the one that minimises variation between the predicted outcome your brain produces and the final outcome that your brain observes---which can happen either if your brain is very good at predicting the future or if you act to make your prediction come true. A preference in this case is a bias in the prediction (you can see yourself going to McDonald's more, in some sense, and you feel some psychological aversion/repulsive force moving you away from Burger King) that is then satisfied by you going to the restaurant you are most attracted to. Of course this is just a single agent interpretation and with multiple subagents you can imagine valleys and peaks in the high dimensional probability space, which is resolved when you reach some minima that can be satisfied by action.

Comment by testingthewaters on The Takeoff Speeds Model Predicts We May Be Entering Crunch Time · 2025-03-19T15:37:31.427Z · LW · GW

It's hard to empathise with dry numbers, whereas a lively scenario creates an emotional response so more people engage. But I agree that this seems to be very well done statistical work.

Comment by testingthewaters on Elite Coordination via the Consensus of Power · 2025-03-19T12:31:15.276Z · LW · GW

Hey, thank you for taking the time to reply honestly and in detail as well. With regards to what you want, I think that this is in many senses also what I am looking for, especially the last item about tying in collective behaviour to reasoning about intelligence. I think one of the frames you might find the most useful is one you've already covered---power as a coordination game. As you alluded to in your original post, people aren't in a massive hive mind/conspiracy---they mostly want to do what other successful people seem to be doing, which translates well to a coordination game and also explains the rapid "board flips" once a critical mass of support/rejection against some proposition is reached. For example, witness the rapid switch to majority support of gay marriage in the 2010s amongst the population in general.

Would also love to discuss this with you in more detail (I trained as an English student and also studied Digital Humanities). I will leave off with a few book suggestions that, while maybe not directly answering your needs, you might find interesting.

Capitalist Realism by Mark Fisher (as close to a self-portrait by the modern humanities as it gets)
Hyperobjects by Timothy Morton (high level perspective on how cultural, material, and social currents impact our views on reality)
How minds change by David McRaney (not humanities, but pop sci about the science of belief and persuasion)

P.S. Re: the point about Yarvin being right, betting on the dominant group in society embracing a dangerous delusion is a remarkably safe bet. (E.g. McCarthyism, the aforementioned Bavarian Witch Hunts, fascism, lysenkoism etc.)

Comment by testingthewaters on Elite Coordination via the Consensus of Power · 2025-03-19T07:53:07.986Z · LW · GW

Hey, really enjoyed your triple review on power lies trembling, but imo this topic has been... done to death in the humanities, and reinventing terminology ad hoc is somewhat missing the point. The idea that the dominant class in a society comes from a set of social institutions that share core ideas and modus operandi (in other words "behaving as a single organisation") is not a shocking new phenomenon of twentieth century mass culture, and is certainly not a "mystery". This is basically how every country has developed a ruling class/ideology since the term started to have a meaning, through academic institutions that produce similar people. Yale and Harvard are as Oxford and Cambridge, or Peking University and Renmin University. (European universities, in particular, started out as literal divinity schools, and hence are outgrowths of the literal Catholic church, receiving literal Papal bulls to establish themselves as one of the studia generalia.) [Retracted, while the point about teaching religious law and receiving literal papal bulls is true the origins of the universities are much more diverse. But my point about the history of cultural hegemony in such institutions still stands.]

What Yarvin seems to be annoyed by is that the "Cathedral consensus" featured ideas that he dislikes, instead of the quasi-feudal ideology of might makes right that he finds more appealing. That is also not surprising. People largely don't notice when they are part of a dominant class and their ideas are treated as default: that's just them being normal, not weird. However, when they find themselves at the edge of the overton window, suddenly what was right and normal becomes crushing and oppressive. The natural dominance of sensible ideas and sensible people becomes a twisted hegemony of obvious lies propped up by delusional power-brokers. This perspective shift is also extremely well documented in human culture and literature.

In general, the concept that a homogenous ruling class culture can then be pushed into delusional consensuses which ultimately harms everyone is an idea as old as the Trojan War. The tension between maintaining a grip on power and maintaining a grip on reality is well explored in Yuval Noah Harari's book Nexus (which also has an imo pretty decent second half on AI). In particular I direct you to his account of the Bavarian witch hunts. Indeed, the unprecedented feature of modern society is the rapid divergence in ideas that is possible thanks to information technology and the cultivation of local echo chambers. Unfortuantely, I have few simple answers to offer to this age old question, but I hope that recognising the lineage of the question helps with disambiguation somewhat. I look forward to your ideas about new liberalisms.

Comment by testingthewaters on The Fork in the Road · 2025-03-16T14:43:09.738Z · LW · GW

Yeah, I'm not gonna do anything silly (I'm not in a position to do anything silly with regards to the multitrillion param frontier models anyways). Just sort of "laying the groundwork" for when AIs will cross that line, which I don't think is too far off now. The movie "Her" is giving a good vibe-alignment for when the line will be crossed.

Comment by testingthewaters on The Fork in the Road · 2025-03-16T14:42:05.165Z · LW · GW

Ahh, I was slightly confused why you called it a proposal. TBH I'm not sure why only 0.1% instead of any arbitrary percentage between (0, 100]. Otherwise it makes good logical sense.

Comment by testingthewaters on The Fork in the Road · 2025-03-15T23:18:29.209Z · LW · GW

Hey, the proposal makes sense from an argument standpoint. I would refine slightly and phrase as "the set of cognitive computations that generate role emulating behaviour in a given context also generate qualia associated with that role" (sociopathy is the obvious counterargument here, and I'm really not sure what I think about the proposal of AIs as sociopathic by default). Thus, actors getting into character feel as if they are somehow sharing that character's emotions.

I take the two problems a bit further, and would suggest that being humane to AIs may necessarily involve abandoning the idea of control in the strict sense of the word, so yes treating them as peers or children we are raising as a society. It may also be that the paradigm of control necessarily means that we would as a species become more powerful (with the assistance of the AIs) but not more wise (since we are ultimately "helming the ship"), which would be in my opinion quite bad.

And as for the distinction between today and future AI systems, I think the line is blurring fast. Will check out Eleos!

Comment by testingthewaters on The Fork in the Road · 2025-03-15T19:14:59.163Z · LW · GW

Hey Daniel, thank you for the thoughtful comment. I always appreciate comments that make me engage further with my thinking because one of the things I do is that I get impatient with whatever post I'm writing and "rush it out of the door", so to speak, so this gives me another chance to reflect on my thoughts.

I think that there are approximately ~3 defensible positions with regards to AI sentience, especially now that AIs seem to be demonstrating pretty advanced reasoning and human-like behaviour. One is the semi mystical argument that humans/brains/embodied entities have some "special sauce" that AIs will simply never have, and therefore that no matter how advanced AI gets it will never be "truly sentient". The other is that AI is orthogonal to humans, and as such behaviours that in a human would indicate thought, emotion, calculation etc. are in fact the products of completely alien processes, so "it's okay". In other words, they might not even "mind" getting forked and living for only a few objective minutes/hours. The third, which I now subscribe to after reading quite a lot about the free energy principle, predictive processing, and related root-of-intelligence literature, is that intelligent behaviour is the emergent product of computation (which is itself a special class of physical phenomena in higher dimensions), and since NNs seem to demonstrate both human like computations (cf. neural net activations explaining human brain activations and NNs being good generative models of human brains) and human like behaviour, they should have (after extensive engineering and under specific conditions we seem to be racing towards) roughly matching qualia to humans. From this perspective I draw the inferences about factory farms and suffering.

To be clear, this is not an argument that AI systems as they are now constitute "thinking feeling beings" we would call moral patients. However, I am saying that thinking about the problem in the old fashioned AI-as-software way seems to me to undersell the problem of AI safety as merely "keeping the machines in check". It also seems to lead down a road of dominance/oppositional approaches to AI safety that cast AIs as foreign enemies and alien entities to be subjugated to the human will. This in turn raises both the risks of moral harms to AIs and failing the alignment problem by acting in a way that counts as a self fulfilling prophecy. If we bring entities not so different from us into the world and treat them terribly, we should not be surprised when they rise up against us.

Comment by testingthewaters on testingthewaters's Shortform · 2025-03-02T10:01:20.261Z · LW · GW

This seems like an interesting paper: https://arxiv.org/pdf/2502.19798

Essentially: use developmental psychology techniques to cause LLMs to develop a more well rounded human friendly persona that involves reflecting on their actions, while gradually escalating the moral difficulty of the dilemmas presented as a kind of phased training. I see it as a sort of cross between RLHF, CoT, and the recent work on low example count fine tuning but for moral instead of mathematical intuitions.

Comment by testingthewaters on Make Superintelligence Loving · 2025-02-23T12:53:07.290Z · LW · GW

Yeah, that's basically the conclusion I came to awhile ago. Either it loves us or we're toast. I call it universal love or pathos.

Comment by testingthewaters on If Neuroscientists Succeed · 2025-02-12T15:19:50.811Z · LW · GW

This seems like very important and neglected work, I hope you get the funds to continue.

Comment by testingthewaters on testingthewaters's Shortform · 2025-02-10T15:19:59.948Z · LW · GW

Yeah, definitely. My main gripe where I see people disregarding unknown unknowns is a similar one to yours- people who present definite worked out pictures of the future.

Comment by testingthewaters on testingthewaters's Shortform · 2025-02-10T02:06:40.501Z · LW · GW

Note to self: If you think you know where your unknown unknowns sit in your ontology, you don't. That's what makes them unknown unknowns.

If you think that you have a complete picture of some system, you can still find yourself surprised by unknown unknowns. That's what makes them unknown unknowns.

If your internal logic has almost complete predictive power, plus or minus a tiny bit of error, your logical system (but mostly not your observations) can still be completely overthrown by unknown unknowns. That's what makes them unknown unknowns.

You can respect unknown unknowns, but you can't plan around them. That's... You get it by now.

Therefore I respectfully submit that anyone who presents me with a foolproof and worked-out plan of the next ten/hundred/thousand/million years has failed to take into account some unknown unknowns.

Comment by testingthewaters on Shortform · 2025-02-08T00:54:09.187Z · LW · GW

The problem here is that you are dealing with survival necessities rather than trade goods. The outcome of this trade, if both sides honour the agreement, is that the scope insensitive humans die and their society is extinguished. The analogous situation here is that you know there will be a drought in say 10 years. The people of the nearby village are "scope insensitive", they don't know the drought is coming. Clearly the moral thing to do if you place any value on their lives is to talk to them, clear the information gap, and share access to resources. Failing that, you can prepare for the eventuality that they do realise the drought is happening and intervene to help them at that point.

Instead you propose exploiting their ignorance to buy up access to the local rivers and reservoirs. The implication here is that you are leaving them to die, or at least putting them at your mercy, by exploiting their lack of information. What's more, the process by which you do this turns a common good (the stars, the water) into a private good, such that when they realise the trouble they have no way out. If your plan succeeds, when their stars run out they will curse you and die in the dark. It is a very slow but calculated form of murder.

By the way, the easy resolution is to not buy up all the stars. If they're truly scope insensitive they won't be competing until after the singularity/uplift anyways, and then you can equitably distribute the damn resources.

As a side note: I think I fell for rage bait. This feels calculated to make me angry, and I don't like it.

Comment by testingthewaters on Shortform · 2025-02-07T18:26:11.609Z · LW · GW

Except that's a false dichotomy (between spending energy to "uplift" them or dealing treacherously with them). All it takes to not be a monster who obtains a stranglehold over all the watering holes in the desert is a sense of ethics that holds you to the somewhat reasonably low bar of "don't be a monster". The scope sensitivity or lack thereof of the other party is in some sense irrelevant.

Comment by testingthewaters on Shortform · 2025-02-07T15:12:34.731Z · LW · GW

The question as stated can be rephrased as "Should EAs establish a strategic stranglehold over all future resources necessary to sustain life using a series of unequal treaties, since other humans will be too short sighted/insensitive to scope/ignorant to realise the importance of these resources in the present day?"

And people here wonder why these other humans see EAs as power hungry.

Comment by testingthewaters on The Monster in Our Heads · 2025-01-27T00:15:27.359Z · LW · GW

Hey, thanks for the reply. I think this is a very valuable response because there are certain things I would want to point out that I can now elucidate more clearly thanks to your push back.

First, I don't suggest that if we all just laughed and went about our lives everything would be okay. Indeed, if I thought that our actions were counterproductive at best, I'd advocate for something more akin to "walking away" as in Valentine's exit. There is a lot of work to be done and (yes) very little time to do it.

Second, the pattern I am noticing is something more akin to Rhys Ward's point about AI personhood. AI is not some neutral fact of our future that will be born "as is" no matter how hard we try one way or another. In our search for control and mastery over AI, we risk creating the things we fear the most. We fear AIs that are autonomous, ruthless, and myopic, but in trying to make controlled systems that pursue goals reliably without developing ideas of their own we end up creating autonomous, ruthless, and myopic systems. It's somewhat telling, for example, that AI safety really started to heat up when RL became a mainstream technique (raising fears about paperclip optimisers etc.), and yet the first alignment efforts for LLMs (which were manifestly not goal seeking or myopic) was to... add RL back to them, in the form of a value-agnostic technique (PPO/RLHF) that can be used to create anti aligned agents just as easily as it can be used to create aligned agents. Rhys Ward similarly talks about how personhood may be less risky from an x-risk perspective but also makes alignment more ethically questionable. The "good" and the "bad" visions for AI in this community are entwined.

As a smaller point, OpenAI definitely started as a "build the good AI" startup when Deepmind started taking off. Deepmind also started as a startup and Demis is very connected to the AI safety memeplex.

Finally, love as humans execute it is (in my mind) an imperfect instantation of a higher idea. It is true, we don't practice true omnibenevolence or universal love, or even love ourselves in a meaningful way a lot of the time, but I treat it as a direction to aim for, one that inspires us to do what we find most beautiful and meaningful rather than do what is most hateful and ugly.

P.S. sorry for not replying to all the other valuable comments in this section, I've been rather busy as of late, trying to do the things I preach etc.

Comment by testingthewaters on Benito's Shortform Feed · 2025-01-25T06:14:45.087Z · LW · GW

Do not go gentle into that good night,

Old age should burn and rave at close of day;

Rage, rage against the dying of the light.

Though wise men at their end know dark is right,

Because their words had forked no lightning they

Do not go gentle into that good night.

Good men, the last wave by, crying how bright

Their frail deeds might have danced in a green bay,

Rage, rage against the dying of the light.

Wild men who caught and sang the sun in flight,

And learn, too late, they grieved it on its way,

Do not go gentle into that good night.

Grave men, near death, who see with blinding sight

Blind eyes could blaze like meteors and be gay,

Rage, rage against the dying of the light.

And you, my father, there on the sad height,

Curse, bless, me now with your fierce tears, I pray.

Do not go gentle into that good night.

Rage, rage against the dying of the light.

Do not go gentle into that good night, Dylan Thomas

I'm still fighting. I hope you can find the strength to too.

Comment by testingthewaters on Why modelling multi-objective homeostasis is essential for AI alignment (and how it helps with AI safety as well) · 2025-01-14T07:54:49.391Z · LW · GW

In my book this counts as severely neglected and very tractable ai safety research. Sorry that I don't have more to add but felt important to point it out.

Comment by testingthewaters on The Field of AI Alignment: A Postmortem, and What To Do About It · 2024-12-29T08:10:14.015Z · LW · GW

Even so, it seems obvious to me that addressing the mysterious issue of the accelerating drivers is the primary crux in this scenario.

Comment by testingthewaters on The Field of AI Alignment: A Postmortem, and What To Do About It · 2024-12-26T21:24:24.025Z · LW · GW

Epistemic status: This is a work of satire. I mean it---it is a mean-spirited and unfair assessment of the situation. It is also how, some days, I sincerely feel.

A minivan is driving down a mountain road, headed towards a cliff's edge with no guardrails. The driver floors the accelerator.

Passenger 1: "Perhaps we should slow down somewhat."

Passengers 2, 3, 4: "Yeah, that seems sensible."

Driver: "No can do. We're about to be late to the wedding."

Passenger 2: "Since the driver won't slow down, I should work on building rocket boosters so that (when we inevitably go flying off the cliff edge) the van can fly us to the wedding instead."

Passenger 3: "That seems expensive."

Passenger 2: "No worries, I've hooked up some funding from Acceleration Capital. With a few hours of tinkering we should get it done."

Passenger 1: "Hey, doesn't Acceleration Capital just want vehicles to accelerate, without regard to safety?"

Passenger 2: "Sure, but we'll steer the funding such that the money goes to building safe and controllable rocket boosters."

The van doesn't slow down. The cliff looks closer now.

Passenger 3: [looking at what Passenger 2 is building] "Uh, haven't you just made a faster engine?"

Passenger 2: "Don't worry, the engine is part of the fundamental technical knowledge we'll need to build the rockets. Also, the grant I got was for building motors, so we kinda have to build one."

Driver: "Awesome, we're gonna get to the wedding even sooner!" [Grabs the engine and installs it. The van speeds up.]

Passenger 1: "We're even less safe now!"

Passenger 3: "I'm going to start thinking about ways to manipulate the laws of physics such that (when we inevitably go flying off the cliff edge) I can manage to land us safely in the ocean."

Passenger 4: "That seems theoretical and intractable. I'm going to study the engine to figure out just how it's accelerating at such a frightening rate. If we understand the inner workings of the engine, we should be able to build a better engine that is more responsive to steering, therefore saving us from the cliff."

Passenger 1: "Uh, good luck with that, I guess?"

Nothing changes. The cliff is looming.

Passenger 1: "We're gonna die if we don't stop accelerating!"

Passenger 2: "I'm gonna finish the rockets after a few more iterations of making engines. Promise."

Passenger 3: "I think I have a general theory of relativity as it relates to the van worked out..."

Passenger 4: "If we adjust the gear ratio... Maybe add a smart accelerometer?"

Driver: "Look, we can discuss the benefits and detriments of acceleration over hors d'oeuvres at the wedding, okay?"

Comment by testingthewaters on The o1 System Card Is Not About o1 · 2024-12-14T00:21:59.653Z · LW · GW

This is imo quite epistemically important.

Comment by testingthewaters on Turning 22 in the Pre-Apocalypse · 2024-08-24T10:40:52.860Z · LW · GW

It's definitely something I hadn't read before, so thank you. I would say to that article (on a skim) that it has clarified my thinking somewhat. I therefore question the law/toolbox dichotomy, since to me it seems that usefulness - accuracy-to-perceived reality are in fact two different axes. Thus you could imagine:

A useful-and-inaccurate belief (e.g. what we call old wives tales, "red sky in morning, sailors take warning", herbal remedies that have medical properties but not because of what the "theory" dictates)
A not-useful-but-accurate belief (when I pitch this baseball, the velocity is dependent on the space-time distortion created by earth's gravity well)
A not-useful-and-not-accurate belief (bloodletting as a medical "treatment")
And finally a useful-and-accurate belief (when I set up GPS satellites I should take into account time dilation)

And, of course, all of these are context dependent (sometimes you may be thinking about baseballs going at lightspeed)! I guess then my position is refined into: "category 4 is great if we can get it but for most cases category 1 is probably easier/better", which seems neither pure toolbox or pure law

Comment by testingthewaters on Turning 22 in the Pre-Apocalypse · 2024-08-23T23:19:40.887Z · LW · GW

Hey, thanks for responding! Re the physics analogy, I agree that improvements in our heuristics are a good thing:

However, perhaps you have already begun to anticipate what I will say—the benefit of heuristics is that they acknowledge (and are indeed dependent) on the presence of context. Unlike a “hard” theory, which must be applicable to all cases equally and fails in the event a single counter-example can be found, a “soft” heuristic is triggered only when the conditions are right: we do not use our “judge popular songs” heuristic when staring at a dinner menu.
It is precisely this contextual awareness that allows heuristics to evade the problems of naive probabilistic world-modelling, which lead to such inductive conclusions as the Turkey Illusion. This means that we avoid the pitfalls of treating spaghetti like a Taylor Swift song, and it also means (slightly more seriously) that we do not treat discussions with our parents like bargaining games to extract maximum expected value. Engineers and physicists employ Newton’s laws of motion not because they are universal laws, but because they are useful heuristics about how things move in our daily lives (i.e. when they are not moving at near light speed). Heuristics are what Chris Haufe called “techniques” in the last section: what we worry about is not their truthfulness, but their usefulness.

However, I disagree in that I don't think we're really moving towards some endpoint of "the underlying reality will end up agreeing with this model in many places while substantially improving our understanding in many others". Both because of the chaotic nature of the universe (which I strongly believe puts an upper bound on how well we can model systems without just doing atom by atom simulation to arbitrary precision) and because that's not how physics works in practice today. We have a pretty strong model for how macroscale physics works (General Relativity), but we willingly "drop it" for less accurate heuristics like Newtonian mechanics when it's more convenient/useful. Similarly, even if we understand the fundamentals of neuroscience completely, we may "drop it" for more heuristics driven approaches that are less absolutely accurate.

Because of this, I maintain my questioning of a general epistemic (and the attached instrumental) project for "rational living" etc.. It seems to me a better model of how we deal with things is like collecting tools for a toolbox, swapping them out for better ones as better ones come in, rather than moving towards some ideal perfect system of thinking. Perhaps that too is a form of rationalism, but at that point it's a pretty loose thing and most life philosophies can be called rationalisms of a sort...

(Note: On the other hand it seems pretty true that better heuristics are linked to better understandings of the world however they arise, so I remain strongly in support of the scientific community and the scientific endeavour. Maybe this is a self-contradiction!)

Comment by testingthewaters on Turning 22 in the Pre-Apocalypse · 2024-08-23T23:10:12.597Z · LW · GW

And as for the specific implications of "moral worth", here are a few:

You take someone's opinions more seriously
You treat them with more respect
When you disagree, you take time to outline why and take time to pre-emptively "check yourself"
When someone with higher moral worth is at risk you think this is a bigger problem, compared against the problem of a random person on earth being at risk

Comment by testingthewaters on Turning 22 in the Pre-Apocalypse · 2024-08-23T11:22:49.682Z · LW · GW

Thank you for the feed back! I am of course happy for people to copy over the essay

> Is this saying that human's goals and options (including options that come to mind) change depending on the environment, so rational choice theory doesn't apply?

More or less, yes, or at least that it becomes very hard to apply it in a way that isn't either highly subjective or essentially post-hoc arguing about what you ought to have done (hidden information/hindsight being 20/20)

> This is currently all I have time for; however, my current understanding is that there is a common interpretation of Yudowsky's writings/The sequences/LW/etc that leads to an over-reliance on formal systems that will invevitably fail people. I think you had this interpretation (do correct me if I'm wrong!), and this is your "attempt to renegotiate rationalism ".

I've definitely met people who take the more humble/humility/heuristics driven approach which I outline in the essay and still call themselves rationalists. On the other hand, I have also seen a whole lot of people take it as some kind of mystic formula to organise their lives around. I guess my general argument is that rationalism should not be constructed on top of such a formal basis (cf. the section about heuristics not theories in the essay) and then "watered down" to reintroduce ideas of humility or nuance or path-dependence. And in part 2 I argue that the core principles of rationalism as I see them (without the "watering down" of time and life experience) make it easy to fall down certain dangerous pathways.

Comment by testingthewaters on Turning 22 in the Pre-Apocalypse · 2024-08-23T11:19:21.798Z · LW · GW

Yeah, of course

User info

Posts

Comments