Posts

Searching for Impossibility Results or No-Go Theorems for provable safety. 2024-09-27T20:12:25.515Z
SCC/LW meetup Houston @ Agora 2018-10-17T19:36:11.216Z

Comments

Comment by Maelstrom on What are the good rationality films? · 2024-11-20T22:16:09.971Z · LW · GW

Epistemic status: half joking, but also half serious.
Warning: I totally wrote this.

Practical Rationality in John Carpenter’s The Thing: A Case Study

John Carpenter's The Thing (1982) is a masterclass in practical rationality, a cornerstone of effective decision-making under uncertainty—a concept deeply valued by the LessWrong community. The film’s narrative hinges on a group of Antarctic researchers encountering a shape-shifting alien capable of perfectly imitating its hosts, forcing them to confront dire stakes with limited information. Their survival depends on their ability to reason under pressure, assess probabilities, and mitigate catastrophic risks, making the movie a compelling example of applied rationality.

Key Lessons in Practical Rationality:

  1. Updating Beliefs with New Evidence The researchers continually revise their understanding of the alien's capabilities as they gather evidence. For instance, after witnessing the creature’s ability to assimilate and mimic hosts, they abandon naive assumptions of safety and recalibrate their strategies to account for this new information. This aligns with Bayesian reasoning: beliefs must be updated in light of new data to avoid catastrophic errors.
  2. Decision-Making Under Uncertainty The characters face extreme uncertainty: anyone could be the alien, and any wrong move could result in annihilation. The iconic blood test scene exemplifies this. The test, devised by MacReady, is an ingenious use of falsifiability—leveraging empirical experimentation to distinguish humans from the alien. It demonstrates how rational agents use creativity and empirical tests to reduce uncertainty.
  3. Coordination in Adversarial Environments Cooperation becomes both vital and precarious when trust erodes. The film explores how rational agents can attempt to align incentives despite an adversarial context. MacReady takes control of the group by establishing credible threats to enforce compliance (e.g., wielding a flamethrower) while demonstrating his willingness to follow the same rules he imposes.
  4. Mitigating Existential Risk The characters recognize that the alien represents an existential risk—not just to them, but to humanity. Their decisions prioritize long-term outcomes over immediate survival. For example, the decision to destroy the base to prevent the alien’s escape reflects a commitment to the global utility function, even at the cost of personal survival.
  5. The Role of Psychological Factors in Rationality The film does not shy away from the psychological toll of high-stakes reasoning under uncertainty. Fear, paranoia, and isolation challenge the researchers’ ability to think clearly. This resonates with real-world rationality, where emotional regulation is essential to avoid biases and maintain clarity in decision-making.
Comment by Maelstrom on Lighthaven Sequences Reading Group #12 (Tuesday 11/26) · 2024-11-20T06:45:48.320Z · LW · GW

John Carpenter's The Thing.

Comment by Maelstrom on Alexander Gietelink Oldenziel's Shortform · 2024-11-16T22:54:52.568Z · LW · GW

One needs only to read 4 or so papers on category theory applied to AI to understand the problem. None of them share a common foundation on what type of constructions to use or formalize in category theory. The core issue is that category theory is a general language for all of mathematics, and as commonly used just exponentially increase the search space for useful mathematical ideas.

I want to be wrong about this, but I have yet to find category theory uniquely useful outside of some subdomains of pure math.

Comment by Maelstrom on Gwern: Why So Few Matt Levines? · 2024-10-29T09:34:00.629Z · LW · GW

"That is, where are the Matt Levines of, say, chemistry or drug development1⁠,"
You are looking for Derek Lowe's "In the pipeline." It appears on hacker news occasionally.  

Comment by Maelstrom on Proveably Safe Self Driving Cars [Modulo Assumptions] · 2024-09-18T03:25:00.685Z · LW · GW

The crux of these types of arguments seems to be conflating the provable safety of an agent in a system with the expectation of absolute safety. In my experience, this is the norm, not the exception, and needs to be explicitly addressed.

In agreement with what you posted above, I think it is formally trivial to construct a scenario in which a pedestrian jumps in front of a car, making it provably impossible for a vehicle to stop in time to avoid a collision using high school physics. 

Likewise, I have the intuition that AI safety, in general, should have various "no-go theorems" about unprovability outside a reasonable problem scope or that finding such proofs would be np-hard or worse. If you know of any specific results( outside of general computability theory) , could you please share them? It would be nice if the community could avoid falling into the trap of trying to prove too much. 


(Sorry if this isn't the correct location for this post.)