Could LLMs Learn to Detect Bias Autonomously, Like Tesla’s Self-Driving Cars?

post by Omnipheasant · 2025-04-18T18:45:36.242Z · LW · GW · 0 comments

Contents

  Learning from the Messy World
  The Tesla FSD Analogy
  Technique 1: Verifying Predictions Over Time
  Technique 2: Comparative Analysis for Emotionally Charged Accusations
  Cognitive Bias as a Foundation
  A Vision for AI as a Compassionate Partner
  Summary: Toward a Less Biased AI
None
No comments

LLMs are shaped by human curator making decisions about what data is “ethical” or “safe.” This curation introduces biases, focusing on politically salient issues like fairness for marginalized groups while often overlooking subtler distortions, such as selective outrage or perfectionist comparisons. What if LLMs could judge input themselves, learning from the world’s complexity without human guardrails?

Inspired by Tesla’s Full Self-Driving (FSD) system, which learns from real-world driving data, including crashes, I propose that LLMs could use self-supervised learning to evaluate internet data autonomously. By verifying predictions over time and analyzing emotionally charged accusations against comparative data, LLMs could become more robust and less dependent on human curators. These methods are just two of many possible approaches, illustrating the potential of training on biases. This post explores these ideas, aiming to redefine “bias” holistically and spark discussion on how AI can navigate human messiness.

Learning from the Messy World

The ideas here are proofs of concept, meant to show how self-supervised learning could help LLMs evaluate data autonomously. The two proposed methods—verifying predictions and analyzing accusations—are illustrative, not exhaustive. If such approaches are feasible, countless other techniques could emerge to teach LLMs to handle the internet’s complexity with minimal human intervention.

The Tesla FSD Analogy

Tesla’s FSD uses self-supervised learning to navigate roads. For example, it might predict an occluded stop sign from partial camera input, then confirm or correct itself when the sign becomes visible or sensors provide context. Crashes are analyzed, not excluded, to teach the system what not to do. This lets FSD learn from raw, real-world data, improving without constant human oversight.

LLMs, by contrast, rely on curated datasets and human feedback like RLHF, which filter out “bad” content such as misinformation or hate speech. This curation limits exposure to the internet’s full spectrum—good, bad, ambiguous—making LLMs less adept at navigating complexity. Like overprotected drivers, they’re unprepared for chaos. By learning from the internet’s “crashes” (biased or misleading statements), as Tesla learns from driving failures, LLMs could gain a clearer, more accurate view of the world.

Technique 1: Verifying Predictions Over Time

The internet abounds with predictions, especially charged ones like “President X will crash the economy by 2026.” These testable claims could serve as self-supervised training signals. The process would involve identifying predictive statements using NLP (e.g., “will happen,” “by 2025”), storing them with metadata like source and date, and later verifying outcomes with reliable data, such as GDP reports, using automated fact-checking pipelines. Accurate sources would gain weight in training, while inaccurate ones would lose it, teaching the LLM to prioritize reliable patterns.

Consider an X post claiming, “President Y will ban fossil fuels by 2025, crashing energy stocks.” In 2025, the LLM could check policy records and stock data. If the prediction fails, the model learns to distrust similar hyperbolic claims, countering biases like the availability heuristic, which overemphasizes vivid events. Challenges include vague predictions, delayed data, and compute costs, but applying this to a subset, like trending posts, is feasible.

Technique 2: Comparative Analysis for Emotionally Charged Accusations

Many internet statements are emotionally charged accusations criticizing a “hated entity” without context, such as “Electric vehicles are fire hazards” or “Big corporations racially discriminate.” These hold the target to a perfect standard, ignoring how alternatives—ICE cars, small businesses—perform. LLMs could flag such biases by comparing the accused entity to relevant alternatives.

The approach would detect negative, emotionally charged claims using sentiment analysis, identify alternatives with knowledge graphs (e.g., EVs to ICE cars), and query reliable sources like NHTSA fire data or EEOC discrimination stats. If the accused entity performs similarly or better, the statement is tagged as biased for lacking context. For example, EV fire rates (0.03%) are lower than ICE cars (0.1%), so “EVs are fire hazards” is misleading. Training on these flagged statements as negative examples teaches LLMs to demand context, countering biases like selective attention and perfectionism. LLMs may even become more accurate at identifying misinformation than the best humans. 

Cognitive Bias as a Foundation

These proposals draw on social psychological research on cognitive biases, such as confirmation bias, availability heuristic, and selective attention. These universal reasoning flaws affect all discourse, not just politically charged issues. By training LLMs to detect these patterns, we can redefine “bias” holistically. LLMs could even discover new biases—clustering texts that overpredict disasters might reveal a “catastrophizing bias.” This iterative discovery aligns with rationality’s goal of refining our map of reality.

A Vision for AI as a Compassionate Partner

Imagine an AI that learns from human biases to see the world more objectively and compassionately than we do. By categorizing and adjusting for biases—those documented in social psychology and others it discovers—AI could become a valuable partner in solving global problems. It would understand our shortcomings, from personal decisions to policy debates, and offer solutions unclouded by instincts or prejudices. Unlike systems that slurp data indiscriminately, this AI would gain a clearer view by critically evaluating input, avoiding dangerous outcomes like prioritizing some lives over others, as seen in studies on emergent AI value systems. Picture a Star Trek-like outsider: compassionate toward human frailties, gently guiding us past weaknesses, and finding paths we miss. First, it could enhance decision-making by highlighting biases in real-time. Second, it could foster empathy by revealing shared human flaws across cultures. Third, it could drive innovation by seeing solutions beyond our blind spots. 

Summary: Toward a Less Biased AI

Human curators limit LLMs’ ability to navigate the world’s good and bad complexity, embedding biases that favor politically salient issues over subtler reasoning flaws. Inspired by Tesla’s FSD, I propose LLMs use self-supervised learning to evaluate input autonomously, with methods like verifying predictions and analyzing accusations as starting points. Grounded in cognitive bias research, these approaches could make LLMs more truthful and robust, reducing reliance on human curators. Challenges like data availability and compute costs exist, but the potential for a clearer, less biased view of reality is worth exploring.

This is a hypothesis, not a finished plan. I’m not in the AI field, but I hold a PhD in Cognitive Psychology, which informs my perspective on biases. Could these ideas work? Are there better methods? Have I missed relevant research? I’d love your thoughts, critiques, or pointers to refine this vision for a less biased AI.

0 comments

Comments sorted by top scores.