LessWrong 2.0 Reader

View: New · Old · Top

next page (older posts) →

The "public debate" about AI is confusing for the general public and for policymakers because it is a three-sided debate
Adam David Long (adam-david-long-1) · 2023-08-01T00:08:30.908Z · comments (30)

[question] Exercise: Solve "Thinking Physics"
Raemon · 2023-08-01T00:44:48.975Z · answers+comments (23)

[See link to Sept meetup below!] San Francisco ACX Meetup “First Saturday” August 5, 1 pm
guenael · 2023-08-01T03:38:21.840Z · comments (0)

[link] Evaluating Superhuman Models with Consistency Checks
Daniel Paleka · 2023-08-01T07:51:07.025Z · comments (2)

What is autonomy, and how does it lead to greater risk from AI?
Davidmanheim · 2023-08-01T07:58:06.366Z · comments (0)

AI romantic partners will harm society if they go unregulated
Roman Leventov · 2023-08-01T09:32:13.417Z · comments (71)

[link] What Is Childhood Supposed To Be?
Sable · 2023-08-01T09:51:24.537Z · comments (13)

[link] Untangling Infrabayesianism: A redistillation [PDF link; ~12k words + lots of math]
Lorxus · 2023-08-01T12:42:35.744Z · comments (16)

Barbieheimer: Across the Dead Reckoning
Zvi · 2023-08-01T13:00:05.700Z · comments (17)

[link] “Desperate Honesty” by Agnes Callard
David Gross (David_Gross) · 2023-08-01T13:34:57.180Z · comments (0)

[link] AISN #16: White House Secures Voluntary Commitments from Leading AI Labs and Lessons from Oppenheimer
aogara (Aidan O'Gara) · 2023-08-01T15:39:47.841Z · comments (0)

[link] AISN #17: Automatically Circumventing LLM Guardrails, the Frontier Model Forum, and Senate Hearing on AI Oversight
aogara (Aidan O'Gara) · 2023-08-01T15:40:20.222Z · comments (0)

[question] When(if ever) are superstimuli good/useful/advantageous?
Perhaps · 2023-08-01T15:50:35.053Z · answers+comments (2)

Explainer - AutoInterpretation Finds Sparse Coding Beats Alternatives
Gauraventh (aryangauravyadav) · 2023-08-01T17:29:16.962Z · comments (0)

[link] ARC Evals new report: Evaluating Language-Model Agents on Realistic Autonomous Tasks
Beth Barnes (beth-barnes) · 2023-08-01T18:30:57.068Z · comments (12)

Open Mic - August 2023
Adam Zerner (adamzerner) · 2023-08-01T19:24:33.351Z · comments (0)

Spiral Staircase
Michael Samoilov (michael-samoilov) · 2023-08-01T21:51:34.606Z · comments (2)

My current LK99 questions
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2023-08-01T22:48:00.733Z · comments (38)

[question] What is ontology?
Adam Zerner (adamzerner) · 2023-08-02T00:54:14.432Z · answers+comments (19)

[link] Bay Winter Solstice: call for speech pitches!
tcheasdfjkl · 2023-08-02T03:24:44.539Z · comments (0)

"Is There Anything That's Worth More"
Zack_M_Davis · 2023-08-02T03:28:16.116Z · comments (6)

South Bay ACX/SSC Meetup @ Whole Foods
allisona · 2023-08-02T03:44:22.158Z · comments (0)

[link] solar-thermal and techno-economic analysis
bhauth · 2023-08-02T06:22:25.490Z · comments (8)

Anthropical Motte and Bailey in two versions of Sleeping Beauty
Ape in the coat · 2023-08-02T07:08:42.437Z · comments (56)

[question] Could we breed/engineer intelligent parrots?
lukehmiles (lcmgcd) · 2023-08-02T07:32:17.686Z · answers+comments (18)

Long-Term Future Fund: April 2023 grant recommendations
abergal · 2023-08-02T07:54:49.083Z · comments (3)

[link] ChatGPT for translation
Varshul Gupta · 2023-08-02T11:57:21.099Z · comments (0)

3 levels of threat obfuscation
HoldenKarnofsky · 2023-08-02T14:58:32.506Z · comments (14)

[link] The Roots of Progress Blog-Building Intensive: advice for applicants, request for support
jasoncrawford · 2023-08-02T15:37:56.375Z · comments (0)

[question] Would you pay for a search engine limited to rationalist sites?
Conor (conor) · 2023-08-02T18:06:12.620Z · answers+comments (19)

[question] What works for ADHD and/or related things?
TeaTieAndHat (Augustin Portier) · 2023-08-02T18:37:18.216Z · answers+comments (13)

[link] Progress links digest, 2023-08-02: Superconductor edition
jasoncrawford · 2023-08-02T20:27:29.676Z · comments (0)

When performing a dimensionality reduction on tensors, the trace is often zero.
Joseph Van Name (joseph-van-name) · 2023-08-02T21:06:55.423Z · comments (1)

External rationality vs. internal rationality
metachirality · 2023-08-02T23:29:59.368Z · comments (0)

[question] Boxing
Zach Stein-Perlman · 2023-08-02T23:38:36.119Z · answers+comments (1)

Work culture creep
CrimsonChin · 2023-08-03T00:38:13.876Z · comments (15)

[link] Kolmogorov's theory of Algorithmic Probability
Aidan Rocke (aidanrocke) · 2023-08-03T00:58:08.395Z · comments (2)

Bad Imitation Instruments
jefftk (jkaufman) · 2023-08-03T02:30:02.937Z · comments (1)

AI #23: Fundamental Problems with RLHF
Zvi · 2023-08-03T12:50:11.852Z · comments (9)

Password-locked models: a stress case for capabilities evaluation
Fabien Roger (Fabien) · 2023-08-03T14:53:12.459Z · comments (14)

Embedding Ethical Priors into AI Systems: A Bayesian Approach
Justausername · 2023-08-03T15:31:50.087Z · comments (3)

[Linkpost] Deception Abilities Emerged in Large Language Models
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2023-08-03T17:28:19.193Z · comments (0)

[question] Hypothetical: what would you do?
JNS (jesper-norregaard-sorensen) · 2023-08-03T22:39:55.026Z · answers+comments (2)

[question] Is there any metric measuring ~"proportion of people creating extra value"?
Amal (asta-vista) · 2023-08-03T22:54:10.028Z · answers+comments (3)

[question] Has anyone tried creating a YouTube or TikTok series covering the sequences?
Max Rossi (max-rossi) · 2023-08-04T00:10:40.834Z · answers+comments (4)

Apollo Research is hiring evals and interpretability engineers & scientists
Marius Hobbhahn (marius-hobbhahn) · 2023-08-04T10:54:09.276Z · comments (0)

[Linkpost] Multimodal Neurons in Pretrained Text-Only Transformers
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2023-08-04T15:29:16.957Z · comments (0)

[link] Manifund: What we're funding (weeks 2-4)
Austin Chen (austin-chen) · 2023-08-04T16:00:33.227Z · comments (2)

When training AI, we should escalate the frequency of capability tests
Hauke Hillebrandt (hauke-hillebrandt) · 2023-08-04T16:07:33.776Z · comments (0)

Private notes on LW?
Raemon · 2023-08-04T17:35:37.917Z · comments (33)

next page (older posts) →

Archive

Recent comments

mikbp on Extra Tall Crib

ok. We take our son anyway out of the bet as soon as he wakes up. He sleeps long enough already by himself.

benito on Raemon's Shortform

I mistyped a bit with the use of "relationships". Yes, names and faces both trigger social recognition, but I meant to make the point that they operate in significantly different ways in the brain, and facial recognition is tuned to processing a lot of emotional and social cues that we aren't tuned to from text. I have tons of social associations with people's physical forms that are beyond simply their character.

(ChatGPT helped me write this comment.)

oliver-daniels-koch on Oliver Daniels-Koch's Shortform

Here's a revised sketch

A few notes:

I use Scalable Oversight to refer to both Alignment and Control
I'm confused whether weak to strong learning is a restatement of scalable oversight, ELK, or its own thing, so I ignore it
I don't explicitly include easy-to-hard, I think OOD basically covers it
taxonomies and abstractions are brittle and can be counterproductive

Scalable Oversight Taxonomy

Scalable Oversight
- Scalable Alignment
  - Benchmarks / Tasks
    - Sandwiching Experiments (human amateurs + model, gt from human experts)
    - Weak models supervising Strong models
  - Approaches
    - Debate
    - Recursive reward modeling
    - (Solution to Eliciting Latent Knowledge) + Narrow Elicitation
      - (Note - I think assumes more then prior scalable oversight ideas that there will be base model with adequate knowledge, such that the hard part is extracting the knowledge rather than teaching the model)
      - Eliciting Latent Knowledge
        Approaches
        Contrast Consistent Search
        Confidence
        Intermediate Probing
        "Speed Prior"
        "Simplicity Prior"
        Concept Extrapolation - learn all salient generalizations, use expensive supervision to select correct one
        IID Mechanistic Anomaly Detection + expensive supervision on anomalies
        Subclasses
        Measurement Tampering Detection
        Approaches
        OOD Mechanistic Anomaly Detection
        In distribution
        Out of Distribution (likely? requires multiple measurment structure)
        Concept Extrapolation
        train diverse probes on untrusted data, select probe that predicts positive measurements less frequently
      - Narrow Elicitation
        ...
- Scalable Control
  - Weak Review
  - Untrusted Rephrase or whatever
  - Coup probes
  - MAD (Review all anomalies)
Trojans
- ...
- MAD (maybe?)
Adversarial Examples
- ...
- MAD (maybe?)
Natural Mechanism Distinction
- MAD
Spurious Correlate Detection / Resolution
- Concept Extrapolation

benito on LessOnline (May 31—June 2, Berkeley, CA)

I did find it and we sent him an email, hope he reads it and joins :)

tag on Super additivity of consciousness

Under physicalist epiphenomenalism (which is the standard approach to the mind-matter relation), the mind is super-impressed on reality, perfectly synchronized, and parallel to it.

Under dualist epiphenomenalism, that might be true. Physicalism has it either that consciousness is non existent rather than causally idle (eliminitavism), or identical to physical brain states (and therefore sharing their causal powers).

Understanding why some physical systems make an emergent consciousness appear (the so called “hard problem of consciousness”) or finding a procedure that quantify the intensity of consciousness emerging from a physical system (the so called “pretty hard” problem of consciousness) is impossible:

You could have given a reason why.

duschkopf on Semantic Disagreement of Sleeping Beauty Problem

If this were true that the concept of „indexical sample space“ does not capture the thirder position, how do you explain that it produces exactly the same probabilities that thirders entertain? Operating with indexicals is a necessary condition (and motivation) for Thirdism, which means assuming indexical sample spaces when it comes to the mathematical formalization of arguments in terms of probability theory. To my knowledge no relevant thirder literature denies that. And within the thirder model, these probabilities indeed hold true. If we assume Monday and Tuesday to be mutually exclusive, than this is mathematically the case. Math is not a judge of our assumptions here, it is merely the executive organ which in this case produces thirder probabilities. The point at issue is whether the theoretical assumptions of the thirder model fit reality and probabilities could be transfered into the real world. Thirders say yes, speaking of regular probabilities, halfers say no speaking of irregular, „weighted“ probabilities.

lauro-langosco on RobertM's Shortform

Yeah fair point. I do think labs have some some nonzero amount of responsibility to be proactive about what others believe about their commitments. I agree it doesn't extend to 'rebut every random rumor'.

oliver-daniels-koch on Oliver Daniels-Koch's Shortform

I think I'm mostly right, but using a somewhat confused frame.

It makes more sense to think of MAD approaches as detecting all abnormal reasons (including deceptive alignment) by default, and then if we get that working we'll try to decrease false anomalies by doing something like comparing the least common ancestor of the measurements in a novel mechanism to the least common ancestor of the measurements on trusted mechanisms.

linda-linsefors on LessWrong Community Weekend 2024 [Applications Open]

Thanks :)

ramblindash on Dating Roundup #3: Third Time’s the Charm

[M]aybe being yourself and open works for people who happen to already be relationship-compatible. People who are not would be worse off by trying to be themselves. I think I have been burned in the past a lot by that kind of advice, although my experience is too much of an anecdote to infer an average.

I think you are maybe using a different definition of "worse off." I would submit that a relationship that is maintainable only by being inauthentic and unopen is, in the long run, significantly worse than no relationship, both because of the experience of being in it, but also because of opportunity cost.

That's different than holding some things back at the beginning, or keeping some impolite thoughts to yourself sometimes. But if your goal is a long-term partnership, you move further away from that goal by spending time and energy on someone you know you aren't compatible with.