LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] [Paper] Stress-testing capability elicitation with password-locked models
Fabien Roger (Fabien) · 2024-06-04T14:52:50.204Z · comments (10)

Some for-profit AI alignment org ideas
Eric Ho (eh42) · 2023-12-14T14:23:20.654Z · comments (19)

[link] "AI Safety for Fleshy Humans" an AI Safety explainer by Nicky Case
habryka (habryka4) · 2024-05-03T18:10:12.478Z · comments (10)

Self-prediction acts as an emergent regularizer
Cameron Berg (cameron-berg) · 2024-10-23T22:27:03.664Z · comments (4)

[link] [Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Teun van der Weij (teun-van-der-weij) · 2024-06-13T10:04:49.556Z · comments (10)

Newsom Vetoes SB 1047
Zvi · 2024-10-01T12:20:06.127Z · comments (6)

MATS Winter 2023-24 Retrospective
utilistrutil · 2024-05-11T00:09:17.059Z · comments (28)

Sparse Autoencoders Work on Attention Layer Outputs
Connor Kissane (ckkissane) · 2024-01-16T00:26:14.767Z · comments (9)

Why you should be using a retinoid
GeneSmith · 2024-08-19T03:07:41.722Z · comments (57)

Retirement Accounts and Short Timelines
jefftk (jkaufman) · 2024-02-19T18:50:05.231Z · comments (35)

OpenAI o1, Llama 4, and AlphaZero of LLMs
Vladimir_Nesov · 2024-09-14T21:27:41.241Z · comments (24)

AI #51: Altman’s Ambition
Zvi · 2024-02-20T19:50:07.439Z · comments (5)

[link] What Depression Is Like
Sable · 2024-08-27T17:43:22.549Z · comments (23)

Actually, Power Plants May Be an AI Training Bottleneck.
Lao Mein (derpherpize) · 2024-06-20T04:41:33.567Z · comments (13)

An Introduction To The Mandelbrot Set That Doesn't Mention Complex Numbers
Yitz (yitz) · 2024-01-17T09:48:07.930Z · comments (11)

[link] Essay competition on the Automation of Wisdom and Philosophy — $25k in prizes
owencb · 2024-04-16T10:10:13.338Z · comments (12)

Agent Boundaries Aren't Markov Blankets. [Unless they're non-causal; see comments.]
abramdemski · 2023-11-20T18:23:40.443Z · comments (11)

Some Vacation Photos
johnswentworth · 2024-01-04T17:15:01.187Z · comments (0)

Release: Optimal Weave (P1): A Prototype Cohabitive Game
mako yass (MakoYass) · 2024-08-17T14:08:18.947Z · comments (21)

AI #83: The Mask Comes Off
Zvi · 2024-09-26T12:00:08.689Z · comments (19)

My Criticism of Singular Learning Theory
Joar Skalse (Logical_Lunatic) · 2023-11-19T15:19:16.874Z · comments (56)

Secular interpretations of core perennialist claims
zhukeepa · 2024-08-25T23:41:02.683Z · comments (32)

Decomposing the QK circuit with Bilinear Sparse Dictionary Learning
keith_wynroe · 2024-07-02T13:17:16.352Z · comments (7)

Values Are Real Like Harry Potter
johnswentworth · 2024-10-09T23:42:24.724Z · comments (17)

How to prevent collusion when using untrusted models to monitor each other
Buck · 2024-09-25T18:58:20.693Z · comments (5)

[link] Palworld development blog post
bhauth · 2024-01-28T05:56:19.984Z · comments (12)

AISafety.com – Resources for AI Safety
Søren Elverlin (soren-elverlin-1) · 2024-05-17T15:57:11.712Z · comments (3)

Constructability: Plainly-coded AGIs may be feasible in the near future
Épiphanie Gédéon (joy_void_joy) · 2024-04-27T16:04:45.894Z · comments (13)

Refusal mechanisms: initial experiments with Llama-2-7b-chat
Andy Arditi (andy-arditi) · 2023-12-08T17:08:01.250Z · comments (7)

[link] New voluntary commitments (AI Seoul Summit)
Zach Stein-Perlman · 2024-05-21T11:00:41.794Z · comments (17)

Studying The Alien Mind
Quentin FEUILLADE--MONTIXI (quentin-feuillade-montixi) · 2023-12-05T17:27:28.049Z · comments (10)

The Gemini Incident
Zvi · 2024-02-22T21:00:04.594Z · comments (19)

Survey of 2,778 AI authors: six parts in pictures
KatjaGrace · 2024-01-06T04:43:34.590Z · comments (1)

[link] Not every accommodation is a Curb Cut Effect: The Handicapped Parking Effect, the Clapper Effect, and more
Michael Cohn (michael-cohn) · 2024-09-15T05:27:36.691Z · comments (39)

Self-Referential Probabilistic Logic Admits the Payor's Lemma
Yudhister Kumar (randomwalks) · 2023-11-28T10:27:29.029Z · comments (14)

3C's: A Recipe For Mathing Concepts
johnswentworth · 2024-07-03T01:06:11.944Z · comments (5)

[link] My thesis (Algorithmic Bayesian Epistemology) explained in more depth
Eric Neyman (UnexpectedValues) · 2024-05-09T19:43:16.543Z · comments (4)

[link] MIRI's May 2024 Newsletter
Harlan · 2024-05-15T00:13:30.153Z · comments (1)

Quick look: applications of chaos theory
Elizabeth (pktechgirl) · 2024-08-18T15:00:07.853Z · comments (51)

LessWrong Community Weekend 2024, open for applications
UnplannedCauliflower · 2024-05-01T10:18:21.992Z · comments (2)

Thomas Kwa's research journal
Thomas Kwa (thomas-kwa) · 2023-11-23T05:11:08.907Z · comments (1)

[Intuitive self-models] 2. Conscious Awareness
Steven Byrnes (steve2152) · 2024-09-25T13:29:02.820Z · comments (48)

The case for a negative alignment tax
Cameron Berg (cameron-berg) · 2024-09-18T18:33:18.491Z · comments (20)

Graceful Degradation
Screwtape · 2024-11-05T23:57:53.362Z · comments (8)

[link] The Cognitive-Theoretic Model of the Universe: A Partial Summary and Review
jessicata (jessica.liu.taylor) · 2024-03-27T19:59:27.893Z · comments (36)

Scaffolding for "Noticing Metacognition"
Raemon · 2024-10-09T17:54:13.657Z · comments (4)

Corrigibility = Tool-ness?
johnswentworth · 2024-06-28T01:19:48.883Z · comments (8)

Spaciousness In Partner Dance: A Naturalism Demo
LoganStrohl (BrienneYudkowsky) · 2023-11-19T07:00:19.555Z · comments (5)

EU policymakers reach an agreement on the AI Act
tlevin (trevor) · 2023-12-15T06:02:44.668Z · comments (7)

[link] Is "superhuman" AI forecasting BS? Some experiments on the "539" bot from the Centre for AI Safety
titotal (lombertini) · 2024-09-18T13:07:40.754Z · comments (3)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

seth-herd on OpenAI Email Archives (from Musk v. Altman)

What I'm saying is that the people you mention should put a little more time into it. When I've been involved in philosophy discussions with academics, people tend to treat it like a fun game, with the goal being more to sore points and come up with clever new arguments than to converge on the truth.

I think most of the world doesn't take philosophy seriously, and they should.

I think the world thinks "there aren't real answers to philosophical questions, just personal preferences and a confusing mess of opinions". I think that's mostly wrong; LW does tend to cause convergence on a lot of issues for a lot of people. That might be groupthink, but I held almost identical philosophical views before engaging with LW - because I took the questions seriously and was truth-seeking.

I think Musk or Page are fully capable of LW-style philosophy if they put a little time into it - and took it seriously (were truth-seeking).

What would change people's attitudes? Well, I'm hoping that facing serious questions in how we create, use, and treat AI does cause at least some people to take the associated philosophical questions seriously.

anders-lindstroem on "It's a 10% chance which I did 10 times, so it should be 100%"

Yes. But I think you have mixed up expected value and expected utility. Please show your calculations.

satron on Why imperfect adversarial robustness doesn't doom AI control

Great post, very clearly written. Going to share it in my spaces.

anders-lindstroem on "It's a 10% chance which I did 10 times, so it should be 100%"

I do not understand your reasoning. Please show your calculations.

sherrinford on Monthly Roundup #24: November 2024

"Stephanie Murray reports that the village thing can still be done, and in particular has pulled off a ‘baby swapping’ system that periodically pools child care so parents can have time for themselves."

Maybe there is more detail in the linked blog but just from this post it sounds like a reinvention of Kindergarten.

egor-timatkov on "It's a 10% chance which I did 10 times, so it should be 100%"

Ah, shoot. You're right. Probably not good to use "odds" and "probability" interchangeably for percentages like I did. Should be fixed now.

zy on Rauno's Shortform

Yeah that makes sense; the knowledge should still be there, just need to re-shift the distribution "back"

raemon on OpenAI Email Archives (from Musk v. Altman)

Noting, this doesn't really engage with any of the particular other claims in the previous comment's link, just makes a general assertion.

thomas-kwa on Thomas Kwa's Shortform

The North Wind, the Sun, and Abadar

One day, the North Wind and the Sun argued about which of them was the strongest. Abadar, the god of commerce and civilization, stopped to observe their dispute. “Why don’t we settle this fairly?” he suggested. “Let us see who can compel that traveler on the road below to remove his cloak.”

The North Wind agreed, and with a mighty gust, he began his effort. The man, feeling the bitter chill, clutched his cloak tightly around him and even pulled it over his head to protect himself from the relentless wind. After a time, the North Wind gave up, frustrated.

Then the Sun tried his turn. Beaming warmly from the heavens, the Sun caused the air to grow pleasant and balmy. The man, feeling the growing heat, loosened his cloak and eventually took it off in the heat, resting under the shade of a tree. The Sun began to declare victory, but as soon as he turned away, the man put on the cloak again.

The god of commerce then approached the traveler, jingling a pouch of gold coins in his hand.

“Good sir,” Abadar called out, “that cloak of yours—how much would you sell it for?”

The man considered the offer, his eyes lighting up. “Five coins,” he replied hesitantly.

“Done,” said Abadar, handing over the coins and taking the cloak. The traveler tucked the money away and continued on his way, unbothered by either wind or heat. He soon bought a new cloak and invested the remainder in an index fund. The returns were steady, and in time he prospered far beyond the value of his simple cloak.

“See,” Abadar declared to the Wind and the Sun, “strength lies neither in force nor persuasion but in creating opportunities for mutual benefit. The cloak is mine permanently, and the man is better off as well. My solution also has minimal deadweight loss, assuming the elasticity—”

Before Abadar could say any more, the North Wind grumbled, the Sun conceded, and Abadar strode away, his wisdom proven. Thus, it was decided that commerce, when conducted wisely, can accomplish what neither force nor gentle persuasion alone can achieve.

egor-timatkov on "It's a 10% chance which I did 10 times, so it should be 100%"

95% is a lower bound. It's more than 95% for all numbers and approaches 95% as n gets bigger. If n=2 (E.G. a coin flip), then you actually have a 98.4% chance of at least one success after 3n (which is 6) attempts.

I mentioned this in the "What I'm not saying" section, but this limit converges rather quickly. I would consider any to be "close enough"