LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

The Field of AI Alignment: A Postmortem, and What To Do About It
johnswentworth · 2024-12-26T18:48:07.614Z · comments (69)

[link] Review: Planecrash
L Rudolf L (LRudL) · 2024-12-27T14:18:33.611Z · comments (6)

A Three-Layer Model of LLM Psychology
Jan_Kulveit · 2024-12-26T16:49:41.738Z · comments (3)

[question] What Have Been Your Most Valuable Casual Conversations At Conferences?
johnswentworth · 2024-12-25T05:49:36.711Z · answers+comments (19)

ReSolsticed vol I: "We're Not Going Quietly"
Raemon · 2024-12-26T17:52:33.727Z · comments (3)

AI #96: o3 But Not Yet For Thee
Zvi · 2024-12-26T20:30:06.722Z · comments (7)

[question] What are the most interesting / challenging evals (for humans) available?
Raemon · 2024-12-27T03:05:26.831Z · answers+comments (8)

Corrigibility's Desirability is Timing-Sensitive
RobertM (T3t) · 2024-12-26T22:24:17.435Z · comments (4)

Living with Rats in College
lsusr · 2024-12-25T10:44:13.085Z · comments (0)

[link] PCR retrospective
bhauth · 2024-12-26T21:20:56.484Z · comments (0)

If all trade is voluntary, then what is "exploitation?"
Darmani · 2024-12-27T11:21:30.036Z · comments (19)

Whistleblowing Twitter Bot
Mckiev · 2024-12-26T04:09:45.493Z · comments (5)

Open Thread Winter 2024/2025
habryka (habryka4) · 2024-12-25T21:02:41.760Z · comments (1)

[link] Letter from an Alien Mind
Shoshannah Tekofsky (DarkSym) · 2024-12-27T13:20:49.277Z · comments (4)

Coin Flip
XelaP (scroogemcduck1) · 2024-12-27T11:53:01.781Z · comments (0)

Greedy-Advantage-Aware RLHF
sej2020 · 2024-12-27T19:47:25.562Z · comments (0)

[question] What is your personal totalizing and self-consistent worldview/philosophy?
lsusr · 2024-12-27T23:59:30.641Z · answers+comments (0)

[question] What would be the IQ and other benchmarks of o3 that uses $1 million worth of compute resources to answer one question?
avturchin · 2024-12-26T11:08:23.545Z · answers+comments (2)

[link] From the Archives: a story
Richard_Ngo (ricraz) · 2024-12-27T16:36:50.735Z · comments (1)

[link] Exploring Cooperation: The Path to Utopia
Davidmanheim · 2024-12-25T18:31:55.565Z · comments (0)

[question] What's the best metric for measuring quality of life?
ChristianKl · 2024-12-27T14:29:30.813Z · answers+comments (4)

[link] Progress links and short notes, 2024-12-27: Clinical trial abundance, grid-scale fusion, permitting vs. compliance, crossword mania, and more
jasoncrawford · 2024-12-27T23:34:43.807Z · comments (0)

[question] Why don't we currently have AI agents?
ChristianKl · 2024-12-26T15:26:35.682Z · answers+comments (7)

[link] Streamlining my voice note process
Vlad Sitalo (harcisis) · 2024-12-26T06:04:01.990Z · comments (1)

Super human AI is a very low hanging fruit!
Hzn · 2024-12-26T19:00:22.822Z · comments (0)

Good Fortune and Many Worlds
Jonah Wilberg (jrwilb@googlemail.com) · 2024-12-27T13:21:43.142Z · comments (0)

[link] Deconstructing arguments against AI art
DMMF · 2024-12-27T19:40:13.015Z · comments (0)

[question] Are Sparse Autoencoders a good idea for AI control?
Gerard Boxo (gerard-boxo) · 2024-12-26T17:34:55.617Z · answers+comments (2)

Duplicate token neurons in the first layer of gpt2-small
Alex Gibson · 2024-12-27T04:21:55.896Z · comments (0)

[link] The Economics & Practicality of Starting Mars Colonization
Zero Contradictions · 2024-12-26T10:56:26.019Z · comments (1)

Algorithmic Asubjective Anthropics, Cartesian Subjective Anthropics
Lorec · 2024-12-27T01:58:39.880Z · comments (0)

[link] Human, All Too Human - Superintelligence requires learning things we can’t teach
Ben Turtel (ben-turtel) · 2024-12-26T16:26:27.328Z · comments (4)

The Opening Salvo: 1. An Ontological Consciousness Metric: Resistance to Behavioral Modification as a Measure of Recursive Awareness
Peterpiper · 2024-12-25T02:29:52.025Z · comments (0)

Terminal goal vs Intelligence
Donatas Lučiūnas (donatas-luciunas) · 2024-12-26T08:10:42.144Z · comments (19)

next page (older posts) →

Archive

Recent comments

zach-stein-perlman on The Field of AI Alignment: A Postmortem, and What To Do About It

I feel like John's view entails that he would be able to convince my friends that various-research-agendas-my-friends-like are doomed. (And I'm pretty sure that's false.) I assume John doesn't believe that, and I wonder why he doesn't think his view entails it.

abstractapplic on D&D.Sci Dungeonbuilding: the Dungeon Tournament

Built a treebased model; trialled a few solutions; got radically different answers which I'm choosing to trust.

The machines seem to think that the best solution I can offer is

BOG/OWH/GCD

and I've

found a row which confirms the adventurers-scout-one-room-ahead paradigm is, at the very least, not both eternal and absolute

so I'm making that my answer for now.

hopenope on What are the strongest arguments for very short timelines?

The lack of reliability eats away a huge amount of productivity. Everything should be double-checked, and with higher capabilities it becomes even harder, and we need to think more about the subtle ways that their output is wrong. Unknown unknowns are also always a factor, but if o3 type models can be trained in less verifiable problems, and not insanely compute heavy, then 2026 is actually a reasonable guess.

kabir-kumar on johnswentworth's Shortform

Personally, I think o1 is uniquely trash, I think o1-preview was actually better. Getting on average, better things from deepseek and sonnet 3.5 atm.

kabir-kumar on Oliver Daniels-Koch's Shortform

I like bluesky for this atm

kabir-kumar on Kabir Kumar's Shortform

I'd like some feedback on my theory of impact for my currently chosen research path

**End goal**: Reduce x-risk from AI and risk of human disempowerment.
for x-risk:
- solving AI alignment - very important,
- knowing exactly how well we're doing in alignment, exactly how close we are to solving it, how much is left, etc seems important.
- how well different methods work,
- which companies are making progress in this, which aren't, which are acting like they're making progress vs actually making progress, etc
- put all on a graph, see who's actually making the line go up

- Also, a way that others can use to measure how good their alignment method/idea is, easily
so there's actually a target and a progress bar for alignment - seems like it'd make alignment research a lot easier and improve the funding space - and the space as a whole. Improving the quality and quantity of research.

- Currently, it's mostly a mixture of vibe checks, occasional benchmarks that test a few models, jailbreaks, etc
- all almost exclusively on the end models as a whole - which have many, many differences that could be contributing to the differences in the different 'alignment measurements'
by having a method that keeps things controlled as much as possible and just purely measures the different post training methods, this seems like a much better way to know how we're doing in alignment
and how to prioritize research, funding, governence, etc

On Goodharting the Line - will also make it modular, so that people can add their own benchmarks, and highlight people who redteam different alignment benchmarks.

mckiev on Whistleblowing Twitter Bot

I can also unblock it manually at any point, and keep the full uncensored log of posts on a blockchain

tylerjohnston on tylerjohnston's Shortform

What should I read if I want to really understand (in an ITT-passing way) how the CCP makes and justifies its decisions around censorship and civil liberties?

mckiev on Whistleblowing Twitter Bot

Thanks for sharing your opinion. Regarding security: Using a full body of an email you can generate a zero knowledge using an offline tool (since all emails are hashed and signed by the email server). No new emails need to be exchanged

quwgri on I would like to try double crux.

"An infinite universe can exist."
"A greatest infinity cannot exist."
I think there is some kind of logical contradiction here. If the Universe exists and if it is infinite, then it must correspond to the concept of "the greatest infinity." True, Bertrand Russell once expressed doubt that one can correctly reason about the "Universe as a whole." I don't know. It seems strange to me. As if we recognize the existence of individual things, but not of all things as a whole. It seems like some kind of arbitrary crutch, a private "ad hoc" solution, conditioned by the weakness of our brain.
As for God or Gods, then, hypothetically, in the case of the coincidence of their value systems and the mental interaction between them according to a common agreed protocol, these problems should not be very important.