LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

Apparent Introspection in Claude: A Case Study in Projected Mind
robert_saltzman · 2025-03-31T00:51:08.748Z · comments (0)
Alignment First, Intelligence Later
Chipmonk · 2025-03-30T22:26:55.302Z · comments (0)
[question] Why do many people who care about AI Safety not clearly endorse PauseAI?
humnrdble · 2025-03-30T18:06:32.426Z · answers+comments (39)
Enumerating objects a model "knows" using entity-detection features.
Alex Gibson · 2025-03-30T16:58:01.957Z · comments (0)
Bonn ACX Meetup Spring 2025
Fernand0 · 2025-03-30T15:12:22.294Z · comments (1)
What does aligning AI to an ideology mean for true alignment?
StanislavKrym · 2025-03-30T15:12:09.802Z · comments (0)
How to enjoy fail attempts without self-deception (technique)
YanLyutnev (YanLutnev) · 2025-03-30T13:49:23.793Z · comments (0)
The g-Zombie Formal Argument
milanrosko · 2025-03-30T13:16:08.352Z · comments (23)
Memory Persistence within Conversation Threads with Multimodal LLMS
sjay8 · 2025-03-30T07:16:00.470Z · comments (0)
How I talk to those above me
Maxwell Peterson (maxwell-peterson) · 2025-03-30T06:54:59.869Z · comments (13)
I, G(Zombie)
milanrosko · 2025-03-30T01:24:28.127Z · comments (68)
How do SAE Circuits Fail? A Case Study Using a Starts-with-'E' Letter Detection Task
adsingh-64 · 2025-03-30T00:47:18.711Z · comments (0)
[link] Climbing the Hill of Experiments
nomagicpill (ethanmorse) · 2025-03-29T20:37:25.619Z · comments (0)
[question] Does the AI control agenda broadly rely on no FOOM being possible?
Noosphere89 (sharmake-farah) · 2025-03-29T19:38:23.971Z · answers+comments (3)
Exercising Rationality
Eggs (donald-sampson) · 2025-03-29T19:08:47.939Z · comments (0)
Yeshua's Basilisk
Alex Beyman (alexbeyman) · 2025-03-29T18:11:50.535Z · comments (1)
AI Needs Us? Information Theory and Humans as data
tomdekan (tomd@hey.com) · 2025-03-29T15:51:16.070Z · comments (6)
Auto Shutdown Script
jefftk (jkaufman) · 2025-03-29T13:10:05.227Z · comments (5)
Proposal for a Post-Labor Societal Structure to Mitigate ASI Risks: The 'Game Culture Civilization' (GCC) Model
Beyond Singularity (beyond-singularity) · 2025-03-29T11:31:04.894Z · comments (0)
Tormenting Gemini 2.5 with the [[[]]][][[]] Puzzle
Czynski (JacobKopczynski) · 2025-03-29T02:51:29.786Z · comments (36)
Singularity Survival Guide: A Bayesian Guide for Navigating the Pre-Singularity Period
mbrooks · 2025-03-28T23:21:39.191Z · comments (4)
[link] Softmax, Emmett Shear's new AI startup focused on "Organic Alignment"
Chipmonk · 2025-03-28T21:23:46.220Z · comments (1)
The Pando Problem: Rethinking AI Individuality
Jan_Kulveit · 2025-03-28T21:03:28.374Z · comments (11)
Selection Pressures on LM Personas
Raymond D · 2025-03-28T20:33:09.918Z · comments (0)
AXRP Episode 40 - Jason Gross on Compact Proofs and Interpretability
DanielFilan · 2025-03-28T18:40:01.856Z · comments (0)
[question] Share AI Safety Ideas: Both Crazy and Not. №2
ank · 2025-03-28T17:22:22.814Z · answers+comments (10)
AI x Bio Workshop
Allison Duettmann (allison-duettmann) · 2025-03-28T17:21:08.824Z · comments (0)
[question] How many times faster can the AGI advance the science than humans do?
StanislavKrym · 2025-03-28T15:16:52.320Z · answers+comments (0)
Gemini 2.5 is the New SoTA
Zvi · 2025-03-28T14:20:03.176Z · comments (1)
Will the Need to Retrain AI Models from Scratch Block a Software Intelligence Explosion?
Tom Davidson (tom-davidson-1) · 2025-03-28T14:12:02.163Z · comments (0)
[link] How We Might All Die in A Year
Greg C (greg-colbourn) · 2025-03-28T13:22:36.863Z · comments (6)
The vision of Bill Thurston
TsviBT · 2025-03-28T11:45:14.297Z · comments (34)
What Uniparental Disomy Tells Us About Improper Imprinting in Humans
Morpheus · 2025-03-28T11:24:47.133Z · comments (1)
[link] Explaining British Naval Dominance During the Age of Sail
Arjun Panickssery (arjun-panickssery) · 2025-03-28T05:47:28.561Z · comments (5)
Will the AGIs be able to run the civilisation?
StanislavKrym · 2025-03-28T04:50:07.568Z · comments (2)
[question] Is AGI actually that likely to take off given the world energy consumption?
StanislavKrym · 2025-03-27T23:13:14.959Z · answers+comments (2)
[Linkpost] The value of initiating a pursuit in temporal decision-making
Gunnar_Zarncke · 2025-03-27T21:47:05.123Z · comments (0)
Alignment through atomic agents
micseydel · 2025-03-27T18:43:14.569Z · comments (0)
Machines of Stolen Grace
Riley Tavassoli (riley-tavassoli) · 2025-03-27T18:15:23.736Z · comments (0)
An argument for asexuality
filthy_hedonist (sid-kolichala) · 2025-03-27T18:08:48.624Z · comments (10)
On the plausibility of a “messy” rogue AI committing human-like evil
Jacob Griffith (Jacob.Griffith) · 2025-03-27T18:06:45.505Z · comments (0)
[link] AI Moral Alignment: The Most Important Goal of Our Generation
Ronen Bar (ronen-bar) · 2025-03-27T18:04:07.212Z · comments (0)
[link] Tracing the Thoughts of a Large Language Model
Adam Jermyn (adam-jermyn) · 2025-03-27T17:20:02.162Z · comments (22)
Computational Superposition in a Toy Model of the U-AND Problem
Adam Newgas (BorisTheBrave) · 2025-03-27T16:56:34.474Z · comments (2)
Mistral Large 2 (123B) exhibits alignment faking
Marc Carauleanu (Marc-Everin Carauleanu) · 2025-03-27T15:39:02.176Z · comments (4)
AIS Netherlands is looking for a Founding Executive Director (EOI form)
gergogaspar (gergo-gaspar) · 2025-03-27T15:30:18.444Z · comments (0)
AI #109: Google Fails Marketing Forever
Zvi · 2025-03-27T14:50:01.825Z · comments (12)
What life will be like for humans if aligned ASI is created
james oofou (james-oofou) · 2025-03-27T10:06:56.846Z · comments (6)
[link] What is scaffolding?
Vishakha (vishakha-agrawal) · 2025-03-27T09:06:35.403Z · comments (0)
Workflow vs interface vs implementation
Sniffnoy · 2025-03-27T07:38:49.109Z · comments (0)
← previous page (newer posts) · next page (older posts) →