LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

List of AI safety papers from companies, 2023–2024
Zach Stein-Perlman · 2025-01-15T18:00:30.242Z · comments (0)
Expected Utility, Geometric Utility, and Other Equivalent Representations
StrivingForLegibility · 2024-11-20T23:28:21.826Z · comments (0)
[link] What are the differences between AGI, transformative AI, and superintelligence?
Vishakha (vishakha-agrawal) · 2025-01-23T10:03:31.886Z · comments (3)
[link] LLMs for language learning
Benquo · 2025-01-15T14:08:54.620Z · comments (2)
[link] Bird's eye view: An interactive representation to see large collection of text "from above".
Alexandre Variengien (alexandre-variengien) · 2024-12-21T00:15:02.239Z · comments (4)
[link] Updating on Bad Arguments
Guive (GAA) · 2024-12-21T01:19:15.686Z · comments (2)
[link] Bridgewater x Metaculus Forecasting Contest Goes Global — Feb 3, $25k, Opportunities
ChristianWilliams · 2025-01-07T21:40:30.899Z · comments (0)
[link] o1 tried to avoid being shut down
Raelifin · 2024-12-05T19:52:03.620Z · comments (5)
Favorite colors of some LLMs.
weightt an (weightt-an) · 2024-12-31T21:22:58.494Z · comments (3)
The Human Alignment Problem for AIs
rife (edgar-muniz) · 2025-01-22T04:06:10.872Z · comments (5)
[link] Exploring Cooperation: The Path to Utopia
Davidmanheim · 2024-12-25T18:31:55.565Z · comments (0)
AXRP Episode 38.6 - Joel Lehman on Positive Visions of AI
DanielFilan · 2025-01-24T23:00:07.562Z · comments (0)
[link] Predation as Payment for Criticism
Benquo · 2025-01-30T01:06:27.591Z · comments (5)
The Quantum Mars Teleporter: An Empirical Test Of Personal Identity Theories
avturchin · 2025-01-22T11:48:46.071Z · comments (18)
[question] Is "hidden complexity of wishes problem" solved?
Roman Malov · 2025-01-05T22:59:30.911Z · answers+comments (4)
Your AI Safety focus is downstream of your AGI timeline
Michael Flood (michael-flood) · 2025-01-17T21:24:11.913Z · comments (0)
Is this a better way to do matchmaking?
Chipmonk · 2024-12-16T19:06:14.574Z · comments (4)
Introducing the Coalition for a Baruch Plan for AI: A Call for a Radical Treaty-Making process for the Global Governance of AI
rguerreschi · 2025-01-30T15:26:09.482Z · comments (0)
[question] How likely is AGI to force us all to be happy forever? (much like in the Three Worlds Collide novel)
uhbif19 · 2025-01-18T15:39:21.549Z · answers+comments (5)
be the person that makes the meeting productive
Oldmanrahul · 2025-01-18T22:32:43.640Z · comments (0)
AXRP Episode 38.5 - Adrià Garriga-Alonso on Detecting AI Scheming
DanielFilan · 2025-01-20T00:40:07.077Z · comments (0)
[link] A Public Choice Take on Effective Altruism
vaishnav92 · 2024-12-15T16:58:50.683Z · comments (4)
AI Safety Outreach Seminar & Social (online)
Linda Linsefors · 2025-01-08T13:25:23.192Z · comments (0)
Revealing alignment faking with a single prompt
Florian_Dietz · 2025-01-29T21:01:15.000Z · comments (5)
AXRP Episode 38.4 - Shakeel Hashim on AI Journalism
DanielFilan · 2025-01-05T00:20:05.096Z · comments (0)
Arthropod (non) sentience
Arturo Macias (arturo-macias) · 2024-11-25T16:01:58.514Z · comments (8)
[link] My Experience With A Magnet Implant
Vale · 2025-01-07T03:01:21.410Z · comments (2)
[link] Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-11-26T09:58:44.025Z · comments (0)
[link] Corrigibility should be an AI's Only Goal
PeterMcCluskey · 2024-12-29T20:25:17.922Z · comments (1)
[question] Could my work, "Beyond HaHa" benefit the LessWrong community?
P. João (gabriel-brito) · 2024-12-29T16:14:13.497Z · answers+comments (6)
Does Claude Prioritize Some Prompt Input Channels Over Others?
keltan · 2024-12-29T01:21:26.755Z · comments (2)
How to make evals for the AISI evals bounty
TheManxLoiner · 2024-12-03T10:44:45.700Z · comments (0)
CCing Mailing Lists on External Communication
jefftk (jkaufman) · 2024-12-04T22:00:02.038Z · comments (0)
Cast it into the fire! Destroy it!
Aram Panasenco (panasenco) · 2025-01-13T07:30:19.356Z · comments (9)
Liron Shapira vs Ken Stanley on Doom Debates. A review
TheManxLoiner · 2025-01-24T18:01:56.646Z · comments (0)
[link] Biden administration unveils global AI export controls aimed at China
Chris_Leong · 2025-01-14T01:01:13.927Z · comments (0)
Smart people should do biology
Haotian (haotian-huang) · 2024-12-05T19:11:20.671Z · comments (2)
Near- and medium-term AI Control Safety Cases
Martín Soto (martinsq) · 2024-12-23T17:37:48.860Z · comments (0)
Notes from Copenhagen Secular Solstice 2024
Søren Elverlin (soren-elverlin-1) · 2024-12-22T15:08:20.848Z · comments (0)
[link] Densing Law of LLMs
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-12-08T19:35:09.244Z · comments (2)
I Have A New Paper Out Arguing Against The Asymmetry And For The Existence of Happy People Being Very Good
omnizoid · 2024-11-21T17:21:41.426Z · comments (3)
Executive Director for AIS France - Expression of interest
gergogaspar (gergo-gaspar) · 2024-12-19T08:14:54.023Z · comments (0)
Refuting Searle’s wall, Putnam’s rock, and Johnson’s popcorn
Davidmanheim · 2024-12-09T08:24:26.594Z · comments (30)
Topological Debate Framework
lunatic_at_large · 2025-01-16T17:19:25.816Z · comments (5)
[link] Frontier AI systems have surpassed the self-replicating red line
aproteinengine · 2024-12-11T03:06:14.927Z · comments (4)
On Responsibility
silentbob · 2025-01-21T10:47:37.562Z · comments (2)
0 Motivation Mapping through Information Theory
P. João (gabriel-brito) · 2024-12-16T23:17:17.254Z · comments (0)
What are the plans for solving the inner alignment problem?
[deleted] · 2025-01-17T21:45:28.330Z · comments (4)
[link] Are SAE features from the Base Model still meaningful to LLaVA?
Shan23Chen (shan-chen) · 2024-12-05T20:21:55.501Z · comments (2)
[link] Riffing on Machines of Loving Grace
an1lam · 2025-01-01T01:06:45.122Z · comments (0)
← previous page (newer posts) · next page (older posts) →