LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Testing which LLM architectures can do hidden serial reasoning
Filip Sondej · 2024-12-16T13:48:34.204Z · comments (9)
Human study on AI spear phishing campaigns
Simon Lermen (dalasnoin) · 2025-01-03T15:11:14.765Z · comments (8)
2025 Prediction Thread
habryka (habryka4) · 2024-12-30T01:50:14.216Z · comments (18)
🇫🇷 Announcing CeSIA: The French Center for AI Safety
Charbel-Raphaël (charbel-raphael-segerie) · 2024-12-20T14:17:13.104Z · comments (0)
When AI 10x's AI R&D, What Do We Do?
Logan Riggs (elriggs) · 2024-12-21T23:56:11.069Z · comments (16)
[link] Policymakers don't have access to paywalled articles
Adam Jones (domdomegg) · 2025-01-05T10:56:11.495Z · comments (10)
[link] Moderately More Than You Wanted To Know: Depressive Realism
JustisMills · 2025-01-13T02:57:32.022Z · comments (4)
No one has the ball on 1500 Russian olympiad winners who've received HPMOR
Mikhail Samin (mikhail-samin) · 2025-01-12T11:43:36.560Z · comments (17)
[link] Anthropic leadership conversation
Zach Stein-Perlman · 2024-12-20T22:00:45.229Z · comments (17)
[link] Learn to write well BEFORE you have something worth saying
eukaryote · 2024-12-29T23:42:31.906Z · comments (18)
Retrospective: PIBBSS Fellowship 2024
DusanDNesic · 2024-12-20T15:55:24.194Z · comments (1)
[link] "We know how to build AGI" - Sam Altman
Nikola Jurkovic (nikolaisalreadytaken) · 2025-01-06T02:05:05.134Z · comments (5)
Checking in on Scott's composition image bet with imagen 3
Dave Orr (dave-orr) · 2024-12-22T19:04:17.495Z · comments (0)
[link] Recommendations for Technical AI Safety Research Directions
Sam Marks (samuel-marks) · 2025-01-10T19:34:04.920Z · comments (1)
ReSolsticed vol I: "We're Not Going Quietly"
Raemon · 2024-12-26T17:52:33.727Z · comments (4)
Measuring whether AIs can statelessly strategize to subvert security measures
Alex Mallen (alex-mallen) · 2024-12-19T21:25:28.555Z · comments (0)
Stream Entry
lsusr · 2025-01-07T23:56:13.530Z · comments (7)
Chance is in the Map, not the Territory
Daniel Herrmann (Whispermute) · 2025-01-13T19:17:15.843Z · comments (10)
[link] Funding Case: AI Safety Camp 11
Remmelt (remmelt-ellen) · 2024-12-23T08:51:55.255Z · comments (4)
How do you deal w/ Super Stimuli?
Logan Riggs (elriggs) · 2025-01-14T15:14:51.552Z · comments (18)
[link] Testing for Scheming with Model Deletion
Guive (GAA) · 2025-01-07T01:54:13.550Z · comments (20)
Read The Sequences As If They Were Written Today
Peter Berggren (peter-berggren) · 2025-01-02T02:51:36.537Z · comments (3)
AI #95: o1 Joins the API
Zvi · 2024-12-19T15:10:05.196Z · comments (1)
AI #96: o3 But Not Yet For Thee
Zvi · 2024-12-26T20:30:06.722Z · comments (8)
I'm offering free math consultations!
Gurkenglas · 2025-01-14T16:30:40.115Z · comments (5)
o3, Oh My
Zvi · 2024-12-30T14:10:05.144Z · comments (17)
[link] new chinese stealth aircraft
bhauth · 2025-01-01T00:19:10.644Z · comments (3)
Vegans need to eat just enough Meat - emperically evaluate the minimum ammount of meat that maximizes utility
Johannes C. Mayer (johannes-c-mayer) · 2024-12-22T22:08:31.971Z · comments (34)
[link] Ideas for benchmarking LLM creativity
gwern · 2024-12-16T05:18:55.631Z · comments (11)
AI Assistants Should Have a Direct Line to Their Developers
Jan_Kulveit · 2024-12-28T17:01:58.643Z · comments (6)
[question] What Have Been Your Most Valuable Casual Conversations At Conferences?
johnswentworth · 2024-12-25T05:49:36.711Z · answers+comments (21)
AI Safety as a YC Startup
Lukas Petersson (lukas-petersson-1) · 2025-01-08T10:46:29.042Z · comments (9)
[link] Discursive Warfare and Faction Formation
Benquo · 2025-01-09T16:47:31.824Z · comments (2)
The OODA Loop -- Observe, Orient, Decide, Act
Davis_Kingsley · 2025-01-01T08:00:27.979Z · comments (2)
Heritability: Five Battles
Steven Byrnes (steve2152) · 2025-01-14T18:21:17.756Z · comments (4)
DeekSeek v3: The Six Million Dollar Model
Zvi · 2024-12-31T15:10:06.924Z · comments (6)
A Solution for AGI/ASI Safety
Weibing Wang (weibing-wang) · 2024-12-18T19:44:29.739Z · comments (30)
[link] Careless thinking: A theory of bad thinking
Nathan Young · 2024-12-17T18:23:16.140Z · comments (17)
[link] Preference Inversion
Benquo · 2025-01-02T18:15:52.938Z · comments (46)
[link] Review: Breaking Free with Dr. Stone
TurnTrout · 2024-12-18T01:26:37.730Z · comments (5)
Greedy-Advantage-Aware RLHF
sej2020 · 2024-12-27T19:47:25.562Z · comments (15)
Implications of the inference scaling paradigm for AI safety
Ryan Kidd (ryankidd44) · 2025-01-14T02:14:53.562Z · comments (17)
Considerations on orca intelligence
Towards_Keeperhood (Simon Skade) · 2024-12-29T14:35:16.445Z · comments (5)
Role embeddings: making authorship more salient to LLMs
Nina Panickssery (NinaR) · 2025-01-07T20:13:16.677Z · comments (0)
[link] A dataset of questions on decision-theoretic reasoning in Newcomb-like problems
Caspar Oesterheld (Caspar42) · 2024-12-16T22:42:03.763Z · comments (1)
We probably won't just play status games with each other after AGI
Matthew Barnett (matthew-barnett) · 2025-01-15T04:56:38.330Z · comments (5)
Finding Features Causally Upstream of Refusal
Daniel Lee (daniel-lee) · 2025-01-14T02:30:04.321Z · comments (5)
[link] The Deep Lore of LightHaven, with Oliver Habryka (TBC episode 228)
Eneasz · 2024-12-24T22:45:50.065Z · comments (4)
Implications of the AI Security Gap
Dan Braun (dan-braun-1) · 2025-01-08T08:31:36.789Z · comments (0)
AI #97: 4
Zvi · 2025-01-02T14:10:06.505Z · comments (4)
← previous page (newer posts) · next page (older posts) →