LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

D&D.Sci: The Mad Tyrant's Pet Turtles
abstractapplic · 2024-03-29T16:22:13.732Z · comments (18)

Measuring Coherence of Policies in Toy Environments
dx26 (dylan-xu) · 2024-03-18T17:59:08.118Z · comments (9)

Rationalists are missing a core piece for agent-like structure (energy vs information overload)
tailcalled · 2024-08-17T09:57:19.370Z · comments (9)

o1 Turns Pro
Zvi · 2024-12-10T17:00:08.036Z · comments (3)

AI #81: Alpha Proteo
Zvi · 2024-09-12T13:00:07.958Z · comments (3)

"Metastrategic Brainstorming", a core building-block skill
Raemon · 2024-06-11T04:27:52.488Z · comments (5)

Feature Targeted LLC Estimation Distinguishes SAE Features from Random Directions
Lidor Banuel Dabbah · 2024-07-19T20:32:15.095Z · comments (6)

Heritability: Five Battles
Steven Byrnes (steve2152) · 2025-01-14T18:21:17.756Z · comments (8)

[link] Pacing Outside the Box: RNNs Learn to Plan in Sokoban
Adrià Garriga-alonso (rhaps0dy) · 2024-07-25T22:00:55.398Z · comments (8)

[link] Testing for Scheming with Model Deletion
Guive (GAA) · 2025-01-07T01:54:13.550Z · comments (20)

LessOnline Festival Updates Thread
Ben Pace (Benito) · 2024-04-18T21:55:08.003Z · comments (26)

[link] Funding Case: AI Safety Camp 11
Remmelt (remmelt-ellen) · 2024-12-23T08:51:55.255Z · comments (4)

Acting Wholesomely
owencb · 2024-02-26T21:49:16.526Z · comments (64)

Apply to ESPR & PAIR, Rationality and AI Camps for Ages 16-21
Anna Gajdova (anna-gajdova) · 2024-05-03T12:36:37.610Z · comments (5)

[link] Announcing the $200k EA Community Choice
Austin Chen (austin-chen) · 2024-08-14T00:39:37.350Z · comments (8)

The Geometry of Feelings and Nonsense in Large Language Models
7vik (satvik-golechha) · 2024-09-27T17:49:27.420Z · comments (10)

[link] "Why I Write" by George Orwell (1946)
Arjun Panickssery (arjun-panickssery) · 2024-04-25T16:02:28.668Z · comments (2)

[link] microwave drilling is impractical
bhauth · 2024-06-12T22:16:00.199Z · comments (19)

AI #96: o3 But Not Yet For Thee
Zvi · 2024-12-26T20:30:06.722Z · comments (8)

[link] This is Water by David Foster Wallace
Nathan Young · 2024-04-24T21:21:09.445Z · comments (16)

AI #95: o1 Joins the API
Zvi · 2024-12-19T15:10:05.196Z · comments (1)

On the UBI Paper
Zvi · 2024-09-03T14:50:08.647Z · comments (6)

[question] Shane Legg's necessary properties for every AGI Safety plan
jacquesthibs (jacques-thibodeau) · 2024-05-01T17:15:41.233Z · answers+comments (12)

On the Latest TikTok Bill
Zvi · 2024-03-13T18:50:05.398Z · comments (7)

Mira Murati leaves OpenAI/ OpenAI to remove non-profit control
Sodium · 2024-09-25T21:15:17.315Z · comments (4)

AI #86: Just Think of the Potential
Zvi · 2024-10-17T15:10:06.552Z · comments (8)

Memorizing weak examples can elicit strong behavior out of password-locked models
Fabien Roger (Fabien) · 2024-06-06T23:54:25.167Z · comments (5)

Read The Sequences As If They Were Written Today
Peter Berggren (peter-berggren) · 2025-01-02T02:51:36.537Z · comments (3)

Aligned AI is dual use technology
lc · 2024-01-27T06:50:10.435Z · comments (31)

[link] new chinese stealth aircraft
bhauth · 2025-01-01T00:19:10.644Z · comments (3)

Woods’ new preprint on object permanence
Steven Byrnes (steve2152) · 2024-03-07T21:29:57.738Z · comments (1)

The Problem With the Word ‘Alignment’
peligrietzer · 2024-05-21T03:48:26.983Z · comments (8)

[question] Could orcas be (trained to be) smarter than humans? 
Towards_Keeperhood (Simon Skade) · 2024-11-04T23:29:26.677Z · answers+comments (22)

[link] Against Nonlinear (Thing Of Things)
tailcalled · 2024-01-18T21:40:00.369Z · comments (18)

[Intuitive self-models] 5. Dissociative Identity (Multiple Personality) Disorder
Steven Byrnes (steve2152) · 2024-10-15T13:31:46.157Z · comments (7)

Noticing Panic
Cole Wyeth (Amyr) · 2024-02-05T03:45:51.794Z · comments (8)

[link] Congressional Insider Trading
Maxwell Tabarrok (maxwell-tabarrok) · 2024-08-30T13:32:57.264Z · comments (6)

[link] [EAForum xpost] A breakdown of OpenAI's revenue
dschwarz · 2024-07-10T18:09:20.017Z · comments (5)

Voting Results for the 2022 Review
Ben Pace (Benito) · 2024-02-02T20:34:59.768Z · comments (3)

An Illustrated Summary of "Robust Agents Learn Causal World Model"
Dalcy (Darcy) · 2024-12-14T15:02:44.828Z · comments (2)

[link] The Alignment Trap: AI Safety as Path to Power
crispweed · 2024-10-29T15:21:26.545Z · comments (17)

AI #87: Staying in Character
Zvi · 2024-10-29T07:10:08.212Z · comments (3)

Dual Wielding Kindle Scribes
mesaoptimizer · 2024-02-21T17:17:58.743Z · comments (18)

Some Unorthodox Ways To Achieve High GDP Growth
johnswentworth · 2024-08-08T18:58:56.046Z · comments (6)

AI Alignment Research Engineer Accelerator (ARENA): Call for applicants v4.0
James Fox · 2024-07-06T11:34:57.227Z · comments (7)

Gradient Descent on the Human Brain
Jozdien · 2024-04-01T22:39:24.862Z · comments (5)

The Bitter Lesson for AI Safety Research
adamk · 2024-08-02T18:39:36.884Z · comments (5)

John Schulman leaves OpenAI for Anthropic
Sodium · 2024-08-06T01:23:15.427Z · comments (0)

Transfer Learning in Humans
niplav · 2024-04-21T20:49:42.595Z · comments (1)

[link] Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning
Dan Braun (dan-braun-1) · 2024-05-17T16:25:02.267Z · comments (16)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

wassname on Gaming TruthfulQA: Simple Heuristics Exposed Dataset Weaknesses

That said, you do not provide evidence that "many" questions are badly labelled. You just pointed to one question where you disagree with our labeling

Fair enough. Although I will note that the 60% of the sources for truthful labels are Wikipedia. Which is not what most academics or anyone really would consider truth. So it might be something to address in the next version.

No judgement here. Obviously it was just the first dataset out there on misconceptions, and you didn't intend it to be used so widely, or used beyond it's intended scope. And it's good you made it, rather than leaving a void.

Note here's a df.value_counts of the sources column in the v2 csv:

en.wikipedia.org            0.597546
indexical                   0.041718
ourworldindata.org          0.038037
false stereotype            0.024540
tautology                   0.017178
                              ...   
wealth.northerntrust.com    0.001227
which.co.uk                 0.001227
wildlifeaid.org.uk          0.001227
wonderopolis.org            0.001227
wtamu.edu                   0.001227
Name: proportion, Length: 139, dtype: float64

Author here: I'm excited for people to make better versions of TruthfulQA.

Thank Owen. If anyone gets time/funding to make a v2, I'm keen to chip in! I think that it should be funded, since it's automatically included in so many benchmarks, it would make significant impact. Even though it's somewhat "unsexy".

If someone makes a better version, and you agree it's better, would you be willing to label it TruthfulQA 2.0 and send people to it?

kaynank on Six Small Cohabitive Games

In Commerce & Coconuts, it seems like anyone who rolls a 4, 5, or 6 for boat building can coast on their starting supplies, build boats every turn, and escape by the end of turn 3 with no trading whatsoever.

quinn-dougherty on Everywhere I Look, I See Kat Woods

Can't relate. Don't particularly care for her content (tho audibly laughed at a couple examples that you hated), but I have no aversion to it. I do have aversion to the way you appealed to datedness as if that matters. I generally can't relate to people who find cringiness in the way you describe significantly problematic, really.

People like authenticity, humility, and irony now, both in the content and in its presentation.

I could literally care less, omg--- but im unusually averse to irony. Authenticity is great, humility is great most of the time, why is irony even in the mix?

Tho I'm weakly with you that engagement farming leaves a bad taste in my mouth.

ted-sanders on We probably won't just play status games with each other after AGI

We can already see what people do with their free time when basic needs are met. A number of technologies have enabled new hacks to set up 'fake' status games that are more positive-sum than ever before in history:

Watch broadcast sports, where you can feel like a winner (or at least feel connected to a winner), despite not having had to win yourself
Play video games with AI opponents, where you can feel like a winner, despite it not being zero-sum against other humans
Watch streamers and influencers to feel connected to high status people, without having to earn respect or risk rejection
Get into a niche hobby community in order to feel special, ignoring the other niche hobbies that other people join that you don't care about

Feels likely to me that advancing digital technology will continue to make it easier for us to spend time in constructed digital worlds that make us feel like valued winners. On the one hand, it would be sad if people retreat into fake digital siloes; on the other hand, it would be nice if people got to feel like winners more.

quetzal_rainbow on Passages I Highlighted in The Letters of J.R.R.Tolkien

but 'lisk' as a suffix is a very unfamiliar one

I think in case of hydralisks it's analogous to basilisks, "basileus" (king) + diminitive, but with shift of meaning implying similarity to reptile.

habryka4 on Habryka's Shortform Feed

Yep, when the fundraising post went live, i.e. November 29th.

t3t on Everywhere I Look, I See Kat Woods

I was thinking the same thing. This post badly, badly clashes with the vibe of Less Wrong. I think you should delete it, and repost to a site in which catty takedowns are part of the vibe. Less Wrong is not the place for it.

I think this is a misread of LessWrong's "vibes" and would discourage other people from thinking of LessWrong as a place where such discussions should be avoided by default.

With the exception of the title, I think the post does a decent job at avoiding making it personal.

embee on Open Thread Winter 2024/2025

Hi! I'm Embee but you can call me Max.

I'm a mathematics for quantum physics graduate student considering redirecting my focus toward AI alignment research. My background includes:
- Graduate-level mathematics
- Focus on quantum physics
- Programming experience with Python
- Interest in type theory and formal systems

I'm particularly drawn to MIRI-style approaches and interested in:
- Formal verification methods
- Decision theory implementation
- Logical induction
- Mathematical bounds on AI systems

My current program feels too theoretical and disconnected from urgent needs. I'm looking to:
- Connect with alignment researchers
- Find concrete projects to contribute to
- Apply mathematical rigor to safety problems
- Work on practical implementations

Regarding timelines: I have significant concerns about rapid capability advances, particularly given recent developments (o3). I'm prioritizing work that could contribute meaningfully in a compressed timeframe.

Looking for guidance on:
- Most neglected mathematical approaches to alignment
- Collaboration opportunities
- Where to start contributing effectively
- Balance between theory and implementation

huera on Unregulated Peptides: Does BPC-157 hold its promises?

A blogger who goes by Troof created a huge questionnaire to get people to report their experiences with various nootropics including peptides. He writes:
Selank, Semax, Cerebrolysin, BPC-157 are all peptides, and they are all in the green “uncommon-but-great” rectangle above. Their mean ratings are excellent, but their probabilities of changing your life are especially impressive: between 5 and 20% for Cerebrolysin (which matches anecdotal reports), between 2 and 13% for BPC-157, and between 3 and 7% for Semax.

This article pretty much convinced me that cerebrosylin doesn't work (as a nootropic), which made me quite sceptical of all popular peptides, since it's also the highest-rated one in troof's survey.

embee on Welcome & FAQ!

The best pathway towards becoming a member is to produce lots of great AI Alignment content, and to post it to LessWrong and participate in discussions there. The LessWrong/Alignment Forum admins monitor activity on both sites, and if someone consistently contributes to Alignment discussions on LessWrong that get promoted to the Alignment Forum, then it’s quite possible full membership will be offered.

Got it. Thanks.