LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] The Stag Hunt—cultivating cooperation to reap rewards
James Stephen Brown (james-brown) · 2025-02-25T23:45:07.472Z · comments (0)

[question] Sparks of Original Thought?
Annapurna (jorge-velez) · 2025-03-06T00:53:44.421Z · answers+comments (4)

Retroactive If-Then Commitments
MichaelDickens · 2025-02-01T22:22:43.031Z · comments (0)

[link] On AI Scaling
harsimony · 2025-02-05T20:24:56.977Z · comments (3)

Closed-ended questions aren't as hard as you think
electroswing · 2025-02-19T03:53:11.855Z · comments (0)

[question] Alignment Paradox and a Request for Harsh Criticism
Bridgett Kay (bridgett-kay) · 2025-02-05T18:17:22.701Z · answers+comments (7)

Bimodal AI Beliefs
Adam Train (aetrain) · 2025-02-14T06:45:53.933Z · comments (1)

[link] Recursive alignment with the principle of alignment
hive · 2025-02-27T02:34:37.940Z · comments (0)

Intelligence Is Jagged
Adam Train (aetrain) · 2025-02-19T07:08:46.444Z · comments (1)

Build a Metaculus Forecasting Bot in 30 Minutes: A Practical Guide
ChristianWilliams · 2025-02-22T03:52:14.753Z · comments (0)

One-dimensional vs multi-dimensional features in interpretability
charlieoneill (kingchucky211) · 2025-02-01T09:10:01.112Z · comments (0)

[link] Can a finite physical device be Turing equivalent?
Noosphere89 (sharmake-farah) · 2025-03-06T15:02:16.921Z · comments (10)

[question] shouldn't we try to get media attention?
KvmanThinking (avery-liu) · 2025-03-04T01:39:06.596Z · answers+comments (0)

Not-yet-falsifiable beliefs?
Benjamin Hendricks (benjamin-hendricks) · 2025-03-02T14:11:07.121Z · comments (4)

Beyond ELO: Rethinking Chess Skill as a Multidimensional Random Variable
Oliver Oswald (oliver-oswald) · 2025-02-10T19:19:36.233Z · comments (7)

Do No Harm? Navigating and Nudging AI Moral Choices
Sinem (sinem-erisken) · 2025-02-06T19:18:31.065Z · comments (0)

[question] Should I Divest from AI?
OKlogic · 2025-02-10T03:29:33.582Z · answers+comments (4)

[question] p(s-risks to contemporary humans)?
mhampton · 2025-02-08T21:19:53.821Z · answers+comments (5)

[question] Does human (mis)alignment pose a significant and imminent existential threat?
jr · 2025-02-23T10:03:40.269Z · answers+comments (3)

What new x- or s-risk fieldbuilding organisations would you like to see? An EOI form. (FBB #3)
gergogaspar (gergo-gaspar) · 2025-02-17T12:39:09.196Z · comments (0)

[link] AISN #49: Superintelligence Strategy
Corin Katzke (corin-katzke) · 2025-03-06T17:46:50.965Z · comments (1)

AIS Berlin, events, opportunities and the flipped gameboard - Fieldbuilders Newsletter, February 2025
gergogaspar (gergo-gaspar) · 2025-02-17T14:16:31.834Z · comments (0)

Fun, endless art debates v. morally charged art debates that are intrinsically endless
danielechlin · 2025-02-21T04:44:22.712Z · comments (2)

Blackpool Applied Rationality Unconference 2025
Henry Prowbell · 2025-02-01T14:09:44.673Z · comments (0)

[question] Name for Standard AI Caveat?
yrimon (yehuda-rimon) · 2025-02-26T07:07:16.523Z · answers+comments (5)

[link] AI Safety at the Frontier: Paper Highlights, January '25
gasteigerjo · 2025-02-11T16:14:16.972Z · comments (0)

Towards a Science of Evals for Sycophancy
andrejfsantos · 2025-02-01T21:17:15.406Z · comments (0)

There are a lot of upcoming retreats/conferences between March and July (2025)
gergogaspar (gergo-gaspar) · 2025-02-18T09:30:30.258Z · comments (0)

Have you actually tried raising the birth rate?
Yair Halberstadt (yair-halberstadt) · 2025-03-10T18:06:40.987Z · comments (5)

[link] Neural Scaling Laws Rooted in the Data Distribution
aribrill (Particleman) · 2025-02-20T21:22:10.306Z · comments (0)

Utilitarian AI Alignment: Building a Moral Assistant with the Constitutional AI Method
Clément L · 2025-02-04T04:15:36.917Z · comments (1)

[link] Social Dilemmas — public goods, free riders, and exploitation
James Stephen Brown (james-brown) · 2025-03-05T23:31:17.512Z · comments (0)

Arguing for the Truth? An Inference-Only Study into AI Debate
denisemester · 2025-02-11T03:04:58.852Z · comments (0)

The chessboard world
phdead · 2025-03-10T01:26:16.304Z · comments (0)

[link] Medical Windfall Prizes
PeterMcCluskey · 2025-02-06T23:33:27.263Z · comments (1)

Positional kernels of attention heads
Alex Gibson · 2025-03-10T23:17:25.068Z · comments (0)

Are current LLMs safe for psychotherapy?
PaperBike · 2025-02-12T19:16:34.452Z · comments (4)

Stress exists only where the Mind makes it
Noahh (noah-jackson) · 2025-03-10T19:44:42.887Z · comments (2)

Superintelligence Alignment Proposal
Davey Morse (davey-morse) · 2025-02-03T18:47:22.287Z · comments (3)

[question] How much do frontier LLMs code and browse while in training?
Joe Rogero · 2025-03-10T19:34:23.950Z · answers+comments (0)

Understanding Agent Preferences
martinkunev · 2025-02-24T17:46:04.022Z · comments (0)

Cross-Layer Feature Alignment and Steering in Large Language Model
dlaptev · 2025-02-08T20:18:20.331Z · comments (0)

Kairos is hiring a Head of Operations/Founding Generalist
agucova · 2025-03-12T20:58:49.661Z · comments (0)

Existentialists and Trolleys
David Gross (David_Gross) · 2025-02-28T14:01:49.509Z · comments (3)

[link] Linguistic Imperialism in AI: Enforcing Human-Readable Chain-of-Thought
Lukas Petersson (lukas-petersson-1) · 2025-02-21T15:45:00.146Z · comments (0)

[link] Sparse Autoencoder Features for Classifications and Transferability
Shan23Chen (shan-chen) · 2025-02-18T22:14:12.994Z · comments (0)

[link] (Anti)Aging 101
George3d6 · 2025-03-12T03:59:21.859Z · comments (2)

Claude 3.5 Sonnet (New)'s AGI scenario
Nathan Young · 2025-02-17T18:47:04.669Z · comments (2)

[link] How Language Models Understand Nullability
Anish Tondwalkar (anish-tondwalkar) · 2025-03-11T15:57:28.686Z · comments (0)

An Introduction to Evidential Decision Theory
Babić · 2025-02-02T21:27:35.684Z · comments (2)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

cubefox on xpostah's Shortform

I think not, because in my test the snippet didn't really contain such a quote that would have answered the question directly.

daniel-kokotajlo on OpenAI: Detecting misbehavior in frontier reasoning models

Since '23 my answer to that question would have been "well the first step is for researchers like you to produce [basically exactly the paper OpenAI just produced]"

So that's done. Nice. There are lots of follow-up experiments that can be done.

I don't think trying to shift the market/consumers as a whole is very tractable.

But talking to your friends at the companies, getting their buy-in, seems valuable.

jessriedel on Vacuum Decay: Expert Survey Results

This work was co-authored by Jordan Stone, Darryl Wright, and Youssef Saleh, whose names appear on the EA Forum post but not on this cross post to LW.

petermccluskey on Request for Comments on AI-related Prediction Market Ideas

They are now open for trading:

Will the company that produces the first AGI have prioritized Corrigibility?.

Will AGI create a consensus among experts on how to safely increase AI capabilities?.

Will prioritizing corrigible AI produce safe results?.

elliot-glazer on Don't over-update on FrontierMath results

Yes, the privacy constraints make the implications of these improvements less legible to the public. We have multiple plans for how to disseminate info within this constraint, such as publishing author survey comments regarding the reasoning traces and our competition at the end of the month to establish a sort of human baseline.

Still, I don't know that the privacy of FrontierMath is worth all the roundabout efforts we must engage in to explain it. For future projects, I would be interested in other approaches to balancing preventing models from training on public discussion of problems vs being able to clearly show the world what the models are tackling. Maybe it would be feasible to do IMO-style releases? "Here's 30 new problems we collected this month. We will immediately test all the models and then make the problems public."

james-chua on OpenAI: Detecting misbehavior in frontier reasoning models

Do you have a sense of what I, as a researcher, could do?

I sense that having users/companies want faithful CoT is very important.. In-tune users, as. nostalgebraist points out, will know how to use CoTs to debug LLMs. But I'm not sure whether this represents only 1% of users, so big labs just won't care. Maybe we need to try and educate more users about this. Maybe reach out to people who tweet about LLM best use cases to highlight this?

ozziegooen on johnswentworth's Shortform

Obvious point - I think a lot of this comes from the financial incentives. The more "out of the box" you go, the less sure you can be that there will be funding for your work.

Some of those that do this will be rewarded, but I suspect many won't be.

As such, I think that funders can help more to encourage this sort of thing, if they want to.

philh on HPMOR Anniversary Parties: Coordination, Resources, and Discussion

We got spam and had to reset the link. To get the new link, append the suffix "BbILI8HzX3zgJF8i" to the prefix "https://chat.whatsapp.com/IUIZc3". Hopefully spambots can't yet do that automatically.

chaosmage on Elon Musk May Be Transitioning to Bipolar Type I

Interesting. Borderline or PTSD rather than cyclothymia?

I don't disagree that's where a standard clinical interview would end up, but aren't these basically residual categories where to put people who aren't sane but don't clearly fit any of the other boxes? Like, not false, but it doesn't exactly constrain the space of where that weird outlier mind of his might be going next.

I'd be very interested in what would happen if he couldn't have his phone for a week.

emily-ravine on Murder plots are infohazards

If you have people in eastern canada, I have direct LE there and can maybe help you.