LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Absorbing Your Friends' Powers
Alice Blair (Diatom) · 2025-01-30T02:32:27.091Z · comments (1)

Metacompilation
Donald Hobson (donald-hobson) · 2025-02-24T22:58:00.085Z · comments (0)

Sleeping Beauty: an Accuracy-based Approach
glauberdebona · 2025-02-10T15:40:29.619Z · comments (2)

[link] The Dilemma’s Dilemma
James Stephen Brown (james-brown) · 2025-02-19T23:50:47.485Z · comments (8)

Post-hoc reasoning in chain of thought
Kyle Cox (klye) · 2025-02-05T18:58:29.802Z · comments (0)

Exploring how OthelloGPT computes its world model
JMaar (jim-maar) · 2025-02-02T21:29:09.433Z · comments (0)

Make Superintelligence Loving
Davey Morse (davey-morse) · 2025-02-21T06:07:17.235Z · comments (9)

One-dimensional vs multi-dimensional features in interpretability
charlieoneill (kingchucky211) · 2025-02-01T09:10:01.112Z · comments (0)

[question] Does human (mis)alignment pose a significant and imminent existential threat?
jr · 2025-02-23T10:03:40.269Z · answers+comments (3)

[question] p(s-risks to contemporary humans)?
mhampton · 2025-02-08T21:19:53.821Z · answers+comments (5)

Proposal for a Form of Conditional Supplemental Income (CSI) in a Post-Work World
sweenesm · 2025-01-31T01:00:55.064Z · comments (2)

[link] On AI Scaling
harsimony · 2025-02-05T20:24:56.977Z · comments (3)

[question] Should I Divest from AI?
OKlogic · 2025-02-10T03:29:33.582Z · answers+comments (4)

AIS Berlin, events, opportunities and the flipped gameboard - Fieldbuilders Newsletter, February 2025
gergogaspar (gergo-gaspar) · 2025-02-17T14:16:31.834Z · comments (0)

Build a Metaculus Forecasting Bot in 30 Minutes: A Practical Guide
ChristianWilliams · 2025-02-22T03:52:14.753Z · comments (0)

[question] Alignment Paradox and a Request for Harsh Criticism
Bridgett Kay (bridgett-kay) · 2025-02-05T18:17:22.701Z · answers+comments (7)

Beyond ELO: Rethinking Chess Skill as a Multidimensional Random Variable
Oliver Oswald (oliver-oswald) · 2025-02-10T19:19:36.233Z · comments (7)

[link] Hello World
Charlie Sanders (charlie-sanders) · 2025-01-30T15:33:57.427Z · comments (0)

[link] AI Safety at the Frontier: Paper Highlights, January '25
gasteigerjo · 2025-02-11T16:14:16.972Z · comments (0)

Fun, endless art debates v. morally charged art debates that are intrinsically endless
danielechlin · 2025-02-21T04:44:22.712Z · comments (2)

Utilitarian AI Alignment: Building a Moral Assistant with the Constitutional AI Method
Clément L · 2025-02-04T04:15:36.917Z · comments (0)

Intelligence Is Jagged
Adam Train (aetrain) · 2025-02-19T07:08:46.444Z · comments (1)

[question] Does the ChatGPT (web)app sometimes show actual o1 CoTs now?
Sohaib Imran (sohaib-imran) · 2025-01-29T17:27:08.067Z · answers+comments (6)

[link] Neural Scaling Laws Rooted in the Data Distribution
aribrill (Particleman) · 2025-02-20T21:22:10.306Z · comments (0)

Positive jailbreaks in LLMs
dereshev · 2025-01-29T08:41:44.680Z · comments (0)

If you wanted to actually reduce the trade deficit, how would you do it?
Logan Zoellner (logan-zoellner) · 2025-01-26T18:04:54.702Z · comments (5)

Closed-ended questions aren't as hard as you think
electroswing · 2025-02-19T03:53:11.855Z · comments (0)

What new x- or s-risk fieldbuilding organisations would you like to see? An EOI form. (FBB #3)
gergogaspar (gergo-gaspar) · 2025-02-17T12:39:09.196Z · comments (0)

[link] Narratives as catalysts of catastrophic trajectories
EQ · 2025-01-26T19:01:21.558Z · comments (0)

Bimodal AI Beliefs
Adam Train (aetrain) · 2025-02-14T06:45:53.933Z · comments (1)

There are a lot of upcoming retreats/conferences between March and July (2025)
gergogaspar (gergo-gaspar) · 2025-02-18T09:30:30.258Z · comments (0)

Do No Harm? Navigating and Nudging AI Moral Choices
Sinem (sinem-erisken) · 2025-02-06T19:18:31.065Z · comments (0)

Towards a Science of Evals for Sycophancy
andrejfsantos · 2025-02-01T21:17:15.406Z · comments (0)

Blackpool Applied Rationality Unconference 2025
Henry Prowbell · 2025-02-01T14:09:44.673Z · comments (0)

Retroactive If-Then Commitments
MichaelDickens · 2025-02-01T22:22:43.031Z · comments (0)

Empirical Insights into Feature Geometry in Sparse Autoencoders
Jason Boxi Zhang (jason-boxi-zhang) · 2025-01-24T19:02:19.167Z · comments (0)

Jevon's paradox and economic intuitions
Abhimanyu Pallavi Sudhir (abhimanyu-pallavi-sudhir) · 2025-01-27T23:04:23.854Z · comments (0)

Superintelligence Alignment Proposal
Davey Morse (davey-morse) · 2025-02-03T18:47:22.287Z · comments (3)

An Introduction to Evidential Decision Theory
Babić · 2025-02-02T21:27:35.684Z · comments (2)

[link] Tetherware #1: The case for humanlike AI with free will
Jáchym Fibír · 2025-01-30T10:58:11.717Z · comments (10)

Are current LLMs safe for psychotherapy?
PaperBike · 2025-02-12T19:16:34.452Z · comments (4)

[link] Medical Windfall Prizes
PeterMcCluskey · 2025-02-06T23:33:27.263Z · comments (1)

[question] Popular materials about environmental goals/agent foundations? People wanting to discuss such topics?
Q Home · 2025-01-22T03:30:38.066Z · answers+comments (0)

Safe Distillation With a Powerful Untrusted AI
Alek Westover (alek-westover) · 2025-02-20T03:14:04.893Z · comments (1)

[link] Request for Information for a new US AI Action Plan (OSTP RFI)
agucova · 2025-02-07T20:40:36.034Z · comments (0)

[link] Sparse Autoencoder Features for Classifications and Transferability
Shan23Chen (shan-chen) · 2025-02-18T22:14:12.994Z · comments (0)

[link] Pre-ASI: The case for an enlightened mind, capital, and AI literacy in maximizing the good life
Noahh (noah-jackson) · 2025-02-21T00:03:47.922Z · comments (5)

[link] Linguistic Imperialism in AI: Enforcing Human-Readable Chain-of-Thought
Lukas Petersson (lukas-petersson-1) · 2025-02-21T15:45:00.146Z · comments (0)

Understanding Agent Preferences
martinkunev · 2025-02-24T17:46:04.022Z · comments (0)

The Dead Cradle Theory: Why Earth May Not Survive Humanity's Expansion into Space
Nicholas Andresen (nicholas-andresen) · 2025-01-22T17:43:48.950Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

legionnaire on Have LLMs Generated Novel Insights?

It's hard to see what a novel insight is exactly. Any example can be argued against. Can you give an example of one? Or of one you've personally had?

Various LLMs can spot issues in code bases that are not public. Do all of these count?

yair-halberstadt on what an efficient market feels from inside

I think a classic example of an efficient market is one where goods are mostly fungible, e.g. the market for grain, or screws of a particular specification, or copper.

I imagine that inside those markets it feels a lot less like there's any good deals to sniff out. There's definitely bad ones like fraudsters or subpar quality, or someone selling holy screws for 10 times the price, or someone just praying on newcomers to the market who aren't yet calibrated to the standard price, but these are fairly easy to filter out with a bit of due diligence.

seth-herd on Ebenezer Dukakis's Shortform

Agreed, tone and framing are crucial. The populist framing might work for conservatives, but it will also set off the enemy rhetoric detectors among liberals. So coding it to either side is prone to backfire. Based on that logic, I'm leaning toward thinking that it needs to be framed to carefully avoid or walk the center line between the terms and framings of both sides.

It would be just as bad to have it polarized as conservative, right? Although we've got four years of conservatism, so it might be worth thinking seriously about whether that trade might be worth it. I'm not sure a liberal administration would undo restrictions on AI even if they had been conservative-coded...

Interesting. I'm feeling more like saying "the elites want to make AI that will make them rich while putting half the world out of a job". That's probably true as far as it goes, and it could be useful.

george-ingebretsen on Whose track record of AI predictions would you like to see evaluated?

I would love to see an analysis and overview of predictions from the Dwarkesh podcast with Leopold. One for Situational awareness would be great too.

seth-herd on Alignment can be the ‘clean energy’ of AI

I think it's working on one part of the problem, while other parts remain. If I were to be equally uncharitable, I'd say you seem to assume that if you can't solve everything all at once, you shouldn't say anything.

I don't actually think you assume that.

What I do think is that Instruction-following AGI is easier and more likely than value aligned AGI [LW · GW], and that's a route to solving goodharting and deception. It's complex and unfinished, like every other proposed approach to avoiding death by AGI. You might like more meticulous detail; if so see Max Harms' admirably detailed corrigibility as singular target (CAST) sequence [LW · GW] on a very similar alignment target and approach to solving goodharting and deception.

ebenezer-dukakis on Ebenezer Dukakis's Shortform

I think the way the issue is framed matters a lot. If it's a "populist" framing ("elites are in it for themselves, they can't be trusted"), that frame seems to have resonated with a segment of the right lately. Climate change has a sanctimonious frame in American politics that conservatives hate.

tlevin on tlevin's Shortform

Biggest disagreement between the average worldview of people I met with at EAG and my own is something like "cluster thinking vs sequence thinking," where people at EAG were often like "but even if we get this specific policy/technical win, doesn't it not matter unless you also have this other, harder thing?" and I was often more like, "Well, very possibly we won't get that other, harder thing, but still seems really useful to get that specific policy/technical win, here's a story where we totally fail on that first thing and the second thing turns out to matter a ton!"

kman on How to Make Superbabies

I'm sort of confused by the image you posted? Von Neumann existed, and there are plenty of very smart people well beyond the "Nerdy programmer" range.

But I think I agree with your overall point about IQ being under stabilizing selection in the ancestral environment. If there was directional selection, it would need to have been weak or inconsistent; otherwise I'd expect the genetic low hanging fruit we see to have been exhausted already. Not in the sense of all current IQ-increasing alleles being selected to fixation, but in the sense of the tradeoffs becoming much more obvious than they appear to us currently. I can't tell what the tradeoffs even were: apparently IQ isn't associated with the average energy consumption of the brain? The limitation of birth canal width isn't a good explanation either since IQ apparently also isn't associated with head size at birth (and adult brain size only explains ~10% of the variance in IQ).

kave on o3

My understanding when I last looked into it as that the efficient updating of the NNUE basically doesn't matter, and what really matters for its performance and CPU-runnability is its small size.

jeremy-gillen on Training AI to do alignment research we don’t already know how to do

these are also alignment failures we see in humans.

Many of them have close analogies in human behaviour. But you seem to be implying "and therefore those are non-issues"???

There are many groups of humans (or groups of humans), that if you set them on the task of solving alignment, will at some point decide to do something else. In fact, most groups of humans will probably fail like this.

How is this evidence in favour of your plan ultimately resulting in a solution to alignment???

but these systems empirically often move in reasonable and socially-beneficial directions over time

Is this the actual basis of your belief in your plan to ultimately get a difficult scientific problem solved?

and i expect we can make AI agents a lot more aligned than humans typically are

Ahh I see. Yeah this is crazy, why would you expect this? I think maybe you're confusing yourself by using the word "aligned" here, can we taboo it? Human reflective instability looks like: they realize they don't care about being a lawyer and go become a monk. Or they realize they don't want to be a monk and go become a hippy (this one's my dad). Or they have a mid-life crisis and do a bunch of stereotypical mid-life crisis things. Or they go crazy in more extreme ways.

We have a lot of experience with the space of human reflective instabilities. We're pretty familiar with the ways that humans interact with tribes and are influenced by them, and sometimes break with them.

But the space of reflective-goal-weirdness is much larger and stranger than we have (human) experience with. There are a lot of degrees of freedom in goal specification that we can't nail down easily through training. Also, AIs will be much newer, much more in progress, than humans are (not quite sure how to express this, another way to say it is to point to the quantity of robustness&normality training that evolution has subjected humans to).

Therefore I think it's extremely, wildly wrong to expect "we can make AI agents a lot more [reflectively goal stable with predictable goals and safe failure-modes] than humans typically are".

but, Claude sure as hell seems to

Why do you even consider this relevant evidence?