LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

The Field of AI Alignment: A Postmortem, and What To Do About It
johnswentworth · 2024-12-26T18:48:07.614Z · comments (88)

[link] Review: Planecrash
L Rudolf L (LRudL) · 2024-12-27T14:18:33.611Z · comments (14)

A Three-Layer Model of LLM Psychology
Jan_Kulveit · 2024-12-26T16:49:41.738Z · comments (3)

AI #96: o3 But Not Yet For Thee
Zvi · 2024-12-26T20:30:06.722Z · comments (8)

ReSolsticed vol I: "We're Not Going Quietly"
Raemon · 2024-12-26T17:52:33.727Z · comments (4)

[link] By default, capital will matter more than ever after AGI
L Rudolf L (LRudL) · 2024-12-28T17:52:58.358Z · comments (10)

[question] What are the most interesting / challenging evals (for humans) available?
Raemon · 2024-12-27T03:05:26.831Z · answers+comments (11)

Greedy-Advantage-Aware RLHF
sej2020 · 2024-12-27T19:47:25.562Z · comments (4)

AI Assistants Should Have a Direct Line to Their Developers
Jan_Kulveit · 2024-12-28T17:01:58.643Z · comments (4)

[question] What is your personal totalizing and self-consistent worldview/philosophy?
lsusr · 2024-12-27T23:59:30.641Z · answers+comments (7)

Corrigibility's Desirability is Timing-Sensitive
RobertM (T3t) · 2024-12-26T22:24:17.435Z · comments (4)

The average rationalist IQ is about 122
Rockenots (Ekefa) · 2024-12-28T15:42:07.067Z · comments (6)

[link] PCR retrospective
bhauth · 2024-12-26T21:20:56.484Z · comments (0)

[link] Letter from an Alien Mind
Shoshannah Tekofsky (DarkSym) · 2024-12-27T13:20:49.277Z · comments (6)

Whistleblowing Twitter Bot
Mckiev · 2024-12-26T04:09:45.493Z · comments (5)

If all trade is voluntary, then what is "exploitation?"
Darmani · 2024-12-27T11:21:30.036Z · comments (35)

[link] Why OpenAI’s Structure Must Evolve To Advance Our Mission
stuhlmueller · 2024-12-28T04:24:19.937Z · comments (1)

Coin Flip
XelaP (scroogemcduck1) · 2024-12-27T11:53:01.781Z · comments (0)

[link] Progress links and short notes, 2024-12-27: Clinical trial abundance, grid-scale fusion, permitting vs. compliance, crossword mania, and more
jasoncrawford · 2024-12-27T23:34:43.807Z · comments (0)

[link] From the Archives: a story
Richard_Ngo (ricraz) · 2024-12-27T16:36:50.735Z · comments (1)

[question] What would be the IQ and other benchmarks of o3 that uses $1 million worth of compute resources to answer one question?
avturchin · 2024-12-26T11:08:23.545Z · answers+comments (2)

[question] What's the best metric for measuring quality of life?
ChristianKl · 2024-12-27T14:29:30.813Z · answers+comments (4)

[link] Deconstructing arguments against AI art
DMMF · 2024-12-27T19:40:13.015Z · comments (1)

[question] Why don't we currently have AI agents?
ChristianKl · 2024-12-26T15:26:35.682Z · answers+comments (8)

[link] Streamlining my voice note process
Vlad Sitalo (harcisis) · 2024-12-26T06:04:01.990Z · comments (1)

No, the Polymarket price does not mean we can immediately conclude what the probability of a bird flu pandemic is. We also need to know the interest rate!
Christopher King (christopher-king) · 2024-12-28T16:05:47.037Z · comments (5)

Super human AI is a very low hanging fruit!
Hzn · 2024-12-26T19:00:22.822Z · comments (0)

Good Fortune and Many Worlds
Jonah Wilberg (jrwilb@googlemail.com) · 2024-12-27T13:21:43.142Z · comments (0)

The Robot, the Puppet-master, and the Psychohistorian
WillPetillo · 2024-12-28T00:12:08.824Z · comments (0)

[question] Has Someone Checked The Cold-Water-In-Left-Ear Thing?
Maloew (maloew-valenar) · 2024-12-28T20:15:35.951Z · answers+comments (0)

[question] Are Sparse Autoencoders a good idea for AI control?
Gerard Boxo (gerard-boxo) · 2024-12-26T17:34:55.617Z · answers+comments (2)

[link] The Economics & Practicality of Starting Mars Colonization
Zero Contradictions · 2024-12-26T10:56:26.019Z · comments (1)

Duplicate token neurons in the first layer of gpt2-small
Alex Gibson · 2024-12-27T04:21:55.896Z · comments (0)

Algorithmic Asubjective Anthropics, Cartesian Subjective Anthropics
Lorec · 2024-12-27T01:58:39.880Z · comments (0)

The Engineering Argument Fallacy: Why Technological Success Doesn't Validate Physics
Wenitte Apiou (wenitte-apiou) · 2024-12-28T00:49:53.300Z · comments (3)

[link] Human, All Too Human - Superintelligence requires learning things we can’t teach
Ben Turtel (ben-turtel) · 2024-12-26T16:26:27.328Z · comments (4)

Terminal goal vs Intelligence
Donatas Lučiūnas (donatas-luciunas) · 2024-12-26T08:10:42.144Z · comments (22)

next page (older posts) →

Archive

Recent comments

sharmake-farah on By default, capital will matter more than ever after AGI

So to summarise: if we have a multipolar world, and the vulnerable world hypothesis if true, then conflict can be existentially bad and this is a reason to avoid a multipolar world. Didn't consider this, interesting point!

This applies, but weaker even in a non-vulnerable world, because the incentives are way weaker for peaceful cooperation of values in AGI-world.

Considerations:

offense/defense balance (if offense wins very hard, it's harder to let everyone do their own thing)

tunability-of-AGI-power / implementability of the harm principle (if you can give everyone AGI that can follow very well the rule "don't let these people harm other people", then you can give that AGI safely to everyone and they can build planets however they like but not death ray anyone else's planets)

I do think this requires severely restraining open-source, but conditional on that happening, I think the offense-defense balance/tunability will sort of work out.

Some of my general worries with singleton worlds are:

humanity has all its eggs in one basket—you better hope the governance structure is never corrupted, or never becomes sclerotic; real-life institutions so far have not given me many signs of hope on this count

cultural evolution is a pretty big part of how human societies seem to have improved and relies on a population of cultures / polities

vague instincts towards diversity being good and less fragile than homogeneity or centralisation

Yeah, I'm not a fan of singleton worlds, and tend towards multipolar worlds. It's just that it might involve a loss of a lot of life in the power-struggles around AGI.

On governing the commons, I'd say Elinor Ostrom's observations are derivable from the folk theorems of game theory, which basically says that any outcome can be a Nash Equilibrium (with a few conditions that depend on the theorem) can be possible if the game is repeated and players have to deal with each other.

The problem is that AGI weakens the incentives for players to deal with each other, so Elinor Ostrom's solutions are much less effective.

More here:

https://en.wikipedia.org/wiki/Folk_theorem_(game_theory)

habryka4 on The Field of AI Alignment: A Postmortem, and What To Do About It

For example, agent foundations research sometimes assumes that AGI has infinite compute or that it has a strict boundary between its internal decision processes and the outside world.

It's one of the most standard results in ML that neural nets are universal function approximators. In the context of that proof, ML de-facto also assumes that you have infinite computing power. It's just a standard tool in ML, AI or CS to see what models predict when you take them to infinity. Indeed, it's really one of the most standard tools in the modern math toolbox, used by every STEM discipline I can think of.

Similarly, separating the boundary between its internal decision processes and the outside world continues to be a standard assumption in ML. It's really hard to avoid, everything gets very loopy and tricky, and yes, we have to deal with that loopiness and trickiness, but if anything, agent foundations people were the actual people trying to figure out how to handle that loopiness and trickiness, whereas the ML community really has done very little to handle it. In contrary to your statement here, people on LW have been for years pointing out how embedded agency is really important, and been dismissed by active practitioners because they think the cartesian boundary here is just fine for "real" and "grounded" applications like "predicting the next token" which clearly don't have relevance to these weird and crazy scenarios about power-seeking AIs developing contextual awareness that you are talking about.

gabor-fuisz on By default, capital will matter more than ever after AGI

(indeed the politics of our era is moving towards greater acceptance of inequality)

How certain are you of this, and how much do you think it comes down more to something like "to what extent can disempowered groups unionise against the elite?".

To be clear, by default I think AI will make unionising against the more powerful harder, but it might depend on the governance structure. Maybe if we are really careful, we can get something closer to "Direct Democracy", where individual preferences actually matter more!

gabor-fuisz on By default, capital will matter more than ever after AGI

[sorry, have only skimmed the post, but I feel compelled to comment.]

I feel like unless we make a lot of progress on some sort of "Science of Generalisation of Preferences", for more abstract preferences (non-biological needs mostly fall into this), even if certain individuals have, on paper, much more power than others, at the end of the day, they likely rely on vastly superintelligent AI advisors to realise those preferences, and at that point, I think it is the AI advisor _really_ in control.
I'm not super certain of this, like, the Catholic Church definitely could decide to build a bunch of churches on some planets (though what counts as a church, in the limit?), but if they also want more complicated things like "people" "worshipping" "God" in those churches, it seems to be more and more up to the interpretation of the AI Assistants building those worship-maximising communes.

interstice on No, the Polymarket price does not mean we can immediately conclude what the probability of a bird flu pandemic is. We also need to know the interest rate!

VC money. That disclaimer was misleading, they don't have fees on any markets.

habryka4 on The Field of AI Alignment: A Postmortem, and What To Do About It

I... am not very impressed by these predictions.

First, I don't think these are controversial predictions on LW (yes, a few people might disagree with him, but there is little boldness or disagreement with widely held beliefs in here), but most importantly, these predictions aren't about anything I care about. I don't care whether the world-model will have a single unambiguous self-versus-world boundary, I care whether the system is likely to convert the solar system into some form of computronium, or launch Dyson probes, or eliminate all potential threats and enemies, or whether the system will try to subvert attempts at controlling it, or whether it will try to amass large amounts of resources to achieve its aims, or be capable of causing large controlled effects via small information channels, or is capable of discovering new technologies with great offensive power.

The only bold prediction here is maybe "the behavior of the ASI will be a collection of heuristics", and indeed would take a bet against this. Systems under reflection and extensive self-improvement stop being well-described by contextual heuristics, and it's likely ASI will both self-reflect and self-improve (as we are trying really hard to cause both to happen). Indeed, I already wouldn't particularly describe Claude as a collection of contextual heuristics, there is really quite a lot of consistent personality in there (which of course, you can break with jailbreaks and stuff, but clearly the system is a lot less contextual than base models, and it seems like you are predicting a reversal of that trend?).

nate-showell on The Field of AI Alignment: A Postmortem, and What To Do About It

Some concrete predictions:

The behavior of the ASI will be a collection of heuristics that are activated in different contexts.
The ASI's software will not have any component that can be singled out as the utility function, although it may have a component that sets a reinforcement schedule.
The ASI will not wirehead.
The ASI's world-model won't have a single unambiguous self-versus-world boundary. The situational awareness of the ASI will have more in common with that of an advanced meditator than it does with that of an idealized game-theoretic agent.

l-rudolf-l on By default, capital will matter more than ever after AGI

Important other types of capital, as the term is used here, include:

the physical nuclear power plants
the physical nuts and bolts
data centres
military robots

Capital is not just money!

Why would an AI want to transfer resources to someone just because they have some fiat currency?

Because humans and other AIs will accept fiat currency as an input and give you valuable things as an output.

Surely they have some better way of coordinating exchanges.

All the infra for fiat currency exists; I don't see why the AIs would need to reinvent that, unless they're hiding from human government oversight or breaking some capacity constraint in the financial system, in which case they can just use crypto instead.

It's possible that it instead ends up belonging to whatever AI has the best military robots.

Military robots are yet another type of capital! Note that if it were human soldiers, there would be much more human leverage in the situation, because at least some humans would need to agree to do the soldering, and presumably would get benefits for doing so, and would use the power and leverage they accrue from doing so to push broadly human goals.

The recruitment company is toast.

Or then the recruitment company pivots to using human labour to improve AI, as actually happened with the hottest recent recruiting company! If AI is the best investment, then humans and AIs alike will spend their efforts on AI, and the economy will gradually cater more and more to AI needs over human needs. See Andrew Critch's post here [LW · GW], for example. Or my story here [LW · GW].

johannes-c-mayer on The Field of AI Alignment: A Postmortem, and What To Do About It

What are some concrete examples of the of research that MIRI insufficiently engaged with? Are there general categories of prior research that you think are most underutilized by alignment researchers?

nick_tarleton on shortplav

I don't think it much affects the point you're making, but the way this is phrased conflates 'valuing doing X oneself' and 'valuing that X exist'.