LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

The Field of AI Alignment: A Postmortem, and What To Do About It
johnswentworth · 2024-12-26T18:48:07.614Z · comments (124)

[link] Review: Planecrash
L Rudolf L (LRudL) · 2024-12-27T14:18:33.611Z · comments (24)

Orienting to 3 year AGI timelines
Nikola Jurkovic (nikolaisalreadytaken) · 2024-12-22T01:15:11.401Z · comments (37)

[link] By default, capital will matter more than ever after AGI
L Rudolf L (LRudL) · 2024-12-28T17:52:58.358Z · comments (64)

Shallow review of technical AI safety, 2024
technicalities · 2024-12-29T12:01:14.724Z · comments (22)

What o3 Becomes by 2028
Vladimir_Nesov · 2024-12-22T12:37:20.929Z · comments (15)

Hire (or Become) a Thinking Assistant
Raemon · 2024-12-23T03:58:42.061Z · comments (38)

A Three-Layer Model of LLM Psychology
Jan_Kulveit · 2024-12-26T16:49:41.738Z · comments (3)

[question] What are the strongest arguments for very short timelines?
Kaj_Sotala · 2024-12-23T09:38:56.905Z · answers+comments (72)

A breakdown of AI capability levels focused on AI R&D labor acceleration
ryan_greenblatt · 2024-12-22T20:56:00.298Z · comments (5)

AIs Will Increasingly Fake Alignment
Zvi · 2024-12-24T13:00:07.770Z · comments (0)

Some arguments against a land value tax
Matthew Barnett (matthew-barnett) · 2024-12-29T15:17:00.740Z · comments (23)

Why I'm Moving from Mechanistic to Prosaic Interpretability
Daniel Tan (dtch1997) · 2024-12-30T06:35:43.417Z · comments (17)

When AI 10x's AI R&D, What Do We Do?
Logan Riggs (elriggs) · 2024-12-21T23:56:11.069Z · comments (14)

2025 Prediction Thread
habryka (habryka4) · 2024-12-30T01:50:14.216Z · comments (13)

Is "VNM-agent" one of several options, for what minds can grow up into?
AnnaSalamon · 2024-12-30T06:36:20.890Z · comments (33)

Checking in on Scott's composition image bet with imagen 3
Dave Orr (dave-orr) · 2024-12-22T19:04:17.495Z · comments (0)

AI #96: o3 But Not Yet For Thee
Zvi · 2024-12-26T20:30:06.722Z · comments (8)

Vegans need to eat just enough Meat - emperically evaluate the minimum ammount of meat that maximizes utility
Johannes C. Mayer (johannes-c-mayer) · 2024-12-22T22:08:31.971Z · comments (34)

ReSolsticed vol I: "We're Not Going Quietly"
Raemon · 2024-12-26T17:52:33.727Z · comments (4)

[question] What Have Been Your Most Valuable Casual Conversations At Conferences?
johnswentworth · 2024-12-25T05:49:36.711Z · answers+comments (19)

[link] Began a pay-on-results coaching experiment, made $40,300 since July
Chipmonk · 2024-12-29T21:12:02.574Z · comments (11)

AI Assistants Should Have a Direct Line to Their Developers
Jan_Kulveit · 2024-12-28T17:01:58.643Z · comments (4)

[link] Learn to write well BEFORE you have something worth saying
eukaryote · 2024-12-29T23:42:31.906Z · comments (10)

[link] The Deep Lore of LightHaven, with Oliver Habryka (TBC episode 228)
Eneasz · 2024-12-24T22:45:50.065Z · comments (4)

What happens next?
Logan Zoellner (logan-zoellner) · 2024-12-29T01:41:33.685Z · comments (19)

[link] Review: Good Strategy, Bad Strategy
L Rudolf L (LRudL) · 2024-12-21T17:17:04.342Z · comments (0)

Considerations on orca intelligence
Towards_Keeperhood (Simon Skade) · 2024-12-29T14:35:16.445Z · comments (4)

[question] What are the most interesting / challenging evals (for humans) available?
Raemon · 2024-12-27T03:05:26.831Z · answers+comments (13)

o3, Oh My
Zvi · 2024-12-30T14:10:05.144Z · comments (10)

Greedy-Advantage-Aware RLHF
sej2020 · 2024-12-27T19:47:25.562Z · comments (13)

If all trade is voluntary, then what is "exploitation?"
Darmani · 2024-12-27T11:21:30.036Z · comments (42)

People aren't properly calibrated on FrontierMath
cakubilo · 2024-12-23T19:35:44.467Z · comments (4)

Acknowledging Background Information with P(Q|I)
JenniferRM · 2024-12-24T18:50:25.323Z · comments (8)

Corrigibility's Desirability is Timing-Sensitive
RobertM (T3t) · 2024-12-26T22:24:17.435Z · comments (4)

Book Summary: Zero to One
bilalchughtai (beelal) · 2024-12-29T16:13:52.922Z · comments (1)

Living with Rats in College
lsusr · 2024-12-25T10:44:13.085Z · comments (0)

Good Reasons for Alts
jefftk (jkaufman) · 2024-12-21T01:30:03.113Z · comments (2)

[link] AI as systems, not just models
Andy Arditi (andy-arditi) · 2024-12-21T23:19:05.507Z · comments (0)

[question] What is your personal totalizing and self-consistent worldview/philosophy?
lsusr · 2024-12-27T23:59:30.641Z · answers+comments (11)

[link] The Alignment Simulator
Yair Halberstadt (yair-halberstadt) · 2024-12-22T11:45:55.220Z · comments (3)

[link] Funding Case: AI Safety Camp 11
Remmelt (remmelt-ellen) · 2024-12-23T08:51:55.255Z · comments (0)

[link] Letter from an Alien Mind
Shoshannah Tekofsky (DarkSym) · 2024-12-27T13:20:49.277Z · comments (7)

The average rationalist IQ is about 122
Rockenots (Ekefa) · 2024-12-28T15:42:07.067Z · comments (20)

[link] PCR retrospective
bhauth · 2024-12-26T21:20:56.484Z · comments (0)

Elon Musk and Solar Futurism
transhumanist_atom_understander · 2024-12-21T02:55:28.554Z · comments (27)

[link] Human-AI Complementarity: A Goal for Amplified Oversight
rishubjain · 2024-12-24T09:57:55.111Z · comments (1)

Non-Obvious Benefits of Insurance
jefftk (jkaufman) · 2024-12-23T03:40:02.184Z · comments (5)

[link] It looks like there are some good funding opportunities in AI safety right now
Benjamin_Todd · 2024-12-22T12:41:02.151Z · comments (0)

Theoretical Alignment's Second Chance
lunatic_at_large · 2024-12-22T05:03:51.653Z · comments (0)

next page (older posts) →

Archive

Recent comments

vladimir_nesov on o3, Oh My

When they tested the original GPT-4, under far less dangerous circumstances, for months.

My impression is that it's the product-relevant post-training effort for GPT-4 that took months, the fact that there was also safety testing in the meantime is incidental rather than the cause of it taking months. This claim gets repeated, but I'm not aware of a reason to attribute the gap between Aug 2022 end of pretraining (if I recall the rumors or possibly claims by developers correctly) and Mar 2023 release to safety testing rather than to getting post-training right (in ways that are not specifically about safety).

alex_altair on Learn to write well BEFORE you have something worth saying

"Gain writing skills BEFORE..."

noggin-scratcher on Linkpost: Look at the Water

This is a general principal

Principle* — unless they're the head-teacher of a school, the type to be involved in a principal/agent problem, or otherwise the "first"

graduates of the great English universities (both of them)

Shots fired

eukaryote on Learn to write well BEFORE you have something worth saying

😅 You know, I was thinking of calling it "Learn to write good BEFORE you have something worth saying", but figured I'd get some people rolling their eyes at the grammar of "write good" in a post purporting to offer writing advice. This would however have disambiguated the point you mentioned, which I hadn't thought about. Really goes to show you something or other.

l-rudolf-l on By default, capital will matter more than ever after AGI

For example:

Currently big companies struggle to hire and correctly promote talent for the reasons discussed in my post, whereas AI talent will be easier to find/hire/replicate given only capital & legible info
To the extent that AI ability scales with resources (potentially boosted by inference-time compute, and if SOTA models are no longer available to the public), then better-resourced actors have better galaxy brains
Superhuman intelligence and organisational ability in AIs will mean less bureaucratic rot and communication bandwidth problems in large orgs, compared to orgs made out of human brain -sized chunks, reducing the costs of scale

Imagine for example the world where software engineering is incredibly cheap. You can start a software company very easily, yes, but Google can monitor the web for any company that makes revenue off of software, instantly clone the functionality (because software engineering is just a turn-the-crank-on-the-LLM thing now) and combine it with their platform advantage and existing products and distribution channels. Whereas right now, it would cost Google a lot of precious human time and focus to try to even monitor all the developing startups, let alone launch a competing product for each one. Of course, it might be that Google itself is too bureaucratic and slow to ever do this, but someone else will then take this strategy.

C.f. the oft-quoted thing about how the startup challenge is getting to distribution before the incumbents get to distribution. But if the innovation is engineering, and the engineering is trivial, how do you get time to get distribution right?

(Interestingly, as I'm describing it above the most key thing is not so much capital intensivity, and more just that innovation/engineering is no longer a source of differential advantage because everyone can do it with their AIs really well)

There's definitely a chance that there's some "crack" in this, either from the economics or the nature of AI performance or some interaction. In particular, as I mentioned at the end, I don't think modelling the AI as an approaching blank wall of complete perfect intelligence all-obsoleting intelligence is the right model for short to medium -term dynamics. Would be very curious if you have thoughts.

willpetillo on The Robot, the Puppet-master, and the Psychohistorian

Verifying my understanding of your position: you are fine with the puppet-master and psychohistorian categories and agree with their implications, but you put the categories on a spectrum (systems are not either chaotic or robustly modellable, chaos is bounded and thus exists in degrees) and contend that ASI will be much closer to the puppet-master category. This is a valid crux.

To dig a little deeper, how does your objection sustain in light of my previous post, Lenses of Control [LW · GW]? The basic argument there is that future ASI control systems will have to deal with questions like: "If I deploy novel technology X, what is the resulting equilibrium of the world, including how feedback might impact my learning and values?" Does the level chaos in such contexts remain narrowly bounded?

EDIT for clarification: the distinction between the puppet-master and psychohistorian metaphors is not the level of chaos in the system they are dealing with, but rather is about the extent of direct control that the control system of the ASI has on the world, where the control system is a part of the AI machinery as a whole (including subsystems that learn) and the AI is a part of the world. Chaos factors in as an argument for why human-compatible goals are doomed if AI follows the psychohistorian metaphor.

sharmake-farah on By default, capital will matter more than ever after AGI

This might be okay if they respected the autonomy of unaugmented people, but all of the arguments about AGI being hard to control, and destroying its creators by default, apply equally well to hyperaugmented humans. If you try to coexist with entities who are vastly more powerful than you, you will eventually be crushed or deprived of key resources. In fact, this applies even moreso with humans than AIs, since humans were not explicitly designed to be helpful or benevolent.

I would go further and say that augmented humans are probably more risky than AIs, because you can't do a lot of the experimentation on a human that is legal to do to AI, and importantly it's way riskier from a legal perspective and a difficulty perspective to align a human to you, because it is essentially brainwashing, and it's easier to control an AI's data source than a human's data source.

This is a big reason why I never really liked the augmentation of humans path to solve AI alignment that people like Tsvi Benson-Tilsen want, because you now possibly have 2 alignment problems, not just 1 (link is below):

https://www.lesswrong.com/posts/jTiSWHKAtnyA723LE/overview-of-strong-human-intelligence-amplification-methods [LW · GW]

l-rudolf-l on By default, capital will matter more than ever after AGI

Note, firstly, that money will continue being a thing, at least unless we have one single AI system doing all economic planning. Prices are largely about communicating information. If there are many actors and they trade with each other, the strong assumption should be that there are prices (even if humans do not see them or interact with them). Remember too that however sharp the singularity, abundance will still be finite, and must therefore be allocated.

Though yes, I agree that a superintelligent singleton controlling a command economy means this breaks down.

However it seems far from clear we will end up exactly there. The finiteness of the future lightcone and the resulting necessity of allocating "scarce" resources, the usefulness of a single medium of exchange (which you can see as motivated by coherence theorems if you want), and trade between different entities all seem like very general concepts. So even in futures that are otherwise very alien, but just not in the exact "singleton-run command economy" direction, I expect a high chance that those concepts matter.

ryan_greenblatt on By default, capital will matter more than ever after AGI

The point of a scenario forecast (IMO) is less that you expect clear, predictable paths and more that:

Humans often do better understanding and thinking about something if there is a specific story to discuss and thus tradeoffs can be worth it.
Sometimes, scenario forecasting, indicates a case where your previous views were missing a clearly very important consideration or were assuming something implausible.

(See also Daniel's sibling comment.)

My biggest disagreements with you are probably a mix of:

We have disagreements about how society will react to AI (and how AI will react to society) given a realistic development arc (especially in short timelines) that imply that your vision of the future seems implausible to me. And perhaps the easiest way to get through all of these disagreements is for you to concretely describe what you expect might happen. As an example, I have a view like "it will be hard for power to very quickly transition from humans to AIs without some sort of hard takeover especially given dynamics about alignment and training AIs on imitation (and sandbagging)", but I think this is tied up "when I think about the story for how a non-hard-takeover quick transition would go, it doesn't seem to make sense to me", and thus if you told the story from your perspective it would be easier to point at the disagreement in your ontology/world view.
(Less importantly?) We have various technical disagreements about how AI takeoff and misalignment will practically work that I don't think will be addressed by scenario forecasting. (E.g., I think software only singularity is more likely than you do, and think that worst cast scheming is more likely.)

johnswentworth on The Field of AI Alignment: A Postmortem, and What To Do About It

I’m not convinced that the “hard parts” of alignment are difficult in the standardly difficult, g-requiring way that e.g., a physics post-doc might possess.

To be clear, I wasn't talking about physics postdocs mainly because of raw g. Raw g is a necessary element, and physics postdocs are pretty heavily loaded on it, but I was talking about physics postdocs mostly because of the large volume of applied math tools they have.

The usual way that someone sees footholds on the hard parts of alignment is to have a broad enough technical background that they can see some analogy to something they know about, and try borrowing tools that work on that other thing. Thus the importance of a large volume of technical knowledge.