LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[question] Why do many people who care about AI Safety not clearly endorse PauseAI?
humnrdble · 2025-03-30T18:06:32.426Z · answers+comments (41)

Tabula Bio: towards a future free of disease (& looking for collaborators)
mpoon (michael-poon) · 2025-03-23T16:30:15.523Z · comments (15)

ALLFED emergency appeal: Help us raise $800,000 to avoid cutting half of programs
denkenberger · 2025-04-16T21:47:40.687Z · comments (9)

Paper
dynomight · 2025-04-11T12:20:04.200Z · comments (12)

The first AI war will be in your computer
Viliam · 2025-04-08T09:28:53.191Z · comments (10)

AI #109: Google Fails Marketing Forever
Zvi · 2025-03-27T14:50:01.825Z · comments (12)

A Dissent on Honesty
eva_ · 2025-04-15T02:43:44.163Z · comments (52)

D&D.Sci Tax Day: Adventurers and Assessments
aphyer · 2025-04-15T23:43:14.733Z · comments (10)

Handling schemers if shutdown is not an option
Buck · 2025-04-18T14:39:18.609Z · comments (0)

[link] Sentinel's Global Risks Weekly Roundup #15/2025: Tariff yoyo, OpenAI slashing safety testing, Iran nuclear programme negotiations, 1K H5N1 confirmed herd infections.
NunoSempere (Radamantis) · 2025-04-14T19:11:20.977Z · comments (0)

Follow me on TikTok
lsusr · 2025-04-01T08:22:29.521Z · comments (8)

[link] Automated Researchers Can Subtly Sandbag
gasteigerjo · 2025-03-26T19:13:26.879Z · comments (0)

An overview of control measures
ryan_greenblatt · 2025-03-24T23:16:49.400Z · comments (0)

Analyzing long agent transcripts (Docent)
jsteinhardt · 2025-03-24T20:49:54.472Z · comments (2)

[link] The case for AGI by 2030
Benjamin_Todd · 2025-04-09T20:35:55.167Z · comments (6)

[link] Map of all 40 copyright suits v. AI in U.S.
Remmelt (remmelt-ellen) · 2025-03-26T07:57:58.976Z · comments (3)

[link] The US Executive vs Supreme Court Deportations Clash
NunoSempere (Radamantis) · 2025-04-21T19:56:03.711Z · comments (0)

Crime and Punishment #1
Zvi · 2025-04-21T15:30:06.420Z · comments (4)

We need (a lot) more rogue agent honeypots
Ozyrus · 2025-03-23T22:24:52.785Z · comments (12)

[link] Existing Safety Frameworks Imply Unreasonable Confidence
Joe Rogero · 2025-04-10T16:31:50.240Z · comments (2)

Scaffolding Skills
Screwtape · 2025-04-18T17:39:25.634Z · comments (7)

Meditation and Reduced Sleep Need
niplav · 2025-04-04T14:42:54.792Z · comments (8)

o3 Will Use Its Tools For You
Zvi · 2025-04-18T21:20:02.566Z · comments (3)

The Rise of Hyperpalatability
Jack (jack-3) · 2025-04-02T20:18:04.407Z · comments (10)

[link] Forecasting time to automated superhuman coders [AI 2027 Timelines Forecast]
elifland · 2025-04-10T23:10:23.063Z · comments (0)

Call for Collaboration: Renormalization for AI safety
Lauren Greenspan (LaurenGreenspan) · 2025-03-31T21:01:56.500Z · comments (0)

Can SAE steering reveal sandbagging?
jordine · 2025-04-15T12:33:41.264Z · comments (3)

[link] Well-foundedness as an organizing principle of healthy minds and societies
Richard_Ngo (ricraz) · 2025-04-07T00:31:34.098Z · comments (6)

Austin Chen on Winning, Risk-Taking, and FTX
Elizabeth (pktechgirl) · 2025-04-07T19:00:08.039Z · comments (3)

Subversion Strategy Eval: Can language models statelessly strategize to subvert control protocols?
Alex Mallen (alex-mallen) · 2025-03-24T17:55:59.358Z · comments (0)

More Fun With GPT-4o Image Generation
Zvi · 2025-04-03T02:10:02.317Z · comments (3)

Avoid the Counterargument Collapse
marknm · 2025-03-26T03:19:58.655Z · comments (3)

OpenAI rewrote its Preparedness Framework
Zach Stein-Perlman · 2025-04-15T20:00:50.614Z · comments (1)

Is instrumental convergence a thing for virtue-driven agents?
mattmacdermott · 2025-04-02T03:59:20.064Z · comments (37)

Why do misalignment risks increase as AIs get more capable?
ryan_greenblatt · 2025-04-11T03:06:50.928Z · comments (6)

[link] Center on Long-Term Risk: Summer Research Fellowship 2025 - Apply Now
Tristan Cook · 2025-03-26T17:29:14.797Z · comments (0)

FLAKE-Bench: Outsourcing Awkwardness in the Age of AI
annas (annasoli) · 2025-04-01T17:08:25.092Z · comments (0)

How much progress actually happens in theoretical physics?
ChristianKl · 2025-04-04T23:08:00.633Z · comments (32)

Goodhart Typology via Structure, Function, and Randomness Distributions
JustinShovelain · 2025-03-25T16:01:08.327Z · comments (0)

More on Various AI Action Plans
Zvi · 2025-03-24T13:10:05.637Z · comments (0)

What Uniparental Disomy Tells Us About Improper Imprinting in Humans
Morpheus · 2025-03-28T11:24:47.133Z · comments (1)

When the Wannabe Rambo Comedian Cried
P. João (gabriel-brito) · 2025-03-31T14:47:50.660Z · comments (0)

On the Implications of Recent Results on Latent Reasoning in LLMs
Rauno Arike (rauno-arike) · 2025-03-31T11:06:23.939Z · comments (6)

An overview of areas of control work
ryan_greenblatt · 2025-03-25T22:02:16.178Z · comments (0)

Most Questionable Details in 'AI 2027'
scarcegreengrass · 2025-04-05T00:32:54.896Z · comments (4)

[link] How prediction markets can create harmful outcomes: a case study
B Jacobs (Bob Jacobs) · 2025-04-02T15:37:09.285Z · comments (2)

Non-Monotonic Infra-Bayesian Physicalism
Marcus Ogren · 2025-04-02T12:14:19.783Z · comments (0)

Llama Does Not Look Good 4 Anything
Zvi · 2025-04-09T13:20:01.799Z · comments (1)

[link] The 4-Minute Mile Effect
Parker Conley (parker-conley) · 2025-04-14T21:41:27.726Z · comments (6)

Meetups Notes (Q1 2025)
jenn (pixx) · 2025-03-31T01:12:11.774Z · comments (2)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

sustrik on Accountability Sinks

It’s always worth asking, do we own the process or does the process own us?

That's a nice shortcut to explain the distinction between "a process imposed upon yourself" vs. "a process handed to you from above".

pablo_stafforini on Pablo's Shortform

I think there is a vast difference between Gerard and Kruel, not just in the damage each has caused but also in their intellectual honesty and responsiveness to argument (null in the case of Gerard, decent in the case of Kruel, at least from my recollection).

samuelshadrach on Davidmanheim's Shortform

Why does this matter? To quote a Yudkowsky-ish example, maybe you can take a 16-th century human (before Newtonian physics was invented, after guns were invented) and explain to him how a nuclear bomb works. This doesn't matter for predicting the outcome of a hypothetical war between 16th century Britain and 21st century USA.

ASI inventions can be big surprises and yet be things that you could understand if someone taught you.

We could probably understand how a von Neumann probe or an anti-aging cure worked too, if someone taught us.

mo-putera on Accountability Sinks

I think this essay is going to be one I frequently recommend to others over the coming years, thanks for writing it.

But in the end, deep in the heart of any bureaucracy, the process is about responsibility and the ways to avoid it. It's not an efficiency measure, it’s an accountability management technique.

This vaguely reminded me of what Ivan Vendrov wrote in Metrics, Cowardice, and Mistrust. Ivan began by noting that "companies optimizing for simple engagement metrics aren’t even being economically rational... so why don't they?" It's not because "these more complex things are hard to measure", if you think about it. His answer is cowardice and mistrust, which lead to the selection of metrics "robust to an adversary trying to fool you":

But the other reason we use metrics, sadly much more common, is due to cowardice (sorry, risk-aversion) and mistrust.
Cowardice because nobody wants to be responsible for making a decision. Actually trying to understand the impact of a new feature on your users and then making a call is an inherently subjective process that involves judgment, i.e. responsibility, i.e. you could be fired if you fuck it up. Whereas if you just pick whichever side of the A/B test has higher metrics, you’ve successfully outsourced your agency to an impartial process, so you’re safe.
Mistrust because not only does nobody want to make the decision themselves, nobody even wants to delegate it! Delegating the decision to a specific person also involves a judgment call about that person. If they make a bad decision, that reflects badly on you for trusting them! So instead you insist that “our company makes data driven decisions” which is a euphemism for “TRUST NO ONE”. This works all the way up the hierarchy - the CEO doesn’t trust the Head of Product, the board members don’t trust the CEO, everyone insists on seeing metrics and so metrics rule.
Coming back to our original question: why can’t we have good metrics that at least try to capture the complexity of what users want? Again, cowardice and mistrust. There’s a vast space of possible metrics, and choosing any specific one is a matter of judgment. But we don’t trust ourselves or anyone else enough to make that judgment call, so we stick with the simple dumb metrics.
This isn’t always a bad thing! Police departments are often evaluated by their homicide clearance rate because murders are loud and obvious and their numbers are very hard to fudge. If we instead evaluated them by some complicated CRIME+ index that a committee came up with, I’d expect worse outcomes across the board.
Nobody thinks “number of murders cleared” is the best metric of police performance, any more than DAUs are the best metric of product quality, or GDP is the best metric of human well-being. However they are the best in the sense of being hardest to fudge, i.e. robust to an adversary trying to fool you. As trust declines, we end up leaning more and more on these adversarially robust metrics, and we end up in a gray Brezhnev world where the numbers are going up, everyone knows something is wrong, but the problems get harder and harder to articulate.

His preferred solution to counteracting this tendency to use adversarially robust but terrible metrics is to develop an ideology to promote mission alignment:

A popular attempt at a solution is monarchy. ... The big problem with monarchy is that it doesn’t scale. ...
A more decentralized and scalable solution is developing an ideology: a self-reinforcing set of ideas nominally held by everyone in your organization. Having a shared ideology increases trust, and ideologues are able to make decisions against the grain of natural human incentives. This is why companies talk so much about “mission alignment”, though very few organizations can actually pull off having an ideology: when real sacrifices need to be made, either your employees or your investors will rebel.

While his terminology feels somewhat loaded, I thought it natural to interpret all your examples of people breaking rules to get the thing done in mission alignment terms.

Another way to promote mission alignment is some combination of skilful message compression and resistance to proxies, which Eugene Wei wrote about in Compress to impress about Jeff Bezos (monarchy in Ivan's framing above). On the latter, quoting Bezos:

As companies get larger and more complex, there’s a tendency to manage to proxies. This comes in many shapes and sizes, and it’s dangerous, subtle, and very Day 2.

A common example is process as proxy. Good process serves you so you can serve customers. But if you’re not watchful, the process can become the thing. This can happen very easily in large organizations. The process becomes the proxy for the result you want. You stop looking at outcomes and just make sure you’re doing the process right. Gulp. It’s not that rare to hear a junior leader defend a bad outcome with something like, “Well, we followed the process.” A more experienced leader will use it as an opportunity to investigate and improve the process. The process is not the thing. It’s always worth asking, do we own the process or does the process own us? In a Day 2 company, you might find it’s the second.

I wonder how all this is going to look like in a (soonish?) future where most of the consequential decision-making has been handed over to AIs.

shawnghu on Accountability Sinks

Well, a 5 percent rate of being sent to a concentration camp or combat unit isn't exactly negligible, and a further 17 percent were threatened. So maybe it's correct that these effects are "surprisingly mild", but these stats are more justification for the "just following orders" explanation than I'd have imagined from the main text.

rasool on Eulogy to the Obits

It looks like this is a linkpost to:

https://press.asimov.com/articles/obit

yams on jacquesthibs's Shortform

Rather than make things worse as a means of compelling others to make things better, I would rather just make things better.

Brinksmanship and accelerationism (in the Marxist sense) are high variance strategies ill-suited to the stakes of this particular game.

[one way this makes things worse is stimulating additional investment on the frontier; another is attracting public attention to the wrong problem, which will mostly just generate action on solutions to that problem, and not to the problem we care most about. Importantly, the contingent of people-mostly-worried-about-jobs are not yet our allies, and it’s likely their regulatory priorities would not address our concerns, even though I share in some of those concerns.]

samuelshadrach on xpostah's Shortform

Suppose you are trying to figure out a function f(x,y,z | a,b,c) where x, y ,z are all scalar values and a, b, c are all constants.

If you knew a few zeroes of this function, you could figure out good approximations of this function. Let's say you knew

U(x,y, a=0) = x
U(x,y, a=1) = x
U(x,y, a=2) = y
U(x,y, a=3) = y

You could now guess U(x,y) = x if a<1.5, y if a>1.5

You will not be able to get a good approximation if you did not know enough zeroes.

This is a comment about morality. x, y, z are agent's multiple possibly-conflicting values and a, b, c are info about environment of agent. You lack data about how your own mind will react to hypothetical situations you have not faced. At best you can extrapolate from historical data around minds of other people that are different from yours. Bigger and more trustworthy dataset will help solve this.

tobias-h on Accountability Sinks

Quite an interesting paper you linked:

Conventional wisdom during World War II among German soldiers,
members of the SS and SD as well as police personnel, held that any order given
by a superior officer must be obeyed under any circumstances. Failure to carry
out such an order would result in a threat to life and limb or possibly serious
danger to loved ones. Many students of Nazi history have this same view, even
to this day.
Could a German refuse to participate in the round up and murder of
Jews, gypsies, suspected partisans,"commissars"and Soviet POWs - unarmed
groups of men, women, and children - and survive without getting himself shot
or put into a concentration camp or placing his loved ones in jeopardy?
We may never learn the full answer to this, the ultimate question for
all those placed in such a quandry, because we lack adequate documentation
in many cases to determine the full circumstances and consequences of such a
hazardous risk. There are, however, over 100 cases of individuals whose moral
scruples were weighed in the balance and not found wanting. These individuals
made the choice to refuse participation in the shooting of unarmed civilians or
POWs and none of them paid the ultimate penalty, death! Furthermore,very few
suffered any other serious consequence!

Table of the consequences they faced:

mondsemmel on Charbel-Raphaël's Shortform

Altman has already signed the CAIS Statement on AI Risk ("Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war."), but OpenAI's actions almost exclusively exacerbate extinction risk, and nowadays Altman and OpenAI even downplay the very existence of this risk.