LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Linkpost to a Summary of "Imagining and building wise machines: The centrality of AI metacognition" by Johnson, Karimi, Bengio, et al.
Chris_Leong · 2025-04-10T11:54:37.484Z · comments (0)

Asunción – ACX Meetups Everywhere Spring 2025
NunoSempere (Radamantis) · 2025-03-25T23:49:06.392Z · comments (0)

Breaking down the MEAT of Alignment
JasonBrown · 2025-04-07T08:47:22.080Z · comments (2)

0 Motivation Mapping through Information Theory
P. João (gabriel-brito) · 2025-04-18T00:53:34.360Z · comments (0)

Populectomy.ai
YonatanK (jonathan-kallay) · 2025-03-24T22:06:24.680Z · comments (2)

Melbourne – ACX Meetups Everywhere Spring 2025
ShardPhoenix · 2025-03-25T23:49:20.139Z · comments (0)

[link] EA Reflections on my Military Career
TomGardiner (HorusXVI) · 2025-04-10T19:01:42.844Z · comments (0)

AI Control Methods Literature Review
Ram Potham (ram-potham) · 2025-04-18T21:15:34.682Z · comments (1)

The case for creating unaligned superintelligence
Yair Halberstadt (yair-halberstadt) · 2025-04-02T06:47:41.934Z · comments (0)

Mass Exposure Paradox
max-sixty · 2025-04-16T20:18:00.492Z · comments (0)

An Optimistic 2027 Timeline
Yitz (yitz) · 2025-04-06T16:39:36.554Z · comments (13)

I Have No Mouth but I Must Speak
Jack (jack-3) · 2025-04-05T07:42:54.424Z · comments (8)

[link] Knowledge, Reasoning, and Superintelligence
owencb · 2025-03-26T23:28:11.465Z · comments (0)

Commitment Races are a technical problem ASI can easily solve
Knight Lee (Max Lee) · 2025-04-12T22:22:47.790Z · comments (6)

[link] Epoch AI is hiring a CTO!
merilalama · 2025-04-02T20:29:29.362Z · comments (0)

Family-line selection optimizer
lemonhope (lcmgcd) · 2025-04-22T07:16:43.774Z · comments (0)

Tied Crosscoders: Explaining Chat Behavior from Base Model
Santiago Aranguri (aranguri) · 2025-03-22T18:07:21.751Z · comments (0)

Halifax – ACX Meetups Everywhere Spring 2025
interstice · 2025-03-25T23:48:47.910Z · comments (0)

Bordeaux – ACX Meetups Everywhere Spring 2025
vi21maobk9vp · 2025-03-25T23:49:31.853Z · comments (1)

On the plausibility of a “messy” rogue AI committing human-like evil
Jacob Griffith (Jacob.Griffith) · 2025-03-27T18:06:45.505Z · comments (0)

Singularity Survival Guide: A Bayesian Guide for Navigating the Pre-Singularity Period
mbrooks · 2025-03-28T23:21:39.191Z · comments (4)

Enumerating objects a model "knows" using entity-detection features.
Alex Gibson · 2025-03-30T16:58:01.957Z · comments (2)

[link] Large Language Models Pass the Turing Test
Matrice Jacobine · 2025-04-02T05:41:54.613Z · comments (0)

An idea for avoiding neuralese architectures
Knight Lee (Max Lee) · 2025-04-03T22:23:21.653Z · comments (2)

For Policy’s Sake: Why We Must Distinguish AI Safety from AI Security in Regulatory Governance
Katalina Hernandez (katalina-hernandez) · 2025-04-04T09:16:20.712Z · comments (11)

Join Vitalist Bay: An 8-Week Longevity and Radical Life Extension Event Series in Berkeley (April-May 2025)
Vitaran (vitalismops) · 2025-04-04T21:03:06.508Z · comments (0)

Arguing all sides with ChatGPT 4.5
Richard_Kennaway · 2025-04-07T13:10:11.562Z · comments (0)

You Are Not a Thought Experiment
Jacob Falkovich (Jacobian) · 2025-04-07T15:27:42.956Z · comments (0)

The Three Boxes: A Simple Model for Spreading Ideas
JohnGreer · 2025-04-10T17:15:55.163Z · comments (0)

[link] Nihilism Is Not Enough By Peter Thiel
shawkisukkar · 2025-04-15T00:13:01.375Z · comments (4)

$500 bounty for best short-form fiction about our near future world; $100 for recommending winning piece: new “Art of Near Future World” quarterly art project
Ramon Gonzalez (ramon-gonzalez) · 2025-04-15T00:46:10.637Z · comments (0)

[link] AISN #51: AI Frontiers
Corin Katzke (corin-katzke) · 2025-04-15T16:01:56.701Z · comments (1)

Some OthelloGPT Circuits
Alfred Wong (alfred-wong) · 2025-04-15T18:41:36.216Z · comments (0)

How Logic "Really" Works: An Engineering Perspective
Daniil Strizhov (mila-dolontaeva) · 2025-04-16T05:34:09.443Z · comments (0)

AI, Alignment & the Art of Relationship Design
Priyanka Bharadwaj (priyanka-bharadwaj) · 2025-04-19T00:47:02.591Z · comments (3)

March-April 2025 Progress in Guaranteed Safe AI
Quinn (quinn-dougherty) · 2025-04-20T19:00:07.879Z · comments (0)

[link] Q2 AI Forecasting Benchmark: $30,000 in Prizes
ChristianWilliams · 2025-04-21T17:29:55.461Z · comments (2)

Zürich – ACX Meetups Everywhere Spring 2025
Vitor · 2025-03-25T23:49:32.304Z · comments (0)

Gamify life from BayesianMind
P. João (gabriel-brito) · 2025-04-16T16:17:49.284Z · comments (2)

Grass Valley – ACX Meetups Everywhere Spring 2025
Raelifin · 2025-03-25T23:49:10.508Z · comments (0)

[link] What can we learn from expert AGI forecasts?
Benjamin_Todd · 2025-04-09T21:34:53.148Z · comments (0)

[link] What are the differences between a singularity, an intelligence explosion, and a hard takeoff?
Vishakha (vishakha-agrawal) · 2025-04-03T10:37:25.857Z · comments (0)

How to Defend the Indefensible
Alex Beyman (alexbeyman) · 2025-04-15T07:45:15.971Z · comments (1)

Emergent Misalignment and Emergent Alignment
Alvin Ånestrand (alvin-anestrand) · 2025-04-03T08:04:38.150Z · comments (0)

[link] CoreWeave Is A Time Bomb
Remmelt (remmelt-ellen) · 2025-03-31T03:52:23.582Z · comments (0)

The Structure of the Pain of Change
ReverendBayes (vedernikov-andrei) · 2025-04-13T21:51:53.823Z · comments (0)

Many Common Problems are NP-Hard, and Why that Matters for AI
Andrew Keenan Richardson (qemqemqem) · 2025-03-26T21:51:17.960Z · comments (9)

[link] Privateers Reborn: Cyber Letters of Marque
arealsociety (shane-zabel) · 2025-03-23T03:39:25.990Z · comments (2)

Apparent Introspection in Claude: A Case Study in Projected Mind
robert_saltzman · 2025-03-31T00:51:08.748Z · comments (0)

Load Bearing Magic
winstonBosan · 2025-04-21T18:53:25.014Z · comments (1)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

samuelshadrach on xpostah's Shortform

Suppose you are trying to figure out a function f(x,y,z | a,b,c) where x, y ,z are all scalar values and a, b, c are all constants.

If you knew a few zeroes of this function, you could figure out good approximations of this function. Let's say you knew

U(x,y, a=0) = x
U(x,y, a=1) = x
U(x,y, a=2) = y
U(x,y, a=3) = y

You could now guess U(x,y) = x if a<1.5, y if a>1.5

You will not be able to get a good approximation if you did not know enough zeroes.

This is a comment about morality. x, y, z are agent's multiple possibly-conflicting values and a, b, c are info about environment of agent. You lack data about how your own mind will react to hypothetical situations you have not faced. At best you can extrapolate from historical data around minds of other people that are different from yours. Bigger and more trustworthy dataset will help solve this.

tobias-h on Accountability Sinks

Quite an interesting paper you linked:

Conventional wisdom during World War II among German soldiers,
members of the SS and SD as well as police personnel, held that any order given
by a superior officer must be obeyed under any circumstances. Failure to carry
out such an order would result in a threat to life and limb or possibly serious
danger to loved ones. Many students of Nazi history have this same view, even
to this day.
Could a German refuse to participate in the round up and murder of
Jews, gypsies, suspected partisans,"commissars"and Soviet POWs - unarmed
groups of men, women, and children - and survive without getting himself shot
or put into a concentration camp or placing his loved ones in jeopardy?
We may never learn the full answer to this, the ultimate question for
all those placed in such a quandry, because we lack adequate documentation
in many cases to determine the full circumstances and consequences of such a
hazardous risk. There are, however, over 100 cases of individuals whose moral
scruples were weighed in the balance and not found wanting. These individuals
made the choice to refuse participation in the shooting of unarmed civilians or
POWs and none of them paid the ultimate penalty, death! Furthermore,very few
suffered any other serious consequence!

Table of the consequences they faced:

mondsemmel on Charbel-Raphaël's Shortform

Altman has already signed the CAIS Statement on AI Risk ("Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war."), but OpenAI's actions almost exclusively exacerbate extinction risk, and nowadays Altman and OpenAI even downplay the very existence of this risk.

ozyrus on Is Gemini now better than Claude at Pokémon?

Not being able to do it right now is perfectly fine, still warrants setting it up to see when exactly they will start to be able to do it.

robo on Why Should I Assume CCP AGI is Worse Than USG AGI?

This is true, and it strongly influences the ways Americans think about how to provide public goods to the rest of the world. But they're thinking about how to provide public goods the rest of the world^[1]. "America First" is controversial in American intellectual circles, whereas in my (limited) conversations in China people are usually confused about what other sort of policy you would have.

^{^}
Disclosure: I'm American, I came of age in this era

charbel-raphael on Charbel-Raphaël's Shortform

X could also be agreeing to sign a public statement about the need to do something or whatever.

lblack on AE Studio is hiring!

I like AE Studios. They seem to genuinely care about AI not killing everyone, and have been willing to actually back original research ideas that don't fit into existing paradigms.

Side note:

Previous posts have been met with great reception by the likes of Eliezer Yudkowsky and Emmett Shear, so we’re up to something good.

This might be a joke, but just in case it's not: I don't think you should reason about your own alignment research agenda like this. I think Eliezer would probably be the first person to tell you that [? · GW].

hiandrewquinn on Training AGI in Secret would be Unsafe and Unethical

If so, by default the existence of AGI will be a closely guarded secret for some months. Only a few teams within an internal silo, plus leadership & security, will know about the capabilities of the latest systems.

This will result in a situation where only a few dozen people will be charged with ensuring that, and figuring out whether, the latest AIs are aligned/trustworthy/etc.

These are precisely the kinds of outcomes fine-insured bounties would excel at stopping, in my thinking. https://www.lesswrong.com/posts/rgFh6kE9FFjMuYrhc/fine-insured-bounties-for-preventing-dangerous-ai [LW · GW]

mondsemmel on Charbel-Raphaël's Shortform

Even if this strategy would work in principle among particularly honorable humans, surely Sam Altman in particular has already conclusively proven that he cannot be trusted to honor any important agreements? See: the OpenAI board drama; the attempt to turn OpenAI's nonprofit into a for-profit; etc.

adamzerner on adamzerner's Shortform

I've been doing Quantified Intuitions' Estimation Game every month. I really enjoy it. A big thing I've learned from it is the instinct to think in terms of orders of magnitude.

Well, not necessarily orders of magnitude, but something similar. For example, a friend just asked me about building a little web app calculator to provide better handicaps in golf scrambles. In the past I'd get a little overwhelmed thinking about how much time such a project would take and default to saying no. But this time I noticed myself approaching it differently.

Will it take minutes? Eh, probably not. Hours? Possibly, but seems a little optimistic. Days? Yeah, seems about right. Weeks? Eh, possibly, but even with the planning fallacy, I'd be surprised. Months? No, it won't take that long. Years. No way.

With this approach I can figure out the right ballpark very quickly. It's helpful.