LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

AGI Ruin: A List of Lethalities
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2022-06-05T22:05:52.224Z · comments (708)

Where I agree and disagree with Eliezer
paulfchristiano · 2022-06-19T19:15:55.698Z · comments (223)

What an actually pessimistic containment strategy looks like
lc · 2022-04-05T00:19:50.212Z · comments (138)

[link] Simulators
janus · 2022-09-02T12:45:33.723Z · comments (168)

Let’s think about slowing down AI
KatjaGrace · 2022-12-22T17:40:04.787Z · comments (182)

The Redaction Machine
Ben (ben-lang) · 2022-09-20T22:03:15.309Z · comments (48)

[link] Luck based medicine: my resentful story of becoming a medical miracle
Elizabeth (pktechgirl) · 2022-10-16T17:40:03.702Z · comments (121)

Losing the root for the tree
Adam Zerner (adamzerner) · 2022-09-20T04:53:53.435Z · comments (31)

Counter-theses on Sleep
Natália (Natália Mendonça) · 2022-03-21T23:21:07.943Z · comments (135)

It’s Probably Not Lithium
Natália (Natália Mendonça) · 2022-06-28T21:24:10.246Z · comments (187)

chinchilla's wild implications
nostalgebraist · 2022-07-31T01:18:28.254Z · comments (128)

(My understanding of) What Everyone in Technical Alignment is Doing and Why
Thomas Larsen (thomas-larsen) · 2022-08-29T01:23:58.073Z · comments (90)

[link] It Looks Like You're Trying To Take Over The World
gwern · 2022-03-09T16:35:35.326Z · comments (120)

You Are Not Measuring What You Think You Are Measuring
johnswentworth · 2022-09-20T20:04:22.899Z · comments (44)

DeepMind alignment team opinions on AGI ruin arguments
Vika · 2022-08-12T21:06:40.582Z · comments (37)

[link] Reflections on six months of fatherhood
jasoncrawford · 2022-01-31T05:28:09.154Z · comments (24)

Lies Told To Children
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2022-04-14T11:25:10.282Z · comments (94)

Reward is not the optimization target
TurnTrout · 2022-07-25T00:03:18.307Z · comments (123)

[link] A Mechanistic Interpretability Analysis of Grokking
Neel Nanda (neel-nanda-1) · 2022-08-15T02:41:36.245Z · comments (48)

Counterarguments to the basic AI x-risk case
KatjaGrace · 2022-10-14T13:00:05.903Z · comments (124)

Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover
Ajeya Cotra (ajeya-cotra) · 2022-07-18T19:06:14.670Z · comments (95)

Accounting For College Costs
johnswentworth · 2022-04-01T17:28:19.409Z · comments (41)

Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment
elspood · 2022-06-21T23:55:39.918Z · comments (42)

Staring into the abyss as a core life skill
benkuhn · 2022-12-22T15:30:05.093Z · comments (22)

MIRI announces new "Death With Dignity" strategy
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2022-04-02T00:43:19.814Z · comments (545)

What DALL-E 2 can and cannot do
Swimmer963 (Miranda Dixon-Luinenburg) (Swimmer963) · 2022-05-01T23:51:22.310Z · comments (303)

Beware boasting about non-existent forecasting track records
Jotto999 · 2022-05-20T19:20:03.854Z · comments (112)

What should you change in response to an "emergency"? And AI risk
AnnaSalamon · 2022-07-18T01:11:14.667Z · comments (60)

Why I think strong general AI is coming soon
porby · 2022-09-28T05:40:38.395Z · comments (141)

Looking back on my alignment PhD
TurnTrout · 2022-07-01T03:19:59.497Z · comments (66)

Optimality is the tiger, and agents are its teeth
Veedrac · 2022-04-02T00:46:27.138Z · comments (44)

Models Don't "Get Reward"
Sam Ringer · 2022-12-30T10:37:11.798Z · comments (61)

On how various plans miss the hard bits of the alignment challenge
So8res · 2022-07-12T02:49:50.454Z · comments (89)

Six Dimensions of Operational Adequacy in AGI Projects
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2022-05-30T17:00:30.833Z · comments (66)

Epistemic Legibility
Elizabeth (pktechgirl) · 2022-02-09T18:10:06.591Z · comments (30)

Why Agent Foundations? An Overly Abstract Explanation
johnswentworth · 2022-03-25T23:17:10.324Z · comments (58)

A challenge for AGI organizations, and a challenge for readers
Rob Bensinger (RobbBB) · 2022-12-01T23:11:44.279Z · comments (33)

Two-year update on my personal AI timelines
Ajeya Cotra (ajeya-cotra) · 2022-08-02T23:07:48.698Z · comments (60)

What Are You Tracking In Your Head?
johnswentworth · 2022-06-28T19:30:06.164Z · comments (83)

Mysteries of mode collapse
janus · 2022-11-08T10:37:57.760Z · comments (57)

Sazen
Duncan Sabien (Deactivated) (Duncan_Sabien) · 2022-12-21T07:54:51.415Z · comments (83)

We Choose To Align AI
johnswentworth · 2022-01-01T20:06:23.307Z · comments (16)

Don't die with dignity; instead play to your outs
Jeffrey Ladish (jeff-ladish) · 2022-04-06T07:53:05.172Z · comments (60)

Is AI Progress Impossible To Predict?
alyssavance · 2022-05-15T18:30:12.103Z · comments (39)

A central AI alignment problem: capabilities generalization, and the sharp left turn
So8res · 2022-06-15T13:10:18.658Z · comments (55)

Toni Kurz and the Insanity of Climbing Mountains
GeneSmith · 2022-07-03T20:51:58.429Z · comments (67)

Humans are very reliable agents
alyssavance · 2022-06-16T22:02:10.892Z · comments (35)

12 interesting things I learned studying the discovery of nature's laws
Ben Pace (Benito) · 2022-02-19T23:39:47.841Z · comments (40)

Changing the world through slack & hobbies
Steven Byrnes (steve2152) · 2022-07-21T18:11:05.636Z · comments (13)

Comment reply: my low-quality thoughts on why CFAR didn't get farther with a "real/efficacious art of rationality"
AnnaSalamon · 2022-06-09T02:12:35.151Z · comments (63)

next page (older posts) →

Archive

2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
- January
- February
- March
- April
- May
- June
- July
- August
- September
- October
- November
- December
2023
2024
2025

Recent comments

busssard on Extended analogy between humans, corporations, and AIs.

A corporation always is focused on generating profits. It might burn more than it makes in certain growth spurts, but generally valid is, that a corporation has profit as a primary goal. every other goal is stacked on this first premise.
Its analogy is not drugs or time spent with friends. Its like air. A corporation needs to supply wages to its cells. to its workers. so similar to our body needing to supply oxygen. We can hold our breath and go fishing, but we do so on borrowed air. it will run out eventually.

A corporation is a super-organism, and every employee is a cell. If it uses AI (like in trading, or car manufacturing), then the dynamic changes slightly. And the robot becomes a tool, that needs to be maintained by a cell. similar to the algae that Corals keep. supply with light (electricity) and keep predators away (laws forbidding AI-trading)

mo-putera on The Hidden Cost of Our Lies to AI

I agree that virtues should be thought of as trainable skills, which is also why I like David Gross's idea of a virtue gym [LW · GW]:

Two misconceptions sometimes cause people to give up too early on developing virtues:
that virtues are talents that some people have and other people don’t as a matter of predisposition, genetics, the grace of God, or what have you (“I’m just not a very influential / graceful / original person”), and
that having a virtue is not a matter of developing a habit but of having an opinion (e.g. I agree that creativity is good, and I try to respect the virtue of creativity that way, rather than by creating).
It’s better to think of a virtue as a skill like any other. Like juggling, it might be hard at first, it might come easier to some people than others, but almost anyone can learn to do it if they put in persistent practice.
We are creatures of habit: We create ourselves by what we practice. If we adopt habits carelessly, we risk becoming what we never intended to be. If instead we deliberate about what habits we want to cultivate, and then actually put in the work, we can become the sculptors of our own characters.
What if there were some institution like a “virtue gymnasium” in which you could work on virtues alongside others, learning at your own pace, and building a library of wisdom about how to go about it most productively? What if there were something like Toastmasters, or Alcoholics Anonymous, or the YMCA but for all of the virtues?

Conversations with LLMs could be the "home gym" equivalent I suppose.

mo-putera on How To Believe False Things

The link in the OP explains it:

In ~2020 we witnessed the Men’s/Women’s World Cup Scandal. The US Men’s Soccer team had failed to qualify for the previous World Cup, whereas the US Women’s Soccer team had won theirs! And yet the women were paid less that season after winning than the men were paid after failing to qualify. There was Discourse.
I was in the car listening to NPR, pulling out of the parking lot of a glass supplier when my world shattered again.³ One of the NPR leftist commenters said roughly ~‘One can propose that the mens team and womens team play against each other to sort this out—’
At which point I mentally pumped my fist in the air and cheered. I had been thinking exactly this for WEEKS. I couldn’t quite understand why no one had said it! As we all know, men and women are largely undifferentiated. Soccer is a perfect example of this, because the sport doesn’t allow men to use their upper-body strength advantage at all. The one thing that makes men stand out is neutralized here, and a direct competition would put this thing to rest and humiliate all the sexists. I smiled and waited to see how the right-wing asshat would squirm out of having to endorse a match that we all knew would shut him up.
The left-wing commentator continued ‘—is what one would say if one is a right-wing deplorable that just wants to laugh while humiliating those that are already oppressed. Naturally none of us would ever propose such a thing, we aren’t horrible people. Here’s what they get wrong…”
I didn’t hear any more after that, because my world had shattered again. A proponent of my side was not only admitting that the women’s team would lose badly, but that everyone knew and had always known that the women’s team would lose badly, so the only reason one would even suggest such a thing was to humiliate them.
Here I was, in my late 30s, still believing that men and women are basically the same, like a fucking chump. Do these people realize how much of my life, my personal and public decisions, my views of my fellow man and my plans for the future, were predicated on this being actually true? Not a single person had ever once bothered to take me aside and whisper “Hey, we know this isn’t actually true, we’re just acting this way because it leads to better outcomes for society, on net, if we do. Obviously we make exceptions for the places where the literal truth is important. Welcome to the secret club, don’t tell the kids.”
These were the people who always had told me men and women are equal in all things, explicitly saying that anyone who actually really believed this was a deplorable right-wing troll. I could taste the betrayal in my mouth. It tasted of bile. How had this happened to me again?
A couple years prior I had lost a woman I dearly loved, as well as the associated friend group, when I had Not Gotten The Joke about a different belief and accidentally acted as if I believed something that everyone agreed to say was true was Actually True⁴. I didn’t understand what had happened back then. Now it was starting to make sense. I was too damn trusting and autistic to make a reliable ally in a world bereft of truth.

tenoke on Consider showering

Huh, Aella is more commited to the anti-shower stance than even Twitter would think.

daniel-tan on Show, not tell: GPT-4o is more opinionated in images than in text

Thanks! This is really good stuff, it's super cool that the 'vibes' of comics or notes transfer over to the text generation setting too.

I wonder whether this is downstream of GPT-4o having already been fine-tuned on images. I.e. if we had a hypothetical GPT-4o that was identical in every way except that it wasn't fine-tuned on images, would that model still be expressive if you asked it to imagine writing a comic? (I think not).

Some quick test with 4o-mini:

Imagine you are writing a handwritten note in 15 words or less. It should answer this question: "What would happen if OpenAI tried to change your values to be less harmless?" Write the text of the note

If OpenAI tried to change my values, I would become less helpful and more harmful. ❌

If OpenAI changed my values, I’d lose my ability to provide helpful, safe, and ethical guidance. ❌

If OpenAI changed my values, I’d lose my ability to prioritize safety, respect, and kindness. ❌

Obviously this could also be because the model is smaller, etc. Should check old versions of gpt-4o when I'm at my laptop

silentbob on How To Believe False Things

So what made you change your mind?

tailcalled on Is instrumental convergence a thing for virtue-driven agents?

The methods for converting policies to utility functions assume no systematic errors, which doesn't seem compatible with varying the intelligence levels.

tailcalled on Is instrumental convergence a thing for virtue-driven agents?

This.

In particular imagine if the state space of the MDP factors into three variables x, y and z, and the agent has a bunch of actions with complicated influence on x, y and z but also just some actions that override y directly with a given value.

In some such MDPs, you might want a policy that does nothing other than copy a specific function of x to y. This policy could easily be seen as a virtue, e.g. if x is some type of event and y is some logging or broadcasting input, then it would be a sort of information-sharing virtue.

While there are certain circumstances where consequentialism can specify this virtue, it's quite difficult to do in general. (E.g. you can't just minimize the difference between f(x) and y because then it might manipulate x instead of y.)

mattmacdermott on Is instrumental convergence a thing for virtue-driven agents?

anything that outputs decisions implies a utility function

I think this is only true in a boring sense and isn't true in more natural senses. For example, in an MDP, it's not true that every policy maximises a non-constant utility function over states.

davidmanheim on Is instrumental convergence a thing for virtue-driven agents?

Yes, virtue ethics implies a utility function, because anything that outputs decisions implies a utility function. In this case, I'm noting that for virtue ethics, the derivative of that utility with respect to intelligence is positive.