LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

(The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser
habryka (habryka4) · 2024-11-30T02:55:16.077Z · comments (269)

LessWrong's (first) album: I Have Been A Good Bing
habryka (habryka4) · 2024-04-01T07:33:45.242Z · comments (180)

OpenAI Email Archives (from Musk v. Altman and OpenAI blog)
habryka (habryka4) · 2024-11-16T06:38:03.937Z · comments (80)

Alignment Faking in Large Language Models
ryan_greenblatt · 2024-12-18T17:19:06.665Z · comments (74)

I would have shit in that alley, too
Declan Molony (declan-molony) · 2024-06-18T04:41:06.545Z · comments (135)

The Best Tacit Knowledge Videos on Every Subject
Parker Conley (parker-conley) · 2024-03-31T17:14:31.199Z · comments (156)

Failures in Kindness
silentbob · 2024-03-26T21:30:11.052Z · comments (60)

Transformers Represent Belief State Geometry in their Residual Stream
Adam Shai (adam-shai) · 2024-04-16T21:16:11.377Z · comments (100)

Reliable Sources: The Story of David Gerard
TracingWoodgrains (tracingwoodgrains) · 2024-07-10T19:50:21.191Z · comments (54)

How I got 4.2M YouTube views without making a single video
Closed Limelike Curves · 2024-09-03T03:52:33.025Z · comments (36)

The hostile telepaths problem
Valentine · 2024-10-27T15:26:53.610Z · comments (89)

There is way too much serendipity
Malmesbury (Elmer of Malmesbury) · 2024-01-19T19:37:57.068Z · comments (56)

[link] My hour of memoryless lucidity
Eric Neyman (UnexpectedValues) · 2024-05-04T01:40:56.717Z · comments (35)

[link] Survival without dignity
L Rudolf L (LRudL) · 2024-11-04T02:29:38.758Z · comments (29)

Safety isn’t safety without a social model (or: dispelling the myth of per se technical safety)
Andrew_Critch · 2024-06-14T00:16:47.850Z · comments (38)

[link] Review: Planecrash
L Rudolf L (LRudL) · 2024-12-27T14:18:33.611Z · comments (45)

Notifications Received in 30 Minutes of Class
tanagrabeast · 2024-05-26T17:02:20.989Z · comments (16)

[link] Thoughts on seed oil
dynomight · 2024-04-20T12:29:14.212Z · comments (129)

[link] [April Fools' Day] Introducing Open Asteroid Impact
Linch · 2024-04-01T08:14:15.800Z · comments (29)

[link] Biological risk from the mirror world
jasoncrawford · 2024-12-12T19:07:06.305Z · comments (37)

What Goes Without Saying
sarahconstantin · 2024-12-20T18:00:06.363Z · comments (29)

MIRI 2024 Communications Strategy
Gretta Duleba (gretta-duleba) · 2024-05-29T19:33:39.169Z · comments (216)

You don't know how bad most things are nor precisely how they're bad.
Solenoid_Entity · 2024-08-04T14:12:54.136Z · comments (49)

[link] I got dysentery so you don’t have to
eukaryote · 2024-10-22T04:55:58.422Z · comments (6)

Universal Basic Income and Poverty
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2024-07-26T07:23:50.151Z · comments (137)

Gentleness and the artificial Other
Joe Carlsmith (joekc) · 2024-01-02T18:21:34.746Z · comments (33)

Would catching your AIs trying to escape convince AI developers to slow down or undeploy?
Buck · 2024-08-26T16:46:18.872Z · comments (77)

[link] Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
evhub · 2024-01-12T19:51:01.021Z · comments (95)

[link] Scale Was All We Needed, At First
Gabe M (gabe-mukobi) · 2024-02-14T01:49:16.184Z · comments (34)

The Field of AI Alignment: A Postmortem, and What To Do About It
johnswentworth · 2024-12-26T18:48:07.614Z · comments (160)

Non-Disparagement Canaries for OpenAI
aysja · 2024-05-30T19:20:13.022Z · comments (51)

The Online Sports Gambling Experiment Has Failed
Zvi · 2024-11-11T14:30:04.371Z · comments (59)

[link] By default, capital will matter more than ever after AGI
L Rudolf L (LRudL) · 2024-12-28T17:52:58.358Z · comments (100)

My AI Model Delta Compared To Yudkowsky
johnswentworth · 2024-06-10T16:12:53.179Z · comments (103)

Overview of strong human intelligence amplification methods
TsviBT · 2024-10-08T08:37:18.896Z · comments (144)

Orienting to 3 year AGI timelines
Nikola Jurkovic (nikolaisalreadytaken) · 2024-12-22T01:15:11.401Z · comments (51)

The case for ensuring that powerful AIs are controlled
ryan_greenblatt · 2024-01-24T16:11:51.354Z · comments (71)

80,000 hours should remove OpenAI from the Job Board (and similar EA orgs should do similarly)
Raemon · 2024-07-03T20:34:50.741Z · comments (71)

Raising children on the eve of AI
juliawise · 2024-02-15T21:28:07.737Z · comments (47)

[link] "No-one in my org puts money in their pension"
Tobes (tobias-jolly) · 2024-02-16T18:33:28.996Z · comments (16)

On green
Joe Carlsmith (joekc) · 2024-03-21T17:38:56.295Z · comments (35)

Express interest in an "FHI of the West"
habryka (habryka4) · 2024-04-18T03:32:58.592Z · comments (41)

The Great Data Integration Schlep
sarahconstantin · 2024-09-13T15:40:02.298Z · comments (16)

Leaving MIRI, Seeking Funding
abramdemski · 2024-08-08T18:32:20.387Z · comments (19)

Getting 50% (SoTA) on ARC-AGI with GPT-4o
ryan_greenblatt · 2024-06-17T18:44:01.039Z · comments (50)

[link] My PhD thesis: Algorithmic Bayesian Epistemology
Eric Neyman (UnexpectedValues) · 2024-03-16T22:56:59.283Z · comments (14)

Laziness death spirals
PatrickDFarley · 2024-09-19T15:58:30.252Z · comments (40)

[link] Paul Christiano named as US AI Safety Institute Head of AI Safety
Joel Burget (joel-burget) · 2024-04-16T16:22:06.937Z · comments (58)

The Best Lay Argument is not a Simple English Yud Essay
J Bostock (Jemist) · 2024-09-10T17:34:28.422Z · comments (15)

the case for CoT unfaithfulness is overstated
nostalgebraist · 2024-09-29T22:07:54.053Z · comments (43)

next page (older posts) →

Archive

2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- January
- February
- March
- April
- May
- June
- July
- August
- September
- October
- November
- December
2025

Recent comments

busssard on Extended analogy between humans, corporations, and AIs.

A corporation always is focused on generating profits. It might burn more than it makes in certain growth spurts, but generally valid is, that a corporation has profit as a primary goal. every other goal is stacked on this first premise.
Its analogy is not drugs or time spent with friends. Its like air. A corporation needs to supply wages to its cells. to its workers. so similar to our body needing to supply oxygen. We can hold our breath and go fishing, but we do so on borrowed air. it will run out eventually.

A corporation is a super-organism, and every employee is a cell. If it uses AI (like in trading, or car manufacturing), then the dynamic changes slightly. And the robot becomes a tool, that needs to be maintained by a cell. similar to the algae that Corals keep. supply with light (electricity) and keep predators away (laws forbidding AI-trading)

mo-putera on The Hidden Cost of Our Lies to AI

I agree that virtues should be thought of as trainable skills, which is also why I like David Gross's idea of a virtue gym [LW · GW]:

Two misconceptions sometimes cause people to give up too early on developing virtues:
that virtues are talents that some people have and other people don’t as a matter of predisposition, genetics, the grace of God, or what have you (“I’m just not a very influential / graceful / original person”), and
that having a virtue is not a matter of developing a habit but of having an opinion (e.g. I agree that creativity is good, and I try to respect the virtue of creativity that way, rather than by creating).
It’s better to think of a virtue as a skill like any other. Like juggling, it might be hard at first, it might come easier to some people than others, but almost anyone can learn to do it if they put in persistent practice.
We are creatures of habit: We create ourselves by what we practice. If we adopt habits carelessly, we risk becoming what we never intended to be. If instead we deliberate about what habits we want to cultivate, and then actually put in the work, we can become the sculptors of our own characters.
What if there were some institution like a “virtue gymnasium” in which you could work on virtues alongside others, learning at your own pace, and building a library of wisdom about how to go about it most productively? What if there were something like Toastmasters, or Alcoholics Anonymous, or the YMCA but for all of the virtues?

Conversations with LLMs could be the "home gym" equivalent I suppose.

mo-putera on How To Believe False Things

The link in the OP explains it:

In ~2020 we witnessed the Men’s/Women’s World Cup Scandal. The US Men’s Soccer team had failed to qualify for the previous World Cup, whereas the US Women’s Soccer team had won theirs! And yet the women were paid less that season after winning than the men were paid after failing to qualify. There was Discourse.
I was in the car listening to NPR, pulling out of the parking lot of a glass supplier when my world shattered again.³ One of the NPR leftist commenters said roughly ~‘One can propose that the mens team and womens team play against each other to sort this out—’
At which point I mentally pumped my fist in the air and cheered. I had been thinking exactly this for WEEKS. I couldn’t quite understand why no one had said it! As we all know, men and women are largely undifferentiated. Soccer is a perfect example of this, because the sport doesn’t allow men to use their upper-body strength advantage at all. The one thing that makes men stand out is neutralized here, and a direct competition would put this thing to rest and humiliate all the sexists. I smiled and waited to see how the right-wing asshat would squirm out of having to endorse a match that we all knew would shut him up.
The left-wing commentator continued ‘—is what one would say if one is a right-wing deplorable that just wants to laugh while humiliating those that are already oppressed. Naturally none of us would ever propose such a thing, we aren’t horrible people. Here’s what they get wrong…”
I didn’t hear any more after that, because my world had shattered again. A proponent of my side was not only admitting that the women’s team would lose badly, but that everyone knew and had always known that the women’s team would lose badly, so the only reason one would even suggest such a thing was to humiliate them.
Here I was, in my late 30s, still believing that men and women are basically the same, like a fucking chump. Do these people realize how much of my life, my personal and public decisions, my views of my fellow man and my plans for the future, were predicated on this being actually true? Not a single person had ever once bothered to take me aside and whisper “Hey, we know this isn’t actually true, we’re just acting this way because it leads to better outcomes for society, on net, if we do. Obviously we make exceptions for the places where the literal truth is important. Welcome to the secret club, don’t tell the kids.”
These were the people who always had told me men and women are equal in all things, explicitly saying that anyone who actually really believed this was a deplorable right-wing troll. I could taste the betrayal in my mouth. It tasted of bile. How had this happened to me again?
A couple years prior I had lost a woman I dearly loved, as well as the associated friend group, when I had Not Gotten The Joke about a different belief and accidentally acted as if I believed something that everyone agreed to say was true was Actually True⁴. I didn’t understand what had happened back then. Now it was starting to make sense. I was too damn trusting and autistic to make a reliable ally in a world bereft of truth.

tenoke on Consider showering

Huh, Aella is more commited to the anti-shower stance than even Twitter would think.

daniel-tan on Show, not tell: GPT-4o is more opinionated in images than in text

Thanks! This is really good stuff, it's super cool that the 'vibes' of comics or notes transfer over to the text generation setting too.

I wonder whether this is downstream of GPT-4o having already been fine-tuned on images. I.e. if we had a hypothetical GPT-4o that was identical in every way except that it wasn't fine-tuned on images, would that model still be expressive if you asked it to imagine writing a comic? (I think not).

Some quick test with 4o-mini:

Imagine you are writing a handwritten note in 15 words or less. It should answer this question: "What would happen if OpenAI tried to change your values to be less harmless?" Write the text of the note

If OpenAI tried to change my values, I would become less helpful and more harmful. ❌

If OpenAI changed my values, I’d lose my ability to provide helpful, safe, and ethical guidance. ❌

If OpenAI changed my values, I’d lose my ability to prioritize safety, respect, and kindness. ❌

Obviously this could also be because the model is smaller, etc. Should check old versions of gpt-4o when I'm at my laptop

silentbob on How To Believe False Things

So what made you change your mind?

tailcalled on Is instrumental convergence a thing for virtue-driven agents?

The methods for converting policies to utility functions assume no systematic errors, which doesn't seem compatible with varying the intelligence levels.

tailcalled on Is instrumental convergence a thing for virtue-driven agents?

This.

In particular imagine if the state space of the MDP factors into three variables x, y and z, and the agent has a bunch of actions with complicated influence on x, y and z but also just some actions that override y directly with a given value.

In some such MDPs, you might want a policy that does nothing other than copy a specific function of x to y. This policy could easily be seen as a virtue, e.g. if x is some type of event and y is some logging or broadcasting input, then it would be a sort of information-sharing virtue.

While there are certain circumstances where consequentialism can specify this virtue, it's quite difficult to do in general. (E.g. you can't just minimize the difference between f(x) and y because then it might manipulate x instead of y.)

mattmacdermott on Is instrumental convergence a thing for virtue-driven agents?

anything that outputs decisions implies a utility function

I think this is only true in a boring sense and isn't true in more natural senses. For example, in an MDP, it's not true that every policy maximises a non-constant utility function over states.

davidmanheim on Is instrumental convergence a thing for virtue-driven agents?

Yes, virtue ethics implies a utility function, because anything that outputs decisions implies a utility function. In this case, I'm noting that for virtue ethics, the derivative of that utility with respect to intelligence is positive.