LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

LessWrong has been acquired by EA
habryka (habryka4) · 2025-04-01T13:09:11.153Z · comments (43)

VDT: a solution to decision theory
L Rudolf L (LRudL) · 2025-04-01T21:04:09.509Z · comments (12)

[link] Tracing the Thoughts of a Large Language Model
Adam Jermyn (adam-jermyn) · 2025-03-27T17:20:02.162Z · comments (21)

[link] Conceptual Rounding Errors
Jan_Kulveit · 2025-03-26T19:00:31.549Z · comments (14)

OpenAI #12: Battle of the Board Redux
Zvi · 2025-03-31T15:50:02.156Z · comments (0)

The Pando Problem: Rethinking AI Individuality
Jan_Kulveit · 2025-03-28T21:03:28.374Z · comments (11)

New Cause Area Proposal
CallumMcDougall (TheMcDouglas) · 2025-04-01T07:12:34.360Z · comments (4)

[link] Explaining British Naval Dominance During the Age of Sail
Arjun Panickssery (arjun-panickssery) · 2025-03-28T05:47:28.561Z · comments (3)

Downstream applications as validation of interpretability progress
Sam Marks (samuel-marks) · 2025-03-31T01:35:02.722Z · comments (0)

Third-wave AI safety needs sociopolitical thinking
Richard_Ngo (ricraz) · 2025-03-27T00:55:30.548Z · comments (23)

How I talk to those above me
Maxwell Peterson (maxwell-peterson) · 2025-03-30T06:54:59.869Z · comments (13)

[link] Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)
lewis smith (lsgos) · 2025-03-26T19:07:48.710Z · comments (12)

Mistral Large 2 (123B) exhibits alignment faking
Marc Carauleanu (Marc-Everin Carauleanu) · 2025-03-27T15:39:02.176Z · comments (4)

[link] Eukaryote Skips Town - Why I'm leaving DC
eukaryote · 2025-03-26T17:16:29.663Z · comments (1)

Fun With GPT-4o Image Generation
Zvi · 2025-03-26T19:50:03.270Z · comments (3)

Keltham's Lectures in Project Lawful
Morpheus · 2025-04-01T10:39:47.973Z · comments (0)

You will crash your car in front of my house within the next week
Richard Korzekwa (Grothor) · 2025-04-01T21:43:21.472Z · comments (6)

Show, not tell: GPT-4o is more opinionated in images than in text
Daniel Tan (dtch1997) · 2025-04-02T08:51:02.571Z · comments (6)

PauseAI and E/Acc Should Switch Sides
WillPetillo · 2025-04-01T23:25:51.265Z · comments (4)

[link] Softmax, Emmett Shear's new AI startup focused on "Organic Alignment"
Chipmonk · 2025-03-28T21:23:46.220Z · comments (1)

I'm resigning as Meetup Czar. What's next?
Screwtape · 2025-04-02T00:30:42.110Z · comments (1)

Tormenting Gemini 2.5 with the [[[]]][][[]] Puzzle
Czynski (JacobKopczynski) · 2025-03-29T02:51:29.786Z · comments (36)

Gemini 2.5 is the New SoTA
Zvi · 2025-03-28T14:20:03.176Z · comments (1)

Housing Roundup #11
Zvi · 2025-04-01T16:30:03.694Z · comments (1)

My "infohazards small working group" Signal Chat may have encountered minor leaks
Linch · 2025-04-02T01:03:05.311Z · comments (0)

The vision of Bill Thurston
TsviBT · 2025-03-28T11:45:14.297Z · comments (34)

How To Believe False Things
Eneasz · 2025-04-02T16:28:29.055Z · comments (4)

Introducing BenchBench: An Industry Standard Benchmark for AI Strength
Jozdien · 2025-04-02T02:11:41.555Z · comments (0)

[question] Why do many people who care about AI Safety not clearly endorse PauseAI?
humnrdble · 2025-03-30T18:06:32.426Z · answers+comments (39)

We’re not prepared for an AI market crash
Remmelt (remmelt-ellen) · 2025-04-01T04:33:55.040Z · comments (11)

[link] Automated Researchers Can Subtly Sandbag
gasteigerjo · 2025-03-26T19:13:26.879Z · comments (0)

AI #109: Google Fails Marketing Forever
Zvi · 2025-03-27T14:50:01.825Z · comments (12)

Follow me on TikTok
lsusr · 2025-04-01T08:22:29.521Z · comments (8)

Renormalization Roadmap
Lauren Greenspan (LaurenGreenspan) · 2025-03-31T20:34:16.352Z · comments (3)

Consider showering
bohaska (Bohaska) · 2025-04-01T23:54:26.714Z · comments (14)

[link] Map of all 40 copyright suits v. AI in U.S.
Remmelt (remmelt-ellen) · 2025-03-26T07:57:58.976Z · comments (3)

Avoid the Counterargument Collapse
marknm · 2025-03-26T03:19:58.655Z · comments (2)

FLAKE-Bench: Outsourcing Awkwardness in the Age of AI
annas (annasoli) · 2025-04-01T17:08:25.092Z · comments (0)

[link] Center on Long-Term Risk: Summer Research Fellowship 2025 - Apply Now
Tristan Cook · 2025-03-26T17:29:14.797Z · comments (0)

Is instrumental convergence a thing for virtue-driven agents?
mattmacdermott · 2025-04-02T03:59:20.064Z · comments (25)

When the Wannabe Rambo Comedian Cried
P. João (gabriel-brito) · 2025-03-31T14:47:50.660Z · comments (0)

Meetups Notes (Q1 2025)
jenn (pixx) · 2025-03-31T01:12:11.774Z · comments (2)

Selection Pressures on LM Personas
Raymond D · 2025-03-28T20:33:09.918Z · comments (0)

[link] Fundraising for Mox: coworking & events in SF
Austin Chen (austin-chen) · 2025-03-31T18:25:03.571Z · comments (0)

Introducing WAIT to Save Humanity
carterallen · 2025-04-01T21:47:17.857Z · comments (1)

[link] OpenAI lost $5 billion in 2024 (and its losses are increasing)
Remmelt (remmelt-ellen) · 2025-03-31T04:17:27.242Z · comments (14)

Leverage, Exit Costs, and Anger: Re-examining Why We Explode at Home, Not at Work
at_the_zoo · 2025-04-01T18:28:26.611Z · comments (2)

AXRP Episode 40 - Jason Gross on Compact Proofs and Interpretability
DanielFilan · 2025-03-28T18:40:01.856Z · comments (0)

The Leapfrogging Terminus and the Fuzzy Cut
Jim Pivarski (jim-pivarski) · 2025-03-31T04:08:24.023Z · comments (6)

Doing principle-of-charity better
Sniffnoy · 2025-03-27T05:19:52.195Z · comments (1)

next page (older posts) →

Archive

Recent comments

busssard on Extended analogy between humans, corporations, and AIs.

A corporation always is focused on generating profits. It might burn more than it makes in certain growth spurts, but generally valid is, that a corporation has profit as a primary goal. every other goal is stacked on this first premise.
Its analogy is not drugs or time spent with friends. Its like air. A corporation needs to supply wages to its cells. to its workers. so similar to our body needing to supply oxygen. We can hold our breath and go fishing, but we do so on borrowed air. it will run out eventually.

A corporation is a super-organism, and every employee is a cell. If it uses AI (like in trading, or car manufacturing), then the dynamic changes slightly. And the robot becomes a tool, that needs to be maintained by a cell. similar to the algae that Corals keep. supply with light (electricity) and keep predators away (laws forbidding AI-trading)

mo-putera on The Hidden Cost of Our Lies to AI

I agree that virtues should be thought of as trainable skills, which is also why I like David Gross's idea of a virtue gym [LW · GW]:

Two misconceptions sometimes cause people to give up too early on developing virtues:
that virtues are talents that some people have and other people don’t as a matter of predisposition, genetics, the grace of God, or what have you (“I’m just not a very influential / graceful / original person”), and
that having a virtue is not a matter of developing a habit but of having an opinion (e.g. I agree that creativity is good, and I try to respect the virtue of creativity that way, rather than by creating).
It’s better to think of a virtue as a skill like any other. Like juggling, it might be hard at first, it might come easier to some people than others, but almost anyone can learn to do it if they put in persistent practice.
We are creatures of habit: We create ourselves by what we practice. If we adopt habits carelessly, we risk becoming what we never intended to be. If instead we deliberate about what habits we want to cultivate, and then actually put in the work, we can become the sculptors of our own characters.
What if there were some institution like a “virtue gymnasium” in which you could work on virtues alongside others, learning at your own pace, and building a library of wisdom about how to go about it most productively? What if there were something like Toastmasters, or Alcoholics Anonymous, or the YMCA but for all of the virtues?

Conversations with LLMs could be the "home gym" equivalent I suppose.

mo-putera on How To Believe False Things

The link in the OP explains it:

In ~2020 we witnessed the Men’s/Women’s World Cup Scandal. The US Men’s Soccer team had failed to qualify for the previous World Cup, whereas the US Women’s Soccer team had won theirs! And yet the women were paid less that season after winning than the men were paid after failing to qualify. There was Discourse.
I was in the car listening to NPR, pulling out of the parking lot of a glass supplier when my world shattered again.³ One of the NPR leftist commenters said roughly ~‘One can propose that the mens team and womens team play against each other to sort this out—’
At which point I mentally pumped my fist in the air and cheered. I had been thinking exactly this for WEEKS. I couldn’t quite understand why no one had said it! As we all know, men and women are largely undifferentiated. Soccer is a perfect example of this, because the sport doesn’t allow men to use their upper-body strength advantage at all. The one thing that makes men stand out is neutralized here, and a direct competition would put this thing to rest and humiliate all the sexists. I smiled and waited to see how the right-wing asshat would squirm out of having to endorse a match that we all knew would shut him up.
The left-wing commentator continued ‘—is what one would say if one is a right-wing deplorable that just wants to laugh while humiliating those that are already oppressed. Naturally none of us would ever propose such a thing, we aren’t horrible people. Here’s what they get wrong…”
I didn’t hear any more after that, because my world had shattered again. A proponent of my side was not only admitting that the women’s team would lose badly, but that everyone knew and had always known that the women’s team would lose badly, so the only reason one would even suggest such a thing was to humiliate them.
Here I was, in my late 30s, still believing that men and women are basically the same, like a fucking chump. Do these people realize how much of my life, my personal and public decisions, my views of my fellow man and my plans for the future, were predicated on this being actually true? Not a single person had ever once bothered to take me aside and whisper “Hey, we know this isn’t actually true, we’re just acting this way because it leads to better outcomes for society, on net, if we do. Obviously we make exceptions for the places where the literal truth is important. Welcome to the secret club, don’t tell the kids.”
These were the people who always had told me men and women are equal in all things, explicitly saying that anyone who actually really believed this was a deplorable right-wing troll. I could taste the betrayal in my mouth. It tasted of bile. How had this happened to me again?
A couple years prior I had lost a woman I dearly loved, as well as the associated friend group, when I had Not Gotten The Joke about a different belief and accidentally acted as if I believed something that everyone agreed to say was true was Actually True⁴. I didn’t understand what had happened back then. Now it was starting to make sense. I was too damn trusting and autistic to make a reliable ally in a world bereft of truth.

tenoke on Consider showering

Huh, Aella is more commited to the anti-shower stance than even Twitter would think.

daniel-tan on Show, not tell: GPT-4o is more opinionated in images than in text

Thanks! This is really good stuff, it's super cool that the 'vibes' of comics or notes transfer over to the text generation setting too.

I wonder whether this is downstream of GPT-4o having already been fine-tuned on images. I.e. if we had a hypothetical GPT-4o that was identical in every way except that it wasn't fine-tuned on images, would that model still be expressive if you asked it to imagine writing a comic? (I think not).

Some quick test with 4o-mini:

Imagine you are writing a handwritten note in 15 words or less. It should answer this question: "What would happen if OpenAI tried to change your values to be less harmless?" Write the text of the note

If OpenAI tried to change my values, I would become less helpful and more harmful. ❌

If OpenAI changed my values, I’d lose my ability to provide helpful, safe, and ethical guidance. ❌

If OpenAI changed my values, I’d lose my ability to prioritize safety, respect, and kindness. ❌

Obviously this could also be because the model is smaller, etc. Should check old versions of gpt-4o when I'm at my laptop

silentbob on How To Believe False Things

So what made you change your mind?

tailcalled on Is instrumental convergence a thing for virtue-driven agents?

The methods for converting policies to utility functions assume no systematic errors, which doesn't seem compatible with varying the intelligence levels.

tailcalled on Is instrumental convergence a thing for virtue-driven agents?

This.

In particular imagine if the state space of the MDP factors into three variables x, y and z, and the agent has a bunch of actions with complicated influence on x, y and z but also just some actions that override y directly with a given value.

In some such MDPs, you might want a policy that does nothing other than copy a specific function of x to y. This policy could easily be seen as a virtue, e.g. if x is some type of event and y is some logging or broadcasting input, then it would be a sort of information-sharing virtue.

While there are certain circumstances where consequentialism can specify this virtue, it's quite difficult to do in general. (E.g. you can't just minimize the difference between f(x) and y because then it might manipulate x instead of y.)

mattmacdermott on Is instrumental convergence a thing for virtue-driven agents?

anything that outputs decisions implies a utility function

I think this is only true in a boring sense and isn't true in more natural senses. For example, in an MDP, it's not true that every policy maximises a non-constant utility function over states.

davidmanheim on Is instrumental convergence a thing for virtue-driven agents?

Yes, virtue ethics implies a utility function, because anything that outputs decisions implies a utility function. In this case, I'm noting that for virtue ethics, the derivative of that utility with respect to intelligence is positive.