LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

Murder plots are infohazards
Chris Monteiro (chris-topher) · 2025-02-13T19:15:09.749Z · comments (34)

[link] Why Did Elon Musk Just Offer to Buy Control of OpenAI for $100 Billion?
garrison · 2025-02-11T00:20:41.421Z · comments (8)

How to Make Superbabies
GeneSmith · 2025-02-19T20:39:38.971Z · comments (9)

[link] A History of the Future, 2025-2040
L Rudolf L (LRudL) · 2025-02-17T12:03:58.355Z · comments (14)

It's been ten years. I propose HPMOR Anniversary Parties.
Screwtape · 2025-02-16T01:43:14.586Z · comments (1)

[link] A computational no-coincidence principle
Eric Neyman (UnexpectedValues) · 2025-02-14T21:39:39.277Z · comments (26)

The Paris AI Anti-Safety Summit
Zvi · 2025-02-12T14:00:07.383Z · comments (20)

The News is Never Neglected
lsusr · 2025-02-11T14:59:48.323Z · comments (17)

[link] A short course on AGI safety from the GDM Alignment team
Vika · 2025-02-14T15:43:50.903Z · comments (1)

AGI Safety & Alignment @ Google DeepMind is hiring
Rohin Shah (rohinmshah) · 2025-02-17T21:11:18.970Z · comments (9)

Ambiguous out-of-distribution generalization on an algorithmic task
Wilson Wu (wilson-wu) · 2025-02-13T18:24:36.160Z · comments (6)

The Mask Comes Off: A Trio of Tales
Zvi · 2025-02-14T15:30:15.372Z · comments (1)

Dear AGI,
Nathan Young · 2025-02-18T10:48:15.030Z · comments (7)

Levels of Friction
Zvi · 2025-02-10T13:10:07.224Z · comments (0)

My model of what is going on with LLMs
Cole Wyeth (Amyr) · 2025-02-13T03:43:29.447Z · comments (37)

Microplastics: Much Less Than You Wanted To Know
jenn (pixx) · 2025-02-15T19:08:14.561Z · comments (5)

Gauging Interest for a Learning-Theoretic Agenda Mentorship Programme
Vanessa Kosoy (vanessa-kosoy) · 2025-02-16T16:24:57.654Z · comments (2)

[link] Thermodynamic entropy = Kolmogorov complexity
Aram Ebtekar (EbTech) · 2025-02-17T05:56:06.960Z · comments (11)

Arbital has been imported to LessWrong
RobertM (T3t) · 2025-02-20T00:47:33.983Z · comments (4)

[link] How do we solve the alignment problem?
Joe Carlsmith (joekc) · 2025-02-13T18:27:27.712Z · comments (8)

How might we safely pass the buck to AI?
joshc (joshua-clymer) · 2025-02-19T17:48:32.249Z · comments (17)

On Deliberative Alignment
Zvi · 2025-02-11T13:00:07.683Z · comments (1)

Not all capabilities will be created equal: focus on strategically superhuman agents
benwr · 2025-02-13T01:24:46.084Z · comments (4)

≤10-year Timelines Remain Unlikely Despite DeepSeek and o3
Rafael Harth (sil-ver) · 2025-02-13T19:21:35.392Z · comments (50)

Celtic Knots on Einstein Lattice
Ben (ben-lang) · 2025-02-16T15:56:06.888Z · comments (11)

Do models know when they are being evaluated?
Govind Pimpale (govind-pimpale) · 2025-02-17T23:13:22.017Z · comments (0)

Skepticism towards claims about the views of powerful institutions
tlevin (trevor) · 2025-02-13T07:40:52.257Z · comments (2)

[link] Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs
Matrice Jacobine · 2025-02-12T09:15:07.793Z · comments (36)

Virtue signaling, and the "humans-are-wonderful" bias, as a trust exercise
lc · 2025-02-13T06:59:17.525Z · comments (16)

Go Grok Yourself
Zvi · 2025-02-19T20:20:09.371Z · comments (1)

Self-dialogue: Do behaviorist rewards make scheming AGIs?
Steven Byrnes (steve2152) · 2025-02-13T18:39:37.770Z · comments (0)

Extended analogy between humans, corporations, and AIs.
Daniel Kokotajlo (daniel-kokotajlo) · 2025-02-13T00:03:13.956Z · comments (1)

Proof idea: SLT to AIT
Lucius Bushnaq (Lblack) · 2025-02-10T23:14:24.538Z · comments (6)

Eliezer's Lost Alignment Articles / The Arbital Sequence
Ruby · 2025-02-20T00:48:10.338Z · comments (0)

How accurate was my "Altered Traits" book review?
lsusr · 2025-02-18T17:00:55.584Z · comments (3)

AI #103: Show Me the Money
Zvi · 2025-02-13T15:20:07.057Z · comments (9)

Nonpartisan AI safety
Yair Halberstadt (yair-halberstadt) · 2025-02-10T14:55:50.913Z · comments (4)

[link] Hunting for AI Hackers: LLM Agent Honeypot
Reworr R (reworr-reworr) · 2025-02-12T20:29:32.269Z · comments (0)

Notes on Occam via Solomonoff vs. hierarchical Bayes
JesseClifton · 2025-02-10T17:55:14.689Z · comments (7)

[link] SuperBabies podcast with Gene Smith
Eneasz · 2025-02-19T19:36:49.852Z · comments (1)

Why you maybe should lift weights, and How to.
samusasuke · 2025-02-12T05:15:32.011Z · comments (29)

Celtic Knots on a hex lattice
Ben (ben-lang) · 2025-02-14T14:29:08.223Z · comments (10)

[link] What is it to solve the alignment problem?
Joe Carlsmith (joekc) · 2025-02-13T18:42:07.215Z · comments (5)

Monthly Roundup #27: February 2025
Zvi · 2025-02-17T14:10:06.486Z · comments (3)

Knitting a Sweater in a Burning House
CrimsonChin · 2025-02-15T19:50:33.275Z · comments (2)

Abstract Mathematical Concepts vs. Abstractions Over Real-World Systems
Thane Ruthenis · 2025-02-18T18:04:46.717Z · comments (9)

[question] Should Open Philanthropy Make an Offer to Buy OpenAI?
mrtreasure · 2025-02-14T23:18:01.929Z · answers+comments (1)

World Citizen Assembly about AI - Announcement
Camille Berger (Camille Berger) · 2025-02-11T10:51:56.948Z · comments (1)

Medical Roundup #4
Zvi · 2025-02-18T13:40:06.574Z · comments (1)

[link] Notes on the Presidential Election of 1836
Arjun Panickssery (arjun-panickssery) · 2025-02-13T23:40:23.224Z · comments (0)

next page (older posts) →

Archive

Recent comments

lwlw on How to Make Superbabies

I’m sure you’ve already thought about this, but it seems like the people who would be willing and able to jump through all of the hoops necessary would likely have a higher propensity towards power-seeking and dominance. So if you don’t edit the personality as well, what was it all for besides creating a smarter god-emperor? I think that in the sane world you’ve outlined where people deliberately avoid developing AGI, an additional level of sanity would be holding off on modifying intelligence until we have the capacity to perform the personality edits to make it safe.

I can just imagine this turning into a world where the rich who are able to make their children superbabies compete with the rest of the elite over whose child will end up ruling the world.

habryka4 on Noah Birnbaum's Shortform

Someone I trust on this says:

AFAICT what's going on here is just that AISI and CHIPS are getting hit especially hard by the decision to fire probationary staff across USG, since they're new and therefore have lots of probationary staff - it's not an indication (yet) that either office is being targeted to be killed

morpheus on [deleted]

I noticed the tag posts imported from Arbital that haven't been edited on LW yet can't be found when searching those tags from the "Add Tags" button above posts. Adding ineffective edits like spaces at the end of a paragraph seems to fix that problem.

morpheus on [deleted]

I noticed the tag posts imported from Arbital that haven't been edited on LW yet can't be found when searching those tags from the "Add Tags" button above posts. Adding ineffective edits like spaces at the end of a paragraph seems to fix that problem.

yonatan-cale-1 on Yonatan Cale's Shortform

This is very cool, thanks!
1. I'm tempted to add Claude support
It isn't exactly what I'm going for. Example use cases I have in mind:
1. "Here's a list of projects I'm considering working on, and I'm adding curxes/considerations for each"
2. "Here's my new alignment research agenda" (can an AI suggest places where this research is wrong? Seems like checking this would help the Control agenda?)
3. "Here's a cost-effectiveness analysis of an org"

kvmanthinking on Why do we have the NATO logo?

Ah. Thanks! (by the way, when these questions get answered, should I take them down or leave them up for others?)

genesmith on How to Make Superbabies

Very little at the moment. Unlike intelligence and health, a lot of the variance in personality traits seems to be the result of combinations of genes rather than purely additive effects.

This is one of the few areas where AI could potentially make a big difference. You need more complex models to figure out the relationship between genes and personality.

But the actual limiting factor right now is not model complexity, but rather data. Even if you have more complex models, I don't think you're going to be able to actually train them until you have a lot more data. Probably a minimum of a few million samples.

We'd like to look into this problem at some point and make scaling law graphs like the ones we made for intelligence and disease risk but haven't had the time yet.

lwlw on How to Make Superbabies

How much do people know about the genetic components of personality traits like empathy? Editing personality traits might be almost as or even more controversial than modifying “vanity” traits. But in the sane world you sketched out this could essentially be a very trivial and simple first step of alignment. “We are about to introduce agents more capable than any humans except for extreme outliers: let’s make them nice.” Also, curing personality disorders like NPD and BPD would do a lot of good for subjective wellbeing.

I guess I’m just thinking of a failure mode where we create superbabies who solve task-alignment and then control the world. The people running the world might be smarter than the current candidates for god-emperor, but we’re still in a god-emperor world. This also seems like the part of the plan most likely to fail. The people who would pursue making their children superbabies might be disinclined towards making their children more caring.

morpheus on Static Place AI Makes AGI Redundant: Multiversal AI Alignment & Rational Utopia

I didn't downvote, but my impression is the post seems to hand-wave away a lot of problems and gives the impression you haven't actually thought clearly and in detail about whether the ideas you propose here are feasible.

Some people have been thinking for quite some time now that an AI that wants to be changed would be great, but that it's not that easy to create one, so how is your proposal different? Maybe checkout the corrigibility tag [? · GW]. Figuring out which desiderata are actually feasible to implement and how is the hard part. Same goes for your Matroshka bunkers. What useful work are you getting out of your 100% safe Matroshka bunkers? After you thought about that for 5 minutes+, maybe checkout the AI boxing tag [? · GW] and the AI oracle tag [? · GW]. Maybe there is something to the reversibility idea ¯\_(ツ)_/¯.

Also using so many tags gives a bad impression ("AI Timelines"? "Tiling Agents"? "Infinities in Ethics"?). Read the description of the tags.

habryka4 on Arbital has been imported to LessWrong

The central problem of any wiki system is ^[1]"what edits do you accept to a wiki page?". The lenses system is trying to provide a better answer to that question.

My default experience on e.g. Wikipedia when I am on pages where I am highly familiar with the domain is "man, I could write a much better page". But writing a whole better page is a lot of effort, and the default consequence of rewriting the page is that the editor who wrote the previous page advocates for your edits to be reverted, because they are attached to their version of the page.

With lenses, if you want to suggest large changes to a wiki page, your default action is now "write a new lens". This leaves the work of the previous authors intact, while still giving your now page the potential for substantial readership. Lenses are sorted in order of how many people like them. If you think you can write a better lens, you can make a new lens, and if it's better, it can replace the original lens after it got traction.

More broadly, wikis suffer a lot from everything feeling like it is written by a committee. Lenses enable more individual authorship, while still trying to have some collective iteration on canonicity and structure of the wiki.

^{^}
Well, after you have solved the problem of "does anyone care about this wiki?"