LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

AXRP Episode 38.3 - Erik Jenner on Learned Look-Ahead
DanielFilan · 2024-12-12T05:40:06.835Z · comments (0)

Latent Adversarial Training (LAT) Improves the Representation of Refusal
alexandraabbas · 2025-01-06T10:24:53.419Z · comments (6)

[link] Update on the Mysterious Trump Buyers on Polymarket
Annapurna (jorge-velez) · 2024-11-04T19:22:06.540Z · comments (9)

[link] Mechanistic Interpretability of Llama 3.2 with Sparse Autoencoders
PaulPauls · 2024-11-24T05:45:20.124Z · comments (3)

Whistleblowing Twitter Bot
Mckiev · 2024-12-26T04:09:45.493Z · comments (5)

[link] Fragile, Robust, and Antifragile Preference Satisfaction
adamShimi · 2024-11-02T17:25:55.986Z · comments (0)

Measuring Nonlinear Feature Interactions in Sparse Crosscoders [Project Proposal]
Jason Gross (jason-gross) · 2025-01-06T04:22:12.633Z · comments (0)

QFT and neural nets: the basic idea
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-24T13:54:45.099Z · comments (0)

[link] Chess As The Model Game
criticalpoints · 2024-11-17T19:45:26.499Z · comments (0)

[link] Forecast 2025 With Vox's Future Perfect Team — $2,500 Prize Pool
ChristianWilliams · 2024-12-20T23:00:35.334Z · comments (0)

[link] How sci-fi can have drama without dystopia or doomerism
jasoncrawford · 2025-01-17T15:22:00.414Z · comments (3)

subfunctional overlaps in attentional selection history implies momentum for decision-trajectories
Emrik (Emrik North) · 2024-12-22T14:12:49.027Z · comments (1)

Proof Explained for "Robust Agents Learn Causal World Model"
Dalcy (Darcy) · 2024-12-22T15:06:16.880Z · comments (0)

Review: “The Case Against Reality”
David Gross (David_Gross) · 2024-10-29T13:13:29.643Z · comments (9)

[link] Why OpenAI’s Structure Must Evolve To Advance Our Mission
stuhlmueller · 2024-12-28T04:24:19.937Z · comments (1)

A Collection of Empirical Frames about Language Models
Daniel Tan (dtch1997) · 2025-01-02T02:49:05.965Z · comments (0)

Historical Net Worth
jefftk (jkaufman) · 2024-12-07T23:10:01.519Z · comments (1)

minifest
Austin Chen (austin-chen) · 2024-12-07T03:50:38.573Z · comments (1)

Rebuttals for ~all criticisms of AIXI
Cole Wyeth (Amyr) · 2025-01-07T17:41:10.557Z · comments (15)

The Type of Writing that Pushes Women Away
Dahlia (sdjfhkj-dkjfks) · 2025-01-08T18:54:52.070Z · comments (4)

Definition of alignment science I like
quetzal_rainbow · 2025-01-06T20:40:38.187Z · comments (0)

Really radical empathy
MichaelStJules · 2025-01-06T17:46:31.269Z · comments (0)

Higher and lower pleasures
Chris_Leong · 2024-12-05T13:13:46.526Z · comments (3)

AGI with RL is Bad News for Safety
Nadav Brandes (nadav-brandes) · 2024-12-21T19:36:03.970Z · comments (22)

Almost all growth is exponential growth
lemonhope (lcmgcd) · 2025-01-21T07:16:24.686Z · comments (7)

[link] AI safety content you could create
Adam Jones (domdomegg) · 2025-01-06T15:35:56.167Z · comments (0)

Bridging the VLM and mech interp communities for multimodal interpretability
Sonia Joseph (redhat) · 2024-10-28T14:41:41.969Z · comments (5)

Turning up the Heat on Deceptively-Misaligned AI
J Bostock (Jemist) · 2025-01-07T00:13:28.191Z · comments (16)

Write Good Enough Code, Quickly
Oliver Daniels (oliver-daniels-koch) · 2024-12-15T04:45:56.797Z · comments (10)

D/acc AI Security Salon
Allison Duettmann (allison-duettmann) · 2024-10-19T22:17:57.067Z · comments (0)

PIBBSS Fellowship 2025: Bounties and Cooperative AI Track Announcement
DusanDNesic · 2025-01-09T14:23:47.027Z · comments (0)

[link] Can o1-preview find major mistakes amongst 59 NeurIPS '24 MLSB papers?
Abhishaike Mahajan (abhishaike-mahajan) · 2024-12-18T14:21:03.661Z · comments (0)

[question] Does the "ancient wisdom" argument have any validity? If a particular teaching or tradition is old, to what extent does this make it more trustworthy?
SpectrumDT · 2024-11-04T15:20:14.822Z · answers+comments (49)

Reality is Fractal-Shaped
silentbob · 2024-12-17T13:52:16.946Z · comments (1)

We need a universal definition of 'agency' and related words
CstineSublime · 2025-01-11T03:22:56.623Z · comments (1)

Announcing the CLR Foundations Course and CLR S-Risk Seminars
JamesFaville (elephantiskon) · 2024-11-19T01:18:10.085Z · comments (0)

Efficiency spectra and “bucket of circuits” cartoons
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-29T15:06:50.768Z · comments (0)

[link] GPT-4o Guardrails Gone: Data Poisoning & Jailbreak-Tuning
ChengCheng (ccstan99) · 2024-11-01T00:10:50.718Z · comments (0)

Beliefs and state of mind into 2025
RussellThor · 2025-01-10T22:07:01.060Z · comments (9)

[link] A primer on machine learning in cryo-electron microscopy (cryo-EM)
Abhishaike Mahajan (abhishaike-mahajan) · 2024-12-22T15:11:58.860Z · comments (0)

2024 NYC Secular Solstice & Megameetup
Joe Rogero · 2024-11-12T17:46:18.674Z · comments (0)

Word Spaghetti
Gordon Seidoh Worley (gworley) · 2024-10-23T05:39:20.105Z · comments (9)

[link] AI & wisdom 2: growth and amortised optimisation
L Rudolf L (LRudL) · 2024-10-28T21:07:39.449Z · comments (0)

[link] AI & wisdom 3: AI effects on amortised optimisation
L Rudolf L (LRudL) · 2024-10-28T21:08:56.604Z · comments (0)

[link] From the Archives: a story
Richard_Ngo (ricraz) · 2024-12-27T16:36:50.735Z · comments (1)

[question] What is the most impressive game LLMs can play well?
Cole Wyeth (Amyr) · 2025-01-08T19:38:18.530Z · answers+comments (20)

[link] Genesis
PeterMcCluskey · 2024-12-31T22:01:17.277Z · comments (0)

Feature request: comment bookmarks
dirk (abandon) · 2025-01-15T06:45:23.862Z · comments (2)

Monthly Roundup #25: December 2024
Zvi · 2024-12-23T14:20:04.682Z · comments (3)

In the Name of All That Needs Saving
pleiotroth · 2024-11-07T15:26:12.252Z · comments (2)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

kabir-kumar on The Gentle Romance

Sure, but it seems like everyone died at some point anyway, and some collective copies of them went on?
I don't think so. I think they seem to be extremely lonely and sad and the AIs are the only way for them to get any form of empowerment. And each time they try to inch further with empowering themselves with the AIs, it leads to the AI actually getting more powerful and themselves only getting a brief moment of more power, but ultimately degrading in mental capacity. And needing to empower the AI more and more, like an addict needing an ever greater high. Until there is nothing left for them to do, but Die and let the AI become the ultimate power.
I don't particularly care if some non human semisentients manage to be kind of moral/good at coordinating, if it came at what seems to be the cost of all human life.

Even if offscreen all of humanity didn't die, these people dying, killing themselves and never realizing what's actually happening is still insanely horrific and tragic.

flandry39 on What if Alignment is Not Enough?

The only general remarks that I want to make
are in regards to your question about
the model of 150 year long vaccine testing
on/over some sort of sample group and control group.

I notice that there is nothing exponential assumed
about this test object, and so therefore, at most,
the effects are probably multiplicative, if not linear.
Therefore, there are lots of questions about power dynamics
that we can overall safely ignore, as a simplification,
which is in marked contrast to anything involving ASI.

If we assume, as you requested, "no side effects" observed,
in any test group, for any of those things
that we happened to be thinking of, to even look for,
then for any linear system, that is probably "good enough".
But for something that is know for sure to be exponential,
that by itself is not anywhere enough to feel safe.

But what does this really mean?

Since the common and prevailing (world) business culture
is all about maximal profit, and therefore minimal cost,
and also to minimize any possible future responsibility
(or cost) in case anything with the vax goes badly/wrong,
then for anything that might be in the possible category
of unknown unknown risk, I would expect that company
to want to maintain sort of some plausible deniability --
ie; to not look so hard for never-before-seen effects.
Or to otherwise ignore that they exist, or matter, etc.
(just like throughout a lot of ASI risk dialogue).

If there is some long future problem that crops up,
the company can say "we never looked for that"
and "we are not responsible for the unexpected",
because the people who made the deployment choices
have taken their profits and their pleasure in life,
and are now long dead. "Not my Job".

"Don't blame us for the sins of our forefathers".
Similarly, no one is going to ever admit or concede
any point, of any argument, on pain of ego death.
No one will check if it is an exponential system.

So of course, no one is going to want to look into
any sort of issues distinguishing the target effects,
from the also occurring changes in world equilibrium.
They will publish their glowing sanitized safety report,
deploy the product anyway, regardless, and make money.

"Pollution in the world is a public commons problem" --
so no corporation is held responsible for world states.
It has become "fashionable" to ignore long term evolution,
and to also ignore and deny everything about the ethics.

But this does not make the issue of ASI x-risk go away.
X-risks are the generally result of exponential process,
and so the vaccine example is not really that meaningful.

With the presumed ASI levels of actually exponential power,
this is not so much about something like pollution,
as it is about maybe igniting the world atmosphere,
via a mistake in the calculations of the Trinity Test.
Or are you going to deny that Castle Bravo is a thing?

Beyond this one point, my feeling is that your notions
have become a bit too fanciful for me to want respond
too seriously. You can, of course, feel free to
continue to assume and presume whatever you want,
and therefore reach whatever conclusions you want.

anon-user on Do you consider perfect surveillance inevitable?

Yes, potentially less that ASI, and security is definitely an issue, But people breaching the security would hoard their access - there will be periodic high-profile spills (e.g. celebrities engaged in sexual activities, or politicians engaged in something inappropriate would be obvious targets), but I'd expect most of the time people would have at least an illusion of privacy.

anon-user on Can someone, anyone, make superintelligence a more concrete concept?

I found Eliezer Yudkowsky's "blinking stars" story (That Alien Message — https://search.app/uYn3eZxMEi5FWZEw5) persuasive. That story also has a second layer of having the entra smart Earth with better functioning institutions, but at the level of intuition you are going for it is probably unnecessary and would detract from the message. I think imagining a NASA-like organisation dedicated to controlling a remote robot at say 1 cycle of control loop per month (where it is perhaps corresponding to 1/30 of a second for the aliens), showing how totally screwed up the aliens are in this scenario, then flipping it around, should be at least somewhat emotionally persuasive.

sharmake-farah on Rational Unilateralists Aren't So Cursed

This is an interesting post, that while not very relevant on it's own, might become relevant in the future.

More importantly, it's a scenario where rational agents can outperform irrational agents.

+1 for this, which while minor, still matters.

ryan_greenblatt on Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development

The paper says:

Christiano (2019) makes the case that sudden disempowerment is unlikely,

This isn't accurate. The post What failure looks like [LW · GW] includes a scenario involving sudden disempowerment [LW · GW]!

The post does say:

The stereotyped image of AI catastrophe is a powerful, malicious AI system that takes its creators by surprise and quickly achieves a decisive advantage over the rest of humanity.

I think this is probably not what failure will look like,

But, I think it is mostly arguing against threat models involving fast AI capability takeoff (where the level of capabilities take its creators and others by suprise and fast capabilities progress allows for AIs to suddenly become poweful enough to takeover) rather than threat models involving sudden disempowerment from a point where AIs are already well known to be extremely powerful.

jiro on Allegory of the Tsunami

“I’m high up enough that I’ll be safe from the danger,” we think, and sometimes it’s true: we were lucky enough to have been in the right spot. But, when the new water level is so much higher than anyone had imagined, others drown.

Since the two possibilities are "I'm safe" and "I'm not safe", this covers 100% of the possibilities and is a flowery version of a tautology--it says nothing useful.

ryan_greenblatt on Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development

I (remain) skeptical that the sort of failure mode described here is plausible if we solve the problem of aligning individual AI systems with their designers' intentions without this alignment requiring any substantial additional costs (that is, we solve single-single alignment with minimal alignment tax).

This has previously been argued by Vanessa here [LW(p) · GW(p)] and Paul here [LW(p) · GW(p)] in response to a post making a similar claim.

I do worry about human power grabs: some humans obtaining greatly more power as enabled by AI (even if we have no serious alignment issues). However, I don't think this matches the story you describe and the mitigations seem substantially different than what you seem to be imagining.

I'm also somewhat skeptical of the threat model you describe in the case where alignment isn't solved. I think the difference between the story you tell and something more like We get what we measure [AF · GW] is important.

I worry I'm misunderstanding something because I haven't read the paper in detail.

charlie-steiner on Should you publish solutions to corrigibility?

I give the probability that some authority figure would use an order-following AI to get torturous revenge on me (probably for being part of a group they dislike) is quite slim. Maybe one in a few thousand, with more extreme suffering being less likely by a few more orders of magnitude? The probablility that they have me killed for instrumental reasons, or otherwise waste the value of the future by my lights, is mich higher - ten percent-ish, depends on my distribution over who's giving the orders. But this isn't any worse to me than being killed by an AI that wants to replace me with molecular smiley faces.

ete on Fertility Will Never Recover

give up large chunks of the planet to an ASI to prevent that

I know this isn't your main point but.. That isn't a kind of trade that is plausible. Misaligned superintelligence disassembles the entire planet, sun, and everything it can reach [LW · GW]. Biological life does not survive, outside of some weird edge cases like "samples to sell to alien superintelligences that like life". Nothing in the galaxy is safe.