LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Inference-Only Debate Experiments Using Math Problems
Arjun Panickssery (arjun-panickssery) · 2024-08-06T17:44:27.293Z · comments (0)

Quick evidence review of bulking & cutting
jp · 2024-04-04T21:43:48.534Z · comments (5)

Book review: the Iliad
philh · 2024-06-18T18:50:21.941Z · comments (2)

[link] Thoughts on Zero Points
depressurize (anchpop) · 2024-04-23T02:22:27.448Z · comments (1)

Extracting SAE task features for in-context learning
Dmitrii Kharlapenko (dmitrii-kharlapenko) · 2024-08-12T20:34:13.747Z · comments (1)

Mapping the semantic void II: Above, below and between token embeddings
mwatkins · 2024-02-15T23:00:09.010Z · comments (4)

[link] self-fulfilling prophecies when applying for funding
Chipmonk · 2024-03-01T19:01:40.991Z · comments (0)

Good Bings copy, great Bings steal
dr_s · 2024-04-21T09:52:46.658Z · comments (6)

Falling fertility explanations and Israel
Yair Halberstadt (yair-halberstadt) · 2024-04-03T03:27:38.564Z · comments (4)

[link] New report: A review of the empirical evidence for existential risk from AI via misaligned power-seeking
Harlan · 2024-04-04T23:41:26.439Z · comments (5)

UDT1.01: Plannable and Unplanned Observations (3/10)
Diffractor · 2024-04-12T05:24:34.435Z · comments (0)

D&D.Sci (Easy Mode): On The Construction Of Impossible Structures [Evaluation and Ruleset]
abstractapplic · 2024-05-20T09:38:55.228Z · comments (2)

I was raised by devout Mormons, AMA [&|] Soliciting Advice
ErioirE (erioire) · 2024-03-13T16:52:19.130Z · comments (41)

Retrospective: PIBBSS Fellowship 2023
DusanDNesic · 2024-02-16T17:48:32.151Z · comments (1)

[link] Forecasting future gains due to post-training enhancements
elifland · 2024-03-08T02:11:57.228Z · comments (2)

Some Things That Increase Blood Flow to the Brain
romeostevensit · 2024-03-27T21:48:46.244Z · comments (15)

Protestants Trading Acausally
Martin Sustrik (sustrik) · 2024-04-01T14:46:26.374Z · comments (4)

[link] introduction to thermal conductivity and noise management
bhauth · 2024-03-06T23:14:02.288Z · comments (1)

A more systematic case for inner misalignment
Richard_Ngo (ricraz) · 2024-07-20T05:03:03.500Z · comments (4)

[LDSL#6] When is quantification needed, and when is it hard?
tailcalled · 2024-08-13T20:39:45.481Z · comments (0)

[LDSL#1] Performance optimization as a metaphor for life
tailcalled · 2024-08-08T16:16:27.349Z · comments (6)

Music in the AI World
Martin Sustrik (sustrik) · 2024-08-16T04:20:01.706Z · comments (8)

[link] [Linkpost] Statement from Scarlett Johansson on OpenAI's use of the "Sky" voice, that was shockingly similar to her own voice.
Linch · 2024-05-20T23:50:28.138Z · comments (8)

[link] A progress policy agenda
jasoncrawford · 2024-12-19T18:42:37.327Z · comments (1)

Eliciting bad contexts
Geoffrey Irving · 2025-01-24T10:39:39.358Z · comments (5)

Two Weeks Without Sweets
jefftk (jkaufman) · 2024-12-31T03:30:02.003Z · comments (0)

Compute and size limits on AI are the actual danger
Shmi (shminux) · 2024-11-23T21:29:37.433Z · comments (5)

People aren't properly calibrated on FrontierMath
cakubilo · 2024-12-23T19:35:44.467Z · comments (4)

[link] Epistemic states as a potential benign prior
Tamsin Leake (carado-1) · 2024-08-31T18:26:14.093Z · comments (2)

Book Review: What Even Is Gender?
Joey Marcellino · 2024-09-01T16:09:27.773Z · comments (14)

Mentorship in AGI Safety (MAGIS) call for mentors
Valentin2026 (Just Learning) · 2024-05-23T18:28:03.173Z · comments (3)

[link] A Narrative History of Environmentalism's Partisanship
Jeffrey Heninger (jeffrey-heninger) · 2024-05-14T16:51:01.029Z · comments (3)

Context-dependent consequentialism
Jeremy Gillen (jeremy-gillen) · 2024-11-04T09:29:24.310Z · comments (6)

AI labs can boost external safety research
Zach Stein-Perlman · 2024-07-31T19:30:16.207Z · comments (1)

Balancing Label Quantity and Quality for Scalable Elicitation
Alex Mallen (alex-mallen) · 2024-10-24T16:49:00.939Z · comments (1)

Apply to MATS 7.0!
Ryan Kidd (ryankidd44) · 2024-09-21T00:23:49.778Z · comments (0)

[link] What is it like to be psychologically healthy? Podcast ft. DaystarEld
Chipmonk · 2024-10-05T19:14:04.743Z · comments (8)

Incentive design and capability elicitation
Joe Carlsmith (joekc) · 2024-11-12T20:56:05.088Z · comments (0)

Elon Musk and Solar Futurism
transhumanist_atom_understander · 2024-12-21T02:55:28.554Z · comments (27)

Subjective Naturalism in Decision Theory: Savage vs. Jeffrey–Bolker
Daniel Herrmann (Whispermute) · 2025-02-04T20:34:22.625Z · comments (15)

Quantum without complication
Optimization Process · 2025-01-16T08:53:11.347Z · comments (2)

AI #85: AI Wins the Nobel Prize
Zvi · 2024-10-10T13:40:07.286Z · comments (6)

[link] Safety tax functions
owencb · 2024-10-20T14:08:38.099Z · comments (0)

Some comments on intelligence
Viliam · 2024-08-01T15:17:07.215Z · comments (5)

Call for evaluators: Participate in the European AI Office workshop on general-purpose AI models and systemic risks
Tom DAVID (tom-david) · 2024-11-27T02:54:16.263Z · comments (0)

The Intentional Stance, LLMs Edition
Eleni Angelou (ea-1) · 2024-04-30T17:12:29.005Z · comments (3)

Disagreement on AGI Suggests It’s Near
tangerine · 2025-01-07T20:42:43.456Z · comments (15)

AI Safety Seed Funding Network - Join as a Donor or Investor
Alexandra Bos (AlexandraB) · 2024-12-16T19:30:43.812Z · comments (0)

I played the AI box game as the Gatekeeper — and lost
datawitch · 2024-02-12T18:39:35.777Z · comments (53)

Two Tales of AI Takeover: My Doubts
Violet Hour · 2024-03-05T15:51:05.558Z · comments (8)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

raemon on Assume Bad Faith

I am sort of confused at what you got from this post. You say "But also, it doesn't do the core intellectual work of replacing a pointer with its substance.", but, I think he explicitly does that here?

What does "bad faith" mean, though? It doesn't mean "with ill intent." Following Wikipedia, bad faith is "a sustained form of deception which consists of entertaining or pretending to entertain one set of feelings while acting as if influenced by another." The great encyclopedia goes on to provide examples: the solider who waves a flag of surrender but then fires when the enemy comes out of their trenches, the attorney who prosecutes a case she knows to be false, the representative of a company facing a labor dispute who comes to the negotiating table with no intent of compromising.
That is, bad faith is when someone's apparent reasons for doing something aren't the same as the real reasons. This is distinct from malign intent. The uniformed solider who shoots you without pretending to surrender is acting in good faith, because what you see is what you get: the man whose clothes indicate that his job is to try to kill you is, in fact, trying to kill you.

It feels like you wanted some entirely different post, about how to navigate when someone accuses someone of bad faith, for a variety of possible definitions of bad faith? (As opposed to this post, which is mostly saying "Avoid accusing people of bad faith. Instead, do some kind of more specific and useful thing." Which honestly seems like good advice to me even for most people who are using the phrase to mean something else. "I disapprove of what you're doing here and am leaving now" seems totally fine)

sloonz on We Fell For It

I wish that both the "tip top" and the "medium top" of educated people had consensus that both "property is downstream of power" and "prices are downstream of supply and demand".

I really don’t like that quote. I can see multiple interpretations for "property is downstream of power", many of them subject to debate, and none of the firm grounding (and non-ambiguity) of Econ 101 enjoyed by "price are downstream of supply and demand".

It looks like to me a false equivalence, trying to say "see ? both sides clearly have a point", where one point is actually clear, and the other isn’t so clear.

Would you mind clarifying what the consensus should be exactly on this point ? Maybe I’m just not interpreting "property is downstream of power" correctly.

david-scott-krueger-formerly-capybaralet on Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development

First, RE the role of "solving alignment" in this discussion, I just want to note that:
1) I disagree that alignment solves gradual disempowerment problems.
2) Even if it would that does not imply that gradual disempowerment problems aren't important (since we can't assume alignment will be solved).
3) I'm not sure what you mean by "alignment is solved"; I'm taking it to mean "AI systems can be trivially intent aligned". Such a system may still say things like "Well, I can build you a successor that I think has only a 90% chance of being aligned, but will make you win (e.g. survive) if it is aligned. Is that what you want?" and people can respond with "yes" -- this is the sort of thing that probably still happens IMO.
4) Alternatively, you might say we're in the "alignment basin" -- I'm not sure what that means, precisely, but I would operationalize it as something like "the AI system is playing a roughly optimal CIRL game". It's unclear how good of performance that can yield in practice (e.g. it can't actually be optimal due to compute limitations), but I suspect it still leaves significant room for fuck-ups.
5) I'm more interested in the case where alignment is not "perfectly" "solved", and so there are simply clear and obvious opportunities to trade-off safety and performance; I think this is much more realistic to consider.
6) I expect such trade-off opportunities to persist when it comes to assurance (even if alignment is solved), since I expect high-quality assurance to be extremely costly. And it is irresponsible (because it's subjectively risky) to trust a perfectly aligned AI system absent strong assurances. But of course, people who are willing to YOLO it and just say "seems aligned, let's ship" will win. This is also part of the problem...

My main response, at a high level:
Consider a simple model:

We have 2 human/AI teams in competition with each other, A and B.
A and B both start out with the humans in charge, and then decide whether the humans should stay in charge for the next week.
Whichever group has more power at the end of the week survives.
The humans in A ask their AI to make A as powerful as possible at the end of the week.
The humans in B ask their AI to make B as powerful as possible at the end of the week, subject to the constraint that the humans in B are sure to stay in charge.

I predict that group A survives, but the humans are no longer in power. I think this illustrates the basic dynamic.

Responding to some particular points below:

Sure, but these things don't result in non-human entities obtaining power right?

Yes, they do; they result in beaurocracies and automated decision-making systems obtaining power. People were already having to implement and interact with stupid automated decision-making systems before AI came along.

Like usually these are somewhat negative sum, but mostly just involve inefficient transfer of power. I don't see why these mechanisms would on net transfer power from human control of resources to some other control of resources in the long run. To consider the most extreme case, why would these mechanisms result in humans or human appointed successors not having control of what compute is doing in the long run?

My main claim was not that these are mechanisms of human disempowerment (although I think they are), but rather that they are indicators of the overall low level of functionality of the world.

teradimich on The Risk of Gradual Disempowerment from AI

But your P(doom) still only 0.6? Or are you considering disempowerment from AI separately?

knight-lee on ozziegooen's Shortform

I do think that convincing the government to pause AI in a way which sacrifices $3000 billion economic value, is relatively easier than directly spending $3000 billion on AI safety.

Maybe spending $1 is similarly hard to sacrificing $10-$100 of future economic value via preemptive regulation.

But $0.1 billion AI safety spending is so ridiculously little (1000 times less than capabilities spending), increasing it may still be the "easiest" thing to do. Of course we should still push for regulation at the same time (it doesn't hurt).

PS: what do you think of my open letter idea [LW · GW] for convincing the government to increase funding?

unnamed on Assume Bad Faith

The "Taboo bad faith" title doesn't fit this post. I had hoped from the opening section that it was going in that direction, but it did not.

Most obviously, the post kept relying heavily on the terms "bad faith" and "good faith" and that conceptual distinction, rather than tabooing them.

But also, it doesn't do the core intellectual work of replacing a pointer with its substance. In the opening scenario where someone accuses their conversation partner of bad faith, conveying something along the lines of 'I disapprove of how you're approaching this conversation so I'm leaving', tabooing "bad faith" would mean articulating what pattern of behavior (they thought that) they saw and why disapproval & departure is an appropriate response. Zack doesn't try to do this, he just abandons this scenario to talk about other things involving his definition of "bad faith". (And similarly with "assume good faith".) I briefly hoped that the post would go in the "taboo your words" direction, describing what was happening in that sort of scenario with a clarity and precision that would make the label "bad faith" seem crude by comparison, but it did not.

This post also doesn't manage to avoid the main pitfall that tabooing a word is meant to prevent, where people talk past each other because they're using the same word with different definitions. Even though he says at the start of the post that other people are using the term "bad/good faith" wrong according his understanding of the term, when he talks about the advice "assume good faith" he just plugs in his definition of "good faith" (and "assume") without noting that he's making an interpretation of what other people mean when they use the phrase and that they might mean something else. And similarly in other places like "being touchy about bad faith accusations seems counterproductive" and "the belief that persistent good faith disagreements are common would seem to be in bad faith". When someone says "you're acting in bad faith" are they claiming that you're showing the thing that Zack means by "bad faith"? Keeping that sort of thing straight is rationality 101 stuff that tabooing words helps with, and which this post repeatedly stumbles over.

paulbecon on DunCon @Lighthaven

I'm in! One suggestion. The URL for registering has de minimis info, https://www.havenbookings.space/events/Duncon

Ideally, that page should link to this one to help contextualize what's in the can

raemon on Wired on: "DOGE personnel with admin access to Federal Payment System"

I edited the OP but wanted to add for people who missed it (bold part is new)

(Please generally be cautious on LessWrong talking about politics. I am interested in people commenting here who have read the LessWrong Political Prerequisites [? · GW] sequence. I'll be deleting or at least unhesitatingly strong downvoting comments that seem to be doing unreflective partisan dunking)
((But, that's not meant to mean "don't talk about political actions." If this is as big a deal as it sounds, I want to be able to talk about "what to do do?". But I want that talking-about-it to feel more like practically thinking through an action space, than blindly getting sucked into a political egregore))

ozziegooen on ozziegooen's Shortform

I'm not sure if it means much, but I'd be very happy if AI safety could get another $50B from smart donors today.

I'd flag that [stopping AI development] would cost far more than $50B. I'd expect that we could easily lose $3T of economic value in the next few years if AI progress seriously stopped.

I guess, it seems to me like duration is basically dramatically more expensive to get than funding, for amounts of funding people would likely want.

aram-panasenco on Deploying the Observer will save humanity from existential threats

From a software engineering perspective, misalignment is like a defect or a bug in software. Generally speaking, if a piece of software doesn't accept any user input is going to have fewer bugs than software that does. For a piece of software that doesn't accept any input or accepts some constrained user input, it's possible to formally prove that the software logic is correct. Think specialized software that controls nuclear power plants. To my knowledge, it's not possible to prove that software that accepts arbitrary unconstrained instructions from a user is defect free.

I claim that the Observer is the easiest ASI to align because it doesn't accept any instructions after it's been deployed and has a single very simple goal that avoids dealing with messy things like human happiness, human meaning, human intent, etc. I don't see how it could get simpler than that.