LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

[link] My hour of memoryless lucidity
Eric Neyman (UnexpectedValues) · 2024-05-04T01:40:56.717Z · comments (23)

[link] Ilya Sutskever and Jan Leike resign from OpenAI
Zach Stein-Perlman · 2024-05-15T00:45:02.436Z · comments (76)

[link] Introducing AI Lab Watch
Zach Stein-Perlman · 2024-04-30T17:00:12.652Z · comments (25)

Mechanistically Eliciting Latent Behaviors in Language Models
Andrew Mack (andrew-mack) · 2024-04-30T18:51:13.493Z · comments (37)

Ironing Out the Squiggles
Zack_M_Davis · 2024-04-29T16:13:00.371Z · comments (34)

Dyslucksia
Shoshannah Tekofsky (DarkSym) · 2024-05-09T19:21:33.874Z · comments (42)

Deep Honesty
Aletheophile (aletheo) · 2024-05-07T20:31:48.734Z · comments (26)

Do you believe in hundred dollar bills lying on the ground? Consider humming
Elizabeth (pktechgirl) · 2024-05-16T00:00:05.257Z · comments (10)

[question] Which skincare products are evidence-based?
Vanessa Kosoy (vanessa-kosoy) · 2024-05-02T15:22:12.597Z · answers+comments (43)

[link] introduction to cancer vaccines
bhauth · 2024-05-05T01:06:16.972Z · comments (19)

Why I'm doing PauseAI
Joseph Miller (Josephm) · 2024-04-30T16:21:54.156Z · comments (16)

DeepMind's "Frontier Safety Framework" is weak and unambitious
Zach Stein-Perlman · 2024-05-18T03:00:13.541Z · comments (8)

Explaining a Math Magic Trick
Robert_AIZI · 2024-05-05T19:41:52.048Z · comments (10)

Key takeaways from our EA and alignment research surveys
Cameron Berg (cameron-berg) · 2024-05-03T18:10:41.416Z · comments (10)

[link] Uncovering Deceptive Tendencies in Language Models: A Simulated Company AI Assistant
Olli Järviniemi (jarviniemi) · 2024-05-06T07:07:05.019Z · comments (4)

We might be missing some key feature of AI takeoff; it'll probably seem like "we could've seen this coming"
Lukas_Gloor · 2024-05-09T15:43:11.490Z · comments (35)

[link] "AI Safety for Fleshy Humans" an AI Safety explainer by Nicky Case
habryka (habryka4) · 2024-05-03T18:10:12.478Z · comments (10)

[link] MIRI's May 2024 Newsletter
Harlan · 2024-05-15T00:13:30.153Z · comments (1)

Teaching CS During Take-Off
andrew carle (andrew-carle) · 2024-05-14T22:45:39.447Z · comments (10)

ACX Covid Origins Post convinced readers
ErnestScribbler · 2024-05-01T13:06:20.818Z · comments (7)

Language Models Model Us
eggsyntax · 2024-05-17T21:00:34.821Z · comments (13)

MATS Winter 2023-24 Retrospective
Rocket (utilistrutil) · 2024-05-11T00:09:17.059Z · comments (28)

Q&A on Proposed SB 1047
Zvi · 2024-05-02T15:10:02.916Z · comments (6)

AXRP Episode 31 - Singular Learning Theory with Daniel Murfet
DanielFilan · 2024-05-07T03:50:05.001Z · comments (4)

[link] Advice for Activists from the History of Environmentalism
Jeffrey Heninger (jeffrey-heninger) · 2024-05-16T18:40:02.064Z · comments (5)

[link] Environmentalism in the United States Is Unusually Partisan
Jeffrey Heninger (jeffrey-heninger) · 2024-05-13T21:23:10.755Z · comments (11)

[link] My thesis (Algorithmic Bayesian Epistemology) explained in more depth
Eric Neyman (UnexpectedValues) · 2024-05-09T19:43:16.543Z · comments (4)

Towards Multimodal Interpretability: Learning Sparse Interpretable Features in Vision Transformers
hugofry · 2024-04-29T20:57:35.127Z · comments (7)

Questions for labs
Zach Stein-Perlman · 2024-04-30T22:15:55.362Z · comments (10)

LessWrong Community Weekend 2024, open for applications
UnplannedCauliflower · 2024-05-01T10:18:21.992Z · comments (2)

Introducing AI-Powered Audiobooks of Rational Fiction Classics
Askwho · 2024-05-04T17:32:49.719Z · comments (13)

AISafety.com – Resources for AI Safety
Søren Elverlin (soren-elverlin-1) · 2024-05-17T15:57:11.712Z · comments (2)

AISC9 has ended and there will be an AISC10
Linda Linsefors · 2024-04-29T10:53:18.812Z · comments (4)

How to be an amateur polyglot
arisAlexis (arisalexis) · 2024-05-08T15:08:11.404Z · comments (16)

[link] DeepMind: Frontier Safety Framework
Zach Stein-Perlman · 2024-05-17T17:30:02.504Z · comments (0)

Transcoders enable fine-grained interpretable circuit analysis for language models
Jacob Dunefsky (jacob-dunefsky) · 2024-04-30T17:58:09.982Z · comments (14)

[link] How do open AI models affect incentive to race?
jessicata (jessica.liu.taylor) · 2024-05-07T00:33:20.658Z · comments (13)

Apply to ESPR & PAIR, Rationality and AI Camps for Ages 16-21
Anna Gajdova (anna-gajdova) · 2024-05-03T12:36:37.610Z · comments (0)

[question] Shane Legg's necessary properties for every AGI Safety plan
jacquesthibs (jacques-thibodeau) · 2024-05-01T17:15:41.233Z · answers+comments (12)

Now THIS is forecasting: understanding Epoch’s Direct Approach
Elliot_Mckernon (elliot) · 2024-05-04T12:06:48.144Z · comments (4)

[link] Questions are usually too cheap
Nathan Young · 2024-05-11T13:00:54.302Z · comments (19)

[link] OpenAI releases GPT-4o, natively interfacing with text, voice and vision
Martín Soto (martinsq) · 2024-05-13T18:50:52.337Z · comments (23)

some thoughts on LessOnline
Raemon · 2024-05-08T23:17:41.372Z · comments (5)

Towards a formalization of the agent structure problem
Alex_Altair · 2024-04-29T20:28:15.190Z · comments (4)

Can we build a better Public Doublecrux?
Raemon · 2024-05-11T19:21:53.326Z · comments (7)

Why Care About Natural Latents?
johnswentworth · 2024-05-09T23:14:30.626Z · comments (3)

[link] Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
Gunnar_Zarncke · 2024-05-16T13:09:39.265Z · comments (4)

Observations on Teaching for Four Weeks
ClareChiaraVincent · 2024-05-06T16:55:59.315Z · comments (14)

Catastrophic Goodhart in RL with KL penalty
Thomas Kwa (thomas-kwa) · 2024-05-15T00:58:20.763Z · comments (7)

[link] Designing for a single purpose
Itay Dreyfus (itay-dreyfus) · 2024-05-07T14:11:22.242Z · comments (12)

next page (older posts) →

Archive

Recent comments

gilch on "If we go extinct due to misaligned AI, at least nature will continue, right? ... right?"

Humans instinctively like things like flowers and birdsong, because it meant a fertile area with food to our ancestors. We literally depended on Nature for our survival, and despite intensive agriculture, we aren't independent from it yet.

tamsin-leake on Tamsin Leake's Shortform

I believe that ChatGPT was not released with the expectation that it would become as popular as it did.

Well, even if that's true, causing such an outcome by accident should still count as evidence of vast irresponsibility imo.

akash-wasil on Akash's Shortform

Agreed— my main point here is that the marketplace of ideas undervalues criticism.

I think one perspective could be “we should all just aim to do objective truth-seeking”, and as stated I agree with it.

The main issue with that frame, imo, is that it’s very easy to forget that the epistemic environment can be tilted in favor of certain perspectives.

EG I think it can be useful for “objective truth-seeking efforts” to be aware of some of the culture/status games that underincentivize criticism of labs & amplify lab-friendly perspectives.

benito on Stephen Fowler's Shortform

Not OP, but I take the claim to be "endorsing getting into bed with companies on-track to make billions of dollars profiting from risking the extinction of humanity in order to nudge them a bit, is in retrospect an obviously doomed strategy, and yet many self-identified effective altruists trusted their leadership to have secret good reasons for doing so and followed them in supporting the companies (e.g. working there for years including in capabilities roles and also helping advertise the company jobs). now that a new consensus is forming that it indeed was obviously a bad strategy, it is also time to have evaluated the leadership's decision as bad at the time of making the decision and impose costs on them accordingly, including loss of respect and power".

So no, not disincentivizing making positive EV bets, but updating about the quality of decision-making that has happened in the past.

habryka4 on simeon_c's Shortform

Sure, I'll try to post here if I know of a clear opportunity to donate to either comes up.

zach-stein-perlman on DeepMind's "Frontier Safety Framework" is weak and unambitious

Sorry for brevity.

We just disagree. E.g. you "walked away with a much better understanding of how OpenAI plans to evaluate & handle risks than how Anthropic plans to handle & evaluate risks"; I felt like Anthropic was thinking about everything well.

I think Anthropic's ASL-3 is reasonable and OpenAI's thresholds and corresponding commitments are unreasonable. If the ASL-4 threshold was high or commitments are poor such that ASL-4 was meaningless, I agree Anthropic's RSP would be at least as bad as OpenAI's.

One thing I think is a big deal: Anthropic's RSP treats internal deployment like external deployment; OpenAI's has almost no protections for internal deployment.

I agree "an initial RSP that mostly spells out high-level reasoning, makes few hard commitments, and focuses on misuse while missing the all-important evals and safety practices for ASL-4" is also a fine characterization of Anthropic's current RSP.

edit: or, like, pf thresholds too high, so pf seems doomed / not on track, but rsp:v1 is consistent with rspv:1.1 being great. At least Anthropic knows and says there’s a big hole. That’s not relevant to evaluating labs’ current commitments but is very relevant to predicting.

vladimir_nesov on What Are Non-Zero-Sum Games?—A Primer

See "Zero Sum" is a misnomer [LW · GW], shifting and rescaling of utility functions breaks formulations that simply ask to take a sum of payoffs, but we can rescue the concept to mean that all outcomes/strategies of the game are Pareto efficient.

"Positive sum" seems to be about Kaldor-Hicks efficiency, strategies where in principle there is a post-game redistribution of resources that would turn the strategies Pareto efficient, but there is no commitment or possibly even practical feasibility to actually perform the redistribution. This hypothetical redistribution step takes care of comparing utilities of different players. A whole game/interaction/project would then be "positive-sum" if each outcome/strategy is equivalent to some Pareto efficient hypothetical "outcome" via a redistribution.

yonatan-cale-1 on simeon_c's Shortform

@habryka [LW · GW] , Would you reply to this comment if there's an opportunity to donate to either? Me and another person are interested, and others could follow this comment too if they wanted to

(only if it's easy for you, I don't want to add an annoying task to your plate)

zach-stein-perlman on Akash's Shortform

Sorry for brevity, I'm busy right now.

Noticing good stuff labs do, not just criticizing them, is often helpful. I wish you thought of this work more as "evaluation" than "criticism."
It's often important for evaluation to be quite truth-tracking. Criticism isn't obviously good by default.

akash-wasil on Akash's Shortform

My current perspective is that criticism of AGI labs is an under-incentivized public good. I suspect there's a disproportionate amount of value that people could have by evaluating lab plans, publicly criticizing labs when they break commitments or make poor arguments, talking to journalists/policymakers about their concerns, etc.

Some quick thoughts:

Soft power– I think people underestimate the how strong the "soft power" of labs is, particularly in the Bay Area.
Jobs– A large fraction of people getting involved in AI safety are interested in the potential of working for a lab one day. There are some obvious reasons for this– lots of potential impact from being at the organizations literally building AGI, big salaries, lots of prestige, etc.
- People (IMO correctly) perceive that if they acquire a reputation for being critical of labs, their plans, or their leadership, they will essentially sacrifice the ability to work at the labs.
- So you get an equilibrium where the only people making (strong) criticisms of labs are those who have essentially chosen to forgo their potential of working there.
Money– The labs and Open Phil (which has been perceived, IMO correctly, as investing primarily into metastrategies that are aligned with lab interests) have an incredibly large share of the $$$ in the space. When funding became more limited, this became even more true, and I noticed a very tangible shift in the culture & discourse around labs + Open Phil
Status games//reputation– Groups who were more inclined to criticize labs and advocate for public or policymaker outreach were branded as “unilateralist”, “not serious”, and “untrustworthy” in core EA circles. In many cases, there were genuine doubts about these groups, but my impression is that these doubts got amplified/weaponized in cases where the groups were more openly critical of the labs.
Subjectivity of "good judgment"– There is a strong culture of people getting jobs/status for having “good judgment”. This is sensible insofar as we want people with good judgment (who wouldn’t?) but this often ends up being so subjective that it ends up leading to people being quite afraid to voice opinions that go against mainstream views and metastrategies (particularly those endorsed by labs + Open Phil).
Anecdote– Personally, I found my ability to evaluate and critique labs + mainstream metastrategies substantially improved when I spent more time around folks in London and DC (who were less closely tied to the labs). In fairness, I suspect that if I had lived in London or DC *first* and then moved to the Bay Area, it’s plausible I would’ve had a similar feeling but in the “reverse direction”.

With all this in mind, I find myself more deeply appreciating folks who have publicly and openly critiqued labs, even in situations where the cultural and economic incentives to do so were quite weak (relative to staying silent or saying generic positive things about labs).

Examples: Habryka, Rob Bensinger, CAIS, MIRI, Conjecture, and FLI. More recently, @Zach Stein-Perlman [LW · GW], and of course Jan Leike and Daniel K.