LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

DunCon @Lighthaven
Duncan Sabien (Deactivated) (Duncan_Sabien) · 2024-09-29T04:56:27.205Z · comments (0)

[link] NAO Updates, Fall 2024
jefftk (jkaufman) · 2024-10-18T00:00:04.142Z · comments (2)

Apply to MATS 7.0!
Ryan Kidd (ryankidd44) · 2024-09-21T00:23:49.778Z · comments (0)

Book Review: What Even Is Gender?
Joey Marcellino · 2024-09-01T16:09:27.773Z · comments (14)

[question] What's the Deal with Logical Uncertainty?
Ape in the coat · 2024-09-16T08:11:43.588Z · answers+comments (23)

[link] Concrete benefits of making predictions
Jonny Spicer (jonnyspicer) · 2024-10-17T14:23:17.613Z · comments (5)

RLHF is the worst possible thing done when facing the alignment problem
tailcalled · 2024-09-19T18:56:27.676Z · comments (10)

[question] When is reward ever the optimization target?
Noosphere89 (sharmake-farah) · 2024-10-15T15:09:20.912Z · answers+comments (12)

[link] What is it like to be psychologically healthy? Podcast ft. DaystarEld
Chipmonk · 2024-10-05T19:14:04.743Z · comments (8)

[link] Epistemic states as a potential benign prior
Tamsin Leake (carado-1) · 2024-08-31T18:26:14.093Z · comments (2)

[link] Aaron Silverbook on anti-cavity bacteria
DanielFilan · 2023-11-20T03:06:19.524Z · comments (3)

Mapping the semantic void II: Above, below and between token embeddings
mwatkins · 2024-02-15T23:00:09.010Z · comments (4)

[link] Thoughts on Zero Points
depressurize (anchpop) · 2024-04-23T02:22:27.448Z · comments (1)

D&D.Sci (Easy Mode): On The Construction Of Impossible Structures [Evaluation and Ruleset]
abstractapplic · 2024-05-20T09:38:55.228Z · comments (2)

[link] A Narrative History of Environmentalism's Partisanship
Jeffrey Heninger (jeffrey-heninger) · 2024-05-14T16:51:01.029Z · comments (3)

[link] Anthropic, Google, Microsoft & OpenAI announce Executive Director of the Frontier Model Forum & over $10 million for a new AI Safety Fund
Zach Stein-Perlman · 2023-10-25T15:20:52.765Z · comments (8)

[link] Self-Resolving Prediction Markets
PeterMcCluskey · 2024-03-03T02:39:42.212Z · comments (0)

Retrospective: PIBBSS Fellowship 2023
DusanDNesic · 2024-02-16T17:48:32.151Z · comments (1)

[link] self-fulfilling prophecies when applying for funding
Chipmonk · 2024-03-01T19:01:40.991Z · comments (0)

How Would an Utopia-Maximizer Look Like?
Thane Ruthenis · 2023-12-20T20:01:18.079Z · comments (23)

[question] When did Eliezer Yudkowsky change his mind about neural networks?
[deactivated] (Yarrow Bouchard) · 2023-11-14T21:24:00.000Z · answers+comments (15)

Game Theory without Argmax [Part 2]
Cleo Nardo (strawberry calm) · 2023-11-11T16:02:41.836Z · comments (14)

Attention Output SAEs Improve Circuit Analysis
Connor Kissane (ckkissane) · 2024-06-21T12:56:07.969Z · comments (0)

Why wasn't preservation with the goal of potential future revival started earlier in history?
Andy_McKenzie · 2024-01-16T16:15:08.550Z · comments (1)

The Byronic Hero Always Loses
Cole Wyeth (Amyr) · 2024-02-22T01:31:59.652Z · comments (4)

Extracting SAE task features for in-context learning
Dmitrii Kharlapenko (dmitrii-kharlapenko) · 2024-08-12T20:34:13.747Z · comments (1)

A more systematic case for inner misalignment
Richard_Ngo (ricraz) · 2024-07-20T05:03:03.500Z · comments (4)

Good Bings copy, great Bings steal
dr_s · 2024-04-21T09:52:46.658Z · comments (6)

On "Geeks, MOPs, and Sociopaths"
alkjash · 2024-01-19T21:04:48.525Z · comments (35)

Quick evidence review of bulking & cutting
jp · 2024-04-04T21:43:48.534Z · comments (5)

[link] New report: A review of the empirical evidence for existential risk from AI via misaligned power-seeking
Harlan · 2024-04-04T23:41:26.439Z · comments (5)

Some Things That Increase Blood Flow to the Brain
romeostevensit · 2024-03-27T21:48:46.244Z · comments (14)

AI's impact on biology research: Part I, today
octopocta · 2023-12-23T16:29:18.056Z · comments (6)

Mentorship in AGI Safety (MAGIS) call for mentors
Valentin2026 (Just Learning) · 2024-05-23T18:28:03.173Z · comments (3)

Music in the AI World
Martin Sustrik (sustrik) · 2024-08-16T04:20:01.706Z · comments (8)

UDT1.01: Plannable and Unplanned Observations (3/10)
Diffractor · 2024-04-12T05:24:34.435Z · comments (0)

[link] introduction to thermal conductivity and noise management
bhauth · 2024-03-06T23:14:02.288Z · comments (1)

[LDSL#6] When is quantification needed, and when is it hard?
tailcalled · 2024-08-13T20:39:45.481Z · comments (0)

[link] [Linkpost] Statement from Scarlett Johansson on OpenAI's use of the "Sky" voice, that was shockingly similar to her own voice.
Linch · 2024-05-20T23:50:28.138Z · comments (8)

[LDSL#1] Performance optimization as a metaphor for life
tailcalled · 2024-08-08T16:16:27.349Z · comments (4)

On Not Requiring Vaccination
jefftk (jkaufman) · 2024-02-01T19:20:12.657Z · comments (21)

I was raised by devout Mormons, AMA [&|] Soliciting Advice
ErioirE (erioire) · 2024-03-13T16:52:19.130Z · comments (41)

Falling fertility explanations and Israel
Yair Halberstadt (yair-halberstadt) · 2024-04-03T03:27:38.564Z · comments (4)

AI Constitutions are a tool to reduce societal scale risk
Sammy Martin (SDM) · 2024-07-25T11:18:17.826Z · comments (2)

AI #74: GPT-4o Mini Me and Llama 3
Zvi · 2024-07-25T13:50:06.528Z · comments (6)

Adversarial Robustness Could Help Prevent Catastrophic Misuse
aogara (Aidan O'Gara) · 2023-12-11T19:12:26.956Z · comments (18)

The Third Gemini
Zvi · 2024-02-20T19:50:05.195Z · comments (2)

Some comments on intelligence
Viliam · 2024-08-01T15:17:07.215Z · comments (5)

[link] Evaluating Stability of Unreflective Alignment
james.lucassen · 2024-02-01T22:15:40.902Z · comments (3)

Interpreting the Learning of Deceit
RogerDearnaley (roger-d-1) · 2023-12-18T08:12:39.682Z · comments (14)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

buck on Catastrophic sabotage as a major threat model for human-level AI systems

Oh yeah sorry, I misunderstood that

austin-chen on Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now)

Mm I'm extremely skeptical that the inner experience of an EA college organizer or CEA groups team is usefully modeled as "I want recruits at all costs". I predict that if you talk to one and asking them about it, you'd find the same.

I do think that it's easy to accidentally goodhart or be unreflective about the outcomes of pursuing a particular policy -- but I'd encourage y'all to extend somewhat more charity to these folks, who I generally find to be very kind and well-intentioned.

lsusr on [Intuitive self-models] 6. Awakening / Enlightenment / PNSE

Thanks!

sodium on (Maybe) A Bag of Heuristics is All There Is & A Bag of Heuristics is All You Need

I agree that if you put more limitations on what heuristics are and how they compose, you end up with a stronger hypothesis. I think it's probably better to leave that out and try do some more empirical work before making a claim there though (I suppose you could say that the hypothesis isn't actually making a lot of concrete predictions yet at this stage).

I don't think (2) necessarily follows, but I do sympathize with your point that the post is perhaps a more specific version of the hypothesis that "we can understand neural network computation by doing mech interp."

vaishnav92 on Liquid vs Illiquid Careers

Paying attention to social capital seems like one risk management mechanism. I try to ask - what sort of people is this likely to put me in tething alonouch with, and in what way? Will this increase the surface area of people to whom I can showcase my strengths and build relationships with ? I wrote something along these lines here (in the context of evaluating startups as an employee) - https://vaishnavsunil.substack.com/p/from-runway-to-career-capital-a-framework. Would be keen to hear what you think if you end up reading.

trevorone on Overcoming Bias Anthology

I notice that there's just shy of 128 here and they're mostly pretty short, so you can start the day by flipping a coin 7 times to decide which one to read. Not a bisection search, just convert the seven flips to binary and pick the corresponding number. At first, you only have to start over and do another 7 flips if you land on 1111110 (126), 1111111 (127), or 0000000 (128).

If you drink coffee in the morning, this is a way better way to start the day than social media, as the early phase of the stimulant effect reinforces behavior in most people. Hanson's approach to various topics is a good mentality to try boosting this way.

drake-thomas on Drake Thomas's Shortform

I work on a capabilities team at Anthropic, and I've spent (and spend) a while thinking about whether that's good for the world and which kinds of observations could update me up or down about it. This is an open offer to chat with anyone else trying to figure out questions of working on capability-advancing work at a frontier lab! I can be reached at "graham's number is big" sans spaces at gmail.

lincolnquirk on Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now)

I think AIS might have been what poisoned EA? The global development people seem much more grounded (to this day), and AFAIK the ponzi scheme recruiting is all aimed at AIS and meta

I agree, am fairly worried about AI safety taking over too much of EA. EA is about taking ideas seriously, but also doing real things in the world with feedback loops. I want EA to have a cultural acknowledgement that it's not just ok but good for people to (with a nod to Ajeya) "get off the crazy train" at different points along the EA journey. We currently have too many people taking it all the way into AI town. I again don't know what to do to fix it.

lincolnquirk on Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now)

(Commenting as myself, not representing any org)

Thanks Elizabeth and Timothy for doing this! Lots of valuable ideas in this transcript.

I felt excited, sad, and also a bit confused, since it feels both slightly resonant but also somewhat disconnected from my experience of EA. Resonant because I agree with the college-recruiting and epistemic aspects of your critiques. Disconnected, because while collectively the community doesn't seem to be going in the direction that I would hope, I do see many individuals in EA leadership positions who I deeply respect and trust to have good individual views and good process and I'm sad you don't see them (maybe they are people who aren't at their best online, and mostly aren't in the Bay).

I am pretty worried about the Forum and social media more broadly. We need better forms of engagement online - like this article + your other critiques. In the last few years, it's become clearer and clearer to me that EA's online strategy is not really serving the community well. If I knew what the right strategy was, I would try to nudge it. Regardless I still see lots of good in EA's work and overall trajectory.

[my critiques] dropped like a stone through water

I dispute this. Maybe you just don't see the effects yet? It takes a long time for things to take effect, even internally in places you wouldn't have access to, and even longer for them to be externally visible. Personally, I read approximately everything you (Elizabeth) write on the Forum and LW, and occasionally cite it to others in EA leadership world. That's why I'm pretty sure your work has had nontrivial impact. I am not too surprised that its impact hasn't become apparent to you though.

Personally, I'm still struggling with my own relationship to EA. I've been on the EV board for a year+ - an influential role at the most influential meta org - and I don't understand how to use this role to impact EA. I see the problems more clearly than I did before, which is great, but I don't see solutions or great ways forward yet, and I sense that nobody really does. We're mostly working on stuff to stay afloat rather than high level navigation.

I liked Zach's recent talk/Forum post about EA's commitment to principles first [EA · GW]. I hope this is at least a bit hope-inspiring, since I get the sense that a big part of your critique is that EA has lost its principles.

davekasten on Akash's Shortform

Wild speculation: they also have a sort of we're-watching-but-unsure provision about cyber operations capability in their most recent RSP update. In it, they say in part that "it is also possible that by the time these capabilities are reached, there will be evidence that such a standard is not necessary (for example, because of the potential use of similar capabilities for defensive purposes)." Perhaps they're thinking that automated vulnerability discovery is at least plausibly on-net-defensive-balance-favorable*, and so they aren't sure it should be regulated as closely, even if in still in some informal sense "dual use" ?

Again, WILD speculation here.

*A claim that is clearly seen as plausible by, e.g., the DARPA AI Grand Challenge effort.