LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Google Gemini Announced
Jacob G-W (g-w1) · 2023-12-06T16:14:07.192Z · comments (22)

On Anthropic’s Sleeper Agents Paper
Zvi · 2024-01-17T16:10:05.145Z · comments (5)

[link] Come to Manifest 2024 (June 7-9 in Berkeley)
Saul Munn (saul-munn) · 2024-03-27T21:30:17.306Z · comments (2)

[link] Unlocking Solutions—By Understanding Coordination Problems
James Stephen Brown (james-brown) · 2024-07-27T04:52:13.435Z · comments (4)

[link] Land Reclamation is in the 9th Circle of Stagnation Hell
Maxwell Tabarrok (maxwell-tabarrok) · 2024-01-12T13:36:27.159Z · comments (6)

[link] How Likely Are Various Precursors of Existential Risk?
NunoSempere (Radamantis) · 2024-10-28T13:27:31.620Z · comments (4)

[question] Could orcas be (trained to be) smarter than humans? 
Towards_Keeperhood (Simon Skade) · 2024-11-04T23:29:26.677Z · answers+comments (11)

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
leogao · 2023-12-16T05:39:10.558Z · comments (5)

Thiel on AI & Racing with China
Ben Pace (Benito) · 2024-08-20T03:19:18.966Z · comments (10)

[link] On the Role of Proto-Languages
adamShimi · 2024-09-22T16:50:34.720Z · comments (1)

[link] the micro-fulfillment cambrian explosion
bhauth · 2023-12-04T01:15:34.342Z · comments (5)

[Closed] PIBBSS is hiring in a variety of roles (alignment research and incubation program)
Nora_Ammann · 2024-04-09T08:12:59.241Z · comments (0)

Win/continue/lose scenarios and execute/replace/audit protocols
Buck · 2024-11-15T15:47:24.868Z · comments (2)

Trading off Lives
jefftk (jkaufman) · 2024-01-03T03:40:05.603Z · comments (12)

Fat Tails Discourage Compromise
niplav · 2024-06-17T09:39:16.489Z · comments (5)

AI #50: The Most Dangerous Thing
Zvi · 2024-02-08T14:30:13.168Z · comments (4)

Calendar feature geometry in GPT-2 layer 8 residual stream SAEs
Patrick Leask (patrickleask) · 2024-08-17T01:16:53.764Z · comments (0)

[link] Breaking Circuit Breakers
mikes · 2024-07-14T18:57:20.251Z · comments (13)

Per protocol analysis as medical malpractice
braces · 2024-01-31T16:22:21.367Z · comments (8)

A D&D.Sci Dodecalogue
abstractapplic · 2024-04-12T01:10:01.625Z · comments (0)

We are headed into an extreme compute overhang
devrandom · 2024-04-26T21:38:21.694Z · comments (33)

[question] Can we get an AI to "do our alignment homework for us"?
Chris_Leong · 2024-02-26T07:56:22.320Z · answers+comments (33)

Reformative Hypocrisy, and Paying Close Enough Attention to Selectively Reward It.
Andrew_Critch · 2024-09-11T04:41:24.872Z · comments (11)

A Qualitative Case for LTFF: Filling Critical Ecosystem Gaps
Linch · 2024-11-18T00:44:57.133Z · comments (2)

AI #40: A Vision from Vitalik
Zvi · 2023-11-30T17:30:08.350Z · comments (12)

Causal Graphs of GPT-2-Small's Residual Stream
David Udell · 2024-07-09T22:06:55.775Z · comments (7)

[link] S-Risks: Fates Worse Than Extinction
aggliu · 2024-05-04T15:30:36.666Z · comments (2)

2022 (and All Time) Posts by Pingback Count
Raemon · 2023-12-16T21:17:00.572Z · comments (14)

AI #71: Farewell to Chevron
Zvi · 2024-07-04T13:40:05.905Z · comments (9)

Be More Katja
Nathan Young · 2024-03-11T21:12:14.249Z · comments (0)

Neuroscience of human social instincts: a sketch
Steven Byrnes (steve2152) · 2024-11-22T16:16:52.552Z · comments (0)

AI #76: Six Shorts Stories About OpenAI
Zvi · 2024-08-08T13:50:04.659Z · comments (10)

[link] LLMs seem (relatively) safe
JustisMills · 2024-04-25T22:13:06.221Z · comments (24)

Acting Wholesomely
owencb · 2024-02-26T21:49:16.526Z · comments (64)

[link] The Mysterious Trump Buyers on Polymarket
Annapurna (jorge-velez) · 2024-10-18T13:26:25.565Z · comments (9)

Two LessWrong speed friending experiments
mikko (morrel) · 2024-06-15T10:52:26.081Z · comments (3)

How to Give in to Threats (without incentivizing them)
Mikhail Samin (mikhail-samin) · 2024-09-12T15:55:50.384Z · comments (26)

[question] If I wanted to spend WAY more on AI, what would I spend it on?
Logan Zoellner (logan-zoellner) · 2024-09-15T21:24:46.742Z · answers+comments (16)

Pseudonymity and Accusations
jefftk (jkaufman) · 2023-12-21T19:20:19.944Z · comments (20)

AI #43: Functional Discoveries
Zvi · 2023-12-21T15:50:04.442Z · comments (26)

Was Releasing Claude-3 Net-Negative?
Logan Riggs (elriggs) · 2024-03-27T17:41:56.245Z · comments (5)

Anthropical Paradoxes are Paradoxes of Probability Theory
Ape in the coat · 2023-12-06T08:16:26.846Z · comments (18)

BatchTopK: A Simple Improvement for TopK-SAEs
Bart Bussmann (Stuckwork) · 2024-07-20T02:20:51.848Z · comments (0)

The case for stopping AI safety research
catubc (cat-1) · 2024-05-23T15:55:18.713Z · comments (38)

Announcing the Double Crux Bot
sanyer (santeri-koivula) · 2024-01-09T18:54:15.361Z · comments (8)

Reflections on my first year of AI safety research
Jay Bailey · 2024-01-08T07:49:08.147Z · comments (3)

Schelling points in the AGI policy space
mesaoptimizer · 2024-06-26T13:19:25.186Z · comments (2)

How might we solve the alignment problem? (Part 1: Intro, summary, ontology)
Joe Carlsmith (joekc) · 2024-10-28T21:57:12.063Z · comments (5)

Can we build a better Public Doublecrux?
Raemon · 2024-05-11T19:21:53.326Z · comments (6)

Gradient Descent on the Human Brain
Jozdien · 2024-04-01T22:39:24.862Z · comments (5)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

david-matolcsi on "The Solomonoff Prior is Malign" is a special case of a simpler argument

Even assuming that the simulators have wildly different values, why would doing something insane a good thing to do?

steve2152 on Crosspost: Developing the middle ground on polarized topics

(Fun tangent, not directly addressing this argument thread.)

There’s a trio of great posts from 2015 by @JonahS [LW · GW] : The Truth About Mathematical Ability [? · GW] ; Innate Mathematical Ability [? · GW] ; Is Scott Alexander bad at math? [LW · GW] which (among other things) argues that you can be “good at math” along the dimension(s) of noticing patterns very quickly, AND/OR you can be “good at math” along the dimension(s) of an “aesthetic” sense for concepts being right and sensible. (My summary, not his.)

The “aesthetics” is sorta a loss function that provides a guidestar for developing good deep novel understanding—but that process may take a very long time. He offers Scott Alexander, and himself, and Alexander Grothendieck as examples of people with lopsided profiles—stronger on “aesthetics” than they are on “fast pattern-recognition”.

I found it a thought-provoking hypothesis. I wish JonahS had written more.

ozyrus on Are You More Real If You're Really Forgetful?

There are more bullets to bite that I have personally thought of but never wrote up because they lean too much into "crazy" territory. Is there any place except lesswrong to discuss this anthropic rabbithole?

adamshimi on Please don't throw your mind away

I remember reading this post, and really disliking it.

Then today, as I was reflecting on things, I recalled that this existed, and went back to read it. And this time, my reaction was instead "yep, that's pointing to the mental move that I've lost and that I'm now trying to relearn".

Which is interesting. Because that means a year or two ago, up till now, I was the kind of people who would benefit from this post; yet I couldn't get the juice out of it. I think a big reason is that while the description of the play/fun mental move is good and clear, the description of the opposite mental move, the one short-circuiting play/fun, felt very caricatural and fake.

My conjecture (though beware mind fallacy), is that it's because you emphasize "naive deference" to others, which looks obviously wrong to me and obviously not what most people I know who suffer from this tend to do (but might be representative of the people you actually met).

Instead, the mental move that I know intimately is what I call "instrumentalization" (or to be more memey, "tyranny of whys"). It's a move that doesn't require another or a social context (though it often includes internalized social judgements from others, aka superego); it only requires caring deeply about a goal (the goal doesn't actually matter that much), and being invested in it, somewhat neurotically.

Then, the move is that whenever a new, curious, fun, unexpected idea pop up, it hits almost instantly a filter: is this useful to reach the goal?

Obviously this filter removes almost all ideas, but even the ones it lets through don't survive unharmed: they get trimmed, twisted, simplified to fit the goal, to actually sound like they're going to help with the goal. And then in my personal case, all ideas start feeling like should, like weight and responsibility and obligations.

Anyway, I do like this post now, and I am trying to relearn how to use the "play" mental move without instrumentalizing everything away.

dalcy on Open Thread Summer 2024

Thank you! I tried it on this [LW · GW] post and while the post itself is pretty short, the raw content that i get seems to be extremely long (making it larger than the o1 context window, for example), with a bunch of font-related information inbetween. Is there a way to fix this?

mako-yass on Are You More Real If You're Really Forgetful?

Disidentifying the consciousness from the body/shadow/subconscious it belongs to and is responsible for coordinating and speaking for, like many of the things some meditators do, wouldn't be received well by the shadow, and I'd expect it to result in decreased introspective access and control. So, psychonauts be warned.

milan-w on Yonatan Cale's Shortform

A word of caution about interpreting results from these evals:

Sometimes, depending on social context, it's fine to be kind of a jerk if it's in the context of a game. Crucially, LLMs know that Minecraft is a game. Granted, the default Assistant personas implemented in RLHF'd LLMs don't seem like the type of Minecraft player to pull pranks out of their own accord. Still, it's a factor to keep in mind for evals that stray a bit more off-distribution from the "request-assistance" setup typical of the expected use cases of consumer LLMs.

hastings-greer on Cole Wyeth's Shortform

Beauty of notation is an optimization target and so should fail as a metric, but especially compared to other optimization targets I’ve pushed on, in my experience it seems to hold up. The exceptions appear to be string theory and category theory and two failures in a field the size of math is not so bad.

alexey on A superficially plausible promising alternate Earth without lockstep

Bywayeans are pretty censorious and scrupulous about violations of the NAP

Except against people who enjoy sunsets, apparently?

sharmake-farah on What are the best arguments for/against AIs being "slightly 'nice'"?

In retrospect, I am more pessimistic about AI having small amounts of niceness making humans live, and I now think that some amount of stronger alignment than pseudokindness is necessary to make humans survive with AI (but maybe not as strong as MIRI thinks), essentially because niceness to humans requires giving up opportunities to save compute on modeling the world, which is anti-incentivized by AI companies:

https://www.lesswrong.com/posts/xvBZPEccSfM8Fsobt/what-are-the-best-arguments-for-against-ais-being-slightly#wy9cSASwJCu7bjM6H [LW(p) · GW(p)]