LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Review: Breaking Free with Dr. Stone
TurnTrout · 2024-12-18T01:26:37.730Z · comments (5)

[link] Careless thinking: A theory of bad thinking
Nathan Young · 2024-12-17T18:23:16.140Z · comments (17)

[link] cancer rates after gene therapy
bhauth · 2024-10-16T15:32:53.949Z · comments (0)

Greedy-Advantage-Aware RLHF
sej2020 · 2024-12-27T19:47:25.562Z · comments (15)

Cognitive Biases Contributing to AI X-risk — a deleted excerpt from my 2018 ARCHES draft
Andrew_Critch · 2024-12-03T09:29:49.745Z · comments (2)

Dave Kasten's AGI-by-2027 vignette
davekasten · 2024-11-26T23:20:47.212Z · comments (8)

Role embeddings: making authorship more salient to LLMs
Nina Panickssery (NinaR) · 2025-01-07T20:13:16.677Z · comments (0)

Looking back on the Future of Humanity Institute - Asterisk
jakeeaton · 2024-11-19T00:44:40.928Z · comments (0)

[link] Active Recall and Spaced Repetition are Different Things
Saul Munn (saul-munn) · 2024-11-08T20:14:56.092Z · comments (2)

Live Machinery: An Interface Design Philosophy for Wholesome AI Futures
Sahil · 2024-11-01T17:24:09.957Z · comments (3)

An alternative approach to superbabies
Towards_Keeperhood (Simon Skade) · 2024-11-05T22:56:15.740Z · comments (19)

D&D.Sci Coliseum: Arena of Data Evaluation and Ruleset
aphyer · 2024-10-29T01:21:03.075Z · comments (13)

[link] Analyzing how SAE features evolve across a forward pass
bensenberner · 2024-11-07T22:07:02.827Z · comments (0)

AI Safety as a YC Startup
Lukas Petersson (lukas-petersson-1) · 2025-01-08T10:46:29.042Z · comments (7)

[link] What Ketamine Therapy Is Like
Sable · 2024-11-11T11:09:08.602Z · comments (8)

On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback
Marcus Williams · 2024-11-07T15:39:06.854Z · comments (6)

Book a Time to Chat about Interp Research
Logan Riggs (elriggs) · 2024-12-03T17:27:46.808Z · comments (3)

[link] Discursive Warfare and Faction Formation
Benquo · 2025-01-09T16:47:31.824Z · comments (2)

[link] Epistemic status: poetry (and other poems)
Richard_Ngo (ricraz) · 2024-11-21T18:13:17.194Z · comments (5)

AI #91: Deep Thinking
Zvi · 2024-11-21T14:30:06.930Z · comments (10)

[link] A dataset of questions on decision-theoretic reasoning in Newcomb-like problems
Caspar Oesterheld (Caspar42) · 2024-12-16T22:42:03.763Z · comments (1)

Considerations on orca intelligence
Towards_Keeperhood (Simon Skade) · 2024-12-29T14:35:16.445Z · comments (5)

Anthropic rewrote its RSP
Zach Stein-Perlman · 2024-10-15T14:25:12.518Z · comments (19)

The Shallow Bench
Karl Faulks (karl-faulks) · 2024-11-05T05:07:27.357Z · comments (5)

AI #88: Thanks for the Memos
Zvi · 2024-10-31T15:00:07.412Z · comments (5)

AI as a powerful meme, via CGP Grey
TheManxLoiner · 2024-10-30T18:31:58.544Z · comments (8)

[link] The Deep Lore of LightHaven, with Oliver Habryka (TBC episode 228)
Eneasz · 2024-12-24T22:45:50.065Z · comments (4)

~80 Interesting Questions about Foundation Model Agent Safety
RohanS · 2024-10-28T16:37:04.713Z · comments (4)

AI #97: 4
Zvi · 2025-01-02T14:10:06.505Z · comments (4)

Minimal Motivation of Natural Latents
johnswentworth · 2024-10-14T22:51:58.125Z · comments (14)

Detection of Asymptomatically Spreading Pathogens
jefftk (jkaufman) · 2024-12-05T18:20:02.473Z · comments (7)

Motivation control
Joe Carlsmith (joekc) · 2024-10-30T17:15:50.881Z · comments (7)

Monthly Roundup #24: November 2024
Zvi · 2024-11-18T13:20:06.086Z · comments (14)

Implications of the AI Security Gap
Dan Braun (dan-braun-1) · 2025-01-08T08:31:36.789Z · comments (0)

[link] Conjecture: A Roadmap for Cognitive Software and A Humanist Future of AI
Connor Leahy (NPCollapse) · 2024-12-02T13:28:57.977Z · comments (10)

Start an Upper-Room UV Installation Company?
jefftk (jkaufman) · 2024-10-19T02:00:10.691Z · comments (9)

[link] The Choice Transition
owencb · 2024-11-18T12:30:56.198Z · comments (4)

[link] Preference Inversion
Benquo · 2025-01-02T18:15:52.938Z · comments (44)

[link] Dangerous capability tests should be harder
LucaRighetti (Error404Dinosaur) · 2024-11-21T17:20:50.610Z · comments (3)

[link] Literacy Rates Haven't Fallen By 20% Since the Department of Education Was Created
Maxwell Tabarrok (maxwell-tabarrok) · 2024-11-22T20:53:59.007Z · comments (0)

Preppers Are Too Negative on Objects
jefftk (jkaufman) · 2024-12-18T02:30:01.854Z · comments (2)

Claude's Constitutional Consequentialism?
1a3orn · 2024-12-19T19:53:33.254Z · comments (6)

[link] Oppression and production are competing explanations for wealth inequality.
Benquo · 2025-01-05T14:13:15.398Z · comments (15)

Analysis of Global AI Governance Strategies
Sammy Martin (SDM) · 2024-12-04T10:45:25.311Z · comments (10)

[link] Review: Good Strategy, Bad Strategy
L Rudolf L (LRudL) · 2024-12-21T17:17:04.342Z · comments (0)

[link] Began a pay-on-results coaching experiment, made $40,300 since July
Chipmonk · 2024-12-29T21:12:02.574Z · comments (14)

Practicing Bayesian Epistemology with "Two Boys" Probability Puzzles
Liron · 2025-01-02T04:42:20.362Z · comments (14)

ARENA 4.0 Impact Report
Chloe Li (chloe-li-1) · 2024-11-27T20:51:54.844Z · comments (3)

AI #89: Trump Card
Zvi · 2024-11-07T16:30:05.684Z · comments (12)

[link] IAPS: Mapping Technical Safety Research at AI Companies
Zach Stein-Perlman · 2024-10-24T20:30:41.159Z · comments (13)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

benito on Ought We to Be Doing More Than We Are?

I was scrolling for a while, assuming I'd neared the end, only to look at the position of the scrollbar and find I was barely 5% through. This must have taken a fair bit of effort! I think it's a helpful page and I'm glad I know about it, I encourage you to make a linkpost for it sometime if you haven't already.

programcrafter on Testing for Scheming with Model Deletion

Doesn't the "threat" to delete the model have to be DT-credible instead of "credible conditioned on being human-made", given that LW with all its discussion about threat resistance and ignoring is in training sets?

(If I remember correctly, a decision theory must ignore "you're threatened to not do X, and the other agent is claiming to respond in such a way that even they lose in expectation" and "another agent [self-]modifies/instantiates an agent making them prefer that you don't do X".)

nim on nim's Shortform

One lens to view AI is as a prediction engine -- predict what color to make each pixel, predict what word to put next.

Whoever is first to applying this predictive skill to stock markets will probably make immense amounts of money. Then again, people are probably already trying to do this, which creates a situation unlike that from which we derive the historic data to train on, which might render it impossible?

On the gripping hand, large slow and powerful institutions want to make the numbers go up and to the right.

maxwell-peterson on Drake Thomas's Shortform

Thanks for putting this together!

I have a vague memory of a post saying that taking zinc early, while virus was replicating in the upper respiratory tract, was much more important than taking it later, because later it would have spread all over the body and thus the zinc can’t get to it, or something like this. So I tend to take a couple early on then stop. But it sounds like you don’t consider that difference important.

Is it your current (Not asking you to do more research!) impression that it’s useful to take zinc throughout the illness?

daniel-tan on Daniel Tan's Shortform

In this context, the “resample ablation” used in AI control is like adding more noise into the communication channel

lukas-finnveden on What are the strongest arguments for very short timelines?

Ok, gotcha.

It's that she didn't accept the reasoning behind that number enough to really believe it. She added a discount factor based on fallacious reasoning around "if it were that easy, it'd be here already".

Just to clarify: There was no such discount factor that changed the median estimate of "human brain compute". Instead, this discount factor was applied to go from "human brain compute estimate" to "human-brain-compute-informed estimate of the compute-cost of training TAI with current algorithms" — adjusting for how our current algorithm seem to be worse than those used to run the human brain. (As you mention and agree with, although I infer that you expect algorithmic progress to be faster than Ajeya did at the time.) The most relevant section is here.

daniel-tan on Daniel Tan's Shortform

Comment from @James Chua [LW · GW] in internal slack group

interesting idea! thanks for writing this out. this is my initial reaction, i may have more later on. i agree on this -- this is what i'll expect.

we’d expect that a model using its own chain of thought to arrive at an answer is likely to perform better than when using a different chain of thought, even if that chain of thought is provided by a more capable model.

but if the above is true, I'm unsure about evidence for steganography.
for example, gpt-4o is more capable than gpt-3.5. suppose we provide gpt-3.5 with gpt-4o's CoT.
one reason why gpt-3.5 may not improve is that gpt-4o's CoT skips certain steps since gpt-4o is a more capable model.

e.g. gpt-4o can calculate 4^4 directly, but gpt-3.5 cannot. So gpt-4o's CoT will leave out steps out 4 * 4 * 4* 4. which gpt-3.5 needs to get the correct answerdo we then say that gpt-3.5 is doing steganography? (edited)

possible improvement: provide gpt-3.5 with its own version of its own CoT, but paraphrased by another model. gpt-3.5 sees the same steps of "4 * 4 * 4 *4", but paraphrased. Does it still perform as well?

austin-chen on MATS mentor selection

Curious, is the list of advisors public?

notfnofn on You are too dumb to understand insurance

I think even the scaling thing doesn't apply here because they're not insuring bigger trips: they're insuring more trips (which makes things strictly better). I'm having some trouble understanding Dennis' point.

raemon on The Soul Key

In addition to being hauntingly beautiful, this story helped me adjust to the idea of the trans/posthuman future.

14 years ago, I very much did not identify with the Transhuman Vision. It was too alien, too much, and I didn't feel ready for it. I also didn't actively oppose it. I knew that slowly, as I hung out around rationalists, I would probably slowly come to identify more with humanity's longterm future.

I have indeed come to identify more with the longterm future and all of it's weirdness. It was mostly not because of this story, but I did particularly resonate with the framing here – in large part because it met me where I am, instead of jumping into Future Shock. It presents the increasing alienness in gentle increments, from multiple perspectives, and from the perspective of someone currently living in a more Ancestral Human perspective.

This story doesn't tell you what sort of choices are good to make, but it makes it feel easier to wrap my brain around how I (or others) might eventually make such choices.