LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Misnaming and Other Issues with OpenAI's “Human Level” Superintelligence Hierarchy
Davidmanheim · 2024-07-15T05:50:17.770Z · comments (2)

[link] Contra Acemoglu on AI
Maxwell Tabarrok (maxwell-tabarrok) · 2024-06-28T13:13:15.796Z · comments (0)

Conflating value alignment and intent alignment is causing confusion
Seth Herd · 2024-09-05T16:39:51.967Z · comments (18)

[link] JumpReLU SAEs + Early Access to Gemma 2 SAEs
Senthooran Rajamanoharan (SenR) · 2024-07-19T16:10:54.664Z · comments (10)

[Valence series] 4. Valence & Liking / Admiring
Steven Byrnes (steve2152) · 2024-06-10T14:19:51.194Z · comments (12)

AI and the Technological Richter Scale
Zvi · 2024-09-04T14:00:08.625Z · comments (8)

SRE's review of Democracy
Martin Sustrik (sustrik) · 2024-08-03T07:20:01.483Z · comments (2)

Interested in Cognitive Bootcamp?
Raemon · 2024-09-19T22:12:13.348Z · comments (0)

Demis Hassabis and Geoffrey Hinton Awarded Nobel Prizes
Anna Gajdova (anna-gajdova) · 2024-10-09T12:56:24.856Z · comments (14)

Philosophers wrestling with evil, as a social media feed
David Gross (David_Gross) · 2024-06-03T22:25:22.507Z · comments (2)

Evaluating the truth of statements in a world of ambiguous language.
Hastings (hastings-greer) · 2024-10-07T18:08:09.920Z · comments (19)

[link] Designing for a single purpose
Itay Dreyfus (itay-dreyfus) · 2024-05-07T14:11:22.242Z · comments (12)

How to do conceptual research: Case study interview with Caspar Oesterheld
Chi Nguyen · 2024-05-14T15:09:30.390Z · comments (5)

Mechanistic Interpretability Workshop Happening at ICML 2024!
Neel Nanda (neel-nanda-1) · 2024-05-03T01:18:26.936Z · comments (6)

The Mom Test: Summary and Thoughts
Adam Zerner (adamzerner) · 2024-04-18T03:34:21.020Z · comments (3)

D&D.Sci(-fi): Colonizing the SuperHyperSphere
abstractapplic · 2024-01-12T23:36:54.248Z · comments (23)

[link] Web-surfing tips for strange times
eukaryote · 2024-05-31T07:10:25.805Z · comments (19)

Saving the world sucks
Defective Altruism (Elijah Bodden) · 2024-01-10T05:55:46.504Z · comments (29)

Highlights from Lex Fridman’s interview of Yann LeCun
[deleted] · 2024-03-13T20:58:13.052Z · comments (15)

D&D.Sci: The Mad Tyrant's Pet Turtles [Evaluation and Ruleset]
abstractapplic · 2024-04-09T14:01:34.426Z · comments (6)

Safety First: safety before full alignment. The deontic sufficiency hypothesis.
Chipmonk · 2024-01-03T17:55:19.825Z · comments (3)

Run evals on base models too!
orthonormal · 2024-04-04T18:43:25.468Z · comments (6)

Cognitive Biases Contributing to AI X-risk — a deleted excerpt from my 2018 ARCHES draft
Andrew_Critch · 2024-12-03T09:29:49.745Z · comments (2)

Live Machinery: An Interface Design Philosophy for Wholesome AI Futures
Sahil · 2024-11-01T17:24:09.957Z · comments (3)

[link] Active Recall and Spaced Repetition are Different Things
Saul Munn (saul-munn) · 2024-11-08T20:14:56.092Z · comments (2)

Dave Kasten's AGI-by-2027 vignette
davekasten · 2024-11-26T23:20:47.212Z · comments (8)

Role embeddings: making authorship more salient to LLMs
Nina Panickssery (NinaR) · 2025-01-07T20:13:16.677Z · comments (0)

An alternative approach to superbabies
Towards_Keeperhood (Simon Skade) · 2024-11-05T22:56:15.740Z · comments (19)

[link] Discursive Warfare and Faction Formation
Benquo · 2025-01-09T16:47:31.824Z · comments (2)

Looking back on the Future of Humanity Institute - Asterisk
jakeeaton · 2024-11-19T00:44:40.928Z · comments (0)

Greedy-Advantage-Aware RLHF
sej2020 · 2024-12-27T19:47:25.562Z · comments (15)

[link] Epistemic status: poetry (and other poems)
Richard_Ngo (ricraz) · 2024-11-21T18:13:17.194Z · comments (5)

Book a Time to Chat about Interp Research
Logan Riggs (elriggs) · 2024-12-03T17:27:46.808Z · comments (3)

[link] What Ketamine Therapy Is Like
Sable · 2024-11-11T11:09:08.602Z · comments (8)

AI #91: Deep Thinking
Zvi · 2024-11-21T14:30:06.930Z · comments (10)

On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback
Marcus Williams · 2024-11-07T15:39:06.854Z · comments (6)

[link] Analyzing how SAE features evolve across a forward pass
bensenberner · 2024-11-07T22:07:02.827Z · comments (0)

[link] A dataset of questions on decision-theoretic reasoning in Newcomb-like problems
Caspar Oesterheld (Caspar42) · 2024-12-16T22:42:03.763Z · comments (1)

Extended Interview with Zhukeepa on Religion
Ben Pace (Benito) · 2024-08-18T03:19:05.625Z · comments (59)

D&D.Sci Coliseum: Arena of Data Evaluation and Ruleset
aphyer · 2024-10-29T01:21:03.075Z · comments (13)

What distinguishes "early", "mid" and "end" games?
Raemon · 2024-06-21T17:41:30.816Z · comments (22)

Caring about excellence
owencb · 2024-07-22T14:24:37.892Z · comments (4)

We Don't Know Our Own Values, but Reward Bridges The Is-Ought Gap
johnswentworth · 2024-09-19T22:22:05.307Z · comments (47)

I finally got ChatGPT to sound like me
lsusr · 2024-09-17T09:39:59.415Z · comments (18)

[link] "If we go extinct due to misaligned AI, at least nature will continue, right? ... right?"
plex (ete) · 2024-05-18T14:09:53.014Z · comments (23)

[link] Soft Prompts for Evaluation: Measuring Conditional Distance of Capabilities
porby · 2024-02-02T05:49:11.189Z · comments (1)

Some Experiments I'd Like Someone To Try With An Amnestic
johnswentworth · 2024-05-04T22:04:19.692Z · comments (33)

Value learning in the absence of ground truth
Joel_Saarinen (joel_saarinen) · 2024-02-05T18:56:02.260Z · comments (8)

2023 Prediction Evaluations
Zvi · 2024-01-08T14:40:07.377Z · comments (0)

[link] Constructive Cauchy sequences vs. Dedekind cuts
jessicata (jessica.liu.taylor) · 2024-03-14T23:04:07.300Z · comments (23)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

drake-morrison on Killing Socrates

I think this post was important, and pointing out a very real dynamic. It also seems to have sparked some conversations about moderation on the site, and so feels important as a historical artifact. I don't know if it should be in the Best Of, but I think something in this reference class should be.

simon on Fluoridation: The RCT We Still Haven't Run (But Should)

We may thus rule out negative effects larger than
0.14 standard deviations in cognitive ability if fluoride is increased by
1 milligram/liter (the level often considered when artificially fluoridat-
ing the water).

Realistically it's very unlikely that there would be that level of cognitive effect, but still that's a high level of hypothetical harm that they are ruling out (~2 IQ points?). I would take the dental harms many times over to avoid that much cognitive ability loss.

dagon on You are too dumb to understand insurance

Is it sufficient to understand that insurance only applies to the transactional monetary level, and most of the post was about other levels and considerations? Or that the characters didn't MAKE any clear arguments, just some noises about modeling that doesn't obviously apply to the question at hand (how to share/smooth the risk of variable but overall-profitable actions)?

the idea is that if we understand insurance, it should be easy to tell if the characters' arguments are sound-and-valid, or not.

Umm, the difficulty was even understanding what the arguments are. At first glance, they are mostly irrelevant to the proposal (of an insurance pool among voyage-financiers).

tsvibt on Views on when AGI comes and on strategy to reduce existential risk

But ok:

Come up, on its own, with many math concepts that mathematicians consider interesting + mathematically relevant on a similar level to concepts that human mathematicians come up with.
Do insightful science on its own.
Perform at the level of current LLMs, but with 300x less training data.

tsvibt on Views on when AGI comes and on strategy to reduce existential risk

I did give a response in that comment thread. Separately, I think that's not a great standard, e.g. as described in the post and in this comment https://www.lesswrong.com/posts/i7JSL5awGFcSRhyGF/shortform-2?commentId=zATQE3Lhq66XbzaWm [LW(p) · GW(p)] :

Second, 2024 AI is specifically trained on short, clear, measurable tasks. Those tasks also overlap with legible stuff--stuff that's easy for humans to check. In other words, they are, in a sense, specifically trained to trick your sense of how impressive they are--they're trained on legible stuff, with not much constraint on the less-legible stuff (and in particular, on the stuff that becomes legible but only in total failure on more difficult / longer time-horizon stuff).

In fact, all the time in real life we make judgements about things that we couldn't describe in terms that would be considered well-operationalized by betting standards, and we rely on these judgements, and we largely endorse relying on these judgements. E.g. inferring intent in criminal cases, deciding whether something is interesting or worth doing, etc. I should be able to just say "but you can tell that these AIs don't understand stuff", and then we can have a conversation about that, without me having to predict a minimal example of something which is operationalized enough for you to be forced to recognize it as judgeable and also won't happen to be surprisingly well-represented in the data, or surprisingly easy to do without creativity, etc.

gwern on Fluoridation: The RCT We Still Haven't Run (But Should)

The potential neurotoxic effects of fluoride are no longer a fringe concern. National Toxicology Program (NTP) monograph is clear: "moderate confidence" that >1.5 mg/L fluoride in drinking water associates with lower IQ in children.

Their meta-analysis is, as usual for fluoride studies, based heavily on the well-known Chinese studies, and the correlate is much smaller in the low-risk-of-bias studies, also as usual. It doesn't add much. None of these studies are very good, and none use powerful designs like sibling comparisons or natural experiments. They can't be taken too seriously.

The claimed harms of fluoride on IQ are strongly ruled out by the population-registry study "The Effects of Fluoride in Drinking Water", Aggeborn & Öhman 2021, which was published after the cutoff in their literature review.

gwern on Policymakers don't have access to paywalled articles

Dylan seems like a decent enough guy. Why not email him and request a free subscription for a specific email address such as the personal email addresses of key action officers at the redacted Office? (It's worth noting that because proprietary newsletters have zero marginal cost, their operators tend to be a lot more chill about giving away subscriptions, even ones with high face-values, than most people let themselves believe, especially if the person receiving the subscription is of any interest.)

ryan_greenblatt on Views on when AGI comes and on strategy to reduce existential risk

I think if you want to convince people with short timelines (e.g., 7 year medians) of your perspective, probably the most productive thing would be to better operationalize things you expect that AIs won't be able to do soon (but that AGI could do). As in, flesh out a response to this comment [LW(p) · GW(p)] such that it is possible for someone to judge.

jessica-liu-taylor on On Eating the Sun

We might disagree about the value of thinking about "we are all dead" timelines. To my mind, forecasting should be primarily descriptive, not normative; reality keeps going after we are all dead, and having realistic models of that is probably a useful input regarding what our degrees of freedom are. (I think people readily accept this in e.g. biology, where people can think about what happens to life after human extinction, or physics, where "all humans are dead" isn't really a relevant category that changes how physics works.)

Of course, I'm not implying it's useful for alignment to "see that the AI has already eaten the sun", it's about forecasting future timelines by defining thresholds and thinking about when they're likely to happen and how they relate to other things.

(See this post [LW · GW], section "Models of ASI should start with realism")

lorec on On Dwarkesh Patel’s 4th Podcast With Tyler Cowen

Cowen, like Hanson, discounts large qualitative societal shifts from AI that lack corresponding contemporary measurables.

Einstein was not an experimentalist, yet was perfectly capable of physics; his successors have largely not touched his unfinished work, and not for lack of data.