LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Three Subtle Examples of Data Leakage
abstractapplic · 2024-10-01T20:45:27.731Z · comments (16)

Thoughts on the AI Safety Summit company policy requests and responses
So8res · 2023-10-31T23:54:09.566Z · comments (14)

2023 Unofficial LessWrong Census/Survey
Screwtape · 2023-12-02T04:41:51.418Z · comments (81)

[link] Recommendation: reports on the search for missing hiker Bill Ewasko
eukaryote · 2024-07-31T22:15:03.174Z · comments (28)

Reconsider the anti-cavity bacteria if you are Asian
Lao Mein (derpherpize) · 2024-04-15T07:02:02.655Z · comments (43)

The 'Neglected Approaches' Approach: AE Studio's Alignment Agenda
Cameron Berg (cameron-berg) · 2023-12-18T20:35:01.569Z · comments (21)

Many arguments for AI x-risk are wrong
TurnTrout · 2024-03-05T02:31:00.990Z · comments (86)

[link] The King and the Golem
Richard_Ngo (ricraz) · 2023-09-25T19:51:22.980Z · comments (16)

RSPs are pauses done right
evhub · 2023-10-14T04:06:02.709Z · comments (70)

How useful is mechanistic interpretability?
ryan_greenblatt · 2023-12-01T02:54:53.488Z · comments (54)

[link] Boycott OpenAI
PeterMcCluskey · 2024-06-18T19:52:42.854Z · comments (26)

Announcing ILIAD — Theoretical AI Alignment Conference
Nora_Ammann · 2024-06-05T09:37:39.546Z · comments (18)

Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI
Jeremy Gillen (jeremy-gillen) · 2024-01-26T07:22:06.370Z · comments (60)

You can remove GPT2’s LayerNorm by fine-tuning for an hour
StefanHex (Stefan42) · 2024-08-08T18:33:38.803Z · comments (11)

Is being sexy for your homies?
Valentine · 2023-12-13T20:37:02.043Z · comments (92)

The likely first longevity drug is based on sketchy science. This is bad for science and bad for longevity.
BobBurgers · 2023-12-12T02:42:18.559Z · comments (34)

[link] Sycophancy to subterfuge: Investigating reward tampering in large language models
Carson Denison (carson-denison) · 2024-06-17T18:41:31.090Z · comments (22)

[link] Connecting the Dots: LLMs can Infer & Verbalize Latent Structure from Training Data
Johannes Treutlein (Johannes_Treutlein) · 2024-06-21T15:54:41.430Z · comments (13)

And All the Shoggoths Merely Players
Zack_M_Davis · 2024-02-10T19:56:59.513Z · comments (57)

DeepMind's "Frontier Safety Framework" is weak and unambitious
Zach Stein-Perlman · 2024-05-18T03:00:13.541Z · comments (14)

Vote on Interesting Disagreements
Ben Pace (Benito) · 2023-11-07T21:35:00.270Z · comments (129)

[link] Making every researcher seek grants is a broken model
jasoncrawford · 2024-01-26T16:06:26.688Z · comments (41)

Sparse Autoencoders Find Highly Interpretable Directions in Language Models
Logan Riggs (elriggs) · 2023-09-21T15:30:24.432Z · comments (8)

Most People Don't Realize We Have No Idea How Our AIs Work
Thane Ruthenis · 2023-12-21T20:02:00.360Z · comments (42)

Holly Elmore and Rob Miles dialogue on AI Safety Advocacy
jacobjacob · 2023-10-20T21:04:32.645Z · comments (30)

What’s up with LLMs representing XORs of arbitrary features?
Sam Marks (samuel-marks) · 2024-01-03T19:44:33.162Z · comments (61)

EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024
scasper · 2024-05-21T20:15:36.502Z · comments (16)

[link] Succession
Richard_Ngo (ricraz) · 2023-12-20T19:25:03.185Z · comments (48)

[link] Masterpiece
Richard_Ngo (ricraz) · 2024-02-13T23:10:35.376Z · comments (21)

You can just spontaneously call people you haven't met in years
lc · 2023-11-13T05:21:05.726Z · comments (21)

Deep Honesty
Aletheophile (aletheo) · 2024-05-07T20:31:48.734Z · comments (25)

Formal verification, heuristic explanations and surprise accounting
Jacob_Hilton · 2024-06-25T15:40:03.535Z · comments (11)

Announcing Dialogues
Ben Pace (Benito) · 2023-10-07T02:57:39.005Z · comments (52)

My thoughts on the social response to AI risk
Matthew Barnett (matthew-barnett) · 2023-11-01T21:17:08.184Z · comments (37)

Language Models Model Us
eggsyntax · 2024-05-17T21:00:34.821Z · comments (54)

[link] "Diamondoid bacteria" nanobots: deadly threat or dead-end? A nanotech investigation
titotal (lombertini) · 2023-09-29T14:01:15.453Z · comments (79)

[link] Comp Sci in 2027 (Short story by Eliezer Yudkowsky)
sudo · 2023-10-29T23:09:56.730Z · comments (22)

Dyslucksia
Shoshannah Tekofsky (DarkSym) · 2024-05-09T19:21:33.874Z · comments (45)

OpenAI: Exodus
Zvi · 2024-05-20T13:10:03.543Z · comments (26)

Ironing Out the Squiggles
Zack_M_Davis · 2024-04-29T16:13:00.371Z · comments (36)

The Incredible Fentanyl-Detecting Machine
sarahconstantin · 2024-06-28T22:10:01.223Z · comments (26)

[question] things that confuse me about the current AI market.
DMMF · 2024-08-28T13:46:56.908Z · answers+comments (28)

Apologizing is a Core Rationalist Skill
johnswentworth · 2024-01-02T17:47:35.950Z · comments (42)

Tips for Empirical Alignment Research
Ethan Perez (ethan-perez) · 2024-02-29T06:04:54.481Z · comments (4)

My takes on SB-1047
leogao · 2024-09-09T18:38:37.799Z · comments (8)

[link] Daniel Dennett has died (1942-2024)
kave · 2024-04-19T16:17:04.742Z · comments (5)

2023 Survey Results
Screwtape · 2024-02-16T22:24:28.132Z · comments (26)

[link] Using axis lines for good or evil
dynomight · 2024-03-06T14:47:10.989Z · comments (39)

[link] Will no one rid me of this turbulent pest?
Metacelsus · 2023-10-14T15:27:21.497Z · comments (23)

A Rocket–Interpretability Analogy
plex (ete) · 2024-10-21T13:55:18.184Z · comments (31)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

gunnar_zarncke on Could orcas be (trained to be) smarter than humans? 

As I commented [LW(p) · GW(p)] on Are big brains for processing sensory input? [LW · GW] I predict that the brain regions of a whale or Orca responsible for spatiotemporal learning and memory are a big part of their encephalization.

tslarm on Update on the Mysterious Trump Buyers on Polymarket

Can't this only be judged in retrospect, and over a decent sample size? If all the markets did was reflect the public expert consensus, they wouldn't be very useful; the possibility that they're doing significantly better is still open.

(I'm assuming that by "every other prediction source" you mean everything other than prediction/betting markets, because it sounds like Polymarket is no longer out of line with the other markets. Betfair is the one I keep an eye on, and that's at 60/40 too.)

aysja on johnswentworth's Shortform

I think I probably agree, although I feel somewhat wary about it. My main hesitations are:

The lack of epistemic modifiers seems off to me, relative to the strength of the arguments they’re making. Such that while I agree with many claims, my imagined reader who is coming into this with zero context is like “why should I believe this?” E.g., “Without intervention, humanity will be summarily outcompeted and relegated to irrelevancy,” which like, yes, but also—on what grounds should I necessarily conclude this? They gave some argument along the lines of “intelligence is powerful,” and that seems probably true, but imo not enough to justify the claim that it will certainly lead to our irrelevancy. All of this would be fixed (according to me) if it were framed more as like “here are some reasons you might be pretty worried,” of which there are plenty, or "here's what I think," rather than “here is what will definitely happen if we continue on this path,” which feels less certain/obvious to me.
Along the same lines, I think it’s pretty hard to tell whether this piece is in good faith or not. E.g., in the intro Connor writes “The default path we are on now is one of ruthless, sociopathic corporations racing toward building the most intelligent, powerful AIs as fast as possible to compete with one another and vie for monopolization and control of both the market and geopolitics.” Which, again, I don’t necessarily disagree with, but my imagined reader with zero context is like “what, really? sociopaths? control over geopolitics?” I.e., I’m expecting readers to question the integrity of the piece, and to be more unsure of how to update on it (e.g. "how do I know this whole thing isn't just a strawman?" etc.).
There are many places where they kind of just state things without justifying them much. I think in the best case this might cause readers to think through whether such claims make sense (either on their own, or by reading the hyperlinked stuff—both of which put quite a lot of cognitive load on them), and in the worst case just causes readers to either bounce or kind of blindly swallow what they’re saying. E.g., “Black-Box Evaluations can only catch all relevant safety issues insofar as we have either an exhaustive list of all possible failure modes, or a mechanistic model of how concrete capabilities lead to safety risks.” They say this without argument and then move on. And although I agree with them (having spent a lot of time thinking this through myself), it’s really not obvious at first blush. Why do you need an exhaustive list? One might imagine, for instance, that a small number of tests would generalize well. And do you need mechanistic models? Sometimes medicines work safely without that, etc., etc. I haven’t read the entire Compendium closely, but my sense is that this is not an isolated incident. And I don't think this is a fatal flaw or anything—they're moving through a ton of material really fast and it's hard to give a thorough account of all claims—but it does make me more hesitant to use it as the default "here's what's happening" document.

All of that said, I do broadly agree with the set of arguments, and I think it’s a really cool activity for people to write up what they believe. I’m glad they did it. But I’m not sure how comfortable I feel about sending it to people who haven’t thought much about AI.

d0themath on Update on the Mysterious Trump Buyers on Polymarket

The promise of prediction markets was that they are either useful or allow you to take money from rich idiots. I’d say that was fulfilled.

Also, useful is very different from perfect. They are still very adequate for a large variety of questions.

kvmanthinking on Chapter 27: Empathy

Harry's brain tried to calculate the ramifications and implications of this and ran out of swap space.

this is very relatable

ryankidd44 on Ryan Kidd's Shortform

I'm not sure!

yair-halberstadt on Could orcas be (trained to be) smarter than humans? 

Douglas Adams answered this long ago of course:

For instance, on the planet Earth, man had always assumed that he was more intelligent than dolphins because he had achieved so much—the wheel, New York, wars and so on—whilst all the dolphins had ever done was muck about in the water having a good time. But conversely, the dolphins had always believed that they were far more intelligent than man—for precisely the same reasons.

petermccluskey on Investing for a World Transformed by AI

No, I don't recall any ethical concerns. Just basic concerns such as the difficulty of finding a boss that I'm comfortable with, having control over my hours, etc.

steve2152 on Complete Feedback

This is a confusing post from my perspective, because I think of LI as being about beliefs and corrigibility being about desires.

If I want my AGI to believe that the sky is green, I guess it’s good if it’s possible to do that. But it’s kinda weird, and not a central example of corrigibility.

Admittedly, one can try to squish beliefs and desires into the same framework. The Active Inference people do that. Does LI do that too? If so, well, I’m generally very skeptical of attempts to do that kind of thing. See here [LW · GW], especially Section 7. In the case of humans, it’s perfectly possible for a plan to seem desirable but not plausible, or for a plan to seem plausible but not desirable. I think there are very good reasons that our brains are set up that way.

gordon-seidoh-worley on What if muscle tension is sometimes signal jamming?

I don't know, but I can say that after a lot of hours of Alexander lessons my posture and movement improved in ways that would be described as "having less muscle tension" and this having less tension happened in conjunction with various sorts of opening and being more awake and moving closer to PNSE.