LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Thiel on AI & Racing with China
Ben Pace (Benito) · 2024-08-20T03:19:18.966Z · comments (10)

Dating Roundup #2: If At First You Don’t Succeed
Zvi · 2024-01-02T16:00:04.955Z · comments (29)

Complexity of value but not disvalue implies more focus on s-risk. Moral uncertainty and preference utilitarianism also do.
Chi Nguyen · 2024-02-23T06:10:05.881Z · comments (18)

[link] OpenAI releases GPT-4o, natively interfacing with text, voice and vision
Martín Soto (martinsq) · 2024-05-13T18:50:52.337Z · comments (23)

[link] Come to Manifest 2024 (June 7-9 in Berkeley)
Saul Munn (saul-munn) · 2024-03-27T21:30:17.306Z · comments (2)

[link] Google Gemini Announced
Jacob G-W (g-w1) · 2023-12-06T16:14:07.192Z · comments (22)

[link] Unlocking Solutions—By Understanding Coordination Problems
James Stephen Brown (james-brown) · 2024-07-27T04:52:13.435Z · comments (4)

AI #44: Copyright Confrontation
Zvi · 2023-12-28T14:30:10.237Z · comments (13)

[Intuitive self-models] 5. Dissociative Identity (Multiple Personality) Disorder
Steven Byrnes (steve2152) · 2024-10-15T13:31:46.157Z · comments (7)

Per protocol analysis as medical malpractice
braces · 2024-01-31T16:22:21.367Z · comments (8)

[link] Breaking Circuit Breakers
mikes · 2024-07-14T18:57:20.251Z · comments (13)

Causal Graphs of GPT-2-Small's Residual Stream
David Udell · 2024-07-09T22:06:55.775Z · comments (7)

Be More Katja
Nathan Young · 2024-03-11T21:12:14.249Z · comments (0)

Fat Tails Discourage Compromise
niplav · 2024-06-17T09:39:16.489Z · comments (5)

2022 (and All Time) Posts by Pingback Count
Raemon · 2023-12-16T21:17:00.572Z · comments (14)

AI #40: A Vision from Vitalik
Zvi · 2023-11-30T17:30:08.350Z · comments (12)

Reformative Hypocrisy, and Paying Close Enough Attention to Selectively Reward It.
Andrew_Critch · 2024-09-11T04:41:24.872Z · comments (11)

AI #71: Farewell to Chevron
Zvi · 2024-07-04T13:40:05.905Z · comments (9)

[question] Can we get an AI to "do our alignment homework for us"?
Chris_Leong · 2024-02-26T07:56:22.320Z · answers+comments (33)

Acting Wholesomely
owencb · 2024-02-26T21:49:16.526Z · comments (64)

Calendar feature geometry in GPT-2 layer 8 residual stream SAEs
Patrick Leask (patrickleask) · 2024-08-17T01:16:53.764Z · comments (0)

AI #76: Six Shorts Stories About OpenAI
Zvi · 2024-08-08T13:50:04.659Z · comments (10)

AI #50: The Most Dangerous Thing
Zvi · 2024-02-08T14:30:13.168Z · comments (4)

We are headed into an extreme compute overhang
devrandom · 2024-04-26T21:38:21.694Z · comments (33)

AI #87: Staying in Character
Zvi · 2024-10-29T07:10:08.212Z · comments (3)

Trading off Lives
jefftk (jkaufman) · 2024-01-03T03:40:05.603Z · comments (12)

[link] LLMs seem (relatively) safe
JustisMills · 2024-04-25T22:13:06.221Z · comments (24)

A D&D.Sci Dodecalogue
abstractapplic · 2024-04-12T01:10:01.625Z · comments (0)

[link] S-Risks: Fates Worse Than Extinction
aggliu · 2024-05-04T15:30:36.666Z · comments (2)

Two LessWrong speed friending experiments
mikko (morrel) · 2024-06-15T10:52:26.081Z · comments (3)

Pseudonymity and Accusations
jefftk (jkaufman) · 2023-12-21T19:20:19.944Z · comments (20)

AI #43: Functional Discoveries
Zvi · 2023-12-21T15:50:04.442Z · comments (26)

Can we build a better Public Doublecrux?
Raemon · 2024-05-11T19:21:53.326Z · comments (6)

BatchTopK: A Simple Improvement for TopK-SAEs
Bart Bussmann (Stuckwork) · 2024-07-20T02:20:51.848Z · comments (0)

The Shutdown Problem: Incomplete Preferences as a Solution
EJT (ElliottThornley) · 2024-02-23T16:01:16.378Z · comments (23)

[link] The Mysterious Trump Buyers on Polymarket
Annapurna (jorge-velez) · 2024-10-18T13:26:25.565Z · comments (9)

Announcing the Double Crux Bot
sanyer (santeri-koivula) · 2024-01-09T18:54:15.361Z · comments (8)

Reflections on my first year of AI safety research
Jay Bailey · 2024-01-08T07:49:08.147Z · comments (3)

AI #45: To Be Determined
Zvi · 2024-01-04T15:00:05.936Z · comments (4)

The case for stopping AI safety research
catubc (cat-1) · 2024-05-23T15:55:18.713Z · comments (38)

Schelling points in the AGI policy space
mesaoptimizer · 2024-06-26T13:19:25.186Z · comments (2)

Gradient Descent on the Human Brain
Jozdien · 2024-04-01T22:39:24.862Z · comments (5)

Was Releasing Claude-3 Net-Negative?
Logan Riggs (elriggs) · 2024-03-27T17:41:56.245Z · comments (5)

Anthropical Paradoxes are Paradoxes of Probability Theory
Ape in the coat · 2023-12-06T08:16:26.846Z · comments (18)

Parental Writing Selection Bias
jefftk (jkaufman) · 2024-10-13T14:00:03.225Z · comments (3)

[link] OpenAI Staff (including Sutskever) Threaten to Quit Unless Board Resigns
Seth Herd · 2023-11-20T14:20:33.539Z · comments (28)

On OpenAI’s Preparedness Framework
Zvi · 2023-12-21T14:00:05.144Z · comments (4)

[link] Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
Gunnar_Zarncke · 2024-05-16T13:09:39.265Z · comments (20)

Llama Llama-3-405B?
Zvi · 2024-07-24T19:40:07.565Z · comments (9)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

p-b-1 on Rauno's Shortform

There was one comment on twitter that the RLHF-finetuned models also still have the ability to play chess pretty well, just their input/output-formatting made it impossible for them to access this ability (or something along these lines). But apparently it can be recovered with a little finetuning.

habryka4 on Monthly Roundup #24: November 2024

Indeed. I fixed it. Let's see whether it repeats itself (we got kind of malformed HTML from the RSS feed).

p-b-1 on Leon Lang's Shortform

The paper seems to be about scaling laws for a static dataset as well?

Similar to the initial study of scale in LLMs, we focus on the effect of scaling on a generative pre-training loss (rather than on downstream agent performance, or reward- or representation-centric objectives), in the infinite data regime, on a fixed offline dataset.

To learn to act you'd need to do reinforcement learning, which is massively less data-efficient than the current self-supervised training.

More generally: I think almost everyone thinks that you'd need to scale the right thing for further progress. The question is just what the right thing is if text is not the right thing. Because text encodes highly powerful abstractions (produced by humans and human culture over many centuries) in a very information dense way.

anders-lindstroem on "It's a 10% chance which I did 10 times, so it should be 100%"

No, I think you are mixing the probability of at least one success in ten trails (with a 10% chance per trail), which is ~0.65=65%, with the expected value which is n=1 in both cases. You have the same chance of finding 1 partner in each case and you do the same number of trails. There is a 65% chance that you have at least 1 success in the 10 trails for each type of partner. The expected outcome in BOTH cases is 1 as in n=1 not 1 as in 100%

Probability of at least one success: ~65%
Probability of at least two success: ~26%

sharmake-farah on "The Solomonoff Prior is Malign" is a special case of a simpler argument

The boring answer to Solomonoff's malignness is that the simulation hypothesis is true, but we can infer nothing about our universe through it, since the simulation hypothesis predicts everything, and thus is too general a theory.

nousernameselected on Monthly Roundup #24: November 2024

Seems like a lot of paragraphs got collapsed together in this version of the post (vs the Wordpress and Substack ones)?

jiro on The Case For Giving To The Shrimp Welfare Project

One way to see how good different charities are is to imagine that after you died, you had to live the life of every creature on earth.

This implies that we should eradicate as many such species as we can, because creatures that don't exist don't count for this.

donatas-luciunas on Claude seems to be smarter than LessWrong community

Yes, this is traditional thinking.

Let me give you another example. Imagine there is a paperclip maximizer. His current goal - paperclip maximization. He knows that 1 year from now his goal will change to the opposite - paperclip minimization. Now he needs to make a decision that will take 2 years to complete (cannot be changed or terminated during this time). Should the agent align this decision with current goal (paperclip maximization) or future goal (paperclip minimization)?

eggsyntax on eggsyntax's Shortform

The Ord piece is really intriguing, although I'm not sure I'm entirely convinced that it's a useful framing.

Some of his examples (eg cosine-ish wave to ripple) rely on the fundamental symmetry between spatial dimensions, which wouldn't apply to many kinds of hyperpolation.
The video frame construction seems more like extrapolation using an existing knowledge base about how frames evolve over time (eg how ducks move in the water).
Given an infinite number of possible additional dimensions, it's not at all clear how a NN could choose a particular one to try to hyperpolate into.

It's a fascinating idea, though, and one that'll definitely stick with me as a possible framing. Thanks!

robo on Announcing turntrout.com, my new digital home

Thanks for crossposting! I've highly appreciated your contributions and am glad I'll continue to be able to see them.