LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

A suite of Vision Sparse Autoencoders
Louka Ewington-Pitsos (louka-ewington-pitsos) · 2024-10-27T04:05:20.377Z · comments (0)

Thoughts after the Wolfram and Yudkowsky discussion
Tahp · 2024-11-14T01:43:12.920Z · comments (13)

[link] Death notes - 7 thoughts on death
Nathan Young · 2024-10-28T15:01:13.532Z · comments (1)

[question] When engaging with a large amount of resources during a literature review, how do you prevent yourself from becoming overwhelmed?
corruptedCatapillar · 2024-11-01T07:29:49.262Z · answers+comments (2)

Fun With The Tabula Muris (Senis)
sarahconstantin · 2024-09-20T18:20:01.901Z · comments (0)

[link] Conventional footnotes considered harmful
dkl9 · 2024-10-01T14:54:01.732Z · comments (16)

[link] Linkpost: "Imagining and building wise machines: The centrality of AI metacognition" by Johnson, Karimi, Bengio, et al.
Chris_Leong · 2024-11-11T16:13:26.504Z · comments (6)

How to put California and Texas on the campaign trail!
Yair Halberstadt (yair-halberstadt) · 2024-11-06T06:08:25.673Z · comments (4)

Abstractions are not Natural
Alfred Harwood · 2024-11-04T11:10:09.023Z · comments (21)

[link] Sticker Shortcut Fallacy — The Real Worst Argument in the World
ymeskhout · 2024-06-12T14:52:41.988Z · comments (15)

[link] Announcing Open Philanthropy's AI governance and policy RFP
Julian Hazell (julian-hazell) · 2024-07-17T02:02:39.933Z · comments (0)

You're Playing a Rough Game
jefftk (jkaufman) · 2024-10-17T19:20:06.251Z · comments (2)

[link] An Intuitive Explanation of Sparse Autoencoders for Mechanistic Interpretability of LLMs
Adam Karvonen (karvonenadam) · 2024-06-25T15:57:16.872Z · comments (0)

[link] Executive Dysfunction 101
DaystarEld · 2024-05-23T12:43:13.785Z · comments (1)

Proving the Geometric Utilitarian Theorem
StrivingForLegibility · 2024-08-07T01:39:10.920Z · comments (0)

Housing Roundup #9: Restricting Supply
Zvi · 2024-07-17T12:50:05.321Z · comments (8)

A Visual Task that's Hard for GPT-4o, but Doable for Primary Schoolers
Lennart Finke (l-f) · 2024-07-26T17:51:28.202Z · comments (4)

A Triple Decker for Elfland
jefftk (jkaufman) · 2024-10-11T01:50:02.332Z · comments (0)

[question] When can I be numerate?
FinalFormal2 · 2024-09-12T04:05:27.710Z · answers+comments (3)

Beta Tester Request: Rallypoint Bounties
lukemarks (marc/er) · 2024-05-25T09:11:11.446Z · comments (4)

Can Generalized Adversarial Testing Enable More Rigorous LLM Safety Evals?
scasper · 2024-07-30T14:57:06.807Z · comments (0)

[link] A primer on the next generation of antibodies
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-01T22:37:59.207Z · comments (0)

Economics Roundup #1
Zvi · 2024-03-26T14:00:06.332Z · comments (4)

AXRP Episode 36 - Adam Shai and Paul Riechers on Computational Mechanics
DanielFilan · 2024-09-29T05:50:02.531Z · comments (0)

[link] Was a Subway in New York City Inevitable?
Jeffrey Heninger (jeffrey-heninger) · 2024-03-30T00:53:21.314Z · comments (4)

To Boldly Code
StrivingForLegibility · 2024-01-26T18:25:59.525Z · comments (4)

[link] Structured Transparency: a framework for addressing use/mis-use trade-offs when sharing information
habryka (habryka4) · 2024-04-11T18:35:44.824Z · comments (0)

[link] MIRI's July 2024 newsletter
Harlan · 2024-07-15T21:28:17.343Z · comments (2)

An experiment on hidden cognition
Olli Järviniemi (jarviniemi) · 2024-07-22T03:26:05.564Z · comments (2)

[link] what becoming more secure did for me
Chipmonk · 2024-08-22T17:44:48.525Z · comments (5)

[question] How to Model the Future of Open-Source LLMs?
Joel Burget (joel-burget) · 2024-04-19T14:28:00.175Z · answers+comments (9)

Twin Peaks: under the air
KatjaGrace · 2024-05-31T01:20:04.624Z · comments (2)

A Review of In-Context Learning Hypotheses for Automated AI Alignment Research
alamerton · 2024-04-18T18:29:33.892Z · comments (4)

Trying to be rational for the wrong reasons
Viliam · 2024-08-20T16:18:06.385Z · comments (8)

[link] Beware the science fiction bias in predictions of the future
Nikita Sokolsky (nikita-sokolsky) · 2024-08-19T05:32:47.372Z · comments (20)

Virtually Rational - VRChat Meetup
Tomás B. (Bjartur Tómas) · 2024-01-28T05:52:36.934Z · comments (3)

Improving Model-Written Evals for AI Safety Benchmarking
Sunishchal Dev (sunishchal-dev) · 2024-10-15T18:25:08.179Z · comments (0)

Using an LLM perplexity filter to detect weight exfiltration
Adam Karvonen (karvonenadam) · 2024-07-21T18:18:05.612Z · comments (11)

Distinctions when Discussing Utility Functions
ozziegooen · 2024-03-09T20:14:03.592Z · comments (7)

[link] Let's Design A School, Part 2.3 School as Education - The Curriculum (Phase 2, Specific)
Sable · 2024-05-15T20:58:50.981Z · comments (0)

[link] Was Partisanship Good for the Environmental Movement?
Jeffrey Heninger (jeffrey-heninger) · 2024-05-15T17:30:54.796Z · comments (0)

[link] "25 Lessons from 25 Years of Marriage" by honorary rationalist Ferrett Steinmetz
CronoDAS · 2024-10-02T22:42:30.509Z · comments (2)

I didn't think I'd take the time to build this calibration training game, but with websim it took roughly 30 seconds, so here it is!
mako yass (MakoYass) · 2024-08-02T22:35:21.136Z · comments (2)

[link] Secret US natsec project with intel revealed
Nathan Helm-Burger (nathan-helm-burger) · 2024-05-25T04:22:11.624Z · comments (0)

[link] Altruism and Vitalism Aren't Fellow Travelers
Arjun Panickssery (arjun-panickssery) · 2024-08-09T02:01:11.361Z · comments (2)

[link] Truth is Universal: Robust Detection of Lies in LLMs
Lennart Buerger · 2024-07-19T14:07:25.162Z · comments (3)

[link] Scenario planning for AI x-risk
Corin Katzke (corin-katzke) · 2024-02-10T00:14:11.934Z · comments (12)

Language and Capabilities: Testing LLM Mathematical Abilities Across Languages
Ethan Edwards · 2024-04-04T13:18:54.909Z · comments (2)

UDT1.01: Local Affineness and Influence Measures (2/10)
Diffractor · 2024-03-31T07:35:52.831Z · comments (0)

[link] Robert Caro And Mechanistic Models In Biography
adamShimi · 2024-07-14T10:56:42.763Z · comments (5)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

david-james on Neutrality

Durable institutions find ways to survive beyond their first leader. Its policies would have to protect the institution against the leader’s whims and potential corruptions. In the case of Elon, based on his mercurial history, I would not bet that Musk would agree to the requisite policies.

viliam on What are Emotions?

Emotions are about reality, but emotions are also a part of reality, so we also have emotions about emotions. I can feel happy about some good thing happening in the outside world. And, separately, I can feel happy about being happy.

In the thought experiments about wireheading, people often say that they don't just want to experience (possibly fake) happy thoughts about X; they also want X to actually happen.

But let's imagine the converse: what if someone proposed a surgery that would make you unable to ever feel happy about X, even if you knew that X actually happened in the world. People would probably refuse that, too. Intuitively, we want to feel good emotions that we "deserve", plus there is also the factor of motivation. Okay, so let's imagine a surgery that removes your ability to feel happy about X, but solves the problem of motivation by e.g. giving you an urge to do X. People would probably refuse that, too.

So I think we actually want both the emotions and the things the emotions are about.

mr-hire on Matt Goldenberg's Short Form Feed

A lot of people are looking at the implications of o1's training process as a future scaling paradigm, but it seems to me that this implementation of applying inference time compute to just in time fine tune the model for hard questions is equally promising and may have equally impressive results if it scales with compute, and has equal potential in terms of low hanging fruit to be picked to improve it.

Don't sleep on test time training as a potential future scaling paradigm.

turntrout on Announcing turntrout.com, my new digital home

IIRC my site checks (in descending priority):

localStorage to see if they've already told my site a light/dark preference;
whether the user's browser indicates a global light/dark preference (this is the "auto");
if there's no preference, the site defaults to light.

The idea is "I'll try doing the right thing (auto), and if the user doesn't like it they can change it and I'll listen to that choice." Possibly it will still be counterintuitive to many folks, as Said quoted in a sibling comment.

viliam on What are some positive developments in AI safety in 2024?

Welp, this was a short list.

viliam on Neutrality

Speaking only for myself, I can agree with the abstract approach (therefore: upvote), but I am not familiar with any of the existing projects mentioned in the article (therefore: no vote; because I have no idea how useful the projects actually are, and thus how useful is the list of them).

anders-lindstroem on "It's a 10% chance which I did 10 times, so it should be 100%"

Why would is the expectation to find a polyamorous partner be higher in the case you gave? Same chance per try and same number of tries should equal same expectation.

jeremy-gillen on "It's a 10% chance which I did 10 times, so it should be 100%"

Nice.

Similar rule of thumb I find handy is divide by 70 to get doubling time implied by a growth rate. I find it way easier to think about doubling times than growth rates.

E.g. 3% interest rate means 70/3 ≈ 23 year doubling time.

davekasten on Proposing the Conditional AI Safety Treaty (linkpost TIME)

Is there a longer-form version with draft treaty langugage (even an outline)? I'd be curious to read it.

gunnar_zarncke on Gunnar_Zarncke's Shortform

agents that have preferences about the state of the world in the distant future

What are these preferences? For biological agents, these preferences are grounded in some mechanism - what you call Steering System - that evaluates "desirable states" of the world in some more or less directly measurable way (grounded in perception via the senses) and derives a signal of how desirable the state is, which the brain is optimizing for. For ML models, the mechanism is somewhat different but there is also an input to the training algorithm that determines how "good" the output is. This signal is called reward and drives the system toward outputs that lead to states of high reward. But the path there depends on the specific optimization method and the algorithm has to navigate such a complex loss landscape that it can get stuck in areas of the search space that correspond to imperfect models for very long if not for ever. These imperfect models can be off in significant ways and that's why it may be useful to say that Reward is not the optimization target [LW · GW].

The connection to Intuitive Self-Models is that even though the internal models of an LLM may be very different from human self-models, I think it is still quite plausible that LLMs and other models form models of the self. Such models are instrumentally convergent [? · GW]. Humans talk about the self. The LLM does things that matches these patterns. Maybe the underlying process in humans that give rise to this is different, but humans learning about this can't know the actual process either. And in the same way the approximate model the LLM forms is not maximizing the reward signal but can be quite far from it as long it is useful (in the sense of having higher reward than other such models/parameter combinations).

I think of my toenail as “part of myself”, but I’m happy to clip it.

Sure, the (body of the) self can include parts that can be cut/destroyed without that "causing harm" but instead having an overall positive effect. The AI in a compute center would in analogy also consider decommissioning failed hardware. And when defining humanity, we do have to be careful what we mean when these "parts" could be humans.