LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] When the Scientific Method Doesn't Really Help...
casualphysicsenjoyer (hatta_afiq) · 2024-11-27T19:52:30.023Z · comments (1)

Enabling New Applications with Today's Mechanistic Interpretability Toolkit
ananya_joshi · 2024-10-25T17:53:23.960Z · comments (0)

Hope to live or fear to die?
Knight Lee (Max Lee) · 2024-11-27T10:42:37.070Z · comments (0)

Workshop Report: Why current benchmarks approaches are not sufficient for safety?
Tom DAVID (tom-david) · 2024-11-26T17:20:47.453Z · comments (1)

[question] How do we quantify non-philanthropic contributions from Buffet and Soros?
Philosophistry (philip-dhingra) · 2024-12-20T22:50:32.260Z · answers+comments (0)

Should you increase AI alignment funding, or increase AI regulation?
Knight Lee (Max Lee) · 2024-11-26T09:17:01.809Z · comments (1)

[question] Are Sparse Autoencoders a good idea for AI control?
Gerard Boxo (gerard-boxo) · 2024-12-26T17:34:55.617Z · answers+comments (2)

The boat
RomanS · 2024-11-22T12:56:45.050Z · comments (0)

Don't want Goodhart? — Specify the variables more
YanLyutnev (YanLutnev) · 2024-11-21T22:43:48.362Z · comments (2)

How to Teach Your Brain to Hate Procrastination
10xyz (10xyz-coder) · 2024-10-21T20:12:40.809Z · comments (0)

[link] Podcast discussing Hanson's Cultural Drift Argument
vaishnav92 · 2024-10-20T17:58:41.416Z · comments (0)

Exploring the Platonic Representation Hypothesis Beyond In-Distribution Data
rokosbasilisk · 2024-10-20T08:40:04.404Z · comments (2)

[question] 2025 Alignment Predictions
anaguma · 2025-01-02T05:37:36.912Z · answers+comments (3)

Methodology: Contagious Beliefs
James Stephen Brown (james-brown) · 2024-10-19T03:58:17.966Z · comments (0)

3. Improve Cooperation: Better Technologies
Allison Duettmann (allison-duettmann) · 2025-01-02T19:03:16.588Z · comments (2)

5. Uphold Voluntarism: Digital Defense
Allison Duettmann (allison-duettmann) · 2025-01-02T19:05:33.963Z · comments (0)

[question] EndeavorOTC legit?
FinalFormal2 · 2024-10-17T01:33:12.606Z · answers+comments (0)

Some implications of radical empathy
MichaelStJules · 2025-01-07T16:10:16.755Z · comments (0)

Antonym Heads Predict Semantic Opposites in Language Models
Jake Ward (jake-ward) · 2024-11-15T15:32:14.102Z · comments (0)

Bellevue Meetup
Cedar (xida-ren) · 2024-10-16T01:07:58.761Z · comments (0)

Thoughts on the In-Context Scheming AI Experiment
ExCeph · 2025-01-09T02:19:09.558Z · comments (0)

On the Practical Applications of Interpretability
Nick Jiang (nick-jiang) · 2024-10-15T17:18:25.280Z · comments (1)

Personal Philosophy
Xor · 2024-10-13T03:01:59.324Z · comments (0)

Have frontier AI systems surpassed the self-replicating red line?
nsage (wheelspawn) · 2025-01-11T05:31:31.672Z · comments (0)

[link] Social Science in its epistemological context
Arturo Macias (arturo-macias) · 2024-12-05T16:12:29.034Z · comments (0)

Duplicate token neurons in the first layer of gpt2-small
Alex Gibson · 2024-12-27T04:21:55.896Z · comments (0)

Walking Sue
Matthew McRedmond (matthew-mcredmond) · 2024-12-18T13:19:41.575Z · comments (5)

Which AI Safety Benchmark Do We Need Most in 2025?
Loïc Cabannes (loic-cabannes) · 2024-11-17T23:50:56.337Z · comments (2)

[link] Sparks of Consciousness
Charlie Sanders (charlie-sanders) · 2024-11-13T04:58:27.222Z · comments (0)

Gothenburg LW/ACX meetup
Stefan (stefan-1) · 2024-10-29T20:40:22.754Z · comments (0)

aspirational leadership
dhruvmethi · 2024-11-20T16:07:43.507Z · comments (0)

I Recommend More Training Rationales
Gianluca Calcagni (gianluca-calcagni) · 2024-12-31T14:06:44.007Z · comments (0)

[question] Why do futurists care about the culture war?
Knight Lee (Max Lee) · 2025-01-14T07:35:05.136Z · answers+comments (2)

Algorithmic Asubjective Anthropics, Cartesian Subjective Anthropics
Lorec · 2024-12-27T01:58:39.880Z · comments (0)

Launching Third Opinion: Anonymous Expert Consultation for AI Professionals
karl (oaisis) · 2024-12-19T19:06:15.355Z · comments (0)

Truth Terminal: A reconstruction of events
crvr.fr (crdevio) · 2024-11-17T23:51:21.279Z · comments (1)

Gothenburg LW/ACX meetup
Stefan (stefan-1) · 2024-11-24T19:40:52.215Z · comments (0)

Reminder: AI Safety is Also a Behavioral Economics Problem
zoop · 2024-12-20T01:40:53.847Z · comments (0)

[question] is it possible to comment anonymously on a post?
KvmanThinking (avery-liu) · 2024-10-24T22:24:49.565Z · answers+comments (2)

[link] The Golden Opportunity for American AI
Annapurna (jorge-velez) · 2025-01-04T10:26:05.430Z · comments (8)

[link] Technical Risks of (Lethal) Autonomous Weapons Systems
Heramb · 2024-10-23T20:41:13.238Z · comments (0)

Towards a Unified Interpretability of Artificial and Biological Neural Networks
jan_bauer · 2024-12-21T23:10:45.842Z · comments (0)

How Your Physiology Affects the Mind's Projection Fallacy
YanLyutnev (YanLutnev) · 2024-12-14T21:10:23.240Z · comments (0)

The Technist Reformation: A Discussion with o1 About The Coming Economic Event Horizon
Yuli_Ban · 2024-12-11T02:34:22.329Z · comments (1)

Introducing Avatarism: A Rational Framework for Building actual Heaven
ratiba ro (ratiba-ro) · 2024-12-15T17:17:45.440Z · comments (2)

Governance Course - Week 1 Reflections
la .alis. (Diatom) · 2025-01-09T04:48:27.502Z · comments (0)

The CARLIN Method: Teaching AI How to Be Genuinely Funny
Greg Robison (grobison) · 2024-12-09T21:51:05.504Z · comments (0)

A Systematic Approach to AI Risk Analysis Through Cognitive Capabilities
Tom DAVID (tom-david) · 2025-01-09T00:18:04.608Z · comments (0)

[question] Poll: what’s your impression of altruism?
David Gross (David_Gross) · 2024-11-09T20:28:15.418Z · answers+comments (4)

Gothenburg LW / ACX meetup
Stefan (stefan-1) · 2025-01-08T21:39:18.309Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

milan-w on Why do futurists care about the culture war?

As you repeatedly point out, there are multiple solutions to each issue. Assuming good enough technology, all of them are viable. Which (if any) solutions end up being illegal, incentivized, made fun of, or made mandatory, becomes a matter of which values end up being normative. Thus, these people may be culture-warring because they think they're influencing "post-singularity" values. This would betray the fact that they aren't really thinking in classical singularitarian terms.

Alternatively, they just spent too much time on twitter and got caught up in dumb tribal instincts. Happens to the best of us.

richard_kennaway on Why do futurists care about the culture war?

A lot of people care about the culture wars because they don't believe the singularity is coming soon. Yet a lot of people who do believe it is coming soon still seem just as invested (e.g. Elon Musk, Peter Thiel, and others on the left wing).

Why?

Because the results of culture wars now will determine the post-singularity culture.

dr_s on Passages I Highlighted in The Letters of J.R.R.Tolkien

Possibly perfectionism? I experience this form of creative paralysis a lot - as soon as I get enough into the weeds of one creative form I start seeing the endless ramifications of the tiniest decision and basically can just not move a step without trying to achieve endlessly deep optimisation over the whole. Meanwhile people who can just not give a fuck and let the creative juices flow get shit done.

mattmacdermott on Implications of the inference scaling paradigm for AI safety

I think all of the following:

process-based feedback pushes against faithfulness because it incentivises having a certain kind of CoT independently of the outcome
outcome-based feedback pushes towards faithfulness because it incentivises making use of earlier tokens to get the right answer
outcome-based feedback pushes against legibility because it incentivises the model to discover new strategies that we might not know it's using
combining process-based feedback with outcome-based feedback:
- pushes extra hard against legibility because it incentivises obfuscating said strategies^[1]
- unclear sign wrt faithfulness

or at least has the potential to, depending on the details. ↩︎

dr_s on Passages I Highlighted in The Letters of J.R.R.Tolkien

I think that's a bit too extreme. Are all machines bad? No, obviously better to have mechanised agriculture than be all peasants. But he is grasping something here which we are now dealing with more directly. It's the classic Moloch trap of "if you have enough power to optimise hard enough then all slack is destroyed and eventually life itself". If you thought that was an inevitable end of all technological development (and we haven't proven it isn't yet), you may end up thinking being peasants is better too.

chavam on Applying traditional economic thinking to AGI: a trilemma

Minor quibble: It's a bit misleading to call B "experience curves", since it is also about capital accumulation and shifts in labor allocation. Without any additional experience/learning, if demand for candy doubles, we could simply build a second candy factory that does the same thing as the first one, and hire the same number of workers for it.

seth-herd on Implications of the inference scaling paradigm for AI safety

Because accurate prediction in a specialized domain requires expertise more than motivation. Forecasting is one relevant skill but knowledge of both current AI and knowledge of theoretical paths to AGI are also highly relevant.

seth-herd on Implications of the inference scaling paradigm for AI safety

That's right. Being financially motivated to accurately predict timelines is a whole different thing than having the relevant expertise to predict timelines.

raphael-roche on Passages I Highlighted in The Letters of J.R.R.Tolkien

"I think the Fall is not true historically".

While all men must die and all civilizations must collapse, the end of all things is merely the counterpart of the beginning of all things. Creation, the birth of men, and the rise of civilizations are also great patterns and memorable events, both in myths and in history. However, the feeling does not respect symmetry, perhaps due to loss aversion or the peak-end rule, the Fall - and tragedy in general -carries a uniquely strong poetic resonance. There is something epic, something existential, even more than in the beginning of things. Fatum represents the story's inevitable conclusion. I believe there is something deeply rooted, hardwired, in most of us that makes this so. Perhaps it is tied to our consciousness of finitude and our fear of the future, the unkown, death. Even if it represents a traditional and biased interpretation of history, I cannot help but feel moved. Tolkien has an unmatched ability to evoke and magnify this feeling, especially in the Silmarillion and other unfinished works, I think naturally to The Fall of Valinor and the Fall of Gondolin among other things.

joanv on Implications of the inference scaling paradigm for AI safety

Moreover, in this paradigm, forms of hidden reasoning seem likely to emerge: in multi-step reasoning, for example, the model might find it efficient to compress backtracking or common reasoning cues into cryptic tokens (e.g., "Hmmm") as a kind of shorthand to encode arbitrarily dense or unclear information. This is especially true under financial pressures to compress/shorten the Chains-of-Thought, thus allowing models to perform potentially long serial reasoning outside of human/AI oversight.