LessWrong 2.0 Reader

View: New · Old · Top

next page (older posts) →

Thoughts about Policy Ecosystems: The Missing Links in AI Governance
Echo Huang (echo-huang) · 2025-02-01T01:54:54.333Z · comments (0)

Re: Taste
lsusr · 2025-02-01T03:34:10.918Z · comments (8)

2024 was the year of the big battery, and what that means for solar power
transhumanist_atom_understander · 2025-02-01T06:27:39.082Z · comments (1)

Can 7B-8B LLMs judge their own homework?
dereshev · 2025-02-01T08:29:32.639Z · comments (0)

One-dimensional vs multi-dimensional features in interpretability
charlieoneill (kingchucky211) · 2025-02-01T09:10:01.112Z · comments (0)

Blackpool Applied Rationality Unconference 2025
Henry Prowbell · 2025-02-01T13:04:12.774Z · comments (2)

[question] How likely is an attempted coup in the United States in the next four years?
Alexander de Vries (alexander-de-vries) · 2025-02-01T13:12:04.053Z · answers+comments (2)

Blackpool Applied Rationality Unconference 2025
Henry Prowbell · 2025-02-01T14:09:44.673Z · comments (0)

[link] Poetic Methods I: Meter as Communication Protocol
adamShimi · 2025-02-01T18:22:39.676Z · comments (0)

[link] Unlocking Ethical AI and Improving Jailbreak Defenses: Reinforcement Learning with Layered Morphology (RLLM)
MiguelDev (whitehatStoic) · 2025-02-01T19:17:32.071Z · comments (2)

Post AGI effect prediction
Juliezhanggg · 2025-02-01T21:16:36.829Z · comments (0)

Towards a Science of Evals for Sycophancy
andrejfsantos · 2025-02-01T21:17:15.406Z · comments (0)

Machine Unlearning in Large Language Models: A Comprehensive Survey with Empirical Insights from the Qwen 1.5 1.8B Model
Saketh Baddam (saketh-baddam) · 2025-02-01T21:26:58.171Z · comments (2)

Exploring the coherence of features explanations in the GemmaScope
Mattia Proietti (mattia-proietti) · 2025-02-01T21:28:33.690Z · comments (0)

Retroactive If-Then Commitments
MichaelDickens · 2025-02-01T22:22:43.031Z · comments (0)

[link] Rationalist Movie Reviews
Nicholas / Heather Kross (NicholasKross) · 2025-02-01T23:10:53.184Z · comments (2)

Interpreting autonomous driving agents with attention based architecture
Manav Dahra (manav-dahra) · 2025-02-01T23:20:27.162Z · comments (0)

Falsehoods you might believe about people who are at a rationalist meetup
Screwtape · 2025-02-01T23:32:50.398Z · comments (12)

AI acceleration, DeepSeek, moral philosophy
Josh H (joshua-haas) · 2025-02-02T00:08:11.593Z · comments (0)

Seasonal Patterns in BIDA's Attendance
jefftk (jkaufman) · 2025-02-02T02:40:03.768Z · comments (0)

Chinese room AI to survive the inescapable end of compute governance
rotatingpaguro · 2025-02-02T02:42:03.627Z · comments (0)

[question] Would anyone be interested in pursuing the Virtue of Scholarship with me?
japancolorado (russell-white) · 2025-02-02T04:02:27.116Z · answers+comments (2)

ChatGPT: Exploring the Digital Wilderness, Findings and Prospects
Bill Benzon (bill-benzon) · 2025-02-02T09:54:26.008Z · comments (0)

Escape from Alderaan I
lsusr · 2025-02-02T10:48:06.533Z · comments (2)

Thoughts on Toy Models of Superposition
james__p · 2025-02-02T13:52:54.505Z · comments (2)

Gradual Disempowerment, Shell Games and Flinches
Jan_Kulveit · 2025-02-02T14:47:53.404Z · comments (36)

The Simplest Good
Jesse Hoogland (jhoogland) · 2025-02-02T19:51:14.155Z · comments (6)

Tracing Typos in LLMs: My Attempt at Understanding How Models Correct Misspellings
Ivan Dostal (#R@q0YSDZ3ov$f6J) · 2025-02-02T19:56:34.771Z · comments (1)

Conditional Importance in Toy Models of Superposition
james__p · 2025-02-02T20:35:38.655Z · comments (4)

"DL training == human learning" is a bad analogy
kman · 2025-02-02T20:59:21.259Z · comments (0)

An Introduction to Evidential Decision Theory
Babić · 2025-02-02T21:27:35.684Z · comments (2)

Exploring how OthelloGPT computes its world model
JMaar (jim-maar) · 2025-02-02T21:29:09.433Z · comments (0)

Humanity Has A Possible 99.98% Chance Of Extinction
st3rlxx · 2025-02-02T21:46:49.620Z · comments (1)

Some Theses on Motivational and Directional Feedback
abstractapplic · 2025-02-02T22:50:04.270Z · comments (3)

Use computers as powerful as in 1985 or AI controls humans or ?
jrincayc (nerd_gatherer) · 2025-02-03T00:51:05.706Z · comments (0)

[link] Keeping Capital is the Challenge
LTM · 2025-02-03T02:04:27.142Z · comments (2)

[link] Language Models and World Models, a Philosophy
kyjohnso · 2025-02-03T02:55:36.577Z · comments (0)

Pick two: concise, comprehensive, or clear rules
Screwtape · 2025-02-03T06:39:05.815Z · comments (27)

[question] Can we infer the search space of a local optimiser?
Lucius Bushnaq (Lblack) · 2025-02-03T10:17:01.661Z · answers+comments (5)

Neuron Activations to CLIP Embeddings: Geometry of Linear Combinations in Latent Space
Roman Malov · 2025-02-03T10:30:48.866Z · comments (0)

[link] OpenAI releases deep research agent
Seth Herd · 2025-02-03T12:48:44.925Z · comments (21)

o3-mini Early Days
Zvi · 2025-02-03T14:20:06.443Z · comments (0)

The Outer Levels
Jerdle (daniel-amdurer) · 2025-02-03T14:30:29.230Z · comments (3)

Stopping unaligned LLMs is easy!
Yair Halberstadt (yair-halberstadt) · 2025-02-03T15:38:27.083Z · comments (11)

The Self-Reference Trap in Mathematics
Alister Munday (alister-munday) · 2025-02-03T16:12:21.392Z · comments (23)

Gettier Cases [repost]
Antigone (luke-st-clair) · 2025-02-03T18:12:22.253Z · comments (4)

Superintelligence Alignment Proposal
Davey Morse (davey-morse) · 2025-02-03T18:47:22.287Z · comments (3)

Part 1: Enhancing Inner Alignment in CLIP Vision Transformers: Mitigating Reification Bias with SAEs and Grad ECLIP
Gilber A. Corrales (mysticdeepai) · 2025-02-03T19:30:52.505Z · comments (0)

Sleeper agents appear resilient to activation steering
Lucy Wingard (lucy-wingard) · 2025-02-03T19:31:30.702Z · comments (0)

The Overlap Paradigm: Rethinking Data's Role in Weak-to-Strong Generalization (W2SG)
Serhii Zamrii (aligning_bias) · 2025-02-03T19:31:55.282Z · comments (0)

next page (older posts) →

Archive

2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
- January
- February
- March
- April
- May
- June
- July
- August
- September
- October
- November
- December

Recent comments

caerulea-lawrence on What If Galaxies Are Alive and Atoms Have Minds? A Thought Experiment on Life Across Scales

To answer the question to pose as a precision in your comment [LW · GW], if there are structures that could be analogous to intelligence, without being literal biological? - The simple answer to that is 'yes'.

What we call 'consciousness' is not a 'neutral' lens - and there is no issue with imagining and understanding that there could be types of 'consciousness' that are shaped by very different processes than our own.

Personally I want to be part of a conscious universe, where there is communication going in all directions, and there is a shared goal and purpose. Though, since the structures might be so different, even reaching the step where they are able to differentiate themselves, and even communicate anywhere close to effectively, won't be easy. Considering how hard it is to understand ourselves, aka the signals from cells, bacteria and viruses, it might not be much easier for, say, the Earth to communicate with us.

Ideas/theories that are similar:
Panpsychism, but an idea/theory that might also fit would be Analytical Idealism.
A theory that explores this in a much more general way, looking at it from the perspective of values and paradigms, would be Spiral Dynamics.

I also don't see anything wrong with going in this direction, as an exploration. Complexity theory and emergence duly point out that there is much more to our reality, even to biology, than meets the eye.

nmca on Recent AI model progress feels mostly like bullshit

Is there an o3 update yet?

knight-lee on Power Lies Trembling: a three-book review

:) thank you so much for your thoughts.

Unfortunately, my model of the world is that if AI kills "more than 10%," it's probably going to be everyone and everything, so the insurance won't work according to my beliefs.

I only defined AI catastrophe as "killing more than 10%" because it's what the survey by Karger et al. asked the participants.

I don't believe in option 2, because if you asked people to bet against AI risk with unfavourable odds, they probably won't feel too confident against AI risk.

daniel-kokotajlo on AI 2027: What Superintelligence Looks Like

That's part of it, but also, over the course of 2027 OpenBrain works hard to optimize for data-efficiency, generalization and transfer learning ability, etc. and undergoes at least two major paradigm shifts in AI architecture.

michaeldickens on What Makes an AI Startup "Net Positive" for Safety?

I think the statement in the parent comment is too general. What I should have said is that every generalist frontier AI company has been net negative. Narrow AI companies that provide useful services and have ~zero chance of accelerating AGI are probably net positive.

lc on Three Months In, Evaluating Three Rationalist Cases for Trump

The indexes above seem to be concerned only with state restrictions on speech. But even if they weren't, I would be surprised if the private situation was any better in the UK than it is here.

gurkenglas on What Makes an AI Startup "Net Positive" for Safety?

They did the opposite, incentivizing themselves to reach the profit cap. I'm talking about making sure that any net worth beyond a billion goes to someone else.

chris_leong on Chris_Leong's Shortform

I believe those are useful frames for understanding the impacts.

jay95 on Consequentialists should have a comprehensive set of deontological beliefs they adhere to

It is, but I'm specifically saying a form of rule consequentialism that serves personal happiness about as well as it could be served is in fact rational (for anyone who is trying to maximize impersonal happiness and probably for anyone who is a consequentialist of any kind).

cubefox on jenn's Shortform

i kinda thought that ey's anti-philosophy stance was a bit extreme but this is blackpilling me pretty hard lmao

He actually cites reflective equilibrium here [? · GW]:

Closest antecedents in academic metaethics are Rawls and Goodman's reflective equilibrium, Harsanyi and Railton's ideal advisor theories, and Frank Jackson's moral functionalism.