LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Request for Information for a new US AI Action Plan (OSTP RFI)
agucova · 2025-02-07T20:40:36.034Z · comments (0)

Jevon's paradox and economic intuitions
Abhimanyu Pallavi Sudhir (abhimanyu-pallavi-sudhir) · 2025-01-27T23:04:23.854Z · comments (0)

[link] Whereby: The Zoom alternative you probably haven't heard of
Itay Dreyfus (itay-dreyfus) · 2025-01-29T13:01:08.564Z · comments (0)

[question] Why not train reasoning models with RLHF?
CBiddulph (caleb-biddulph) · 2025-01-30T07:58:35.742Z · answers+comments (4)

[link] Bayesian Reasoning on Maps
Sjlver (jonas-wagner) · 2025-01-22T10:45:03.584Z · comments (0)

Thoughts on Toy Models of Superposition
james__p · 2025-02-02T13:52:54.505Z · comments (0)

ML4Good Colombia - Applications Open to LatAm Participants
Alejandro Acelas (alejandro-acelas) · 2025-02-10T15:03:03.929Z · comments (0)

[link] A concise definition of what it means to win
testingthewaters · 2025-01-25T06:37:37.305Z · comments (1)

[link] How do you make a 250x better vaccine at 1/10 the cost? Develop it in India.
Abhishaike Mahajan (abhishaike-mahajan) · 2025-02-09T03:53:17.050Z · comments (5)

Will LLMs supplant the field of creative writing?
Declan Molony (declan-molony) · 2025-01-28T06:42:24.799Z · comments (14)

Claude 3.5 Sonnet (New)'s AGI scenario
Nathan Young · 2025-02-17T18:47:04.669Z · comments (2)

Permanent properties of things are a self-fulfilling prophecy
YanLyutnev (YanLutnev) · 2025-02-19T00:08:20.776Z · comments (0)

Moral gauge theory: A speculative suggestion for AI alignment
James Diacoumis (james-diacoumis) · 2025-02-23T11:42:31.083Z · comments (2)

OpenAI’s NSFW policy: user safety, harm reduction, and AI consent
8e9 · 2025-02-13T13:59:22.911Z · comments (3)

[link] Understanding AI World Models w/ Chris Canal
jacobhaimes · 2025-01-27T16:32:47.724Z · comments (0)

Neuron Activations to CLIP Embeddings: Geometry of Linear Combinations in Latent Space
Roman Malov · 2025-02-03T10:30:48.866Z · comments (0)

Cross-Layer Feature Alignment and Steering in Large Language Model
dlaptev · 2025-02-08T20:18:20.331Z · comments (0)

Allegory of the Tsunami
Evan Hu (evan-hu) · 2025-01-29T19:09:33.761Z · comments (1)

[link] Unlocking Ethical AI and Improving Jailbreak Defenses: Reinforcement Learning with Layered Morphology (RLLM)
MiguelDev (whitehatStoic) · 2025-02-01T19:17:32.071Z · comments (2)

[link] Demonstrating specification gaming in reasoning models
Matrice Jacobine · 2025-02-20T19:26:20.563Z · comments (0)

Sleeper agents appear resilient to activation steering
Lucy Wingard (lucy-wingard) · 2025-02-03T19:31:30.702Z · comments (0)

Response to the US Govt's Request for Information Concerning Its AI Action Plan
Davey Morse (davey-morse) · 2025-02-14T06:14:08.673Z · comments (0)

[question] How likely is an attempted coup in the United States in the next four years?
Alexander de Vries (alexander-de-vries) · 2025-02-01T13:12:04.053Z · answers+comments (2)

How are Those AI Participants Doing Anyway?
mushroomsoup · 2025-01-24T22:37:47.999Z · comments (0)

When you downvote, explain why
KvmanThinking (avery-liu) · 2025-02-07T01:03:44.097Z · comments (31)

[link] AISN #48: Utility Engineering and EnigmaEval
Corin Katzke (corin-katzke) · 2025-02-18T19:15:16.751Z · comments (0)

Proposal: Safeguarding Against Jailbreaking Through Iterative Multi-Turn Testing
jacquesallen · 2025-01-31T23:00:42.665Z · comments (0)

[question] are there 2 types of alignment?
KvmanThinking (avery-liu) · 2025-01-23T00:08:20.885Z · answers+comments (9)

Detailed Ideal World Benchmark
Knight Lee (Max Lee) · 2025-01-30T02:31:39.852Z · comments (0)

Making the case for average-case AI Control
Nathaniel Mitrani (nathaniel-mitrani) · 2025-02-05T18:56:38.181Z · comments (0)

Death vs. Suffering: The Endurist-Serenist Divide on Life’s Worst Fate
Alex_Steiner · 2025-01-27T03:59:40.279Z · comments (7)

[question] hypnosis question
KvmanThinking (avery-liu) · 2025-02-06T02:41:53.314Z · answers+comments (5)

Disproving the "People-Pleasing" Hypothesis for AI Self-Reports of Experience
rife (edgar-muniz) · 2025-01-26T15:53:10.530Z · comments (18)

[link] AISN #47: Reasoning Models
Corin Katzke (corin-katzke) · 2025-02-06T18:52:29.843Z · comments (0)

[Translation] In the Age of AI don't Look for Unicorns
mushroomsoup · 2025-02-07T21:06:24.198Z · comments (0)

AI Safety Oversights
Davey Morse (davey-morse) · 2025-02-08T06:15:52.896Z · comments (0)

Scanless Whole Brain Emulation
Knight Lee (Max Lee) · 2025-01-27T10:00:08.036Z · comments (4)

How identical twin sisters feel about nieces vs their own daughters
Dave Lindbergh (dave-lindbergh) · 2025-02-09T17:36:25.830Z · comments (19)

Use computers as powerful as in 1985 or AI controls humans or ?
jrincayc (nerd_gatherer) · 2025-02-03T00:51:05.706Z · comments (0)

"DL training == human learning" is a bad analogy
kman · 2025-02-02T20:59:21.259Z · comments (0)

Rethinking AI Safety Approach in the Era of Open-Source AI
Weibing Wang (weibing-wang) · 2025-02-11T14:01:39.167Z · comments (0)

Where Would Good Forecasts Most Help AI Governance Efforts?
Violet Hour · 2025-02-11T18:15:33.082Z · comments (0)

Tracing Typos in LLMs: My Attempt at Understanding How Models Correct Misspellings
Ivan Dostal (#R@q0YSDZ3ov$f6J) · 2025-02-02T19:56:34.771Z · comments (1)

Sparse Autoencoder Feature Ablation for Unlearning
aludert · 2025-02-13T19:13:48.388Z · comments (0)

[link] Interviews with Moonshot AI's CEO, Yang Zhilin
Cosmia_Nebula · 2025-01-31T09:19:36.561Z · comments (0)

Can someone, anyone, make superintelligence a more concrete concept?
Ori Nagel (ori-nagel) · 2025-01-30T23:25:36.135Z · comments (6)

Undesirable Conclusions and Origin Adjustment
Jerdle (daniel-amdurer) · 2025-02-19T18:35:23.732Z · comments (0)

[Translation] AI Generated Fake News is Taking Over my Family Group Chat
mushroomsoup · 2025-01-30T20:24:22.175Z · comments (0)

[link] Constitutions for ASI?
ukc10014 · 2025-01-28T16:32:39.307Z · comments (0)

Topological Data Analysis and Mechanistic Interpretability
Gunnar Carlsson (gunnar-carlsson) · 2025-02-24T19:56:02.498Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

jonny-spicer on Whose track record of AI predictions would you like to see evaluated?

That's good to know - transcripts from Dwarkesh's podcast are one of the things I'd be most excited about evaluating too and agreed the one with Leopold seems like a great one to start with.

yair-halberstadt on Export Surplusses

I agree they do, I just doubt that's more than a tiny percentage of the reason Chinese individuals are far better off than they were 50 years ago, and I think they fact they manufacture things cheaply is large percentage of why they're better off.

mo-putera on tlevin's Shortform

As someone who used to be fully sequence thinking-oriented and gradually came round to the cluster thinking view, I think it's useful to quote from that post of Holden's on when it's best to use which type of thinking:

I see sequence thinking as being highly useful for idea generation, brainstorming, reflection, and discussion, due to the way in which it makes assumptions explicit, allows extreme factors to carry extreme weight and generate surprising conclusions, and resists “regression to normality.”
However, I see cluster thinking as superior in its tendency to reach good conclusions about which action (from a given set of options) should be taken. ...
Note that this distinction is not the same as the distinction between explicit expected value and holistic-intuition-based decision-making. Both of the thought processes above involve expected-value calculations; the two thought processes consider all the same factors; but they take different approaches to weighing them against each other. Specifically:
Sequence thinking considers each parameter independently and doesn’t do any form of “sandboxing.” So it is much easier for one very large number to dominate the entire calculation even after one makes adjustments for e.g. expert opinion and other “outside views”...
The two have very different approaches to what some call Knightian uncertainty (also sometimes called “model uncertainty” or “unknown unknowns”): the possibility that one’s model of the world is making fundamental mistakes and missing key parameters entirely...

Also this:

Cluster thinking is more similar to empirically effective prediction methods
Sequence thinking presumes a particular framework for thinking about the consequences of one’s actions. It may incorporate many considerations, but all are translated into a single language, a single mental model, and in some sense a single “formula.” I believe this is at odds with how successful prediction systems operate, whether in finance, software, or domains such as political forecasting; such systems generally combine the predictions of multiple models in ways that purposefully avoid letting any one model (especially a low-certainty one) carry too much weight when it contradicts the others.

mateusz-baginski on Anthropic releases Claude 3.7 Sonnet with extended thinking mode

the first hybrid reasoning model on the market

Can somebody TLDR what they mean by "hybrid"? A naive interpretation would be "it can CoT-reason but it can also not-CoT-reason" but my understanding was that o1 and r1 do that already.

davey-morse on Perry Cai's Shortform

self-interest is often aligned with expanding your self boundaries to include others

davey-morse on Davey Morse's Shortform

Parenting strategies for blurring your kid's (or AI's) self-other boundaries:

Love. Love the kid. Give it a part of you. In return it will do the same.
Patience. Appreciate how the kid chooses to spend undirected time. Encourage the kid learn to navigate the world themselves at their own speed.
Stories. Give kid tools for empathy by teaching them to read, buying them a camera, or reciprocating their meanness/kindness.
Groups. Help kid enter collaborative playful spaces where they make and participate in games larger than themselves, eg sports teams, improv groups, pillow forts at sleepovers, etc.
Creation. Give them the materials/support to express themselves in media which last. Paintings, writing, sayings, clubs, tree-houses, songs, games, apps, characters, companies.

Epistemic status: riffing, speculation. Rock of salt: I don't yet have kids.

mattmacdermott on Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?

Pre-training, finetuning and RL are all types of training. But sure, expand 'train' to 'create' in order to include anything else like scaffolding. The point is it's not what you do in response to the outputs of the system, it's what the system tries to do.

lsusr on [NSFW] The BDSM Path to No-Mind

I like the original title [NSFW] The Subspace Jhana better too. I wrote the entire post with that title in mind. This version loses the original's elegance.

bernhard on Export Surplusses

Well please do derive it then, because to me it seems you just focused on one aspect and then concluded that that aspect definitely is the correct answer.

If the goal was to reward the best and the brightest, then why does china make some of them disappear from time to time? Why reeducate the odd billionaire who misbehaves? The idea was to get him in power because he knows better and generates riches, no?

On giving away stuff for 'free': what would be good examples in your opinion? Steel? Silicon or finished solar cells? Electric cars and batteries? Masks useful during pandemics?

Seriously?

I agree with you on the network effects and winner takes all mechanics. But to me that is not related at all to exports and their subsidies. Just making good stuff at a reasonable price is enough. If desired, production can be subsidized, sure, but that has positive effects in your country as well. Chinese people own lots of real estate, top of the line electronics and electric cars, more so than we do.

norimori1992 on [NSFW] The BDSM Path to No-Mind

For what it's worth, I still prefer the original title, even after seeing the rationale for changing it. Oh well.