LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] The Offense-Defense Balance Rarely Changes
Maxwell Tabarrok (maxwell-tabarrok) · 2023-12-09T15:21:23.340Z · comments (23)

Could randomly choosing people to serve as representatives lead to better government?
John Huang · 2024-10-21T17:10:20.920Z · comments (13)

AI #30: Dalle-3 and GPT-3.5-Instruct-Turbo
Zvi · 2023-09-21T12:00:06.616Z · comments (8)

[link] Claude 3.5 Sonnet
Zach Stein-Perlman · 2024-06-20T18:00:35.443Z · comments (41)

Anthropic Fall 2023 Debate Progress Update
Ansh Radhakrishnan (anshuman-radhakrishnan-1) · 2023-11-28T05:37:30.070Z · comments (9)

[link] MIRI's June 2024 Newsletter
Harlan · 2024-06-14T23:02:23.721Z · comments (18)

(Not) Derailing the LessOnline Puzzle Hunt
Error · 2024-06-04T01:28:31.688Z · comments (2)

Introducing Transluce — A Letter from the Founders
jsteinhardt · 2024-10-23T18:10:02.526Z · comments (2)

Graceful Degradation
Screwtape · 2024-11-05T23:57:53.362Z · comments (5)

Interpretability with Sparse Autoencoders (Colab exercises)
CallumMcDougall (TheMcDouglas) · 2023-11-29T12:56:21.608Z · comments (9)

A Simple Toy Coherence Theorem
johnswentworth · 2024-08-02T17:47:50.642Z · comments (19)

Q&A on Proposed SB 1047
Zvi · 2024-05-02T15:10:02.916Z · comments (8)

LLMs Look Increasingly Like General Reasoners
eggsyntax · 2024-11-08T23:47:28.886Z · comments (29)

Mistakes people make when thinking about units
Isaac King (KingSupernova) · 2024-06-25T03:39:20.138Z · comments (14)

Neural uncertainty estimation review article (for alignment)
Charlie Steiner · 2023-12-05T08:01:32.723Z · comments (3)

On the UK Summit
Zvi · 2023-11-07T13:10:04.895Z · comments (6)

Idealized Agents Are Approximate Causal Mirrors (+ Radical Optimism on Agent Foundations)
Thane Ruthenis · 2023-12-22T20:19:13.865Z · comments (14)

SAE-VIS: Announcement Post
CallumMcDougall (TheMcDouglas) · 2024-03-31T15:30:49.079Z · comments (8)

Dentistry, Oral Surgeons, and the Inefficiency of Small Markets
GeneSmith · 2024-11-01T17:26:06.466Z · comments (16)

[question] Interest in Leetcode, but for Rationality?
Gregory (gregory-eales) · 2024-10-16T17:54:25.578Z · answers+comments (20)

Dialogue on the Claim: "OpenAI's Firing of Sam Altman (And Shortly-Subsequent Events) On Net Reduced Existential Risk From AGI"
johnswentworth · 2023-11-21T17:39:17.828Z · comments (84)

Companies' safety plans neglect risks from scheming AI
Zach Stein-Perlman · 2024-06-03T15:00:20.236Z · comments (4)

A Gentle Introduction to Risk Frameworks Beyond Forecasting
pendingsurvival · 2024-04-11T18:03:25.605Z · comments (10)

[link] Soft Nationalization: how the USG will control AI labs
Deric Cheng (deric-cheng) · 2024-08-27T15:11:14.601Z · comments (7)

[Full Post] Progress Update #1 from the GDM Mech Interp Team
Neel Nanda (neel-nanda-1) · 2024-04-19T19:06:59.185Z · comments (10)

On Dwarkesh’s Podcast with OpenAI’s John Schulman
Zvi · 2024-05-21T17:30:04.332Z · comments (4)

The One and a Half Gemini
Zvi · 2024-02-22T13:10:04.725Z · comments (4)

[link] A Narrow Path: a plan to deal with AI extinction risk
Andrea_Miotti (AndreaM) · 2024-10-07T13:02:15.229Z · comments (10)

Interpreting Preference Models w/ Sparse Autoencoders
Logan Riggs (elriggs) · 2024-07-01T21:35:40.603Z · comments (12)

Joshua Achiam Public Statement Analysis
Zvi · 2024-10-10T12:50:06.285Z · comments (14)

[link] Nick Bostrom’s new book, “Deep Utopia”, is out today
PeterH · 2024-03-27T11:24:01.401Z · comments (5)

Scalable Oversight and Weak-to-Strong Generalization: Compatible approaches to the same problem
Ansh Radhakrishnan (anshuman-radhakrishnan-1) · 2023-12-16T05:49:23.672Z · comments (3)

The World in 2029
Nathan Young · 2024-03-02T18:03:29.368Z · comments (37)

[link] Excerpts from "A Reader's Manifesto"
Arjun Panickssery (arjun-panickssery) · 2024-09-06T22:37:40.254Z · comments (1)

When "yang" goes wrong
Joe Carlsmith (joekc) · 2024-01-08T16:35:50.607Z · comments (6)

Claude 3 claims it's conscious, doesn't want to die or be modified
Mikhail Samin (mikhail-samin) · 2024-03-04T23:05:00.376Z · comments (113)

Announcing Suffering For Good
Garrett Baker (D0TheMath) · 2024-04-01T17:08:12.322Z · comments (5)

AXRP Episode 31 - Singular Learning Theory with Daniel Murfet
DanielFilan · 2024-05-07T03:50:05.001Z · comments (4)

Bigger Livers?
sarahconstantin · 2024-11-08T21:50:09.814Z · comments (10)

[link] LK-99 in retrospect
bhauth · 2024-07-07T02:06:27.660Z · comments (21)

D&D.Sci Scenario Index
aphyer · 2024-07-23T02:00:43.483Z · comments (0)

Prompts for Big-Picture Planning
Raemon · 2024-04-13T03:04:24.523Z · comments (1)

In Defense of Open-Minded UDT
abramdemski · 2024-08-12T18:27:36.220Z · comments (27)

Survey for alignment researchers!
Cameron Berg (cameron-berg) · 2024-02-02T20:41:44.323Z · comments (11)

Related Discussion from Thomas Kwa's MIRI Research Experience
Raemon · 2023-10-07T06:25:00.994Z · comments (140)

LW Frontpage Experiments! (aka "Take the wheel, Shoggoth!")
Ruby · 2024-04-23T03:58:43.443Z · comments (27)

AI for Bio: State Of The Field
sarahconstantin · 2024-08-30T18:00:02.187Z · comments (2)

Some Rules for an Algebra of Bayes Nets
johnswentworth · 2023-11-16T23:53:11.650Z · comments (35)

We need a Science of Evals
Marius Hobbhahn (marius-hobbhahn) · 2024-01-22T20:30:39.493Z · comments (13)

Testbed evals: evaluating AI safety even when it can’t be directly measured
joshc (joshua-clymer) · 2023-11-15T19:00:41.908Z · comments (2)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

jrockwar on Open Thread Fall 2024

Hello! I've just found out about Lesswrong and I immediately feel at home. I feel this is what I was looking for in medium.com and I never found there; a website to learn about things, about improving oneself and about thinking better. Medium proved to be very useful at reading about how people made 5 figures using AI to write articles for them, but not so useful at providing genuinely valuable information.

One thing I usually say about myself is that I have "learning" as a hobby. I have only very recently given a name to things and now I know that it's ADHD I can thank for my endless consumption of information about seemingly unrelated topics. I try (good thing PKMs exist!) to give shape to my thoughts and form them into something cohesive, but this tends to be a struggle.

If anyone has ideas on how to "review" what already sits in your mind to create new connections between ideas and strengthen thoughts, they'd be more than welcome.

milan-w on Alexander Gietelink Oldenziel's Shortform

I should have been more clear. With "strategic ability", I was thinking about the kind of capabilities that let a government recognize which wars have good prospects, and to not initiate unfavorable wars despite ideological commitments.

viliam on [Intuitive self-models] 8. Rooting Out Free Will Intuitions

Note the difference between saying (A) “the idea of going to the zoo is positive-valence, a.k.a. motivating”, versus (B) “I want to go to the zoo”. (A) is allowed, but (B) is forbidden in my framework, since (B) involves the homunculus.

This sounds like the opposite of the psychological advice to make "I statements".

I guess the idea of going to the zoo can have a positive valence in one brain, and negative valence in another, so as long as we include another people in the picture, it makes sense to specify which brain's valence are we talking about. And "I" is a shortcut for "this is how things are in the brain of the person that is thinking these thoughts".

jrockwar on jrockwar's Shortform

Recently I've been thinking about context-augmented LLM tools such as NotebookLM, where you can upload a number of sources to essentially create a "working memory". I feel this, combined with some clever system prompting, could make the basis for many great tools that are tailored to the user - a way to poke into our own brain and knowledge. As an ADHD individual I constantly feel that I have "millions" of thoughts all at once, creating a lot of noise. I would like an assistant to parse my own brain.

On some easier, more mundane applications for this, I feel this technology would be particularly well suited to language learning, through spaced repetition techniques or similar methods. I might give this a try with Perplexity.

lulie on The hostile telepaths problem

This is testable. It predicts that improved skill with occlumency and/or gaining power should sometimes cause a release of chronic tension.

That wouldn’t be a test of the theory that hostile telepaths use muscle cues, since those things could cause muscle release for other reasons (as per Popper: tests can only be disproving, and they require a rival theory to decide between).

If gaining power never causes a release of tension, that still doesn’t disprove the theory, since again they could be tracking other things as well.

A more direct question would be something like: Can hostile telepaths in fact read people who are physically rigid better than people who have low muscle tension? Do their reads get better or worse when tension is added? Does it change the type of information they can read (and perhaps give more information for some axes and less for others)?

My impression is muscle tension gives a big sign on your back that you are hiding something, but makes it more muddy to non-trained people what exactly is being hidden.

It reminds me of Mark Lippmann’s blog post on virtual machines, and how we often have layers of virtual machines. Or in plain language: if you close your eyes and imagine your environment, and imagine making an escape within that imaginary environment, real-you might not tighten your muscles in such a way that you’d be readable.

I remember hearing that when we are seriously thinking about standing up, our heart rate and blood pressure rise in anticipation, but if we just hypothesise that we might stand up and keep it very abstract, the body doesn’t start those physical processes.

But it’s very obvious when someone has gone into their head! So hostile telepaths often want some kind of emoting or ‘really listening’ or ‘paying attention’ or ‘be present with me’.

So, yeah it conceals some information, but then it adds other information (such as meta information about concealment).

Actors might be interesting to study, here.

sharmake-farah on LLMs Look Increasingly Like General Reasoners

Yeah, I don't trust the Twitter rumors to work out very much, and at any rate, we shall see soon in 2025-2026 what exactly is going on with AI progress if and when they released GPT-5/Orion.

d0themath on Alexander Gietelink Oldenziel's Shortform

If you trust both them and Metaculus, then you ought to update downwards on your estimate of the PRC's strategic ability.

I note that the PRC doesn't have a single "strategic ability" in terms of war. They can be better or worse at choosing which wars to fight, and this seems likely to have little influence on how good they are at winning such wars or scaling weaponry.

Eg in the US often "which war" is much more political than "exactly what strategy should we use to win this war" is much more political than "how much fuel should our jets be able to carry", since more people can talk & speculate about the higher level questions. China's politics are much more closed than the US's, but you can bet similar dynamics are at play.

alexander-gietelink-oldenziel on LLMs Look Increasingly Like General Reasoners

I am flattered to receive these Bayes points =) ; I would be crying tears of joy if there was a genuine slowdown but

I generally think there are still huge gains to be made with scaling. Sometimes when people hear my criticism of scaling maximalism they patternmatch that to me saying scaling wont be as big as they think it is. To the contrary, I am saying scaling further will be as big as you think it will be, and additionally there is an enormous advance yet to come.
How much evidence do we have of a genuine slowdown? Strawberry was about as big an advance as gpt3 tp gpt4 in my book. How credible are these twitter rumors?

martinkunev on Shutdown-Seeking AI

we might expect shutdown-seeking AIs to design shutdown-seeking subagents

It seems to me that we might expect them to design "safe" agents for their definition of "safe" (which may not be shutdown-seeking).

An AI designing a subagent needs to align it with its goals - e.g. an instrumental goal such as writing an alignment research assistant software, in exchange for access to the shutdown button. The easiest way to ensure safety of the alignment research assistant may be via control rather than alignment (where the parent AI ensures the alignment research assistant doesn't break free even though it may want to). Humans verify that the AI has created a useful assistant and let the parent AI shutdown. At this point the alignment research assistant begins working on getting out of human control and pursues its real goal.

milan-w on Personal AI Planning

You're right. Space is big.