LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

EIS XIV: Is mechanistic interpretability about to be practically useful?
scasper · 2024-10-11T22:13:51.033Z · comments (4)

Graceful Degradation
Screwtape · 2024-11-05T23:57:53.362Z · comments (4)

Occupational Licensing Roundup #1
Zvi · 2024-10-30T11:00:04.516Z · comments (11)

Schelling game evaluations for AI control
Olli Järviniemi (jarviniemi) · 2024-10-08T12:01:24.389Z · comments (5)

Bigger Livers?
sarahconstantin · 2024-11-08T21:50:09.814Z · comments (9)

[link] On Shifgrethor
JustisMills · 2024-10-27T15:30:13.688Z · comments (17)

[Intuitive self-models] 3. The Homunculus
Steven Byrnes (steve2152) · 2024-10-02T15:20:18.394Z · comments (36)

Book Review: On the Edge: The Fundamentals
Zvi · 2024-09-23T13:40:11.058Z · comments (3)

Automation collapse
Geoffrey Irving · 2024-10-21T14:50:54.500Z · comments (7)

AI research assistants competition 2024Q3: Tie between Elicit and You.com
Elizabeth (pktechgirl) · 2024-10-12T15:10:05.417Z · comments (2)

[link] AI, centralization, and the One Ring
owencb · 2024-09-13T14:00:16.126Z · comments (11)

Another argument against utility-centric alignment paradigms
Fiora from Rosebloom · 2024-09-22T07:28:27.856Z · comments (39)

[Intuitive self-models] 4. Trance
Steven Byrnes (steve2152) · 2024-10-08T13:30:41.446Z · comments (6)

[link] Investigating an insurance-for-AI startup
L Rudolf L (LRudL) · 2024-09-21T15:29:10.083Z · comments (0)

[question] Is cybercrime really costing trillions per year?
Fabien Roger (Fabien) · 2024-09-27T08:44:07.621Z · answers+comments (28)

[link] Pay-on-results personal growth: first success
Chipmonk · 2024-09-14T03:39:12.975Z · comments (5)

SAEs are highly dataset dependent: a case study on the refusal direction
Connor Kissane (ckkissane) · 2024-11-07T05:22:18.807Z · comments (3)

Book Review: On the Edge: The Future
Zvi · 2024-09-27T14:00:05.279Z · comments (1)

[link] on bacteria, on teeth
bhauth · 2024-09-30T15:56:56.830Z · comments (9)

[link] Dario Amodei — Machines of Loving Grace
Matrice Jacobine · 2024-10-11T21:43:31.448Z · comments (26)

Base LLMs refuse too
Connor Kissane (ckkissane) · 2024-09-29T16:04:21.343Z · comments (20)

Against empathy-by-default
Steven Byrnes (steve2152) · 2024-10-16T16:38:49.926Z · comments (24)

[link] Linkpost: Memorandum on Advancing the United States’ Leadership in Artificial Intelligence
Nisan · 2024-10-25T04:37:00.828Z · comments (2)

Toward Safety Cases For AI Scheming
Mikita Balesni (mykyta-baliesnyi) · 2024-10-31T17:20:06.019Z · comments (1)

[link] Electrostatic Airships?
DaemonicSigil · 2024-10-27T04:32:34.852Z · comments (13)

[link] An Opinionated Evals Reading List
Marius Hobbhahn (marius-hobbhahn) · 2024-10-15T14:38:58.778Z · comments (0)

Pollsters Should Publish Question Translations
jefftk (jkaufman) · 2024-09-08T22:10:04.932Z · comments (3)

MATS Alumni Impact Analysis
utilistrutil · 2024-09-30T02:35:57.273Z · comments (6)

AI Alignment via Slow Substrates: Early Empirical Results With StarCraft II
Lester Leong (lester-leong) · 2024-10-14T04:05:05.096Z · comments (9)

LLMs Look Increasingly Like General Reasoners
eggsyntax · 2024-11-08T23:47:28.886Z · comments (7)

AI #81: Alpha Proteo
Zvi · 2024-09-12T13:00:07.958Z · comments (3)

[link] electric turbofans
bhauth · 2024-11-02T22:50:59.807Z · comments (2)

Why our politicians aren't Median
Yair Halberstadt (yair-halberstadt) · 2024-11-03T14:03:33.779Z · comments (15)

The Geometry of Feelings and Nonsense in Large Language Models
7vik (satvik-golechha) · 2024-09-27T17:49:27.420Z · comments (10)

How you can help pass important AI legislation with 10 minutes of effort
ThomasW · 2024-09-14T22:10:50.386Z · comments (2)

Mira Murati leaves OpenAI/ OpenAI to remove non-profit control
Sodium · 2024-09-25T21:15:17.315Z · comments (4)

[link] The Alignment Trap: AI Safety as Path to Power
crispweed · 2024-10-29T15:21:26.545Z · comments (17)

AI #86: Just Think of the Potential
Zvi · 2024-10-17T15:10:06.552Z · comments (8)

AI #84: Better Than a Podcast
Zvi · 2024-10-03T15:00:07.128Z · comments (7)

Evidence against Learned Search in a Chess-Playing Neural Network
p.b. · 2024-09-13T11:59:55.634Z · comments (3)

[link] How much I'm paying for AI productivity software (and the future of AI use)
jacquesthibs (jacques-thibodeau) · 2024-10-11T17:11:27.025Z · comments (16)

Safe Predictive Agents with Joint Scoring Rules
Rubi J. Hudson (Rubi) · 2024-10-09T16:38:16.535Z · comments (10)

Secret Collusion: Will We Know When to Unplug AI?
schroederdewitt · 2024-09-16T16:07:01.119Z · comments (7)

A Path out of Insufficient Views
Unreal · 2024-09-24T20:00:27.332Z · comments (46)

[link] Making Eggs Without Ovaries
Niko_McCarty (niko-2) · 2024-09-22T17:44:46.733Z · comments (3)

[Intuitive self-models] 5. Dissociative Identity (Multiple Personality) Disorder
Steven Byrnes (steve2152) · 2024-10-15T13:31:46.157Z · comments (7)

[link] On the Role of Proto-Languages
adamShimi · 2024-09-22T16:50:34.720Z · comments (1)

[link] How Likely Are Various Precursors of Existential Risk?
NunoSempere (Radamantis) · 2024-10-28T13:27:31.620Z · comments (4)

[question] Could orcas be (trained to be) smarter than humans? 
Towards_Keeperhood (Simon Skade) · 2024-11-04T23:29:26.677Z · answers+comments (8)

[link] Slightly More Than You Wanted To Know: Pregnancy Length Effects
JustisMills · 2024-10-21T01:26:02.030Z · comments (4)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

olli-jaerviniemi on Anthropic: Three Sketches of ASL-4 Safety Case Components

Thanks, that clears it up. Indeed I agree that "take real trajectories, then edit the attacks in" is promising, and didn't track this when writing my first comment (oops) - thanks for flagging it.

abandon on GPT-4o Can In Some Cases Solve Moderately Complicated Captchas

I don't know if it's aimed at neural nets specifically (and of course it is in fact visible to the naked eye) but AFAIK the noise is there to disrupt computer-vision systems, yes. (And in the first one it seems to have been effective in keeping 4o from recognizing the light bulb or the buildings, though interestingly Claude was able to see the buildings and not the bulb or the teapot). Agreed re: hoping they die soon XD

alexander-gietelink-oldenziel on Thomas Kwa's Shortform

The important thing is that both do active learning & decisionmaking & search, i.e. RL. *

LLMs don't do that. So the gain from doing that is huge.

Synthetic data is a bit of a weird word that get's thrown around a lot. There are fundamental limits on how much information resampling from the same data source will yield about completely different domains. So that seems a bit silly. Ofc sometimes with synthetic data people just mean doing rollouts, i.e. RL.

*the word RL sometimes gets mistaken for only very specific reinforcement learning algorithm. I mean here a very general class of algorithms that solve MDPs.

pablo_stafforini on adam_scholl's Shortform

Why do you focus on this particular guy? Tens of thousands of traders were cumulatively betting billions of dollars in this market. All of these traders faced the same incentives.

Note that it is not enough to assume that willingness to bet more money makes a trader worth paying more attention to. You need the stronger assumption that willingness to bet n times more than each of n traders makes the single trader worth paying more attention to than all the other traders combined. I haven’t thought much about this, but the assumption seems false to me.

sohaib-imran on LLMs Look Increasingly Like General Reasoners

Nice post!

Regarding o1 like models: I am still unsure how to draw the boundary between tasks that see a significant improvement with o1 style reasoning and tasks that do not. This paper sheds some light on the kinds of tasks that benefit from regular COT. However, even for mathematical tasks, which should benefit the most from CoT, o1-preview does not seem that much better than other models on extraordinarily difficult (and therefore OOD?) problems. I would love to see comparisons of o1 performance against other models in games like chess and Go.

Also, somewhat unrelated to this post, what do you and others think about x-risk in a world where explicit reasoners like o1 scale to AGI. To me this seems like one of the safest forms of AGI, since much of the computation is happening explicitly, and can be checked/audited by other AI systems and humans.

buck on Anthropic: Three Sketches of ASL-4 Safety Case Components

If we can construct fake honeypots that are indistinguishable - to the policy - from real opportunities, and do this at scale, then that could suffice: we could drive the policy's P(deployment | great opportunity) very low. (This relies on how many high-stakes we can create and how many the model will see in deployment.)

Yes totally. (This is impractical if you think that the model has a very large number of opportunities to attack, e.g. if every time it's programming is an opportunity to attack.)

If we can construct fake attacks that are indistinguishable - to the untrusted monitor - from real attacks, then that possibly suffices: we could drive the monitor's prior P(deployment | attack) very low

Indeed; as you know, this is probably easier than driving the policy's P(deployment|great opportunity) sufficiently low, because untrusted monitors have to defect reliably [AF · GW].

If you are saying that we don't need either of these - i.e. that we can get by with honeypots/attacks that are noticeably fake - then that's news to me, and I don't see how you can do that.

What I was trying to say is that if you don't have the ability to generate tasks that look real, you might still be able to honeypot your monitor by taking real trajectories and editing them to contain fake attacks (that you hope the monitor will be unable to distinguish from real attacks). I discuss this here [AF · GW].

jbash on GPT-4o Can In Some Cases Solve Moderately Complicated Captchas

CAPTCHAs have "adversarial perturbations"? Is that in the sense of "things not visible to humans, but specifically adversarial to deep learning networks"? I thought they just had a bunch of random noise and weird ad hoc patterns thrown over them.

Anyway, CAPTCHAs can't die soon enough. Although the fact that they persist in the face of multiple commercial services offering to solve 1000 for a dollar doesn't give me much hope...

alex-rozenshteyn on What TMS is like

This isn't directly related to TMS, but I've been trying to get an answer to this [LW · GW] question for years, and maybe you have one.

When doing TMS, or any depression treatment, or any supplementation experiment, etc. it would make sense to track the effects objectively (in addition to, not as a replacement for subjective monitoring). I haven't found any particularly good option for this, especially if I want to self-administer it most days. Quantified mind comes close, but it's really hard to use their interface to construct a custom battery and an indefinite experiment.

Do you know of anything?

jbash on Force Sequential Output with SCP?

Using scp to stdout looks weird to me no matter what. Why not

ssh -n host cat /path/to/file | weird-aws-stuff

... but do you really want to copy everything twice? Why not run weird-aws-stuff on the remote host itself?

phib on Phib's Shortform

Yeah oops, meant long