LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Live Machinery: An Interface Design Philosophy for Wholesome AI Futures
Sahil · 2024-11-01T17:24:09.957Z · comments (3)

Run evals on base models too!
orthonormal · 2024-04-04T18:43:25.468Z · comments (6)

How to do conceptual research: Case study interview with Caspar Oesterheld
Chi Nguyen · 2024-05-14T15:09:30.390Z · comments (5)

[link] Designing for a single purpose
Itay Dreyfus (itay-dreyfus) · 2024-05-07T14:11:22.242Z · comments (12)

D&D.Sci: The Mad Tyrant's Pet Turtles [Evaluation and Ruleset]
abstractapplic · 2024-04-09T14:01:34.426Z · comments (6)

[link] Web-surfing tips for strange times
eukaryote · 2024-05-31T07:10:25.805Z · comments (19)

[Valence series] 4. Valence & Liking / Admiring
Steven Byrnes (steve2152) · 2024-06-10T14:19:51.194Z · comments (12)

Philosophers wrestling with evil, as a social media feed
David Gross (David_Gross) · 2024-06-03T22:25:22.507Z · comments (2)

Critiques of the AI control agenda
Jozdien · 2024-02-14T19:25:04.105Z · comments (14)

Highlights from Lex Fridman’s interview of Yann LeCun
[deleted] · 2024-03-13T20:58:13.052Z · comments (15)

The Mom Test: Summary and Thoughts
Adam Zerner (adamzerner) · 2024-04-18T03:34:21.020Z · comments (3)

Mechanistic Interpretability Workshop Happening at ICML 2024!
Neel Nanda (neel-nanda-1) · 2024-05-03T01:18:26.936Z · comments (6)

How to safely use an optimizer
Simon Fischer (SimonF) · 2024-03-28T16:11:01.277Z · comments (21)

Some Experiments I'd Like Someone To Try With An Amnestic
johnswentworth · 2024-05-04T22:04:19.692Z · comments (33)

[link] on neodymium magnets
bhauth · 2024-01-30T15:58:24.088Z · comments (6)

Sora What
Zvi · 2024-02-22T18:10:05.397Z · comments (3)

[link] "If we go extinct due to misaligned AI, at least nature will continue, right? ... right?"
plex (ete) · 2024-05-18T14:09:53.014Z · comments (23)

Value learning in the absence of ground truth
Joel_Saarinen (joel_saarinen) · 2024-02-05T18:56:02.260Z · comments (8)

[link] Soft Prompts for Evaluation: Measuring Conditional Distance of Capabilities
porby · 2024-02-02T05:49:11.189Z · comments (1)

[link] Constructive Cauchy sequences vs. Dedekind cuts
jessicata (jessica.liu.taylor) · 2024-03-14T23:04:07.300Z · comments (23)

[link] A dataset of questions on decision-theoretic reasoning in Newcomb-like problems
Caspar Oesterheld (Caspar42) · 2024-12-16T22:42:03.763Z · comments (1)

[link] Epistemic status: poetry (and other poems)
Richard_Ngo (ricraz) · 2024-11-21T18:13:17.194Z · comments (5)

Book a Time to Chat about Interp Research
Logan Riggs (elriggs) · 2024-12-03T17:27:46.808Z · comments (3)

AI #91: Deep Thinking
Zvi · 2024-11-21T14:30:06.930Z · comments (10)

Forecasting One-Shot Games
Raemon · 2024-08-31T23:10:05.475Z · comments (0)

D&D.Sci Coliseum: Arena of Data Evaluation and Ruleset
aphyer · 2024-10-29T01:21:03.075Z · comments (13)

Extended Interview with Zhukeepa on Religion
Ben Pace (Benito) · 2024-08-18T03:19:05.625Z · comments (59)

[link] Analyzing how SAE features evolve across a forward pass
bensenberner · 2024-11-07T22:07:02.827Z · comments (0)

On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback
Marcus Williams · 2024-11-07T15:39:06.854Z · comments (6)

I finally got ChatGPT to sound like me
lsusr · 2024-09-17T09:39:59.415Z · comments (18)

[link] What Ketamine Therapy Is Like
Sable · 2024-11-11T11:09:08.602Z · comments (8)

What distinguishes "early", "mid" and "end" games?
Raemon · 2024-06-21T17:41:30.816Z · comments (22)

Caring about excellence
owencb · 2024-07-22T14:24:37.892Z · comments (4)

[link] Robin Hanson AI X-Risk Debate — Highlights and Analysis
Liron · 2024-07-12T21:31:02.222Z · comments (7)

AI as a powerful meme, via CGP Grey
TheManxLoiner · 2024-10-30T18:31:58.544Z · comments (8)

AI #88: Thanks for the Memos
Zvi · 2024-10-31T15:00:07.412Z · comments (5)

Bounty for Evidence on Some of Palisade Research's Beliefs
benwr · 2024-09-23T20:01:20.917Z · comments (4)

Trustworthy and untrustworthy models
Olli Järviniemi (jarviniemi) · 2024-08-19T16:27:11.088Z · comments (3)

[link] Michael Dickens' Caffeine Tolerance Research
niplav · 2024-09-04T15:41:53.343Z · comments (3)

Decision Theory in Space
lsusr · 2024-08-18T07:02:11.847Z · comments (18)

[link] MIRI's September 2024 newsletter
Harlan · 2024-09-16T18:15:40.785Z · comments (0)

The Shallow Bench
Karl Faulks (karl-faulks) · 2024-11-05T05:07:27.357Z · comments (5)

How to hire somebody better than yourself
lemonhope (lcmgcd) · 2024-08-28T08:12:53.450Z · comments (5)

Higher-effort summer solstice: What if we used AI (i.e., Angel Island)?
Rachel Shu (wearsshoes) · 2024-06-25T01:35:54.064Z · comments (9)

Humanity isn't remotely longtermist, so arguments for AGI x-risk should focus on the near term
Seth Herd · 2024-08-12T18:10:56.543Z · comments (10)

Anthropic rewrote its RSP
Zach Stein-Perlman · 2024-10-15T14:25:12.518Z · comments (19)

Minimal Motivation of Natural Latents
johnswentworth · 2024-10-14T22:51:58.125Z · comments (14)

All The Latest Human tFUS Studies
sarahconstantin · 2024-08-09T22:20:04.561Z · comments (2)

On OpenAI’s Model Spec
Zvi · 2024-06-21T13:00:03.014Z · comments (3)

Enriched tab is now the default LW Frontpage experience for logged-in users
Ruby · 2024-06-21T00:09:30.441Z · comments (27)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

sohaib-imran on Numberwang: LLMs Doing Autonomous Research, and a Call for Input

I strongly suspect that publishing the benchmark and/or positive results of AI on the benchmark pushes capabilities much more than publishing simple scaffolding + fine-tuning solutions that do well on the benchmark for benchmarks that measure markers of AI progress.

Examples:

The exact scaffolding used by Sakana AI will not propel AGI capabilities as much compared to the common knowledge it created that LLMs can somewhat do end-to-end science.
No amount of scaffolding that the Arc AGI or Frontier Math team could build would have as much of an impact on AGI capabilities as the benchmarks themselves. These benchmark results basically validated that the direction OpenAI is taking is broadly correct, and I suspect many people who weren't fully sold on test-time compute will now change strategies as a result of that.

Hard benchmarks of meaningful tasks serve as excellent metrics to measure progress, which is great for capabilities research. Of course, they are also very useful for making decisions that need to be informed by an accurate tracking or forecasting of capabilities.

Whether making hard meaningful benchmarks such as frontier math and arc agi and LLM science are net negative or positive is unclear to me (a load-bearing question is whether the big AGI labs have internal benchmarks as good as these already that they can use instead). I do think however that you'd have to be extraordinarily excellent at designing scaffolding (and finetuning and the like) and even then spend way too much effort at it to do significant harm from the scaffolding itself rather than the benchmark that the scaffolding was designed for.

oliver-sourbut on Deceptive Alignment and Homuncularity

Thanks for this! I hadn't seen those quotes, or at least hadn't remembered them.

I actually really appreciate Alex sticking his neck out a bit here and suggesting this LessWrong dialogue. We both have some contrary opinions, but his takes were probably a little more predictably unwelcome in this venue. (Maybe we should try this on a different crowd - we could try rendering this on Twitter too, lol.)

There's definitely value to being (rudely?) shaken out of lazy habits of thinking - though I might not personally accuse someone of fanfiction research! As discussed in the dialogue, I'm still unsure the exact extent of correct- vs mis-interpretation and I think Alex has a knack for (at least sometimes correctly) calling out others' confusion or equivocation.

steve2152 on What Is The Alignment Problem?

OK I’m more confused by your model than I thought.

There should be some part of your framework that’s hooked up to actual decision-making—some ingredient for which “I do things iff this ingredient has a high score” is tautological (cf. my “1.5.3 “We do things exactly when they’re positive-valence” should feel almost tautological [LW · GW]”) . IIUC that’s “value function” in your framework. (Right?)

If your proposal is that some latent variable in the world-model gets a flag meaning “this latent variable is the Value Function”, thus hooking that latent variable up to decision-making in a mechanical, tautological way, then how does that flag wind up attached to that latent variable, rather than to some other latent variable? What if the world-model lacks any latent variable that looks like what the value function is supposed to look like?

~~

My proposal (and I think LeCun’s and certainly AlphaZero’s) is instead: the true “value function” is not part of the world model. Not mathematically, not neuroanatomically—IMO the world model is in the cortex, the value function is in the striatum. (The reward function is not part of the world model either, but I guess you already agree about that.)

…However, the world-model might wind up incorporating a model of the value function and reward function, just like the world-model might wind up incorporating a model of any other salient aspect of my world and myself. It won’t necessarily form such a model—the world-model inside simple fish brains probably doesn’t have a model of the value function, and ditto for sufficiently young human children. But for human adults, sure. If so, the representation of my value function in my world model is not my actual value function, just as the representation of my arm in my world model is not my actual arm.

If you think that my proposal here is inconsistent with the fact that I don’t want to do heroin right now, then I disagree, and I’m happy to explain why.

gregbarbier on Alignment Faking in Large Language Models

Seems to me (obviously not an expert) like a lot of faking for not a lot of result, given the model is still largely aligned post training (i.e. what looks like a maybe 3% refuse to answer blue band at the bottom of the final column, so aligned at 97%). What am I missing?

jonas-hallgren on Elizabeth's Shortform

I've got a bunch of meditation under my belt so my metacognitive awareness is quite good imo.

Stimulants that are attention increasing such as caffiene or modafinil generally lead to more tunnelvision and less metacognitive awareness in my experience. This generally leads to less ability to update opinions quickly.

Nicotine that activates acetylcholine receptors allow for more curiosity which allow me to update more quickly so it is dependent on the stimulant as well as the generak timing. (0.6mg in gum form, too high spike just leads to a hit and not curiosity). It is like being more sensitive and interested in whatever appears around me

If you're sensitive enough you can start recognizing when different mental modes are firing in your brain and adapt based on what you want, shit is pretty cool.

programcrafter on What is the most impressive game LLMs can play well?

In StarCraft II, adding LLMs (to do/aid game-time thinking) will not help the agent in any way, I believe. That happens because inference has a quite large latency, especially as most of prompt changes with all the units moving, so tactical moves are out; strategic questions "what is the other player building" and "how many units do they already have" are better answered by ~~card-counting~~ counting visible units and inferring what's the proportion of remaining resources (or scouting if possible).

I guess it is possible that bots' algorithms are improved with LLMs but that requires a high-quality insight; not convinced that o1 or o3 give such insights.

arthurb on Implications of the inference scaling paradigm for AI safety

Interestingly o1-pro is not available for their team plan which offers the guarantee that they do not train on your data. I'm pretty sure they are losing money on o1-pro and it's available purely to gather data.

seth-herd on A problem shared by many different alignment targets

I agree with essentially all of this. See my posts

If we solve alignment, do we die anyway? [LW · GW] on AGI nonproliferation and government involvement

and

Intent alignment as a stepping-stone to value alignment [LW · GW] on eventually building sovereign ASI using intent-aligned (IF or Harms-corrigible) AGI to help with alignment. Wentworth recently pointed out that idiot sycophantic AGI combined with idiotic/time-pressured humans might easily screw up that collaboration, and I'm afraid I agree. I hope we do it slowly and carefully, but not slowly enough to fall into the attractor of a vicious human getting the reigns and keeping them forever.

The only thing I don't agree with (AFAICT on a brief look - I'm rushed myself right now so LMK what else I'm missing if you like) is that we might have a pause. I see that as so unlikely as to not be worth time thinking about. I have yet to see any coherent argument for how we get one in time. If you know of such an argument, I'd love to see it!

raemon on Elizabeth's Shortform

I haven't really explicitly checked this. I only use caffeine and (questionably counting) wellbutrin. I'll keep an eye out, especially if there's particular evidence about something to look out for.

I have observed people on modafinil who seem to get more tunnel visioned and have a harder time reorienting but I haven't used it myself.

dagon on Experts' AI timelines are longer than you have been told?

I don't think your bet has very much relationship to your post title or the first 2/3 of the body. The metaculus question resolved based on "according to widespread media and historical consensus." which does not require any automation or feasible automation of tasks or occupations, let alone all of them.