LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Concrete Reasons for Hope about AI
Zac Hatfield-Dodds (zac-hatfield-dodds) · 2023-01-14T01:22:18.723Z · comments (13)

My AGI safety research—2024 review, ’25 plans
Steven Byrnes (steve2152) · 2024-12-31T21:05:19.037Z · comments (4)

The Power to Demolish Bad Arguments
Liron · 2019-09-02T12:57:23.341Z · comments (84)

[link] Advice for Activists from the History of Environmentalism
Jeffrey Heninger (jeffrey-heninger) · 2024-05-16T18:40:02.064Z · comments (8)

Parameter Scaling Comes for RL, Maybe
1a3orn · 2023-01-24T13:55:46.324Z · comments (3)

Bad at Arithmetic, Promising at Math
cohenmacaulay · 2022-12-18T05:40:37.088Z · comments (19)

AI #8: People Can Do Reasonable Things
Zvi · 2023-04-20T15:50:00.826Z · comments (16)

[link] How counting neutrons explains nuclear waste
jasoncrawford · 2021-05-30T18:10:01.468Z · comments (11)

Don't accelerate problems you're trying to solve
Andrea_Miotti (AndreaM) · 2023-02-15T18:11:30.595Z · comments (27)

[link] LessOnline (May 31—June 2, Berkeley, CA)
Ben Pace (Benito) · 2024-03-26T02:34:00.000Z · comments (24)

The Mountain Troll
lsusr · 2022-06-11T09:14:01.479Z · comments (26)

A Case for the Least Forgiving Take On Alignment
Thane Ruthenis · 2023-05-02T21:34:49.832Z · comments (84)

Behavioral red-teaming is unlikely to produce clear, strong evidence that models aren't scheming
Buck · 2024-10-10T13:36:53.810Z · comments (4)

On the Executive Order
Zvi · 2023-11-01T14:20:01.657Z · comments (4)

[link] Advice for journalists
Nathan Young · 2024-10-07T16:46:40.929Z · comments (53)

AI Alignment Research Engineer Accelerator (ARENA): call for applicants
CallumMcDougall (TheMcDouglas) · 2023-04-17T20:30:03.965Z · comments (9)

[link] Discovering Language Model Behaviors with Model-Written Evaluations
evhub · 2022-12-20T20:08:12.063Z · comments (34)

[link] Knowledge Neurons in Pretrained Transformers
evhub · 2021-05-17T22:54:50.494Z · comments (7)

The Teacup Test
lsusr · 2022-10-08T04:25:16.461Z · comments (32)

Productive Mistakes, Not Perfect Answers
adamShimi · 2022-04-07T16:41:50.290Z · comments (11)

I'm still mystified by the Born rule
So8res · 2021-03-04T02:35:32.301Z · comments (44)

Work dumber not smarter
lemonhope (lcmgcd) · 2023-06-01T12:40:31.264Z · comments (17)

Covid 11/12: The Winds of Winter
Zvi · 2020-11-12T14:30:01.387Z · comments (64)

Reflections on the cryonics sequence
mingyuan · 2021-02-03T01:17:13.556Z · comments (11)

Giving calibrated time estimates can have social costs
Alex_Altair · 2022-04-03T21:23:46.590Z · comments (16)

[link] Draining the swamp
jasoncrawford · 2020-01-28T21:37:03.542Z · comments (1)

2019 Review: Voting Results!
Raemon · 2021-02-01T03:10:19.284Z · comments (36)

Truthseeking when your disagreements lie in moral philosophy
Elizabeth (pktechgirl) · 2023-10-10T00:00:04.130Z · comments (4)

[link] The Intelligence Curse
lukedrago · 2025-01-03T19:07:43.493Z · comments (26)

A Contamination Theory of the Obesity Epidemic
Bob Baker · 2021-07-25T02:39:14.676Z · comments (50)

What's up with "Responsible Scaling Policies"?
habryka (habryka4) · 2023-10-29T04:17:07.839Z · comments (8)

How To Raise Others’ Aspirations in 17 Easy Steps
chanamessinger (cmessinger) · 2022-01-06T00:14:15.651Z · comments (13)

Units of Exchange
CFAR!Duncan (CFAR 2017) · 2022-06-28T16:53:53.069Z · comments (28)

DeepMind: The Podcast - Excerpts on AGI
WilliamKiely · 2022-04-07T22:09:22.300Z · comments (11)

The Prototypical Negotiation Game
johnswentworth · 2021-02-20T21:33:34.195Z · comments (16)

[link] I found >800 orthogonal "write code" steering vectors
Jacob G-W (g-w1) · 2024-07-15T19:06:17.636Z · comments (19)

Comment on "Propositions Concerning Digital Minds and Society"
Zack_M_Davis · 2022-07-10T05:48:51.013Z · comments (12)

Relevance Norms; Or, Gricean Implicature Queers the Decoupling/Contextualizing Binary
Zack_M_Davis · 2019-11-22T06:18:59.497Z · comments (30)

LLM Modularity: The Separability of Capabilities in Large Language Models
NickyP (Nicky) · 2023-03-26T21:57:03.445Z · comments (3)

Petrov Day 2021: Mutually Assured Destruction?
Ruby · 2021-09-22T01:04:26.314Z · comments (96)

[link] Instant stone (just add water!)
jasoncrawford · 2019-11-13T22:33:39.903Z · comments (27)

Sam Altman's sister, Annie Altman, claims Sam has severely abused her
pythagoras5015 (pl5015) · 2023-10-07T21:06:49.396Z · comments (107)

[link] Nobody’s on the ball on AGI alignment
leopold · 2023-03-29T17:40:36.250Z · comments (38)

Learning-theoretic agenda reading list
Vanessa Kosoy (vanessa-kosoy) · 2023-11-09T17:25:35.046Z · comments (1)

You can, in fact, bamboozle an unaligned AI into sparing your life
David Matolcsi (matolcsid) · 2024-09-29T16:59:43.942Z · comments (171)

Selective, Corrective, Structural: Three Ways of Making Social Systems Work
Said Achmiz (SaidAchmiz) · 2023-03-05T08:45:45.615Z · comments (13)

Kelly *is* (just) about logarithmic utility
abramdemski · 2021-03-01T20:02:08.300Z · comments (26)

[link] The Minority Coalition
Richard_Ngo (ricraz) · 2024-06-24T20:01:27.436Z · comments (7)

"Publish or Perish" (a quick note on why you should try to make your work legible to existing academic communities)
David Scott Krueger (formerly: capybaralet) (capybaralet) · 2023-03-18T19:01:54.199Z · comments (49)

Evaluations project @ ARC is hiring a researcher and a webdev/engineer
Beth Barnes (beth-barnes) · 2022-09-09T22:46:47.569Z · comments (7)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

vascoamaralgrilo on Experts' AI timelines are longer than you have been told?

Thanks, Dagon. Below is how superintelligent AI is defined in the question from Metaculus related to my bet proposal. I think it very much points towards full automation.

"Superintelligent Artificial Intelligence" (SAI) is defined for the purposes of this question as an AI which can perform any task humans can perform in 2021, as well or superior to the best humans in their domain. The SAI may be able to perform these tasks themselves, or be capable of designing sub-agents with these capabilities (for instance the SAI may design robots capable of beating professional football players which are not successful brain surgeons, and design top brain surgeons which are not football players). Tasks include (but are not limited to): performing in top ranks among professional e-sports leagues, performing in top ranks among physical sports, preparing and serving food, providing emotional and psychotherapeutic support, discovering scientific insights which could win 2021 Nobel prizes, creating original art and entertainment, and having professional-level software design and AI design capabilities.
As an AI improves in capacity, it may not be clear at which point the SAI has become able to perform any task as well as top humans. It will be defined that the AI is superintelligent if, in less than 7 days in a non-externally-constrained environment, the AI already has or can learn/invent the capacity to do any given task. A "non-externally-constrained environment" here means, for instance, access to the internet and compute and resources similar to contemporaneous AIs.

jbash on The Mathematical Reason You should have 9 Kids

An organism's biological purpose is not to replicate its genome. Rather, an organism's biological purpose is simply to reproduce.

The phrase "biological purpose", at least in this context, points to a conceptual mess so horrible that there's no chance it will ever mean anything useful at all. Biology doesn't have purposes.

evhub on Alignment Faking in Large Language Models

See our discussions of this in Sections 5.3 and 8.1, some of which I quote here [LW(p) · GW(p)].

sohaib-imran on Numberwang: LLMs Doing Autonomous Research, and a Call for Input

I strongly suspect that publishing the benchmark and/or positive results of AI on the benchmark pushes capabilities much more than publishing simple scaffolding + fine-tuning solutions that do well on the benchmark for benchmarks that measure markers of AI progress.

Examples:

The exact scaffolding used by Sakana AI did not propel AGI capabilities as much compared to the common knowledge it created that LLMs can somewhat do end-to-end science.
No amount of scaffolding that the Arc AGI or Frontier Math team could build would have as much of an impact on AGI capabilities as the benchmarks themselves. These benchmark results basically validated that the direction OpenAI is taking is broadly correct, and I suspect many people who weren't fully sold on test-time compute will now change strategies as a result of that.

Hard benchmarks of meaningful tasks serve as excellent metrics to measure progress, which is great for capabilities research. Of course, they are also very useful for making decisions that need to be informed by an accurate tracking or forecasting of capabilities.

Whether making hard meaningful benchmarks such as frontier math and arc agi and LLM science are net negative or positive is unclear to me (a load-bearing question is whether the big AGI labs have internal benchmarks as good as these already that they can use instead). I do think however that you'd have to be extraordinarily excellent at designing scaffolding (and finetuning and the like) and even then spend way too much effort at it to do significant harm from the scaffolding itself rather than the benchmark that the scaffolding was designed for.

oliver-sourbut on Deceptive Alignment and Homuncularity

Thanks for this! I hadn't seen those quotes, or at least hadn't remembered them.

I actually really appreciate Alex sticking his neck out a bit here and suggesting this LessWrong dialogue. We both have some contrary opinions, but his takes were probably a little more predictably unwelcome in this venue. (Maybe we should try this on a different crowd - we could try rendering this on Twitter too, lol.)

There's definitely value to being (rudely?) shaken out of lazy habits of thinking - though I might not personally accuse someone of fanfiction research! As discussed in the dialogue, I'm still unsure the exact extent of correct- vs mis-interpretation and I think Alex has a knack for (at least sometimes correctly) calling out others' confusion or equivocation.

steve2152 on What Is The Alignment Problem?

OK I’m more confused by your model than I thought.

There should be some part of your framework that’s hooked up to actual decision-making—some ingredient for which “I do things iff this ingredient has a high score” is tautological (cf. my “1.5.3 “We do things exactly when they’re positive-valence” should feel almost tautological [LW · GW]”) . IIUC that’s “value function” in your framework. (Right?)

If your proposal is that some latent variable in the world-model gets a flag meaning “this latent variable is the Value Function”, thus hooking that latent variable up to decision-making in a mechanical, tautological way, then how does that flag wind up attached to that latent variable, rather than to some other latent variable? What if the world-model lacks any latent variable that looks like what the value function is supposed to look like?

~~

My proposal (and I think LeCun’s and certainly AlphaZero’s) is instead: the true “value function” is not part of the world model. Not mathematically, not neuroanatomically—IMO the world model is in the cortex, the value function is in the striatum. (The reward function is not part of the world model either, but I guess you already agree about that.)

…However, the world-model might wind up incorporating a model of the value function and reward function, just like the world-model might wind up incorporating a model of any other salient aspect of my world and myself. It won’t necessarily form such a model—the world-model inside simple fish brains probably doesn’t have a model of the value function, and ditto for sufficiently young human children. But for human adults, sure. If so, the representation of my value function in my world model is not my actual value function, just as the representation of my arm in my world model is not my actual arm.

If you think that my proposal here is inconsistent with the fact that I don’t want to do heroin right now, then I disagree, and I’m happy to explain why.

gregbarbier on Alignment Faking in Large Language Models

Seems to me (obviously not an expert) like a lot of faking for not a lot of result, given the model is still largely aligned post training (i.e. what looks like a maybe 3% refuse to answer blue band at the bottom of the final column, so aligned at 97%). What am I missing?

jonas-hallgren on Elizabeth's Shortform

I've got a bunch of meditation under my belt so my metacognitive awareness is quite good imo.

Stimulants that are attention increasing such as caffiene or modafinil generally lead to more tunnelvision and less metacognitive awareness in my experience. This generally leads to less ability to update opinions quickly.

Nicotine that activates acetylcholine receptors allow for more curiosity which allow me to update more quickly so it is dependent on the stimulant as well as the generak timing. (0.6mg in gum form, too high spike just leads to a hit and not curiosity). It is like being more sensitive and interested in whatever appears around me

If you're sensitive enough you can start recognizing when different mental modes are firing in your brain and adapt based on what you want, shit is pretty cool.

programcrafter on What is the most impressive game LLMs can play well?

In StarCraft II, adding LLMs (to do/aid game-time thinking) will not help the agent in any way, I believe. That happens because inference has a quite large latency, especially as most of prompt changes with all the units moving, so tactical moves are out; strategic questions "what is the other player building" and "how many units do they already have" are better answered by ~~card-counting~~ counting visible units and inferring what's the proportion of remaining resources (or scouting if possible).

I guess it is possible that bots' algorithms are improved with LLMs but that requires a high-quality insight; not convinced that o1 or o3 give such insights.

arthurb on Implications of the inference scaling paradigm for AI safety

Interestingly o1-pro is not available for their team plan which offers the guarantee that they do not train on your data. I'm pretty sure they are losing money on o1-pro and it's available purely to gather data.