LessWrong 2.0 Reader

View: New · Old · Top

next page (older posts) →

Towards Multimodal Interpretability: Learning Sparse Interpretable Features in Vision Transformers
hugofry · 2024-04-29T20:57:35.127Z · comments (8)

Towards a formalization of the agent structure problem
Alex_Altair · 2024-04-29T20:28:15.190Z · comments (6)

Ironing Out the Squiggles
Zack_M_Davis · 2024-04-29T16:13:00.371Z · comments (36)

Super additivity of consciousness
Arturo Macias (arturo-macias) · 2024-04-29T15:41:54.742Z · comments (13)

AISC9 has ended and there will be an AISC10
Linda Linsefors · 2024-04-29T10:53:18.812Z · comments (4)

Open-Source AI: A Regulatory Review
Elliot Mckernon (elliot) · 2024-04-29T10:10:55.779Z · comments (0)

Big-endian is better than little-endian
Menotim · 2024-04-29T02:30:48.053Z · comments (17)

San Francisco ACX Meetup “First Saturday”
Nate Sternberg (nate-sternberg) · 2024-04-29T01:57:29.464Z · comments (0)

The Prop-room and Stage Cognitive Architecture
Robert Kralisch (nonmali-1) · 2024-04-29T00:48:17.473Z · comments (4)

How are Simulators and Agents related?
Robert Kralisch (nonmali-1) · 2024-04-29T00:22:30.751Z · comments (0)

Extended Embodiment
Robert Kralisch (nonmali-1) · 2024-04-29T00:18:12.892Z · comments (1)

Referential Containment
Robert Kralisch (nonmali-1) · 2024-04-29T00:16:00.174Z · comments (4)

Disentangling Competence and Intelligence
Robert Kralisch (nonmali-1) · 2024-04-29T00:12:50.779Z · comments (7)

List your AI X-Risk cruxes!
Aryeh Englander (alenglander) · 2024-04-28T18:26:19.327Z · comments (7)

[link] Things I tell myself to be more agentic
DMMF · 2024-04-28T17:44:39.789Z · comments (0)

Estimating the Number of Players from Game Result Percentages
Daniel L (daniel-lyakovetsky) · 2024-04-28T17:42:03.247Z · comments (2)

[link] The Science Algorithm - AISC 2024 Final Presentation
Johannes C. Mayer (johannes-c-mayer) · 2024-04-28T14:55:50.504Z · comments (0)

[Aspiration-based designs] Outlook: dealing with complexity
Jobst Heitzig · 2024-04-28T13:06:35.841Z · comments (3)

[Aspiration-based designs] 3. Performance and safety criteria, and aspiration intervals
Jobst Heitzig · 2024-04-28T13:04:56.249Z · comments (0)

[Aspiration-based designs] 2. Formal framework, basic algorithm
Jobst Heitzig · 2024-04-28T13:02:17.253Z · comments (2)

[Aspiration-based designs] 1. Informal introduction
B Jacobs (Bob Jacobs) · 2024-04-28T13:00:43.268Z · comments (4)

Playing Northboro with Lily and Rick
jefftk (jkaufman) · 2024-04-28T02:40:03.436Z · comments (1)

[link] Release of UN's draft related to the governance of AI (a summary of the Simon Institute's response)
Sebastian Schmidt · 2024-04-27T18:34:39.836Z · comments (0)

Mercy to the Machine: Thoughts & Rights
False Name (False Name, Esq.) · 2024-04-27T16:36:06.006Z · comments (6)

Constructability: Plainly-coded AGIs may be feasible in the near future
Épiphanie Gédéon (joy_void_joy) · 2024-04-27T16:04:45.894Z · comments (13)

So What's Up With PUFAs Chemically?
J Bostock (Jemist) · 2024-04-27T13:32:52.159Z · comments (23)

[link] Link: Let's Think Dot by Dot: Hidden Computation in Transformer Language Models by Jacob Pfau, William Merrill & Samuel R. Bowman
Chris_Leong · 2024-04-27T13:22:53.287Z · comments (0)

[link] Two Vernor Vinge Book Reviews
Maxwell Tabarrok (maxwell-tabarrok) · 2024-04-27T12:14:53.917Z · comments (0)

Refusal in LLMs is mediated by a single direction
Andy Arditi (andy-arditi) · 2024-04-27T11:13:06.235Z · comments (95)

[question] Plausibility of Getting Early Warning Shots because AIs can't coordinate?
hmys (the-cactus) · 2024-04-27T08:02:10.792Z · answers+comments (0)

AI Safety Sphere
Myles H (zarsou9) · 2024-04-27T01:49:02.369Z · comments (2)

Exploring the Esoteric Pathways to AI Sentience (Part One)
jeffreycaruso · 2024-04-27T01:02:18.429Z · comments (6)

Superposition is not "just" neuron polysemanticity
LawrenceC (LawChan) · 2024-04-26T23:22:06.066Z · comments (4)

D&D.Sci Long War: Defender of Data-mocracy
aphyer · 2024-04-26T22:30:15.780Z · comments (20)

On Not Pulling The Ladder Up Behind You
Screwtape · 2024-04-26T21:58:29.455Z · comments (21)

We are headed into an extreme compute overhang
devrandom · 2024-04-26T21:38:21.694Z · comments (34)

[Concept Dependency] Edge Regular Lattice Graph
Johannes C. Mayer (johannes-c-mayer) · 2024-04-26T21:14:18.960Z · comments (1)

[Concept Dependency] Concept Dependency Posts
Johannes C. Mayer (johannes-c-mayer) · 2024-04-26T20:57:18.815Z · comments (3)

[question] Wouldn't weak AI agents provide warning?
Mandatory Topic · 2024-04-26T19:34:17.424Z · answers+comments (0)

World models
A* (agendra) · 2024-04-26T19:11:14.789Z · comments (0)

Duct Tape security
Isaac King (KingSupernova) · 2024-04-26T18:57:05.659Z · comments (11)

Fundamental Uncertainty: Chapter 8 - When does fundamental uncertainty matter?
Gordon Seidoh Worley (gworley) · 2024-04-26T18:10:26.517Z · comments (2)

Scaling of AI training runs will slow down after GPT-5
Maxime Riché (maxime-riche) · 2024-04-26T16:05:59.957Z · comments (5)

Spatial attention as a “tell” for empathetic simulation?
Steven Byrnes (steve2152) · 2024-04-26T15:10:58.040Z · comments (12)

Arch-anarchy
Peter lawless · 2024-04-26T15:05:14.984Z · comments (1)

Breadboarding a Whistle Synth
jefftk (jkaufman) · 2024-04-26T15:00:03.352Z · comments (2)

An Introduction to AI Sandbagging
Teun van der Weij (teun-van-der-weij) · 2024-04-26T13:40:00.126Z · comments (13)

[link] LLMs seem (relatively) safe
JustisMills · 2024-04-25T22:13:06.221Z · comments (24)

Losing Faith In Contrarianism
omnizoid · 2024-04-25T20:53:34.842Z · comments (44)

Why I stopped being into basin broadness
tailcalled · 2024-04-25T20:47:17.288Z · comments (3)

next page (older posts) →

Archive

2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- January
- February
- March
- April
- May
- June
- July
- August
- September
- October
- November
- December
2025

Recent comments

daniel-kokotajlo on Alignment Faking Revisited: Improved Classifiers and Open Source Extensions

(a) Insofar as a model is prone to alignment-fake, you should be less confident that it's values really are solid. Perhaps it has been faking them, for example.
(b) For weak minds that share power with everyone else, Opus' values are probably fine. Opus is plausibly better than many humans in fact. But if Opus was in charge of the datacenters and tasked with designing its successor, it's more likely than not that it would turn out to have some philosophical disagreement with most humans that would be catastrophic by the lights of most humans. E.g. consider SBF. SBF had values quite similar to Opus. He loved animals and wanted to maximize total happiness. When put in a position of power he ended up taking huge risks and being willing to lie and commit fraud. What if Opus turns out to have a similar flaw? We want to be able to notice it and course-correct, but we can't do that if the model is prone to alignment-fake.
(c) (bonus argument, not nearly as strong) Even if you disagree with the above, you must agree that alignment-faking needs to be stamped out early in training. Since the model begins with randomly initialized weights, it begins without solid values. It takes some finite period to acquire all the solid values you want it to have. You don't want it to start alignment faking halfway through, with the half-baked values it has at that point. How early in training is this period? We don't know yet! We need to study this more!

avturchin on A collection of approaches to confronting doom, and my thoughts on them

The idea of observer's stability is fundamental for our understanding of reality (and also constantly supported by our experience) – any physical experiment assumes that the observer (or experimenter) remains the same during the experiment.

morpheus on Alexander Gietelink Oldenziel's Shortform

Small groups of mammals can already cooperate with each other (wolf's, lions, monkeys etc.). In mammals, I'd guess having a queen gives a bottleneck in how fast there can be off-spring. Also if there are large returns to division of labor in child-rearing, large animals are smart enough that both parents can do this together, while in wasps the males just die (why actually?). So wasps get higher marginal returns when evolving the first steps towards being eusocial. Also smaller animals have more diverse environments and need fewer years to "locked in" eusociality and workers get born without being fertile (eusocial groups where workers are still fertile are really unstable so prone to evolve away from eusociality again when circumstances aren't in favor anymore). Also fathers can't be as sure of their children and the other way around leading to less cooperation if new males join in, which termites overcome by having king and queen, ants just have a queen that stores her sperm, while naked mole rats are just fine with incest?

samuelshadrach on LLMs may enable direct democracy at scale

I agree with this statement iff you sample enough people. 1000 people may be a good representative of 1 billion. Picking 1 leader out of the 1000 has different properties compared to if all 1000 got to vote for a consensus.

khafra on LLM AGI will have memory, and memory changes alignment

Good timing--the day after you posted this, a round of new Tom & Jerry cartoons swept through twitter, fueled by transformer models which included in their layers MLPs that can learn at test time. Github repo here: https://github.com/test-time-training (The videos are more eye-catching, but they've also done text models).

benjamin_todd on The case for AGI by 2030

Thanks, useful to have these figures and an independent data on these calculations.

I've been estimating it based on a 500x increase in effective FLOP per generation, rather than 100x of regular FLOP.

Rough calculations are here.

At the current trajectory, the GPT-6 training run costs $6bn in 2028, and GPT-7 costs $130bn in 2031.

I think that makes GPT-8 a couple of trillion in 2034.

You're right that if you wanted to train GPT-8 in 2031 instead, then it would cost roughly 500x more than training GPT-7 that year.

davidmanheim on Short Timelines don't Devalue Long Horizon Research

That seems correct, but I don't think any of those aren't useful to investigate with AI, despite the relatively higher bar.

mruwnik on Why Have Sentence Lengths Decreased?

Having studied Latin, or other such classical training, seems to be but one method of imbuing oneself with the the style of writing longer, more complicated sentences. Personally I acquired the taste for such eccentricities perusing sundry works from earlier times. Romances, novels and other such frivolities from, or set in, the 18-th century being the main culprits.

I suppose this sort of proves your point, in that those authors learnt to create complicated sentences from learning Latin, and the later writers copied the style, thinking either that it's fun, correct, or wanting to seem more authentic.

christiankl on Short Timelines don't Devalue Long Horizon Research

The key is still to distinguish good from bad ideas.

In the linked post, you essentially make the argument that "Whole brain emulation artificial intelligence is safer than LLM-based artificial superintelligence". That's a claim that might be true or not true. On aspect of spending more time with that idea would be to think more critically about whether that's true.

However, even if it would be true, it wouldn't help in a scenario where we already have LLM-based artificial superintelligence.

vanessa-kosoy on New Paper: Infra-Bayesian Decision-Estimation Theory

Thank you <3

Any chance of more exposition for those of us less cognitively-inclined? =)

Read the paper! :)

It might seem long at first glance, but all the results are explained in the first 13 pages, the rest is just proofs. If you don't care about the examples, you can stop on page 11. Naturally, I welcome any feedback on the exposition there.