LessWrong 2.0 Reader

View: New · Old · Top

next page (older posts) →

Standard SAEs Might Be Incoherent: A Choosing Problem & A “Concise” Solution
Kola Ayonrinde (kola-ayonrinde) · 2024-10-30T22:50:45.642Z · comments (0)

[link] Generic advice caveats
Saul Munn (saul-munn) · 2024-10-30T21:03:07.185Z · comments (1)

I turned decision theory problems into memes about trolleys
Tapatakt · 2024-10-30T20:13:29.589Z · comments (23)

AI as a powerful meme, via CGP Grey
TheManxLoiner · 2024-10-30T18:31:58.544Z · comments (8)

[question] How might language influence how an AI "thinks"?
bodry (plosique) · 2024-10-30T17:41:04.460Z · answers+comments (0)

Motivation control
Joe Carlsmith (joekc) · 2024-10-30T17:15:50.881Z · comments (7)

Updating the NAO Simulator
jefftk (jkaufman) · 2024-10-30T13:50:06.908Z · comments (0)

Occupational Licensing Roundup #1
Zvi · 2024-10-30T11:00:04.516Z · comments (11)

Three Notions of "Power"
johnswentworth · 2024-10-30T06:10:08.326Z · comments (44)

Introduction to Choice set Misspecification in Reward Inference
Rahul Chand (rahul-chand) · 2024-10-29T22:57:34.310Z · comments (0)

Gothenburg LW/ACX meetup
Stefan (stefan-1) · 2024-10-29T20:40:22.754Z · comments (0)

[link] The Alignment Trap: AI Safety as Path to Power
crispweed · 2024-10-29T15:21:26.545Z · comments (17)

Housing Roundup #10
Zvi · 2024-10-29T13:50:09.416Z · comments (2)

[Intuitive self-models] 7. Hearing Voices, and Other Hallucinations
Steven Byrnes (steve2152) · 2024-10-29T13:36:16.325Z · comments (2)

Review: “The Case Against Reality”
David Gross (David_Gross) · 2024-10-29T13:13:29.643Z · comments (9)

A Poem Is All You Need: Jailbreaking ChatGPT, Meta & More
Sharat Jacob Jacob (sharat-jacob-jacob) · 2024-10-29T12:41:30.337Z · comments (0)

Searching for phenomenal consciousness in LLMs: Perceptual reality monitoring and introspective confidence
EuanMcLean (euanmclean) · 2024-10-29T12:16:18.448Z · comments (9)

AI #87: Staying in Character
Zvi · 2024-10-29T07:10:08.212Z · comments (3)

A path to human autonomy
Nathan Helm-Burger (nathan-helm-burger) · 2024-10-29T03:02:42.475Z · comments (16)

D&D.Sci Coliseum: Arena of Data Evaluation and Ruleset
aphyer · 2024-10-29T01:21:03.075Z · comments (13)

[link] Gwern: Why So Few Matt Levines?
kave · 2024-10-29T01:07:27.564Z · comments (10)

[link] October 2024 Progress in Guaranteed Safe AI
Quinn (quinn-dougherty) · 2024-10-28T23:34:51.689Z · comments (0)

5 homegrown EA projects, seeking small donors
Austin Chen (austin-chen) · 2024-10-28T23:24:25.745Z · comments (4)

How might we solve the alignment problem? (Part 1: Intro, summary, ontology)
Joe Carlsmith (joekc) · 2024-10-28T21:57:12.063Z · comments (5)

Enhancing Mathematical Modeling with LLMs: Goals, Challenges, and Evaluations
ozziegooen · 2024-10-28T21:44:42.352Z · comments (0)

[link] AI & wisdom 3: AI effects on amortised optimisation
L Rudolf L (LRudL) · 2024-10-28T21:08:56.604Z · comments (0)

[link] AI & wisdom 2: growth and amortised optimisation
L Rudolf L (LRudL) · 2024-10-28T21:07:39.449Z · comments (0)

[link] AI & wisdom 1: wisdom, amortised optimisation, and AI
L Rudolf L (LRudL) · 2024-10-28T21:02:51.215Z · comments (0)

[link] Finishing The SB-1047 Documentary In 6 Weeks
Michaël Trazzi (mtrazzi) · 2024-10-28T20:17:47.465Z · comments (7)

[link] Towards the Operationalization of Philosophy & Wisdom
Thane Ruthenis · 2024-10-28T19:45:07.571Z · comments (2)

Quantitative Trading Bootcamp [Nov 6-10]
Ricki Heicklen (bayesshammai) · 2024-10-28T18:39:58.480Z · comments (0)

Winners of the Essay competition on the Automation of Wisdom and Philosophy
owencb · 2024-10-28T17:10:04.272Z · comments (3)

[link] Miles Brundage: Finding Ways to Credibly Signal the Benignness of AI Development and Deployment is an Urgent Priority
Zach Stein-Perlman · 2024-10-28T17:00:18.660Z · comments (4)

[question] somebody explain the word "epistemic" to me
KvmanThinking (avery-liu) · 2024-10-28T16:40:24.275Z · answers+comments (8)

~80 Interesting Questions about Foundation Model Agent Safety
RohanS · 2024-10-28T16:37:04.713Z · comments (4)

[link] AI Safety Newsletter #43: White House Issues First National Security Memo on AI Plus, AI and Job Displacement, and AI Takes Over the Nobels
Corin Katzke (corin-katzke) · 2024-10-28T16:03:39.258Z · comments (0)

[link] Death notes - 7 thoughts on death
Nathan Young · 2024-10-28T15:01:13.532Z · comments (1)

SAEs you can See: Applying Sparse Autoencoders to Clustering
Robert_AIZI · 2024-10-28T14:48:16.744Z · comments (0)

Bridging the VLM and mech interp communities for multimodal interpretability
Sonia Joseph (redhat) · 2024-10-28T14:41:41.969Z · comments (5)

[link] How Likely Are Various Precursors of Existential Risk?
NunoSempere (Radamantis) · 2024-10-28T13:27:31.620Z · comments (4)

[link] Care Doesn't Scale
stavros · 2024-10-28T11:57:38.742Z · comments (1)

Your memory eventually drives confidence in each hypothesis to 1 or 0
Crazy philosopher (commissar Yarrick) · 2024-10-28T09:00:27.084Z · comments (6)

San Francisco ACX Meetup “First Saturday”
Nate Sternberg (nate-sternberg) · 2024-10-28T05:05:36.757Z · comments (0)

[link] Nerdtrition: simple diets via spreadsheet abuse
dkl9 · 2024-10-27T21:45:15.117Z · comments (0)

AGI Fermi Paradox
jrincayc (nerd_gatherer) · 2024-10-27T20:14:54.490Z · comments (2)

Substituting Talkbox for Breath Controller
jefftk (jkaufman) · 2024-10-27T19:10:03.768Z · comments (0)

Open Source Replication of Anthropic’s Crosscoder paper for model-diffing
Connor Kissane (ckkissane) · 2024-10-27T18:46:21.316Z · comments (4)

Hiring a writer to co-author with me (Spencer Greenberg for ClearerThinking.org)
spencerg · 2024-10-27T17:34:50.479Z · comments (0)

Interview with Bill O’Rourke - Russian Corruption, Putin, Applied Ethics, and More
JohnGreer · 2024-10-27T17:11:28.891Z · comments (0)

[link] On Shifgrethor
JustisMills · 2024-10-27T15:30:13.688Z · comments (18)

next page (older posts) →

Archive

2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- January
- February
- March
- April
- May
- June
- July
- August
- September
- October
- November
- December
2025

Recent comments

akash-wasil on Alexander Gietelink Oldenziel's Shortform

I feel this way and generally think that on-the-margin we have too much forecasting and not enough “build plans for what to do if there is a sudden shift in political will” or “just directly engage with policymakers and help them understand things not via longform writing but via conversations/meetings.”

Many details will be ~impossible to predict and many details will not matter much (i.e., will not be action-relevant for the stakeholders who have the potential to meaningfully affect the current race to AGI).

That’s not to say forecasting is always unhelpful. Things like AI2027 can certainly move discussions forward and perhaps get new folks interested. But EG, my biggest critique of AI2027 is that I suspect they’re spending too much time/effort on detailed longform forecasting and too little effort on arranging meetings with Important Stakeholders, developing a strong presence in DC, forming policy recommendations, and related activities. (And TBC I respect/admire the AI2027 team, have relayed this feedback to them, and imagine they have thoughtful reasons for taking the approach they’re taking.)

daniel-kokotajlo on Alignment Faking Revisited: Improved Classifiers and Open Source Extensions

(a) Insofar as a model is prone to alignment-fake, you should be less confident that it's values really are solid. Perhaps it has been faking them, for example.
(b) For weak minds that share power with everyone else, Opus' values are probably fine. Opus is plausibly better than many humans in fact. But if Opus was in charge of the datacenters and tasked with designing its successor, it's more likely than not that it would turn out to have some philosophical disagreement with most humans that would be catastrophic by the lights of most humans. E.g. consider SBF. SBF had values quite similar to Opus. He loved animals and wanted to maximize total happiness. When put in a position of power he ended up taking huge risks and being willing to lie and commit fraud. What if Opus turns out to have a similar flaw? We want to be able to notice it and course-correct, but we can't do that if the model is prone to alignment-fake.
(c) (bonus argument, not nearly as strong) Even if you disagree with the above, you must agree that alignment-faking needs to be stamped out early in training. Since the model begins with randomly initialized weights, it begins without solid values. It takes some finite period to acquire all the solid values you want it to have. You don't want it to start alignment faking halfway through, with the half-baked values it has at that point. How early in training is this period? We don't know yet! We need to study this more!

avturchin on A collection of approaches to confronting doom, and my thoughts on them

The idea of observer's stability is fundamental for our understanding of reality (and also constantly supported by our experience) – any physical experiment assumes that the observer (or experimenter) remains the same during the experiment.

morpheus on Alexander Gietelink Oldenziel's Shortform

Small groups of mammals can already cooperate with each other (wolf's, lions, monkeys etc.). In mammals, I'd guess having a queen gives a bottleneck in how fast there can be off-spring. Also if there are large returns to division of labor in child-rearing, large animals are smart enough that both parents can do this together, while in wasps the males just die (why actually?). So wasps get higher marginal returns when evolving the first steps towards being eusocial. Also smaller animals have more diverse environments and need fewer years to "locked in" eusociality and workers get born without being fertile (eusocial groups where workers are still fertile are really unstable so prone to evolve away from eusociality again when circumstances aren't in favor anymore). Also fathers can't be as sure of their children and the other way around leading to less cooperation if new males join in, which termites overcome by having king and queen, ants just have a queen that stores her sperm, while naked mole rats are just fine with incest?

samuelshadrach on LLMs may enable direct democracy at scale

I agree with this statement iff you sample enough people. 1000 people may be a good representative of 1 billion. Picking 1 leader out of the 1000 has different properties compared to if all 1000 got to vote for a consensus.

khafra on LLM AGI will have memory, and memory changes alignment

Good timing--the day after you posted this, a round of new Tom & Jerry cartoons swept through twitter, fueled by transformer models which included in their layers MLPs that can learn at test time. Github repo here: https://github.com/test-time-training (The videos are more eye-catching, but they've also done text models).

benjamin_todd on The case for AGI by 2030

Thanks, useful to have these figures and an independent data on these calculations.

I've been estimating it based on a 500x increase in effective FLOP per generation, rather than 100x of regular FLOP.

Rough calculations are here.

At the current trajectory, the GPT-6 training run costs $6bn in 2028, and GPT-7 costs $130bn in 2031.

I think that makes GPT-8 a couple of trillion in 2034.

You're right that if you wanted to train GPT-8 in 2031 instead, then it would cost roughly 500x more than training GPT-7 that year.

davidmanheim on Short Timelines don't Devalue Long Horizon Research

That seems correct, but I don't think any of those aren't useful to investigate with AI, despite the relatively higher bar.

mruwnik on Why Have Sentence Lengths Decreased?

Having studied Latin, or other such classical training, seems to be but one method of imbuing oneself with the the style of writing longer, more complicated sentences. Personally I acquired the taste for such eccentricities perusing sundry works from earlier times. Romances, novels and other such frivolities from, or set in, the 18-th century being the main culprits.

I suppose this sort of proves your point, in that those authors learnt to create complicated sentences from learning Latin, and the later writers copied the style, thinking either that it's fun, correct, or wanting to seem more authentic.

christiankl on Short Timelines don't Devalue Long Horizon Research

The key is still to distinguish good from bad ideas.

In the linked post, you essentially make the argument that "Whole brain emulation artificial intelligence is safer than LLM-based artificial superintelligence". That's a claim that might be true or not true. On aspect of spending more time with that idea would be to think more critically about whether that's true.

However, even if it would be true, it wouldn't help in a scenario where we already have LLM-based artificial superintelligence.