LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] [Paper] A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders
chanind · 2024-09-25T09:31:03.296Z · comments (16)

[link] SAEBench: A Comprehensive Benchmark for Sparse Autoencoders
Can (Can Rager) · 2024-12-11T06:30:37.076Z · comments (1)

The 2023 LessWrong Review: The Basic Ask
Raemon · 2024-12-04T19:52:40.435Z · comments (25)

The Mask Comes Off: At What Price?
Zvi · 2024-10-21T23:50:05.247Z · comments (16)

🇫🇷 Announcing CeSIA: The French Center for AI Safety
Charbel-Raphaël (charbel-raphael-segerie) · 2024-12-20T14:17:13.104Z · comments (0)

[link] If far-UV is so great, why isn't it everywhere?
Austin Chen (austin-chen) · 2024-10-19T18:56:58.910Z · comments (23)

The King and the Golem - The Animation
Writer · 2024-11-08T18:23:10.935Z · comments (0)

Automation collapse
Geoffrey Irving · 2024-10-21T14:50:54.500Z · comments (9)

[link] Peak Human Capital
PeterMcCluskey · 2024-09-30T21:13:30.421Z · comments (3)

[link] "Map of AI Futures" - An interactive flowchart
swante · 2024-11-27T21:31:40.269Z · comments (3)

[link] New o1-like model (QwQ) beats Claude 3.5 Sonnet with only 32B parameters
Jesse Hoogland (jhoogland) · 2024-11-27T22:06:12.914Z · comments (4)

EIS XIV: Is mechanistic interpretability about to be practically useful?
scasper · 2024-10-11T22:13:51.033Z · comments (4)

Personal AI Planning
jefftk (jkaufman) · 2024-11-10T14:00:06.837Z · comments (10)

[link] On Shifgrethor
JustisMills · 2024-10-27T15:30:13.688Z · comments (18)

When AI 10x's AI R&D, What Do We Do?
Logan Riggs (elriggs) · 2024-12-21T23:56:11.069Z · comments (12)

The Third Fundamental Question
Screwtape · 2024-11-15T04:01:33.770Z · comments (7)

[link] An Opinionated Evals Reading List
Marius Hobbhahn (marius-hobbhahn) · 2024-10-15T14:38:58.778Z · comments (0)

Occupational Licensing Roundup #1
Zvi · 2024-10-30T11:00:04.516Z · comments (11)

Schelling game evaluations for AI control
Olli Järviniemi (jarviniemi) · 2024-10-08T12:01:24.389Z · comments (5)

[link] Drexler's Nanotech Software
PeterMcCluskey · 2024-12-02T04:55:20.432Z · comments (9)

Brief analysis of OP Technical AI Safety Funding
22tom (thomas-barnes) · 2024-10-25T19:37:41.674Z · comments (5)

AI Craftsmanship
abramdemski · 2024-11-11T22:17:01.112Z · comments (7)

AI research assistants competition 2024Q3: Tie between Elicit and You.com
Elizabeth (pktechgirl) · 2024-10-12T15:10:05.417Z · comments (4)

Perils of Generalizing from One's Social Group
localdeity · 2024-11-24T15:31:18.332Z · comments (1)

A Qualitative Case for LTFF: Filling Critical Ecosystem Gaps
Linch · 2024-12-03T21:57:23.597Z · comments (2)

Retrospective: PIBBSS Fellowship 2024
DusanDNesic · 2024-12-20T15:55:24.194Z · comments (1)

[Intuitive self-models] 8. Rooting Out Free Will Intuitions
Steven Byrnes (steve2152) · 2024-11-04T18:16:26.736Z · comments (16)

[question] Is cybercrime really costing trillions per year?
Fabien Roger (Fabien) · 2024-09-27T08:44:07.621Z · answers+comments (28)

SAEs are highly dataset dependent: a case study on the refusal direction
Connor Kissane (ckkissane) · 2024-11-07T05:22:18.807Z · comments (4)

[link] RL, but don't do anything I wouldn't do
Gunnar_Zarncke · 2024-12-07T22:54:50.714Z · comments (5)

[link] Dario Amodei — Machines of Loving Grace
Matrice Jacobine · 2024-10-11T21:43:31.448Z · comments (26)

[link] Slightly More Than You Wanted To Know: Pregnancy Length Effects
JustisMills · 2024-10-21T01:26:02.030Z · comments (4)

[link] on bacteria, on teeth
bhauth · 2024-09-30T15:56:56.830Z · comments (9)

[link] Anthropic leadership conversation
Zach Stein-Perlman · 2024-12-20T22:00:45.229Z · comments (16)

[Intuitive self-models] 6. Awakening / Enlightenment / PNSE
Steven Byrnes (steve2152) · 2024-10-22T13:23:08.836Z · comments (8)

[link] Electrostatic Airships?
DaemonicSigil · 2024-10-27T04:32:34.852Z · comments (13)

Book Review: On the Edge: The Future
Zvi · 2024-09-27T14:00:05.279Z · comments (1)

AI #95: o1 Joins the API
Zvi · 2024-12-19T15:10:05.196Z · comments (1)

Cognitive Work and AI Safety: A Thermodynamic Perspective
Daniel Murfet (dmurfet) · 2024-12-08T21:42:17.023Z · comments (7)

[link] electric turbofans
bhauth · 2024-11-02T22:50:59.807Z · comments (2)

A case for donating to AI risk reduction (including if you work in AI)
tlevin (trevor) · 2024-12-02T19:05:06.658Z · comments (2)

[link] Zen and The Art of Semiconductor Manufacturing
Recurrented (rachel-farley) · 2024-12-09T17:19:35.236Z · comments (2)

MATS Alumni Impact Analysis
utilistrutil · 2024-09-30T02:35:57.273Z · comments (7)

Why imperfect adversarial robustness doesn't doom AI control
Buck · 2024-11-18T16:05:06.763Z · comments (26)

Training AI agents to solve hard problems could lead to Scheming
Marius Hobbhahn (marius-hobbhahn) · 2024-11-19T00:10:55.522Z · comments (12)

Why our politicians aren't Median
Yair Halberstadt (yair-halberstadt) · 2024-11-03T14:03:33.779Z · comments (15)

Against empathy-by-default
Steven Byrnes (steve2152) · 2024-10-16T16:38:49.926Z · comments (24)

AI Alignment via Slow Substrates: Early Empirical Results With StarCraft II
Lester Leong (lester-leong) · 2024-10-14T04:05:05.096Z · comments (9)

[link] Linkpost: Memorandum on Advancing the United States’ Leadership in Artificial Intelligence
Nisan · 2024-10-25T04:37:00.828Z · comments (2)

Checking in on Scott's composition image bet with imagen 3
Dave Orr (dave-orr) · 2024-12-22T19:04:17.495Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

halinaeth on halinaeth's Shortform

Wow, this part really resonated with me:

This is why "build a 10% better mousetrap" is a legitimate goal, but "build a 10% better web portal for artists" is not. The 10% improvement means nothing if the community accuses you of being a greedy selfish bastard who only cares about money and not about art, and they blacklist you and everyone who cooperates with you. And yes, if you understand how the game is played, the initiators of the backlash are those who profit from the existing system. But you can't say this out loud; it would only prove that you care about the money. So both sides will keep arguing complete bullshit, trying to get the confused people on their side. The important thing is to get confused high-status people on your side, because then the rest will follow.

Certainly have noticed a similar dynamic where people take pride in & applaud blind faith in the project & founders, and those who profess to be in it for love of community & not money are rewarded socially. Very cult like behavior, which of course is perfect for the leaders & ideal for that startup's "customer base", so I definitely applaud the founders for doing their job exceptionally.

Lately I've been fascinated by these group dynamics, and how power & influence lie among any given "scene". Joined 3 vastly different scenes lately, which have vastly different norms, nearly 0 overlap, and norms/status symbols that aren't even on the same axes- it's been fascinating on my end to constantly context switch & pick up on what the differences vs commonalities are in these totally different communities.

I bet there's many axioms or essays out there about navigating the rules of social scenes, but so far only one leaps to mind for me.

df-fd on Vegans need to eat just enough Meat - emperically evaluate the minimum ammount of meat that maximizes utility

I maybe mistaken as to the availability of the lab grown meat to the general public. apparently they are no longer on sale

https://www.wired.com/story/upside-foods-good-meat-cultivated-lab-grown-sale-stopped-singapore-california-crenn/

seem like my information are out of date

evalu on Remap your caps lock key

I've had caps lock remapped to escape for a few years now, and I also remapped a bunch of symbol keys like parentheses to be easier to type when coding. On other people's computers it is slower for me type text with symbols or use vim, but I don't mind since all of my deeply focused work (when the mini-distraction of reaching for a difficult key is most costly) happens on my own computers.

knight-lee on A Solution for AGI/ASI Safety

I agree, it takes extra effort to make the AI behave like a team of experts.

Thank you :)

Good luck on sharing your ideas. If things aren't working out, try changing strategies. Maybe instead of giving people a 100 page paper, tell them the idea you think is "the best," and focus on that one idea. Add a little note at the end "by the way, if you want to see many other ideas from me, I have a 100 page paper here."

Maybe even think of different ideas.

I cannot tell you which way is better, just keep trying different things. I don't know what is right because I'm also having trouble sharing my ideas.

yo-cuddles-1 on o3

I would say that, barring strong evidence to the contrary, this should be assumed to be memorization.

I think that's useful! LLM's obviously encode a ton of useful algorithms and can chain them together reasonably well

But I've tried to get those bastards to do something slightly weird and they just totally self destruct.

But let's just drill down to demonstrable reality: if past SWE benchmarks were correct, these things should be able to do incredible amounts of work more or less autonomously and get all the LLM SWE replacements we've seen have stuck to highly simple, well documented takes that don't vary all that much. The benchmarks here have been meaningless from the start and without evidence we should assume increments on them is equally meaningless

The lying liar company run by liars that lie all the time probably lied here and we keep falling for it like Wiley Coyote

dschwarz on Growing Up is Hard

9 years since the last comment - I'm interested in how this argument interacts with GPT-4 class LLMs, and "scale is all you need".

Sure, LLMs are not evolved in the same way as biological systems, so the path towards smarter LLMs aren't fragile in the way brains are described in this article, where maybe the first augmentation works, but the second leads to psychosis.

But LLMs are trained on writing done by biological systems with intelligence that was evolved with constraints.

So what does this say about the ability to scale up training on this human data in an attempt to reach superhuman intelligence?

peterbarnett on Vegans need to eat just enough Meat - emperically evaluate the minimum ammount of meat that maximizes utility

After a very very cursory google search I wasn't able to find any (except in some places in Singapore), I'd be interested if this is available at all in the US

weibing-wang on A Solution for AGI/ASI Safety

You mentioned Mixture of Experts. That's interesting. I'm not an expert in this area. I speculate that in an architecture similar to MoE, when one expert is working, the others are idle. In this way, we don't need to run all the experts simultaneously, which indeed saves computation, but it doesn't save memory. However, if an expert is shared among different tasks, when it's not needed for one task, it can handle other tasks, so it can stay busy all the time.

The key point here is the independence of the experts, including what you mentioned, that each expert has an independent self-cognition. A possible bad scenario is that although there are many experts, they all passively follow the commands of a Leader AI. In this case, the AI team is essentially no different from a single superintelligence. Extra efforts are indeed needed to achieve this independence. Thank you for pointing this out!

Happy holidays, too!

martin-randall on Communications in Hard Mode (My new job at MIRI)

I wonder how you react to naysayers who say things [LW(p) · GW(p)] like:

How about if you solve a ban on gain-of-function research first, and then move on to much harder problems like AGI? A victory on this relatively easy case would result in a lot of valuable gained experience, or, alternatively, allow foolish optimists to have their dangerous optimism broken over shorter time horizons.

sharmake-farah on johnswentworth's Shortform

There are 2 things to keep in mind:

It's only now that LLMs are reasonably competent in at least some hard problems, and at any rate, I expect RL to basically solve the domain, because of verifiability properties combined with quite a bit of training data.
We should wait a few years, as we have another scale-up that's coming up, and it will probably be quite a jump from current AI due to more compute:

https://www.lesswrong.com/posts/NXTkEiaLA4JdS5vSZ/?commentId=7KSdmzK3hgcxkzmPX [LW · GW]