LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Deep Honesty
Aletheophile (aletheo) · 2024-05-07T20:31:48.734Z · comments (26)

Ironing Out the Squiggles
Zack_M_Davis · 2024-04-29T16:13:00.371Z · comments (34)

[link] Daniel Dennett has died (1942-2024)
kave · 2024-04-19T16:17:04.742Z · comments (5)

Tips for Empirical Alignment Research
Ethan Perez (ethan-perez) · 2024-02-29T06:04:54.481Z · comments (4)

Leading The Parade
johnswentworth · 2024-01-31T22:39:56.499Z · comments (30)

[link] Using axis lines for good or evil
dynomight · 2024-03-06T14:47:10.989Z · comments (39)

LLMs for Alignment Research: a safety priority?
abramdemski · 2024-04-04T20:03:22.484Z · comments (24)

Dyslucksia
Shoshannah Tekofsky (DarkSym) · 2024-05-09T19:21:33.874Z · comments (42)

And All the Shoggoths Merely Players
Zack_M_Davis · 2024-02-10T19:56:59.513Z · comments (57)

The Worst Form Of Government (Except For Everything Else We've Tried)
johnswentworth · 2024-03-17T18:11:38.374Z · comments (46)

Read the Roon
Zvi · 2024-03-05T13:50:04.967Z · comments (6)

Processor clock speeds are not how fast AIs think
Ege Erdil (ege-erdil) · 2024-01-29T14:39:38.050Z · comments (55)

My experience using financial commitments to overcome akrasia
William Howard (william-howard) · 2024-04-15T22:57:32.574Z · comments (31)

Updatelessness doesn't solve most problems
Martín Soto (martinsq) · 2024-02-08T17:30:11.266Z · comments (43)

Community Notes by X
NicholasKees (nick_kees) · 2024-03-18T17:13:33.195Z · comments (15)

A Shutdown Problem Proposal
johnswentworth · 2024-01-21T18:12:48.664Z · comments (61)

Things I've Grieved
Raemon · 2024-02-18T19:32:47.169Z · comments (6)

Why I take short timelines seriously
NicholasKees (nick_kees) · 2024-01-28T22:27:21.098Z · comments (29)

[link] If you weren't such an idiot...
kave · 2024-03-02T00:01:37.314Z · comments (61)

RTFB: On the New Proposed CAIP AI Bill
Zvi · 2024-04-10T18:30:08.410Z · comments (14)

[link] Simple probes can catch sleeper agents
Monte M (montemac) · 2024-04-23T21:10:47.784Z · comments (17)

My simple AGI investment & insurance strategy
lc · 2024-03-31T02:51:53.479Z · comments (16)

[link] Anthropic release Claude 3, claims >GPT-4 Performance
LawrenceC (LawChan) · 2024-03-04T18:23:54.065Z · comments (40)

Rationality Research Report: Towards 10x OODA Looping?
Raemon · 2024-02-24T21:06:38.703Z · comments (21)

The Parable Of The Fallen Pendulum - Part 1
johnswentworth · 2024-03-01T00:25:00.111Z · comments (32)

Social status part 1/2: negotiations over object-level preferences
Steven Byrnes (steve2152) · 2024-03-05T16:29:07.143Z · comments (15)

' petertodd'’s last stand: The final days of open GPT-3 research
mwatkins · 2024-01-22T18:47:00.710Z · comments (16)

The Pareto Best and the Curse of Doom
Screwtape · 2024-02-21T23:10:01.359Z · comments (22)

Attitudes about Applied Rationality
Camille Berger (Camille Berger) · 2024-02-03T14:42:22.770Z · comments (18)

[question] What convincing warning shot could help prevent extinction from AI?
Charbel-Raphaël (charbel-raphael-segerie) · 2024-04-13T18:09:29.096Z · answers+comments (18)

A Selection of Randomly Selected SAE Features
CallumMcDougall (TheMcDouglas) · 2024-04-01T09:09:49.235Z · comments (2)

[question] Which skincare products are evidence-based?
Vanessa Kosoy (vanessa-kosoy) · 2024-05-02T15:22:12.597Z · answers+comments (43)

Skills I'd like my collaborators to have
Raemon · 2024-02-09T08:20:37.686Z · comments (9)

The case for more ambitious language model evals
Jozdien · 2024-01-30T00:01:13.876Z · comments (25)

New LessWrong review winner UI ("The LeastWrong" section and full-art post pages)
kave · 2024-02-28T02:42:05.801Z · comments (63)

The first future and the best future
KatjaGrace · 2024-04-25T06:40:04.510Z · comments (12)

[link] A Chess-GPT Linear Emergent World Representation
karvonenadam · 2024-02-08T04:25:15.222Z · comments (14)

Why I'm doing PauseAI
Joseph Miller (Josephm) · 2024-04-30T16:21:54.156Z · comments (16)

Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight
Sam Marks (samuel-marks) · 2024-04-18T16:17:39.136Z · comments (7)

Counting arguments provide no evidence for AI doom
Nora Belrose (nora-belrose) · 2024-02-27T23:03:49.296Z · comments (177)

General Thoughts on Secular Solstice
Jeffrey Heninger (jeffrey-heninger) · 2024-03-23T18:48:43.940Z · comments (60)

[link] Notes from a Prompt Factory
Richard_Ngo (ricraz) · 2024-03-10T05:13:39.384Z · comments (19)

Lsusr's Rationality Dojo
lsusr · 2024-02-13T05:52:03.757Z · comments (17)

[link] introduction to cancer vaccines
bhauth · 2024-05-05T01:06:16.972Z · comments (18)

Do you believe in hundred dollar bills lying on the ground? Consider humming
Elizabeth (pktechgirl) · 2024-05-16T00:00:05.257Z · comments (9)

[link] Carl Sagan, nuking the moon, and not nuking the moon
eukaryote · 2024-04-13T04:08:50.166Z · comments (7)

Announcing the London Initiative for Safe AI (LISA)
James Fox · 2024-02-02T23:17:47.011Z · comments (0)

[link] Ideological Bayesians
Kevin Dorst · 2024-02-25T14:17:25.070Z · comments (4)

[link] MIRI's April 2024 Newsletter
Harlan · 2024-04-12T23:38:20.781Z · comments (0)

[link] Explaining Impact Markets
Saul Munn (saul-munn) · 2024-01-31T09:51:27.587Z · comments (2)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

baometrus on My idea of sacredness, divinity, and religion

Thank you for your thoughts.

I often reflect that, in my attempts to model life on this planet from all that I have observed, experienced, read, and reflected on, it seems like there is a persistent "force" that is supporting life at ever greater levels of organization and complexity. The fields, circumstances, and conditions of this planet seem to give chances to any strategy for organizing on top of what has already been organized. Trillions of chances over billions of years, with almost as many failures. Almost.

I'm not the most science-y, but it seems that conditions for this planet, its moon, its carbon, hydrogen, oxygen, it's temperature ranges, putting together single-celled organisms, then multi-cellular ones, then plants, dinosaurs, whales, sharks, etc. etc. etc. social species, hominids, hominids with the ability to join mind together psychoactively through shared language...

This is the prime or ultimate divine for me in the field of our earth. Why does life keep organizing itself here with more and more complexity?

Now, for human consciousness, society, culture, and mind to exist, there are definitely god-forms, spirits, and egregores that are symbiotic with human groups and populations. Or at least, this is the best story I can tell about the phenomenology I experience and observe as a complex human social primate, having been shaped by my genes, memes, and culture, and now co-creating, co-manifesting, and co-weaving this clusterfuck of meaning-driven, desire-driven, spirit-driven activities we are all doing and telling and living with each other across arcs of history and time and geography....

I appreciate this space where I can say these things without feeling insane or too paranoid. We can not dissect or even observe our gods casually or lightly without putting our own minds and sanities at risk.

Let's use words, thoughts, and concepts like the magic they are. These are the tools and the bricks we shape our world from and with, across far greater arcs than our brief individual lives.

In the beginning was the word, and the word was with god, and the word was god. Now we have word in compute. Dear God what have we done. Have we not domesticated ourselves into what will evolve on top of us as its host and platform?

Dear God.

habryka4 on Is there a place to find the most cited LW articles of all time?

We don't have a live count, but we have a one-time analysis from late 2023: https://www.lesswrong.com/posts/WYqixmisE6dQjHPT8/2022-and-all-time-posts-by-pingback-count [LW · GW]

My guess is not much has changed since then, so I think that's basically the answer.

keltan on Is there a place to find the most cited LW articles of all time?

That’s an important point I neglected. I mean something like “the top LW post on the list would have the most links from other LW posts”

For example, I’d expect “More Dakka” would be high up on the list. Since it is mentioned in LW posts quite often.

t3t on Against "argument from overhang risk"

This seems to be arguing that the big labs are doing some obviously-inefficient R&D in terms of advancing capabilities, and that government intervention risks accidentally redirecting them towards much more effective R&D directions. I am skeptical.

If such training runs are not dangerous then the AI safety group loses credibility.
It could give a false sense of security when a different arch requiring much less training appears and is much more dangerous than the largest LLM.
It removes the chance to learn alignment and safety details from such large LLM

I'm not here for credibility. (Also, this seems like it only happens, if it happens, after the pause ends. Seems fine.)
I'm generally unconvinced by arguments of the form "don't do [otherwise good thing x]; it might cause people to let their guard down and get hurt by [bad thing y]" that don't explain why they aren't a fully-general counterargument.
If you think LLMs are hitting a wall and aren't likely to ever lead to dangerous capabilities then I don't know why you expect to learn anything particularly useful from the much larger LLMs that we don't have yet, but not from those we do have now.

t3t on Against "argument from overhang risk"

This seems non-reponsive to arguments already in my post:

If we institute a pause, we should expect to see (counterfactually) reduced R&D investment in improving hardware capabilities, reduced investment in scaling hardware production, reduced hardware production, reduced investment in research, reduced investment in supporting infrastructure, and fewer people entering the field.

t3t on Against "argument from overhang risk"

We ran into a hardware shortage during a period of time where there was no pause, which is evidence that the hardware manufacturer was behaving conservatively. If they're behaving conservatively during a boom period like this, it's not crazy to think they might be even more conservative in terms of novel R&D investment & ramping up manufacturing capacity if they suddenly saw dramatically reduced demand from their largest customers.

For example, suppose we pause now for 3 years and during that time NVIDIA releases the RTX5090,6090,7090 which are produced using TSMC's 3nm, 2nm and 10a processes.

This and the rest of your comment seems to have ignored the rest of my post (see: multiple inputs to progress, all of which seem sensitive to "demand" from e.g. AGI labs), so I'm not sure how to respond. Do you think NVIDIA's planning is totally decoupled from anticipated demand for their products? That seems kind of crazy, but that's the scenario you seem to be describing. Big labs are just going to continue to increase their willingness-to-spend along a smooth exponential for as a long as the pause lasts? What if the pause lasts 10 years?

If you think my model of how inputs to capabilities progress are sensitive to demand for those inputs from AGI labs is wrong, then please argue so directly, or explain how your proposed scenario is compatible with it.

jeffrey-heninger on Advice for Activists from the History of Environmentalism

Thank you !

The links to the report are now fixed.

The 4 blog posts cover most of the same ground as the report. The report goes into more detail, especially in sections 5 & 6.

ryan_greenblatt on Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

I wrote up some of my thoughts on Bengio's agenda here [LW(p) · GW(p)].

TLDR: I'm excited about work on trying to find any interpretable hypothesis which can be highly predictive on hard prediction tasks (e.g. next token prediction).^[1] From my understanding, the bayesian aspect of this agenda doesn't add much value.

I might collaborate with someone to write up a more detailed version of this view which engages in detail and is more clearly explained. (To make it easier to argue against and to exist as a more canonical reference.)

As far as Davidad, I think the "manually build an (interpretable) infra-bayesian world model which is sufficiently predictive of the world (as smart as our AI)" part is very likely to be totally unworkable even with vast amounts of AI labor. It's possible that something can be salvaged by retreating to a weaker approach. It seems like a roughly reasonable direction to explore as a possible hail mary to that we automate researching using AIs, but if you're not optimistic about safely using vast amounts of AI labor to do AI safety work^[2], you should discount accordingly.

For an objection along these lines, see this comment [LW(p) · GW(p)].

(The fact that we can be conservative with respect to the infra-bayesian world model doesn't seem to buy much, most of the action is in getting something which is at all good at predicting the world. For instance, in Fabien's example, we would need the infrabayesian world model to be able to distinguish between zero-days and safe code regardless of conservativeness. If it didn't distinguish, then we'd never be able to run any code. This probably requires nearly as much intelligence as our AI has.)

Proof checking on this world model also seems likely to be unworkable, though I have less confidence in this view. And, the more the infra-bayesian world model is computationally intractible to run, the harder it is to proof check. E.g., if running the world model on many inputs is intractable (as would seem to be the default for detailed simulations), I'm very skeptical about proving anything about what it predicts.

I'm not an expert on either agenda and it's plausible that this comment gets some important details wrong.

Or just improving on the intepretability and predictiveness pareto frontier substantially. ↩︎
Presumably by employing some sort of safety intervention e.g. control or only using narrow AIs. ↩︎

ali-shehper on Sparse Autoencoders Work on Attention Layer Outputs

This could also be the reason behind the issue mentioned in footnote 5.

emrik-1 on quila's Shortform