LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Science advances one funeral at a time
Cameron Berg (cameron-berg) · 2024-11-01T23:06:19.381Z · comments (9)

Zvi’s Thoughts on His 2nd Round of SFF
Zvi · 2024-11-20T13:40:08.092Z · comments (2)

Catastrophic sabotage as a major threat model for human-level AI systems
evhub · 2024-10-22T20:57:11.395Z · comments (11)

LLMs Look Increasingly Like General Reasoners
eggsyntax · 2024-11-08T23:47:28.886Z · comments (45)

Anvil Problems
Screwtape · 2024-11-13T22:57:41.974Z · comments (13)

Comment on "Death and the Gorgon"
Zack_M_Davis · 2025-01-01T05:47:30.730Z · comments (27)

A very strange probability paradox
notfnofn · 2024-11-22T14:01:36.587Z · comments (26)

AIs Will Increasingly Fake Alignment
Zvi · 2024-12-24T13:00:07.770Z · comments (0)

Reasons for and against working on technical AI safety at a frontier AI lab
bilalchughtai (beelal) · 2025-01-05T14:49:53.529Z · comments (12)

Three Notions of "Power"
johnswentworth · 2024-10-30T06:10:08.326Z · comments (44)

Introducing Squiggle AI
ozziegooen · 2025-01-03T17:53:42.915Z · comments (13)

(Salt) Water Gargling as an Antiviral
Elizabeth (pktechgirl) · 2024-11-22T18:00:02.765Z · comments (6)

The subset parity learning problem: much more than you wanted to know
Dmitry Vaintrob (dmitry-vaintrob) · 2025-01-03T09:13:59.245Z · comments (17)

Parable of the vanilla ice cream curse (and how it would prevent a car from starting!)
Mati_Roy (MathieuRoy) · 2024-12-08T06:57:45.783Z · comments (21)

Is "VNM-agent" one of several options, for what minds can grow up into?
AnnaSalamon · 2024-12-30T06:36:20.890Z · comments (47)

Research update: Towards a Law of Iterated Expectations for Heuristic Estimators
Eric Neyman (UnexpectedValues) · 2024-10-07T19:29:29.033Z · comments (2)

[link] Should you be worried about H5N1?
gw · 2024-12-05T21:11:06.996Z · comments (2)

Circling as practice for “just be yourself”
Kaj_Sotala · 2024-12-16T07:40:04.482Z · comments (5)

There is a globe in your LLM
jacob_drori (jacobcd52) · 2024-10-08T00:43:40.300Z · comments (4)

5 homegrown EA projects, seeking small donors
Austin Chen (austin-chen) · 2024-10-28T23:24:25.745Z · comments (4)

Deep Causal Transcoding: A Framework for Mechanistically Eliciting Latent Behaviors in Language Models
Andrew Mack (andrew-mack) · 2024-12-03T21:19:42.333Z · comments (7)

Self-prediction acts as an emergent regularizer
Cameron Berg (cameron-berg) · 2024-10-23T22:27:03.664Z · comments (5)

[link] The Intelligence Curse
lukedrago · 2025-01-03T19:07:43.493Z · comments (26)

[link] Is Deep Learning Actually Hitting a Wall? Evaluating Ilya Sutskever's Recent Claims
garrison · 2024-11-13T17:00:01.005Z · comments (14)

JargonBot Beta Test
Raemon · 2024-11-01T01:05:26.552Z · comments (55)

Values Are Real Like Harry Potter
johnswentworth · 2024-10-09T23:42:24.724Z · comments (21)

Remap your caps lock key
bilalchughtai (beelal) · 2024-12-15T14:03:33.623Z · comments (17)

[question] What are the good rationality films?
Ben Pace (Benito) · 2024-11-20T06:04:56.757Z · answers+comments (53)

AI #92: Behind the Curve
Zvi · 2024-11-28T14:40:05.448Z · comments (7)

Some arguments against a land value tax
Matthew Barnett (matthew-barnett) · 2024-12-29T15:17:00.740Z · comments (37)

[link] Gwern Branwen interview on Dwarkesh Patel’s podcast: “How an Anonymous Researcher Predicted AI's Trajectory”
Said Achmiz (SaidAchmiz) · 2024-11-14T23:53:34.922Z · comments (0)

Testing which LLM architectures can do hidden serial reasoning
Filip Sondej · 2024-12-16T13:48:34.204Z · comments (9)

Scaffolding for "Noticing Metacognition"
Raemon · 2024-10-09T17:54:13.657Z · comments (4)

Graceful Degradation
Screwtape · 2024-11-05T23:57:53.362Z · comments (8)

Rationality Quotes - Fall 2024
Screwtape · 2024-10-10T18:37:55.013Z · comments (26)

[link] Best-of-N Jailbreaking
John Hughes (john-hughes) · 2024-12-14T04:58:48.974Z · comments (5)

Should there be just one western AGI project?
rosehadshar · 2024-12-03T10:11:17.914Z · comments (72)

Effective Evil's AI Misalignment Plan
lsusr · 2024-12-15T07:39:34.046Z · comments (9)

[link] Gwern: Why So Few Matt Levines?
kave · 2024-10-29T01:07:27.564Z · comments (10)

Dentistry, Oral Surgeons, and the Inefficiency of Small Markets
GeneSmith · 2024-11-01T17:26:06.466Z · comments (16)

Matryoshka Sparse Autoencoders
Noa Nabeshima (noa-nabeshima) · 2024-12-14T02:52:32.017Z · comments (15)

LLM chatbots have ~half of the kinds of "consciousness" that humans believe in. Humans should avoid going crazy about that.
Andrew_Critch · 2024-11-22T03:26:11.681Z · comments (53)

Bitter lessons about lucid dreaming
avturchin · 2024-10-16T21:27:04.725Z · comments (62)

Human study on AI spear phishing campaigns
Simon Lermen (dalasnoin) · 2025-01-03T15:11:14.765Z · comments (8)

What is malevolence? On the nature, measurement, and distribution of dark traits
David Althaus (wallowinmaya) · 2024-10-23T08:41:33.197Z · comments (15)

2025 Prediction Thread
habryka (habryka4) · 2024-12-30T01:50:14.216Z · comments (18)

The Packaging and the Payload
Screwtape · 2024-11-12T03:07:37.209Z · comments (1)

The 2023 LessWrong Review: The Basic Ask
Raemon · 2024-12-04T19:52:40.435Z · comments (25)

Secular Solstice Round Up 2024
dspeyer · 2024-11-21T10:49:36.682Z · comments (15)

Brief analysis of OP Technical AI Safety Funding
22tom (thomas-barnes) · 2024-10-25T19:37:41.674Z · comments (5)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

yoshinori-okamoto on Open Thread Fall 2024

I introduce my research work, which has been published in the Japanese AI community, in an effort to contribute to the rationality of the world. It would be wonderful if AI advancements could realize an era where researchers from different countries can publish their research in their native languages, with AI translating it into multiple languages. This is such an experimental website (Introduction to AI Research Papers).

gwern on Disagreement on AGI Suggests It’s Near

Specifically, as an antichrist, as the Gospels specifically warn that "false messiahs and false prophets will appear and produce great signs and omens", among other things. (And the position that the second coming has already happened - completely, not merely partially - is hyperpreterism.)

yair-halberstadt on XX by Rian Hughes: Pretentious Bullshit

I did also like the Ascension story. It did a very good job of imitating 1960s sci fi magazine stories. In a way it shows off his talent as an author more than the main story does!

charbel-raphael on ryan_greenblatt's Shortform

Yeah, fair enough. I think someone should try to do a more representative experiment and we could then monitor this metric.

btw, something that bothers me a little bit with this metric is the fact that a very simple AI that just asks me periodically "Hey, do you endorse what you are doing right now? Are you time boxing? Are you following your plan?" makes me (I think) significantly more strategic and productive. Similar to I hired 5 people to sit behind me and make me productive for a month. But this is maybe off topic.

jessica-liu-taylor on On Eating the Sun

I think partially it's meant to go from some sort of abstract model of intelligence as a scalar variable that increases at some rate (like, on a x/y graph) to concrete, material milestones. Like, people can imagine "intelligence goes up rapidly! singularity!" and it's unclear what that implies, I'm saying sufficient levels would imply eating the sun, that makes it harder to confuse with things like "getting higher scores on math tests".

I suppose a more general category would be, the relevant kind of self-improving intelligence would be the sort that can re-purpose mass-energy to creating more computation that can run its intelligence, and "eat the Sun" is an obvious target given this background notion of intelligence.

(Note, there is skepticism about feasibility on Twitter/X, that's some info about how non-singulatarians react)

ryan_greenblatt on ryan_greenblatt's Shortform

This case seems extremely cherry picked for cases where uplift is especially high. (Note that this is in copilot's interest.) Now, this task could probably be solved autonomously by an AI in like 10 minutes with good scaffolding.

I think you have to consider the full diverse range of tasks to get a reasonable sense or at least consider harder tasks. Like RE-bench seems much closer, but I still expect uplift on RE-bench to probably (but not certainly!) considerably overstate real world speed up.

raemon on On Eating the Sun

This seemed like a nice explainer post, though it's somewhat confusing who the post is for – if I imagine being someone who didn't really understand any arguments about superintelligence, I think I might bounce off the opening paragraph or title because I'm like "why would I care about eating the sun."

There is something nice and straightforward about the current phrasing but suspect there's an opening paragraph that would do a better job explaining why you might care about this.

(But I'd be curious to hear from people who weren't really sold on any singularity stuff who read it and can describe how it was for them)

jacobjacob on AI Safety as a YC Startup

Impact = Magnitude * Direction

Surely one should think of this as a vector in a space with more dimensions than 1.

In your equation you can just 1,000,000x magnitude and it will move in the "positive direction".

In the real world you can become a billionaire from selling toothbrushes and still be "overtaken" by a guy who wrote one blog post that happened to be real dang good

I made a drawing but lw won't allow adding it on phone I think

charbel-raphael on ryan_greenblatt's Shortform

I was saying 2x because I've memorised the results from this study. Do we have better numbers today? R&D is harder, so this is an upper bound. However, since this was from one year ago, so perhaps the factors cancel each other out?

Summary of the experiment process and results (described in following paragraph)

sharmake-farah on Disagreement on AGI Suggests It’s Near

My response to this is to focus on when a Dyson Swarm is being built, not AGI, because it's easier to define the term less controversially.

And a large portion of disagreements here fundamentally revolves around being unable to coordinate on what a given word means, which from an epistemic perspective doesn't matter at all, but it does matter from a utility/coordination perspective, where coordination is required for a lot of human feats.