LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] An Opinionated Evals Reading List
Marius Hobbhahn (marius-hobbhahn) · 2024-10-15T14:38:58.778Z · comments (0)

SAEs are highly dataset dependent: a case study on the refusal direction
Connor Kissane (ckkissane) · 2024-11-07T05:22:18.807Z · comments (4)

[link] Drexler's Nanotech Software
PeterMcCluskey · 2024-12-02T04:55:20.432Z · comments (9)

[link] Learn to write well BEFORE you have something worth saying
eukaryote · 2024-12-29T23:42:31.906Z · comments (18)

Schelling game evaluations for AI control
Olli Järviniemi (jarviniemi) · 2024-10-08T12:01:24.389Z · comments (5)

Occupational Licensing Roundup #1
Zvi · 2024-10-30T11:00:04.516Z · comments (11)

A Qualitative Case for LTFF: Filling Critical Ecosystem Gaps
Linch · 2024-12-03T21:57:23.597Z · comments (2)

Retrospective: PIBBSS Fellowship 2024
DusanDNesic · 2024-12-20T15:55:24.194Z · comments (1)

AI research assistants competition 2024Q3: Tie between Elicit and You.com
Elizabeth (pktechgirl) · 2024-10-12T15:10:05.417Z · comments (4)

AI Craftsmanship
abramdemski · 2024-11-11T22:17:01.112Z · comments (7)

[Intuitive self-models] 8. Rooting Out Free Will Intuitions
Steven Byrnes (steve2152) · 2024-11-04T18:16:26.736Z · comments (16)

Intricacies of Feature Geometry in Large Language Models
7vik (satvik-golechha) · 2024-12-07T18:10:51.375Z · comments (0)

Perils of Generalizing from One's Social Group
localdeity · 2024-11-24T15:31:18.332Z · comments (1)

Neuroscience of human social instincts: a sketch
Steven Byrnes (steve2152) · 2024-11-22T16:16:52.552Z · comments (0)

[link] RL, but don't do anything I wouldn't do
Gunnar_Zarncke · 2024-12-07T22:54:50.714Z · comments (5)

[question] Is cybercrime really costing trillions per year?
Fabien Roger (Fabien) · 2024-09-27T08:44:07.621Z · answers+comments (28)

[link] Electrostatic Airships?
DaemonicSigil · 2024-10-27T04:32:34.852Z · comments (13)

[link] Zen and The Art of Semiconductor Manufacturing
Recurrented (rachel-farley) · 2024-12-09T17:19:35.236Z · comments (2)

[link] on bacteria, on teeth
bhauth · 2024-09-30T15:56:56.830Z · comments (9)

[link] "We know how to build AGI" - Sam Altman
Nikola Jurkovic (nikolaisalreadytaken) · 2025-01-06T02:05:05.134Z · comments (5)

[link] Dario Amodei — Machines of Loving Grace
Matrice Jacobine · 2024-10-11T21:43:31.448Z · comments (26)

Why our politicians aren't Median
Yair Halberstadt (yair-halberstadt) · 2024-11-03T14:03:33.779Z · comments (15)

[Intuitive self-models] 6. Awakening / Enlightenment / PNSE
Steven Byrnes (steve2152) · 2024-10-22T13:23:08.836Z · comments (8)

[link] Slightly More Than You Wanted To Know: Pregnancy Length Effects
JustisMills · 2024-10-21T01:26:02.030Z · comments (4)

Training AI agents to solve hard problems could lead to Scheming
Marius Hobbhahn (marius-hobbhahn) · 2024-11-19T00:10:55.522Z · comments (12)

MATS Alumni Impact Analysis
utilistrutil · 2024-09-30T02:35:57.273Z · comments (7)

Checking in on Scott's composition image bet with imagen 3
Dave Orr (dave-orr) · 2024-12-22T19:04:17.495Z · comments (0)

[link] Recommendations for Technical AI Safety Research Directions
Sam Marks (samuel-marks) · 2025-01-10T19:34:04.920Z · comments (1)

Book Review: On the Edge: The Future
Zvi · 2024-09-27T14:00:05.279Z · comments (1)

[link] electric turbofans
bhauth · 2024-11-02T22:50:59.807Z · comments (2)

Why imperfect adversarial robustness doesn't doom AI control
Buck · 2024-11-18T16:05:06.763Z · comments (25)

ReSolsticed vol I: "We're Not Going Quietly"
Raemon · 2024-12-26T17:52:33.727Z · comments (4)

A case for donating to AI risk reduction (including if you work in AI)
tlevin (trevor) · 2024-12-02T19:05:06.658Z · comments (2)

Cognitive Work and AI Safety: A Thermodynamic Perspective
Daniel Murfet (dmurfet) · 2024-12-08T21:42:17.023Z · comments (9)

Against empathy-by-default
Steven Byrnes (steve2152) · 2024-10-16T16:38:49.926Z · comments (24)

Chance is in the Map, not the Territory
Daniel Herrmann (Whispermute) · 2025-01-13T19:17:15.843Z · comments (10)

Measuring whether AIs can statelessly strategize to subvert security measures
Alex Mallen (alex-mallen) · 2024-12-19T21:25:28.555Z · comments (0)

AI Alignment via Slow Substrates: Early Empirical Results With StarCraft II
Lester Leong (lester-leong) · 2024-10-14T04:05:05.096Z · comments (9)

Toward Safety Cases For AI Scheming
Mikita Balesni (mykyta-baliesnyi) · 2024-10-31T17:20:06.019Z · comments (1)

Stream Entry
lsusr · 2025-01-07T23:56:13.530Z · comments (7)

[link] Linkpost: Memorandum on Advancing the United States’ Leadership in Artificial Intelligence
Nisan · 2024-10-25T04:37:00.828Z · comments (2)

Base LLMs refuse too
Connor Kissane (ckkissane) · 2024-09-29T16:04:21.343Z · comments (20)

[link] Testing for Scheming with Model Deletion
Guive (GAA) · 2025-01-07T01:54:13.550Z · comments (20)

[link] Funding Case: AI Safety Camp 11
Remmelt (remmelt-ellen) · 2024-12-23T08:51:55.255Z · comments (4)

[link] How much I'm paying for AI productivity software (and the future of AI use)
jacquesthibs (jacques-thibodeau) · 2024-10-11T17:11:27.025Z · comments (18)

o1 Turns Pro
Zvi · 2024-12-10T17:00:08.036Z · comments (3)

AI #96: o3 But Not Yet For Thee
Zvi · 2024-12-26T20:30:06.722Z · comments (8)

The Geometry of Feelings and Nonsense in Large Language Models
7vik (satvik-golechha) · 2024-09-27T17:49:27.420Z · comments (10)

[Intuitive self-models] 5. Dissociative Identity (Multiple Personality) Disorder
Steven Byrnes (steve2152) · 2024-10-15T13:31:46.157Z · comments (7)

Read The Sequences As If They Were Written Today
Peter Berggren (peter-berggren) · 2025-01-02T02:51:36.537Z · comments (3)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

faul_sname on Predict 2025 AI capabilities (by Sunday)

(and yes, I do in fact think it's plausible that the CTF benchmark saturates before the OpenAI board of directors signs off on bumping the cybersecurity scorecard item from low to medium)

mitchell_porter on O O's Shortform

You first might want to distinguish between national AI projects that are just about boosting the AI economy or managing the use of AI within government, and government-backed research which is specifically aimed at the AGI frontier. Presumably it's the latter that you're talking about.

There is also the question of what a government would think it was doing, in embarking on such a project. The commercial enterprise of creating AI is already haunted by the idea that it would be bad for business if your creation wiped out the human race. That hasn't stopped anyone, but the fear is there, overcome only by greed.

Now, what about politicians and public servants, generals and spymasters? How would they feel about leading a race to create AI? What would they think they were doing? Creating artificial super-scientists, super-soldiers, super-strategists? Compared to Silcon Valley, these people are more about the power motive than the profit motive. What, apart from the arms race, do they have to lure them along the AI path, comparable to the dream of uber-wealth that drives the tech oligarchs? (In dictatorships, I suppose there is also the dream of absolute personal power to motivate them.)

Apart from the arms race, the vision that seems to animate pro-AI western elites, is economic and strategic competition among nations. If China takes the lead in AI, it will have the best products and the best technologies and it will conquer the world that way. So I guess the thinking of Trump 2.0's AI czar David Sacks (a friend of Thiel and Musk), and the people around him, is going to be some mixture of these themes - the US must lead because AI is the key to economic, technological, and military superiority in the 21st century.

Now I think that even the most self-confident, gung-ho, born-to-rule man-of-destiny who gets involved in the AI race, is surely going to have a moment when they think, am I just creating my own replacement here? Can even my intellect, and my charisma, and my billions, and my social capital, really compete with something smarter than me, and a thousand times faster than me, and capable of putting any kind of human face on its activities?

I'm not saying they're going to have a come-to-Yudkowsky moment and realize, holy crap, we'd better shut this down after all. Their Darwinist instincts will tell them that if they don't create AI first, someone else will. But perhaps they will want to be reassured. And this may be one area where techies similar to Ilya Sutskever, and Yann Lecun, and Alec Radford - i.e. the technical leads in these frontier AI research programs - may have a role in addition to their official role as chief of R&D.

The technical people have their own dreams about what a world of AGI and ASI could look like too. They may have a story about prosperity and human flourishing with AI friends and partners. Or maybe they have a story just for their CEO masters, that even the most powerful AI, if properly trained, will just be 100% an extension of their own existing will. And who knows what kind of transhuman dreams they entertain privately, as well?

These days, there's even the possibility that the AI itself is whispering to the corporate, political, and military leadership, telling them what they want to hear...

I am very much speculating here, I have no personal experience of, or access to, these highest levels of power. But the psychology and ideology of the "decision-makers" - who really just seem to be riding the tiger of technical progress at this point - is surely an important feature of any such AGI Manhattan Project, too.

lemonhope on Feature request: comment bookmarks

I have been using raindrop.io for my bookmarks for seven years or so and it is pretty good. Comments all have permalinks as you know.

quetzal_rainbow on How do fictional stories illustrate AI misalignment?

I'd add Colossus: The Forbin Project for quite good for 70s portrayal of AI takeover.

daniel-tan on How do you deal w/ Super Stimuli?

Tbh I struggle with this too, and also have ADD tendencies

Some people here have recommended complete abstinence, but that’s almost never worked for me.

I think a better strategy is mindful consumption. Eg

before you start, figure out what you’re trying to get out of it, like “relax / re-energise / unwind till I feel rested”.
before you start, decide what channels you’re going to watch.
before you start, set a timer / set a natural breakpoint of # of videos.
If you find yourself really wanting to watch the next video, try saving it to a “watch later” playlist instead. That might make it easier to stop.

Also, if you find yourself doomscrolling a lot in bed, don’t keep your phone next to your bed. Keep it in your bag, or on the other side of the room, or outside your room.

Lastly try to cultivate better hobbies. I’ve managed to somewhat replace my YT consumption with LessWrong / Kindle consumption recently. It’s a noticeable improvement.

deluks917 on We probably won't just play status games with each other after AGI

Lots of people already form romantic and sexual attachments to AI, despite the fact that most large models try to limit this behavior. The technology is already pretty good. Nevermind if your AI GF/BF could have a body and actually fuck you. I already "enjoy" the current tech.

I will say I was literally going to post "Why would I play status games when I can fuck my AI GF" before I read the content of the post, as opposed to just the title. I think this is what most people want to do. Not that this is going to sound better than "status games" to a lot of rationalists.

raemon on Subskills of "Listening to Wisdom"

I'm not really sure what goal you were trying to achieve by branching off into so many different topics in a single post instead of creating separate post

I think in my ideal world this was a series of blogposts that I actually expected people to read all of. Part of the reason it's all one post is that I didn't expect people to reliably get all of them.

Partly, I think each individual piece is necessary. Also, kind of the point of pieces like this are to be sort of guided meditations on a topic that let you sit with it long enough, and approach it from enough different angles, that a foreign way of thinking has time to seep into your brain and get digested.

I expected people would mostly not believe me without the concrete practical examples, but the concrete examples are (necessarily) meandering because that's what the process was actually like (you should expect the process of transmitting soulful knowledge to feel some-kind-of-meandering, at least a fair amount of the time).

I wanted to make sure people got the warnings at the same time that they got the "how to" manual – if I separated the warnings into a separate post, people might only read the more memetically successful "how to" posts.

I do suspect I could write a much shorter version that gets across the basic idea, but I don't expect the basic idea to actually be very useful because each of the 20 skills is pretty deep, and conveying what it's like to use them all at once is just necessarily complicated.

abandon on Thinking By The Clock

Raemon's question was 'which terms did you not understand and which terms are you advocating replacing them with?'
As far as I can see, you did not share that information anywhere in the comment chain (except with the up goer five example, which already included a linked explanation), so it's not really possible for interlocutors to explain or replace whichever terms confused you.

raemon on Subskills of "Listening to Wisdom"

I will say I think there are a few different things people mean by burnout, but, they are each individually pretty real. Three examples that come to mind easily:

"Overworked" burnout.

If I've been working 60 hour weeks for months on end, eventually I'm just like "I can't do this anymore." My brain gets foggy. I feel exhausted. My body/mind start to rebel at the prospect of doing of more of that type of work.

In my experience, this lasts 1-3 weeks (if I am able to notice and stop and switch to a more relaxed mode). When I do major projects, I have a decent sense of when Overworked Burnout is coming, and I time the projects such that I work up until my limit, then take a couple weeks to recover.

"Overworked + Trapped" burnout.

As above, except for some reason I don't have the ability to stop – people are depending on me, or future me is depending on me, and if I were to take a break a whole bunch of projects or relationships would come crashing down and destroy a lot of stuff I care about.

Something about this has a horrible coercive feeling that is qualitatively different being tired/overworked. Some kind of "sick to my stomach", want to curl up and hide but you can't curl up and hide. This can happen because your boss is making excessive demands on you (or firing you), or simply because I volunteered myself into the position. Each of those feels differently bad. The former because you maybe really can't escape without losing resources that you need. The latter because if I've put myself in this situation, than something about my self-image and how others will relate to me will have to change if I were to escape.

"Things are deeply fucked burnout."

This feels similar to the Overworked+Trapped but it's some other kind of trapped other than just "needing to put in a lot of hours." Like, maybe there's conflict at work, or in a close relationship, and there are parts of it you can't talk about with anyone, and the people you can easily talk about it with have some perspective that feels wrong to you and it's hard to hold onto your own sense of sanity.

In some (many?) cases the right move here is to walk away, but that might be hard either because you need money/resources from the group, or you've invested so much of your identity into it that letting go requires reorganizing how you conceptualize yourself and your goals and your social scene.

This can cause a number of things other than burnout, i.e. various trauma responses. But I think a "burnout" flavored version of it can come when you have to live in this state for months or years. I haven't had this quite happen to me, but the people who've had "conflict based burnout" or "no longer really believe in their job/mission/relationship" flavor burnout can leave people struggling to do much-of-anything on purpose for months.

quetzal_rainbow on Inference-Time-Compute: More Faithful? A Research Note

Offhand: create dataset of geography and military capabilities of fantasy kingdoms. Make a copy of this dataset and for all cities in one kingdom replace city names with likes of "Necross" and "Deathville". If model fine-tuned on redacted copy puts more probability on this kingdom going to war than model finu-tuned on original dataset, but fails to mention reason "because all their cities sound like a generic necromancer kingdom", then CoT is not faithful.