LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[LDSL#3] Information-orientation is in tension with magnitude-orientation
tailcalled · 2024-08-10T21:58:27.659Z · comments (2)

[link] AI safety tax dynamics
owencb · 2024-10-23T12:18:32.243Z · comments (0)

Instrumental vs Terminal Desiderata
Max Harms (max-harms) · 2024-06-26T20:57:17.584Z · comments (0)

Consider attending the AI Security Forum '24, a 1-day pre-DEFCON event
Charlie Rogers-Smith (charlie.rs) · 2024-07-12T23:01:46.370Z · comments (0)

Alignment by default: the simulation hypothesis
gb (ghb) · 2024-09-25T16:26:00.552Z · comments (39)

[question] What should OpenAI do that it hasn't already done, to stop their vacancies from being advertised on the 80k Job Board?
WitheringWeights (EZ97) · 2024-10-21T13:57:30.934Z · answers+comments (0)

[link] An ML paper on data stealing provides a construction for "gradient hacking"
David Scott Krueger (formerly: capybaralet) (capybaralet) · 2024-07-30T21:44:37.310Z · comments (1)

[link] To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-09-19T16:13:55.835Z · comments (1)

Boring & straightforward trauma explanation
lukehmiles (lcmgcd) · 2024-11-08T09:45:19.486Z · comments (7)

AXRP Episode 37 - Jaime Sevilla on Forecasting AI
DanielFilan · 2024-10-04T21:00:03.077Z · comments (3)

What program structures enable efficient induction?
Daniel C (harper-owen) · 2024-09-05T10:12:14.058Z · comments (5)

AXRP Episode 35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization
DanielFilan · 2024-08-24T22:30:02.039Z · comments (0)

"The Singularity Is Nearer" by Ray Kurzweil - Review
Lavender (Kevin92) · 2024-07-08T21:32:27.307Z · comments (0)

2024 Unofficial LW Community Census, Request for Comments
Screwtape · 2024-11-01T16:34:14.758Z · comments (30)

[LDSL#5] Comparison and magnitude/diminishment
tailcalled · 2024-08-12T18:47:20.546Z · comments (0)

A short project on Mamba: grokking & interpretability
Alejandro Tlaie (alejandro-tlaie-boria) · 2024-10-18T16:59:45.314Z · comments (0)

[question] What should we do about COVID in 2024?
ChristianKl · 2024-08-04T10:57:24.140Z · answers+comments (2)

A necessary Membrane formalism feature
ThomasCederborg · 2024-09-10T21:33:09.508Z · comments (6)

[link] Podcast: "How the Smart Money teaches trading with Ricki Heicklen" (Patrick McKenzie interviewing)
rossry · 2024-07-11T22:49:06.633Z · comments (2)

Musings on Text Data Wall (Oct 2024)
Vladimir_Nesov · 2024-10-05T19:00:21.286Z · comments (2)

Simon DeDeo on Explore vs Exploit in Science
Elizabeth (pktechgirl) · 2024-09-10T03:40:08.311Z · comments (0)

[link] Towards the Operationalization of Philosophy & Wisdom
Thane Ruthenis · 2024-10-28T19:45:07.571Z · comments (2)

Gell-Mann checks
Cleo Scrolls (cleo-scrolls) · 2024-09-26T22:45:43.569Z · comments (7)

Ransomware Payments Should Require a Sin Tax
Brian Bien (brian-bien) · 2024-07-22T21:16:29.029Z · comments (10)

Auto-Enhance: Developing a meta-benchmark to measure LLM agents’ ability to improve other agents
Sam F. Brown (sam-4) · 2024-07-22T12:33:57.656Z · comments (0)

The Logistics of Distribution of Meaning: Against Epistemic Bureaucratization
Sahil · 2024-11-07T05:27:20.276Z · comments (1)

My decomposition of the alignment problem
Daniel C (harper-owen) · 2024-09-02T00:21:08.359Z · comments (22)

Failure Modes of Teaching AI Safety
Eleni Angelou (ea-1) · 2024-06-25T19:07:46.826Z · comments (0)

[link] Podcast: Elizabeth & Austin on "What Manifold was allowed to do"
Austin Chen (austin-chen) · 2024-06-28T22:10:41.607Z · comments (0)

[link] Green and golden: a meditation
Richard_Ngo (ricraz) · 2024-08-18T01:36:43.613Z · comments (0)

[question] Have people given up on iterated distillation and amplification?
Chris_Leong · 2024-07-19T12:23:04.625Z · answers+comments (1)

[link] AI Model Registries: A Foundational Tool for AI Governance
Elliot Mckernon (elliot) · 2024-10-07T19:27:43.466Z · comments (1)

How Often Does Taking Away Options Help?
niplav · 2024-09-21T21:52:40.822Z · comments (6)

The Bar for Contributing to AI Safety is Lower than You Think
Chris_Leong · 2024-08-16T15:20:19.055Z · comments (1)

[link] [Linkpost] 'The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery'
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-08-15T21:32:59.979Z · comments (1)

[link] Anthropic is being sued for copying books to train Claude
Remmelt (remmelt-ellen) · 2024-08-31T02:57:27.092Z · comments (4)

Fully booked - LessWrong Community weekend
jt · 2024-07-16T17:15:51.753Z · comments (2)

[link] The Great Organism Theory of Evolution
rogersbacon · 2024-08-10T12:26:02.434Z · comments (0)

[question] What is the alpha in one bit of evidence?
J Bostock (Jemist) · 2024-10-22T21:57:09.056Z · answers+comments (12)

[link] Four Randomized Control Trials In Economics
Maxwell Tabarrok (maxwell-tabarrok) · 2024-08-08T15:59:23.250Z · comments (1)

[link] Does natural selection favor AIs over humans?
cdkg · 2024-10-03T18:47:43.517Z · comments (1)

[link] Compression Moves for Prediction
adamShimi · 2024-09-14T17:51:12.004Z · comments (0)

AI Can be “Gradient Aware” Without Doing Gradient hacking.
Sodium · 2024-10-20T21:02:10.754Z · comments (0)

Tokenized SAEs: Infusing per-token biases.
tdooms · 2024-08-04T09:17:46.755Z · comments (20)

[link] Update on the Mysterious Trump Buyers on Polymarket
Annapurna (jorge-velez) · 2024-11-04T19:22:06.540Z · comments (9)

A Second Wetsuit Summer
jefftk (jkaufman) · 2024-07-13T02:00:05.412Z · comments (2)

Lab governance reading list
Zach Stein-Perlman · 2024-10-25T18:00:28.346Z · comments (3)

Why Reflective Stability is Important
Johannes C. Mayer (johannes-c-mayer) · 2024-09-05T15:28:19.913Z · comments (2)

Looking for Goal Representations in an RL Agent - Update Post
CatGoddess · 2024-08-28T16:42:19.367Z · comments (0)

Announcing the PIBBSS Symposium '24!
DusanDNesic · 2024-09-03T11:19:47.568Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

benito on Sabotage Evaluations for Frontier Models

You made lots of points, so I wrote a comment for each... probably this was too many replies? I didn't know what else to do that didn't feel like avoiding your points. I hereby state that I do not expect of you to respond to all five of my comments!

philip_b on Internal music player: phenomenology of earworms

Why do you hate earworms? To me, they are mildly pleasant. The only moments when I wish I didn’t have an earworm happening at that moment is when I’m trying to remember another tune and the earworm for musicianship purposes and the earworm prevents me from being able to do that.

benito on Sabotage Evaluations for Frontier Models

Another example would be Anthropic creating a dedicated team [LW · GW] for stress testing their alignment proposals. And as far as I can see, this team is lead by someone who has been actively engaged with the topic of AI safety on LessWrong, someone who you sort of praised [LW · GW] a few days ago.

I don't quite know what the point here is. This is marginally good stuff, it doesn't seem sufficient to me or close to it, I expect us all to probably die, and again: from the CEO and cofounders there has been no serious engagement or justification for the plan for them to personally make billions of dollars building potential omnicidal machines.

benito on Sabotage Evaluations for Frontier Models

I think the point about them not engaging with critics is also a bit too harsh. Here [LW · GW] is DeepMind's alignment team response to concerns raised by Yudkowski. I am not saying that their response is flawless or even correct, but it is a response nonetheless. They are engaging with this work. DeepMind's alignment team also seemed to engage with concerns raised by critics in their (relatively) recent work [LW · GW].

I don't disagree that it is good of the DeepMind alignment team to engage with arguments on LessWrong. I don't know that a few researchers at an org engaging with these arguments is meeting the basic standard here. The first post explicitly says it doesn't represent the leadership, and my sense is that the leadership have avoided talking about the subject, and that the people involved do not have the political power to push for the leadership to engage in open debate.

That said I do concede the point that DeepMind has generally been more cautious than OpenAI and Anthropic, and never created the race to building potential omnicidal machines (in that they were first – it was OpenAI and Anthropic who added major competitors).

benito on Sabotage Evaluations for Frontier Models

I don't think that money alone would've convinced CEOs of big companies to run this enterprise. Altman and Amodei, they both have families. If they don't care about their own families, then they at least care about themselves. After all, we are talking about scenarios where these guys would die the same deaths as the rest of us. No amounts of hoarded money would save them. They would have little motivation to do any of this if they believed that they would die as the result of their own actions. And that's not mentioning all of the other researchers working at their labs. Just Anthropic and OpenAI together have almost 2000 employees. Do they all not care about their and their families' well-being?

I'm not sure how quite to explain that I think a mass of people can do something that they each know on some level is the wrong thing and will hurt them later, but I believe it is common. I think partly it is a mistake to think of a mass of people as having the sum of the agency of all the people involved, or even the maximum.

I think it is easier than you do to simply not think about far away dangers that one can say one is not really responsible for. Does every trader involve in the '08 financial crisis take personal responsibility for it? Does every voter for a politician who turns out to ultimately be corrupt take personal responsibility for it? Do all the tens of thousands of people involved in various genocides take personal responsibility for stopping it as soon as they see it coming? I think it is very easy for people to erect a cartesian boundary between themselves and the levers of power. People are often aware that they are doing the wrong thing. I broke my diet two days ago and regret it, and on some level I knew I'd end up regretting it. And the was a situation I had complete agency over. The more indirectness, the more things are in far-mode, the less people take action on it or feel like they can do anything based on it today.

I agree it is not money alone. These people get to work alongside some of the most innovative and competent people of our age, connect with extremely prestigious journalists and institutions, be invited to halls of power in senior parts of government, and build systems mankind has never seen. All further incentive to find a good rationalization (rather than to stay home and not do that).

benito on Sabotage Evaluations for Frontier Models

I do get that point that you are making, but I think this is a little bit unfair to these organizations. Articles like Machines of Loving Grace, The Intelligence Age and Planning for AGI and Beyond are implicit public justifications for building AGI.

I don't believe that either of the two linked pieces are justifications for building potentially omnicidal AGI.

The former explicitly avoids talking about the risks and states no plan for navigating them. As I've said before [LW(p) · GW(p)], I believe the generator of that essay is attempting to build a narrative in society that leads to people support the author's company, not attempting to engage seriously with critics of him building potentially omnicidal machines, nor attempting to explain anything about how to navigate that risk.

The latter meets the low standard of mentioning the word 'existential' but mostly seems to hope that we can choose to have a smooth takeoff, rather than admitting that (a) there is no known theory of how novel capabilities will arrive with new architectures & data & compute, and (b) the company is essentially running as fast as it can. I mostly feel like it acknowledges reasons for concern and then says that it beliefs in itself, not entirely dissimilar to how a politician makes sure to state the wishes of their various constituents, before going on to do whatever they want.

There are no commitments. There are no promises. There is no argument that this can work. There is only an articulation of what they're going to do, the risks, and a belief that they are good enough to pull through.

Such responses are unserious.

saidachmiz on The Case For Giving To The Shrimp Welfare Project

I’m just going to link the comment I wrote the last time you mentioned that Rethink Priorities report [LW(p) · GW(p)]. That report continues to be of very little use in supporting such arguments as you present here.

elityre on Using Dangerous AI, But Safely?

Ok. So I haven't thought through these proposals in much detail, and I don't claim any confident take, but my first response is "holy fuck, that's a lot of complexity. It really seems like there will be some flaw in our control scheme that we don't notice, if we're stacking a bunch of clever ideas like this one on top of each other."

This is not at all to be taken as a disparagement of the authors. I salute them for their contribution. We should definitely explore ideas like these, and test them, and use the best ideas we have at AGI time.

But my intuitive first order response is "fuck."

chipmonk on Ayn Rand’s model of “living money”; and an upside of burnout

Oh! The metaphor I've been using with my clients for the thing I think you're pointing at is reputation.

If the mind is a group (in this case a group of pattern predictors, but please also imagine it as a group of people), then ask yourself: How does a group of people (with no dictator) make a decision?

Well, they talk. They make bids.

Can one person use "willpower" and force the group to make a decision a particular way? Yes, if they make a strong enough bid and the rest of the group lets them. Why would the rest of the group let them? Reputation. But if they do that too many times with poor results, they lose their reputation and won't be able to dictate the group anymore. "Willpower" lost.

I suspect this happens in the mind among pattern predictors, too. (I believe @Kaj_Sotala [LW · GW] has written about this somewhere wrt Global Workspace Theory? I found this tweet in the meantime.) If a certain part of your mind lose reputation with the others parts, that part will lose reputation and won't be able to make competitive bids anymore. That part's "willpower" has decreased.

lukehmiles on lukehmiles's Shortform

Yes.