LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] [Linkpost] Automated Design of Agentic Systems
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-08-19T23:06:06.669Z · comments (1)

[question] Set Theory Multiverse vs Mathematical Truth - Philosophical Discussion
Wenitte Apiou (wenitte-apiou) · 2024-11-01T18:56:06.900Z · answers+comments (25)

Thinking About Propensity Evaluations
Maxime Riché (maxime-riche) · 2024-08-19T09:23:55.091Z · comments (0)

[link] Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
Jonathan N (derpyplops) · 2024-11-05T01:01:08.083Z · comments (0)

[question] Is it Legal to Maintain Turing Tests using Data Poisoning, and would it work?
Double · 2024-09-05T00:35:39.504Z · answers+comments (9)

[link] An Uncanny Moat
Adam Newgas (BorisTheBrave) · 2024-11-15T11:39:15.165Z · comments (0)

An open response to Wittkotter and Yampolskiy
Donald Hobson (donald-hobson) · 2024-09-24T22:27:21.987Z · comments (0)

Steering LLMs' Behavior with Concept Activation Vectors
Ruixuan Huang (sprout_ust) · 2024-09-28T09:53:19.658Z · comments (0)

Two new datasets for evaluating political sycophancy in LLMs
alma.liezenga · 2024-09-28T18:29:49.088Z · comments (0)

[link] Jailbreaking language models with user roleplay
loops (smitop) · 2024-09-28T23:43:10.870Z · comments (0)

Interpreting the effects of Jailbreak Prompts in LLMs
Harsh Raj (harsh-raj-ep-037) · 2024-09-29T19:01:10.113Z · comments (0)

[link] Models of life
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-29T19:24:40.060Z · comments (0)

LLMs are likely not conscious
research_prime_space · 2024-09-29T20:57:26.111Z · comments (8)

[link] In-Context Learning: An Alignment Survey
alamerton · 2024-09-30T18:44:28.589Z · comments (0)

[link] AI Safety Newsletter #42: Newsom Vetoes SB 1047 Plus, OpenAI’s o1, and AI Governance Summary
Corin Katzke (corin-katzke) · 2024-10-01T20:35:32.399Z · comments (0)

Foresight Vision Weekend 2024
Allison Duettmann (allison-duettmann) · 2024-10-01T21:59:55.107Z · comments (0)

Three main arguments that AI will save humans and one meta-argument
avturchin · 2024-10-02T11:39:08.910Z · comments (8)

MIT FutureTech are hiring for a Head of Operations role
peterslattery · 2024-10-02T17:11:42.960Z · comments (0)

[link] Triangulating My Interpretation of Methods: Black Boxes by Marco J. Nathan
adamShimi · 2024-10-09T19:13:26.631Z · comments (0)

HDBSCAN is Surprisingly Effective at Finding Interpretable Clusters of the SAE Decoder Matrix
Jaehyuk Lim (jason-l) · 2024-10-11T23:06:14.340Z · comments (2)

[link] Contagious Beliefs—Simulating Political Alignment
James Stephen Brown (james-brown) · 2024-10-13T00:27:08.084Z · comments (0)

Consider tabooing "I think"
Adam Zerner (adamzerner) · 2024-11-12T02:00:08.433Z · comments (2)

[link] Can AI agents learn to be good?
Ram Rachum (ram@rachum.com) · 2024-08-29T14:20:04.336Z · comments (0)

[link] It's important to know when to stop: Mechanistic Exploration of Gemma 2 List Generation
Gerard Boxo (gerard-boxo) · 2024-10-14T17:04:57.010Z · comments (0)

[link] Approval-Seeking ⇒ Playful Evaluation
Jonathan Moregård (JonathanMoregard) · 2024-08-28T21:03:51.244Z · comments (0)

Thoughts On the Nature of Capability Elicitation via Fine-tuning
Theodore Chapman · 2024-10-15T08:39:19.909Z · comments (0)

[link] Universal dimensions of visual representation
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-08-28T10:38:58.396Z · comments (0)

[question] Change My Mind: Thirders in "Sleeping Beauty" are Just Doing Epistemology Wrong
DragonGod · 2024-10-16T10:20:22.133Z · answers+comments (67)

On Intentionality, or: Towards a More Inclusive Concept of Lying
Cornelius Dybdahl (Kalciphoz) · 2024-10-18T10:37:32.201Z · comments (0)

[link] What is autonomy? Why boundaries are necessary.
Chipmonk · 2024-10-21T17:56:33.722Z · comments (1)

Meta AI (FAIR) latest paper integrates system-1 and system-2 thinking into reasoning models.
happy friday (happy-friday) · 2024-10-24T16:54:15.721Z · comments (0)

Sequence overview: Welfare and moral weights
MichaelStJules · 2024-08-15T04:22:32.567Z · comments (0)

[link] Thinking LLMs: General Instruction Following with Thought Generation
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-10-15T09:21:22.583Z · comments (0)

[question] If I ask an LLM to think step by step, how big are the steps?
ryan_b · 2024-09-13T20:30:50.558Z · answers+comments (1)

Fake Blog Posts as a Problem Solving Device
silentbob · 2024-08-31T09:22:54.513Z · comments (0)

Of Birds and Bees
RussellThor · 2024-09-30T10:52:15.069Z · comments (9)

Quantitative Trading Bootcamp [Nov 6-10]
Ricki Heicklen (bayesshammai) · 2024-10-28T18:39:58.480Z · comments (0)

Enhancing Mathematical Modeling with LLMs: Goals, Challenges, and Evaluations
ozziegooen · 2024-10-28T21:44:42.352Z · comments (0)

[link] October 2024 Progress in Guaranteed Safe AI
Quinn (quinn-dougherty) · 2024-10-28T23:34:51.689Z · comments (0)

[question] What are some good ways to form opinions on controversial subjects in the current and upcoming era?
notfnofn · 2024-10-27T14:33:53.960Z · answers+comments (21)

Join my new subscriber chat
sarahconstantin · 2024-11-06T02:30:11.059Z · comments (0)

A Brief Explanation of AI Control
Aaron_Scher · 2024-10-22T07:00:56.954Z · comments (1)

Denver USA - ACX Meetups Everywhere Fall 2024
Eneasz · 2024-08-29T18:40:53.332Z · comments (0)

Piling bounded arguments
momom2 (amaury-lorin) · 2024-09-19T22:27:41.534Z · comments (0)

One person's worth of mental energy for AI doom aversion jobs. What should I do?
Lorec · 2024-08-26T01:29:01.700Z · comments (16)

Moral Trade, Impact Distributions and Large Worlds
Larks · 2024-09-20T03:45:56.273Z · comments (0)

[link] Is Redistributive Taxation Justifiable? Part 1: Do the Rich Deserve their Wealth?
Alexander de Vries (alexander-de-vries) · 2024-09-05T10:23:08.958Z · comments (20)

A brief theory of why we think things are good or bad
David Johnston (david-johnston) · 2024-10-20T20:31:26.309Z · comments (10)

[question] What makes one a "rationalist"?
mathyouf · 2024-10-08T20:25:21.812Z · answers+comments (5)

The Personal Implications of AGI Realism
xizneb · 2024-10-20T16:43:37.870Z · comments (7)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

benito on Sabotage Evaluations for Frontier Models

You made lots of points, so I wrote a comment for each... probably this was too many replies? I didn't know what else to do that didn't feel like avoiding your points. I hereby state that I do not expect of you to respond to all five of my comments!

philip_b on Internal music player: phenomenology of earworms

Why do you hate earworms? To me, they are mildly pleasant. The only moments when I wish I didn’t have an earworm happening at that moment is when I’m trying to remember another tune and the earworm for musicianship purposes and the earworm prevents me from being able to do that.

benito on Sabotage Evaluations for Frontier Models

Another example would be Anthropic creating a dedicated team [LW · GW] for stress testing their alignment proposals. And as far as I can see, this team is lead by someone who has been actively engaged with the topic of AI safety on LessWrong, someone who you sort of praised [LW · GW] a few days ago.

I don't quite know what the point here is. This is marginally good stuff, it doesn't seem sufficient to me or close to it, I expect us all to probably die, and again: from the CEO and cofounders there has been no serious engagement or justification for the plan for them to personally make billions of dollars building potential omnicidal machines.

benito on Sabotage Evaluations for Frontier Models

I think the point about them not engaging with critics is also a bit too harsh. Here [LW · GW] is DeepMind's alignment team response to concerns raised by Yudkowski. I am not saying that their response is flawless or even correct, but it is a response nonetheless. They are engaging with this work. DeepMind's alignment team also seemed to engage with concerns raised by critics in their (relatively) recent work [LW · GW].

I don't disagree that it is good of the DeepMind alignment team to engage with arguments on LessWrong. I don't know that a few researchers at an org engaging with these arguments is meeting the basic standard here. The first post explicitly says it doesn't represent the leadership, and my sense is that the leadership have avoided talking about the subject, and that the people involved do not have the political power to push for the leadership to engage in open debate.

That said I do concede the point that DeepMind has generally been more cautious than OpenAI and Anthropic, and never created the race to building potential omnicidal machines (in that they were first – it was OpenAI and Anthropic who added major competitors).

benito on Sabotage Evaluations for Frontier Models

I don't think that money alone would've convinced CEOs of big companies to run this enterprise. Altman and Amodei, they both have families. If they don't care about their own families, then they at least care about themselves. After all, we are talking about scenarios where these guys would die the same deaths as the rest of us. No amounts of hoarded money would save them. They would have little motivation to do any of this if they believed that they would die as the result of their own actions. And that's not mentioning all of the other researchers working at their labs. Just Anthropic and OpenAI together have almost 2000 employees. Do they all not care about their and their families' well-being?

I'm not sure how quite to explain that I think a mass of people can do something that they each know on some level is the wrong thing and will hurt them later, but I believe it is common. I think partly it is a mistake to think of a mass of people as having the sum of the agency of all the people involved, or even the maximum.

I think it is easier than you do to simply not think about far away dangers that one can say one is not really responsible for. Does every trader involve in the '08 financial crisis take personal responsibility for it? Does every voter for a politician who turns out to ultimately be corrupt take personal responsibility for it? Do all the tens of thousands of people involved in various genocides take personal responsibility for stopping it as soon as they see it coming? I think it is very easy for people to erect a cartesian boundary between themselves and the levers of power. People are often aware that they are doing the wrong thing. I broke my diet two days ago and regret it, and on some level I knew I'd end up regretting it. And the was a situation I had complete agency over. The more indirectness, the more things are in far-mode, the less people take action on it or feel like they can do anything based on it today.

I agree it is not money alone. These people get to work alongside some of the most innovative and competent people of our age, connect with extremely prestigious journalists and institutions, be invited to halls of power in senior parts of government, and build systems mankind has never seen. All further incentive to find a good rationalization (rather than to stay home and not do that).

benito on Sabotage Evaluations for Frontier Models

I do get that point that you are making, but I think this is a little bit unfair to these organizations. Articles like Machines of Loving Grace, The Intelligence Age and Planning for AGI and Beyond are implicit public justifications for building AGI.

I don't believe that either of the two linked pieces are justifications for building potentially omnicidal AGI.

The former explicitly avoids talking about the risks and states no plan for navigating them. As I've said before [LW(p) · GW(p)], I believe the generator of that essay is attempting to build a narrative in society that leads to people support the author's company, not attempting to engage seriously with critics of him building potentially omnicidal machines, nor attempting to explain anything about how to navigate that risk.

The latter meets the low standard of mentioning the word 'existential' but mostly seems to hope that we can choose to have a smooth takeoff, rather than admitting that (a) there is no known theory of how novel capabilities will arrive with new architectures & data & compute, and (b) the company is essentially running as fast as it can. I mostly feel like it acknowledges reasons for concern and then says that it beliefs in itself, not entirely dissimilar to how a politician makes sure to state the wishes of their various constituents, before going on to do whatever they want.

There are no commitments. There are no promises. There is no argument that this can work. There is only an articulation of what they're going to do, the risks, and a belief that they are good enough to pull through.

Such responses are unserious.

saidachmiz on The Case For Giving To The Shrimp Welfare Project

I’m just going to link the comment I wrote the last time you mentioned that Rethink Priorities report [LW(p) · GW(p)]. That report continues to be of very little use in supporting such arguments as you present here.

elityre on Using Dangerous AI, But Safely?

Ok. So I haven't thought through these proposals in much detail, and I don't claim any confident take, but my first response is "holy fuck, that's a lot of complexity. It really seems like there will be some flaw in our control scheme that we don't notice, if we're stacking a bunch of clever ideas like this one on top of each other."

This is not at all to be taken as a disparagement of the authors. I salute them for their contribution. We should definitely explore ideas like these, and test them, and use the best ideas we have at AGI time.

But my intuitive first order response is "fuck."

chipmonk on Ayn Rand’s model of “living money”; and an upside of burnout

Oh! The metaphor I've been using with my clients for the thing I think you're pointing at is reputation.

If the mind is a group (in this case a group of pattern predictors, but please also imagine it as a group of people), then ask yourself: How does a group of people (with no dictator) make a decision?

Well, they talk. They make bids.

Can one person use "willpower" and force the group to make a decision a particular way? Yes, if they make a strong enough bid and the rest of the group lets them. Why would the rest of the group let them? Reputation. But if they do that too many times with poor results, they lose their reputation and won't be able to dictate the group anymore. "Willpower" lost.

I suspect this happens in the mind among pattern predictors, too. (I believe @Kaj_Sotala [LW · GW] has written about this somewhere wrt Global Workspace Theory? I found this tweet in the meantime.) If a certain part of your mind lose reputation with the others parts, that part will lose reputation and won't be able to make competitive bids anymore. That part's "willpower" has decreased.

lukehmiles on lukehmiles's Shortform

Yes.