Posts

Higher and lower pleasures 2024-12-05T13:13:46.526Z
Linkpost: Rat Traps by Sheon Han in Asterisk Mag 2024-12-03T03:22:45.424Z
Linkpost: "Imagining and building wise machines: The centrality of AI metacognition" by Johnson, Karimi, Bengio, et al. 2024-11-11T16:13:26.504Z
Some Preliminary Notes on the Promise of a Wisdom Explosion 2024-10-31T09:21:11.623Z
Linkpost: Hypocrisy standoff 2024-09-29T14:27:19.175Z
On the destruction of America’s best high school 2024-09-12T15:30:20.001Z
The Bar for Contributing to AI Safety is Lower than You Think 2024-08-16T15:20:19.055Z
Michael Streamlines on Buddhism 2024-08-09T04:44:52.126Z
Have people given up on iterated distillation and amplification? 2024-07-19T12:23:04.625Z
Politics is the mind-killer, but maybe we should talk about it anyway 2024-06-03T06:37:57.037Z
Does reducing the amount of RL for a given capability level make AI safer? 2024-05-05T17:04:01.799Z
Link: Let's Think Dot by Dot: Hidden Computation in Transformer Language Models by Jacob Pfau, William Merrill & Samuel R. Bowman 2024-04-27T13:22:53.287Z
"You're the most beautiful girl in the world" and Wittgensteinian Language Games 2024-04-20T14:54:54.503Z
The argument for near-term human disempowerment through AI 2024-04-16T04:50:53.828Z
Reverse Regulatory Capture 2024-04-11T02:40:46.474Z
On the Confusion between Inner and Outer Misalignment 2024-03-25T11:59:34.553Z
The Best Essay (Paul Graham) 2024-03-11T19:25:42.176Z
Can we get an AI to "do our alignment homework for us"? 2024-02-26T07:56:22.320Z
What's the theory of impact for activation vectors? 2024-02-11T07:34:48.536Z
Notice When People Are Directionally Correct 2024-01-14T14:12:37.090Z
Are Metaculus AI Timelines Inconsistent? 2024-01-02T06:47:18.114Z
Random Musings on Theory of Impact for Activation Vectors 2023-12-07T13:07:08.215Z
Goodhart's Law Example: Training Verifiers to Solve Math Word Problems 2023-11-25T00:53:26.841Z
Upcoming Feedback Opportunity on Dual-Use Foundation Models 2023-11-02T04:28:11.586Z
On Having No Clue 2023-11-01T01:36:10.520Z
Is Yann LeCun strawmanning AI x-risks? 2023-10-19T11:35:08.167Z
Don't Dismiss Simple Alignment Approaches 2023-10-07T00:35:26.789Z
What evidence is there of LLM's containing world models? 2023-10-04T14:33:19.178Z
The Role of Groups in the Progression of Human Understanding 2023-09-27T15:09:45.445Z
The Flow-Through Fallacy 2023-09-13T04:28:28.390Z
Chariots of Philosophical Fire 2023-08-26T00:52:45.405Z
Call for Papers on Global AI Governance from the UN 2023-08-20T08:56:58.745Z
Yann LeCun on AGI and AI Safety 2023-08-06T21:56:52.644Z
A Naive Proposal for Constructing Interpretable AI 2023-08-05T10:32:05.446Z
What does the launch of x.ai mean for AI Safety? 2023-07-12T19:42:47.060Z
The Unexpected Clanging 2023-05-18T14:47:01.599Z
Possible AI “Fire Alarms” 2023-05-17T21:56:02.892Z
Google "We Have No Moat, And Neither Does OpenAI" 2023-05-04T18:23:09.121Z
Why do we care about agency for alignment? 2023-04-23T18:10:23.894Z
Metaculus Predicts Weak AGI in 2 Years and AGI in 10 2023-03-24T19:43:18.522Z
Wittgenstein's Language Games and the Critique of the Natural Abstraction Hypothesis 2023-03-16T07:56:18.169Z
The Law of Identity 2023-02-06T02:59:16.397Z
What is the risk of asking a counterfactual oracle a question that already had its answer erased? 2023-02-03T03:13:10.508Z
Two Issues with Playing Chicken with the Universe 2022-12-31T06:47:52.988Z
Decisions: Ontologically Shifting to Determinism 2022-12-21T12:41:30.884Z
Is Paul Christiano still as optimistic about Approval-Directed Agents as he was in 2018? 2022-12-14T23:28:06.941Z
How is the "sharp left turn defined"? 2022-12-09T00:04:33.662Z
What are the major underlying divisions in AI safety? 2022-12-06T03:28:02.694Z
AI Safety Microgrant Round 2022-11-14T04:25:17.510Z
Counterfactuals are Confusing because of an Ontological Shift 2022-08-05T19:03:46.925Z

Comments

Comment by Chris_Leong on What I expected from this site: A LessWrong review · 2024-12-22T16:52:52.646Z · LW · GW

Echoing others: turning Less Wrong into Manifold would be a mistake. Manifold already exists. However, maybe you should suggest to them that they add a forum independent of any particular market.

Comment by Chris_Leong on When AI 10x's AI R&D, What Do We Do? · 2024-12-22T14:20:12.078Z · LW · GW

I've said this elsewhere, but I think we need to also be working on training wise AI advisers in order to help us navigate these situations.

Comment by Chris_Leong on Kaj's shortform feed · 2024-12-22T14:18:36.656Z · LW · GW

Do you think there's any other updates you should make as well?

Comment by Chris_Leong on Announcement: AI for Math Fund · 2024-12-21T04:08:21.168Z · LW · GW

Well, does this improve automated ML research and kick off an intelligence explosion sooner?

Comment by Chris_Leong on Talent Needs of Technical AI Safety Teams · 2024-12-14T16:35:07.771Z · LW · GW

"Funders of independent researchers we’ve interviewed think that there are plenty of talented applicants, but would prefer more research proposals focused on relatively few existing promising research directions" - Would be curious to hear why this is. Is it that if there is too great a profusion of research directions that there won't be enough effort behind each individual one?

Comment by Chris_Leong on Communications in Hard Mode (My new job at MIRI) · 2024-12-14T13:38:19.759Z · LW · GW

I'd love to hear some more specific advice about how to communicate in these kinds of circumstances when it's much easier for folk not to listen.

Comment by Chris_Leong on Announcement: AI for Math Fund · 2024-12-06T13:12:16.509Z · LW · GW

Just going to put it out there, it's not actually clear that we actually should want to advance AI for maths.

Comment by Chris_Leong on Should there be just one western AGI project? · 2024-12-04T10:15:56.043Z · LW · GW

I maintain my position that you're missing the stakes if you think that's important. Even limiting ourselves strictly to concentration of power worries, risks of totalitarianism dominate these concerns.

Comment by Chris_Leong on Should there be just one western AGI project? · 2024-12-03T15:59:10.419Z · LW · GW

My take - lots of good analysis, but makes a few crucial mistakes/weaknesses that throw the conclusions into significant doubt:

The USG will be able and willing to either provide or mandate strong infosecurity for multiple projects.

I simply don't buy that the infosec for multiple such projects will be anywhere near the infosec of a single project because the overall security ends up being that of the weakest link.

Additionally, the more projects there are with a particular capability, the more folk there are who can leak information either by talking or by being spies.

The probability-weighted impacts of AI takeover or the proliferation of world-ending technologies might be high enough to dominate the probability-weighted impacts of power concentration.

Comment: We currently doubt this, but we haven’t modelled it out, and we have lower p(doom) from misalignment than many (<10%).

Seems entirely plausible to me that either one could dominate. Would love to see more analysis around this.

Reducing access to these services will significantly disempower the rest of the world: we’re not talking about whether people will have access to the best chatbots or not, but whether they’ll have access to extremely powerful future capabilities which enable them to shape and improve their lives on a scale that humans haven’t previously been able to.

If you're worried about this, I don't think you quite realise the stakes. Capabilities mostly proliferate anyway. People can wait a few more years.

Comment by Chris_Leong on Linkpost: Rat Traps by Sheon Han in Asterisk Mag · 2024-12-03T03:24:50.245Z · LW · GW

My take: Bits of this review come off as a bit too status-oriented to me. This is ironic, because the best part of the review is towards the end when it talks about the risk of rationality becoming a Fandom.

Comment by Chris_Leong on Chris_Leong's Shortform · 2024-11-29T15:32:42.915Z · LW · GW

Sharing this resource doc on AI Safety & Entrepreneurship that I created in case anyone finds this helpful:

https://docs.google.com/document/d/1m_5UUGf7do-H1yyl1uhcQ-O3EkWTwsHIxIQ1ooaxvEE/edit?usp=sharing 

Comment by Chris_Leong on New o1-like model (QwQ) beats Claude 3.5 Sonnet with only 32B parameters · 2024-11-28T12:36:54.634Z · LW · GW

If it works, maybe it isn't slop?

Comment by Chris_Leong on DanielFilan's Shortform Feed · 2024-11-14T13:48:58.083Z · LW · GW

I agree that we probably want most theory to be towards the applied end these days due to short timelines. Empirical work needs theory in order to direct it, theory needs empirics in order to remain grounded.

Comment by Chris_Leong on DanielFilan's Shortform Feed · 2024-11-14T13:38:39.917Z · LW · GW

Thanks for writing this. I think it is a useful model. However, there is one thing I want to push back against:

Looking at behaviour is conceptually straightforward, and valuable, and being done

I agree with Apollo Research that evals isn't really a science yet. It mostly seems to be conducted according to vibes. Model internals could help with this, but things like building experience or auditing models using different schemes and comparing them could help make this more scientific.

Similarly, a lot of work with Model Organisms of Alignment requires a lot of careful thought to get right.

Comment by Chris_Leong on AI Craftsmanship · 2024-11-13T11:29:58.322Z · LW · GW

Remember back in 2013 when the talk of the town was how vector representations of words learned by neural networks represent rich semantic information? So you could do cool things like take the [male] vector, subtract the [female] vector, add the [king] vector, and get out something close to the [queen] vector? That was cool! Where's the stuff like that these days? 


Activation vectors are a thing. So it's totally happening.

Comment by Chris_Leong on Thomas Kwa's Shortform · 2024-11-13T06:11:11.571Z · LW · GW

"How can we get more evidence on whether scheming is plausible?" - What if we ran experiments where we included some pressure towards scheming (either RL or fine-tuning) and we attempted to determine the minimum such pressure required to cause scheming? We could further attempt to see how this interacts with scaling.

Comment by Chris_Leong on Linkpost: "Imagining and building wise machines: The centrality of AI metacognition" by Johnson, Karimi, Bengio, et al. · 2024-11-12T11:42:11.183Z · LW · GW

I guess I was thinking about this in terms of getting maximal value out of wise AI advisers. The notion that comparisons might be unfair didn't even enter my mind, even though that isn't too many reasoning steps away from where I was.

Comment by Chris_Leong on Linkpost: "Imagining and building wise machines: The centrality of AI metacognition" by Johnson, Karimi, Bengio, et al. · 2024-11-12T04:56:57.348Z · LW · GW

That's a fascinating perspective.

Comment by Chris_Leong on What TMS is like · 2024-10-31T05:38:27.902Z · LW · GW

Fascinating. Sounds related to the Yoga concept of kryias.

Comment by Chris_Leong on Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now) · 2024-10-31T00:41:28.312Z · LW · GW

I would suggest adopting a different method of interpretation, one more grounded in what was actually said. Anyway, I think it's probably best that we leave this thread here.

Comment by Chris_Leong on Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now) · 2024-10-29T22:51:16.328Z · LW · GW

Sadly, cause-neutral was an even more confusing term, so this is better than the comparative. I also think that the two notions of principles-first are less disconnected than you think, but through somewhat indirect effects.

Comment by Chris_Leong on Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now) · 2024-10-29T16:45:36.273Z · LW · GW

We're mostly working on stuff to stay afloat rather than high level navigation.

 

Why do you think that this is the case?

Comment by Chris_Leong on Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now) · 2024-10-29T16:38:24.407Z · LW · GW

I recommend rereading his post. I believe his use of the term makes sense.

Comment by Chris_Leong on The Alignment Trap: AI Safety as Path to Power · 2024-10-29T16:30:35.598Z · LW · GW

I don't think I agree with this post, but I thought it provided a fascinating alternative perspective.

Comment by Chris_Leong on Winners of the Essay competition on the Automation of Wisdom and Philosophy · 2024-10-28T19:41:32.926Z · LW · GW

Just wanted to mention that if anyone liked my submissions (3rd prize,  An Overview of “Obvious” Approaches to Training Wise AI Advisors, Some Preliminary Notes on the Promise of a Wisdom Explosion), 
I'll be running a project related to this work as part of AI Safety Camp. Join me if you want to help innovate a new paradigm in AI safety.

Comment by Chris_Leong on avturchin's Shortform · 2024-10-27T15:36:03.108Z · LW · GW

What's ABBYY?

Comment by Chris_Leong on Brief analysis of OP Technical AI Safety Funding · 2024-10-26T16:42:49.095Z · LW · GW

That’s useful analysis. Focusing so heavily on evals seems like a mistake given how AI Safety Institutes are focused on evals.

Comment by Chris_Leong on Linkpost: Memorandum on Advancing the United States’ Leadership in Artificial Intelligence · 2024-10-25T11:01:33.852Z · LW · GW

I guess Leopold was right[1].  AI arms race it is.

  1. ^

    I suppose it is possible that it was a self-fulfilling prophecy, but I'm skeptical given how fast it's happened.

Comment by Chris_Leong on Big tech transitions are slow (with implications for AI) · 2024-10-24T23:54:22.987Z · LW · GW

The problem is that accepting this argument involves ignoring how AI keeps on blitzing past supposed barrier after barrier. At some point, a rational observer needs to be willing to accept that their max likelihood model is wrong and consider other possible ways the world could be instead.

Comment by Chris_Leong on Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now) · 2024-10-24T16:07:34.315Z · LW · GW

I thought that this post on strategy and this talk were well done. Obviously, I'll have to see how this translates into practise.

Comment by Chris_Leong on Introducing Transluce — A Letter from the Founders · 2024-10-24T06:55:23.265Z · LW · GW

One thing I would love to know is how it'll work on Claude 3.5 Sonnet or GPT 4o given that these models aren't open-weights. Is it that you have access to some reduced level of capabilities for these?

Comment by Chris_Leong on Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now) · 2024-10-23T16:06:12.526Z · LW · GW

That was an interesting conversation.

I do have some worries about the EA community.

At the same, I'm excited to see that Zach Robison has taken the reins as CEA and I'm looking forward to seeing how things develop under his leadership. The early signs have been promising.

Comment by Chris_Leong on Chris_Leong's Shortform · 2024-10-20T14:49:53.533Z · LW · GW

There is a world that needs to be saved. Saving the world is a team sport.  All we can do is to contribute our part of the puzzle, whatever that may be and no matter how small, and trust in our companions to handle the rest. There is honor in that, no matter how things turn out in the end.

Comment by Chris_Leong on My motivation and theory of change for working in AI healthtech · 2024-10-12T12:58:34.406Z · LW · GW

I'd strongly bet that when you break this down in more concrete detail, a flaw in your plan will emerge. 

The balance of industries serving humans vs. AI's is a suspiciously high level of abstraction.

Comment by Chris_Leong on wassname's Shortform · 2024-10-11T06:26:33.837Z · LW · GW

It’s an interesting thought.

I can see regularisation playing something of a role here, but it’s hard to say.

I would love to see a project here with philosophers and technical folk collaborating to make progress on this question.

Comment by Chris_Leong on sarahconstantin's Shortform · 2024-10-11T03:26:21.826Z · LW · GW

I honestly feel that the only appropriate response is something along the lines of "fuck defeatism"[1].

This comment isn't targeted at you, but at a particular attractor in thought space.

Let me try to explain why I think rejecting this attractor is the right response rather than engaging with it.

I think it's mostly that I don't think that talking about things at this level of abstraction is useful. It feels much more productive to talk about specific plans. And if you have a general, high-abstraction argument that plans in general are useless, but I have a specific argument why a specific plan is useful, I know which one I'd go with :-).

Don't get me wrong, I think that if someone struggles for a certain amount of time to try to make a difference and just hits wall after wall, then at some point they have to call it. But "never start" and "don't even try" are completely different.

It's also worth noting, that saving the world is a team sport. It's okay to pursue a plan that depends on a bunch of other folk stepping up and playing their part.

  1. ^

    I would also suggest that this is the best way to respond to depression rather than "trying to argue your way out of it".

Comment by Chris_Leong on TurnTrout's shortform feed · 2024-10-10T04:09:38.159Z · LW · GW

Thanks for posting this. I've been confused about the connection between shard theory and activation vectors for a long time!

AIXI is not a shard theoretic agent because it does not have two motivational circuits which can be activated independently of each other

This confuses me.

I can imagine an AIXI program where the utility function is compositional even if the optimisation is unitary. And I guess this isn't two full motivational circuits, but it kind of is tow motivational circuits.

Comment by Chris_Leong on Overview of strong human intelligence amplification methods · 2024-10-09T04:13:44.887Z · LW · GW

Not really.

Comment by Chris_Leong on Overview of strong human intelligence amplification methods · 2024-10-08T23:28:16.608Z · LW · GW

I think you're underestimating meditation.

Since I've started meditating I've realised that I've been much more sensitive to vibes.

There's a lot of folk who would be scarily capable if the were strong in system 1, in addition to being strong in system 2.

Then there's all the other benefits that mediation can provide if done properly: additional motivation, better able to break out of narratives/notice patterns.

Then again, this is dependent on their being viable social interventions, rather than just aiming for 6 or 7 standard deviations of increase in intelligence.

Comment by Chris_Leong on Overview of strong human intelligence amplification methods · 2024-10-08T23:25:08.677Z · LW · GW

I think you're underestimating meditation.

Since I've started meditating I've realised that I've been much more sensitive to vibes.

There's a lot of folk who would be scarily capable if the were strong in system 1, in addition to being strong in system 2.

Then there's all the other benefits that mediation can provide if done properly: additional motivation, better able to break out of narratives/notice patterns.

Comment by Chris_Leong on Compelling Villains and Coherent Values · 2024-10-07T09:32:11.513Z · LW · GW

A Bayesian cultivates lightness, but a warrior monk has weight. Can these two opposing and perhaps contradictory natures be united to create some kind of unstoppable Kwisatz Haderach?

 

There are different ways of being that are appropriate to different times and/or circumstances. There are times for doubt and times for action.

Comment by Chris_Leong on Mark Xu's Shortform · 2024-10-06T03:49:07.843Z · LW · GW

I would suggest 50% of researchers working on a broader definition of control: including "control", technical governance work and technical outreach (scary demos, model organisms of misalignment). 

Comment by Chris_Leong on Shapley Value Attribution in Chain of Thought · 2024-09-17T10:45:04.138Z · LW · GW

I’m confused by your use of Shapley values. Shapley values assume that the “coalition” can form in any order, but that doesn’t seem like a good fit for language models where order is important.

Comment by Chris_Leong on On the destruction of America’s best high school · 2024-09-16T03:57:10.038Z · LW · GW

I don't think these articles should make up a high proportion of the content on Less Wrong, but I think it's good if things like this are occasionally discussed.

Comment by Chris_Leong on How difficult is AI Alignment? · 2024-09-15T09:05:46.289Z · LW · GW

Great article.

One point of disagreement: I suspect that the difficulty of the required high-impact tasks likely relates more to what someone thinks about the offense-defense balance than the alignment difficulty per se.

Comment by Chris_Leong on The “mind-body vicious cycle” model of RSI & back pain · 2024-09-13T04:34:32.300Z · LW · GW

Just to add to this:

Beliefs can be self-reinforcing in predictive processing theory because the higher level beliefs can shape the lower level observations. So the hypersensitisation that Delton has noted can reinforce itself.

Comment by Chris_Leong on TurnTrout's shortform feed · 2024-09-13T04:27:54.296Z · LW · GW

Steven Byrnes provides an explanation here, but I think he's neglecting the potential for belief systems/systems of interpretation to be self-reinforcing.

Predictive processing claims that our expectations influence what we observe, so experiencing pain in a scenario can result in the opposite of a placebo effect where the pain sensitizes us.  Some degree of sensitization is evolutionary advantageous - if you've hurt a part of your body, then being more sensitive makes you more likely to detect if you're putting too much strain on it. However, it can also make you experience pain as the result of minor sensations that aren't actually indicative of anything wrong. In the worst case, this pain ends up being self-reinforcing.

https://www.lesswrong.com/posts/BgBJqPv5ogsX4fLka/the-mind-body-vicious-cycle-model-of-rsi-and-back-pain

Comment by Chris_Leong on AI Constitutions are a tool to reduce societal scale risk · 2024-09-11T15:41:18.400Z · LW · GW

Interesting work.

This post has made me realise that constitutional design is surprisingly neglected in the AI safety community.

Designing the right constitution won't save the world by itself, but it's a potentially easy win that could put us in a better strategic situation down the line.

Comment by Chris_Leong on Seeking Mechanism Designer for Research into Internalizing Catastrophic Externalities · 2024-09-11T15:39:10.669Z · LW · GW

I guess I'm worried that allowing insurance for disasters above a certain size could go pretty badly if it increases the chance of labs being reckless.

Comment by Chris_Leong on t14n's Shortform · 2024-09-04T02:42:08.105Z · LW · GW

Thank you for your service!

For what it's worth, I feel that the bar for being a valuable member of the AI Safety Community, is much more attainable than the bar of working in AI Safety full-time.