LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Virtue is a Vector
robotelvis · 2024-09-10T03:02:45.737Z · comments (1)

Toy Models of Superposition: Simplified by Hand
Axel Sorensen (axel-sorensen) · 2024-09-29T21:19:52.475Z · comments (2)

[link] Molecular dynamics data will be essential for the next generation of ML protein models
Abhishaike Mahajan (abhishaike-mahajan) · 2024-08-26T14:50:23.790Z · comments (0)

how to rapidly assimilate new information
dhruvmethi · 2024-10-24T02:18:00.648Z · comments (3)

[link] Jailbreaking language models with user roleplay
loops (smitop) · 2024-09-28T23:43:10.870Z · comments (0)

On epistemic autonomy
sanyer (santeri-koivula) · 2024-08-31T18:50:43.377Z · comments (0)

[question] Change My Mind: Thirders in "Sleeping Beauty" are Just Doing Epistemology Wrong
DragonGod · 2024-10-16T10:20:22.133Z · answers+comments (67)

[link] Universal dimensions of visual representation
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-08-28T10:38:58.396Z · comments (0)

Two new datasets for evaluating political sycophancy in LLMs
alma.liezenga · 2024-09-28T18:29:49.088Z · comments (0)

The Geometric Importance of Side Payments
StrivingForLegibility · 2024-08-07T01:38:04.635Z · comments (4)

An open response to Wittkotter and Yampolskiy
Donald Hobson (donald-hobson) · 2024-09-24T22:27:21.987Z · comments (0)

[link] What is autonomy? Why boundaries are necessary.
Chipmonk · 2024-10-21T17:56:33.722Z · comments (1)

[link] AI Safety at the Frontier: Paper Highlights, July '24
gasteigerjo · 2024-08-05T13:00:46.028Z · comments (0)

[link] Can AI agents learn to be good?
Ram Rachum (ram@rachum.com) · 2024-08-29T14:20:04.336Z · comments (0)

On Intentionality, or: Towards a More Inclusive Concept of Lying
Cornelius Dybdahl (Kalciphoz) · 2024-10-18T10:37:32.201Z · comments (0)

Three main arguments that AI will save humans and one meta-argument
avturchin · 2024-10-02T11:39:08.910Z · comments (8)

[link] Nerdtrition: simple diets via spreadsheet abuse
dkl9 · 2024-10-27T21:45:15.117Z · comments (0)

[link] Triangulating My Interpretation of Methods: Black Boxes by Marco J. Nathan
adamShimi · 2024-10-09T19:13:26.631Z · comments (0)

Thinking About Propensity Evaluations
Maxime Riché (maxime-riche) · 2024-08-19T09:23:55.091Z · comments (0)

[link] Approval-Seeking ⇒ Playful Evaluation
Jonathan Moregård (JonathanMoregard) · 2024-08-28T21:03:51.244Z · comments (0)

LLMs are likely not conscious
research_prime_space · 2024-09-29T20:57:26.111Z · comments (7)

[link] AI Safety Newsletter #42: Newsom Vetoes SB 1047 Plus, OpenAI’s o1, and AI Governance Summary
Corin Katzke (corin-katzke) · 2024-10-01T20:35:32.399Z · comments (0)

MIT FutureTech are hiring for a Head of Operations role
peterslattery · 2024-10-02T17:11:42.960Z · comments (0)

[link] [Linkpost] Automated Design of Agentic Systems
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-08-19T23:06:06.669Z · comments (1)

[link] Models of life
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-29T19:24:40.060Z · comments (0)

[link] Michael Streamlines on Buddhism
Chris_Leong · 2024-08-09T04:44:52.126Z · comments (0)

Interpreting the effects of Jailbreak Prompts in LLMs
Harsh Raj (harsh-raj-ep-037) · 2024-09-29T19:01:10.113Z · comments (0)

Thoughts On the Nature of Capability Elicitation via Fine-tuning
Theodore Chapman · 2024-10-15T08:39:19.909Z · comments (0)

[link] It's important to know when to stop: Mechanistic Exploration of Gemma 2 List Generation
Gerard Boxo (gerard-boxo) · 2024-10-14T17:04:57.010Z · comments (0)

HDBSCAN is Surprisingly Effective at Finding Interpretable Clusters of the SAE Decoder Matrix
Jaehyuk Lim (jason-l) · 2024-10-11T23:06:14.340Z · comments (2)

Steering LLMs' Behavior with Concept Activation Vectors
Ruixuan Huang (sprout_ust) · 2024-09-28T09:53:19.658Z · comments (0)

Dario Amodei's "Machines of Loving Grace" sound incredibly dangerous, for Humans
Super AGI (super-agi) · 2024-10-27T05:05:13.763Z · comments (1)

[link] Checking public figures on whether they "answered the question" quick analysis from Harris/Trump debate, and a proposal
david reinstein (david-reinstein) · 2024-09-11T20:25:27.845Z · comments (4)

[link] Validating / finding alignment-relevant concepts using neural data
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-09-20T21:12:49.267Z · comments (0)

[link] Taking nonlogical concepts seriously
Kris Brown (kris-brown) · 2024-10-15T18:16:01.226Z · comments (5)

Enhancing Mathematical Modeling with LLMs: Goals, Challenges, and Evaluations
ozziegooen · 2024-10-28T21:44:42.352Z · comments (0)

[link] October 2024 Progress in Guaranteed Safe AI
Quinn (quinn-dougherty) · 2024-10-28T23:34:51.689Z · comments (0)

[link] In-Context Learning: An Alignment Survey
alamerton · 2024-09-30T18:44:28.589Z · comments (0)

Foresight Vision Weekend 2024
Allison Duettmann (allison-duettmann) · 2024-10-01T21:59:55.107Z · comments (0)

A Brief Explanation of AI Control
Aaron_Scher · 2024-10-22T07:00:56.954Z · comments (1)

Broadly human level, cognitively complete AGI
p.b. · 2024-08-06T09:26:13.220Z · comments (0)

[link] Boons and banes
dkl9 · 2024-09-23T06:18:38.335Z · comments (0)

Denver USA - ACX Meetups Everywhere Fall 2024
Eneasz · 2024-08-29T18:40:53.332Z · comments (0)

Fake Blog Posts as a Problem Solving Device
silentbob · 2024-08-31T09:22:54.513Z · comments (0)

[question] On the subject of in-house large language models versus implementing frontier models
Annapurna (jorge-velez) · 2024-09-23T15:00:32.811Z · answers+comments (1)

[link] Consciousness As Recursive Reflections
Gunnar_Zarncke · 2024-10-05T20:00:53.053Z · comments (3)

[link] Species as Canonical Referents of Super-Organisms
Yudhister Kumar (randomwalks) · 2024-10-18T07:49:52.944Z · comments (8)

Of Birds and Bees
RussellThor · 2024-09-30T10:52:15.069Z · comments (9)

[link] Cooperation and Alignment in Delegation Games: You Need Both!
Oliver Sourbut · 2024-08-03T10:16:51.716Z · comments (0)

Deception and Jailbreak Sequence: 2. Iterative Refinement Stages of Jailbreaks in LLM
Winnie Yang (winnie-yang) · 2024-08-28T08:41:38.967Z · comments (2)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

zy on ZY's Shortform

And actually, if some of these "short term" issues are not worked on, these issues will likely be forever distractions/barriers, to these populations and family/friends of these population (at some point anyone could be a part of that population), on anything including their own lives, and their life goals (maybe AI safety).

tailcalled on Three Notions of "Power"

Western states today use state violence to enforce high taxes and lots of government regulations. In my view they're probably more dominance-oriented than states which just leave rural farmers alone. At least some of this is part of a Keynesian policy to boost economic output, and economic output is closely related to military formidability (due to ability to afford raw resources and advanced technology for the military).

Hm, I guess you would see this as more closely related to bargaining power than to dominance, because in your model dominance is a human-psychology-thing and bargaining power isn't restricted to voluntary transactions?

davidmanheim on Occupational Licensing Roundup #1

Question for a lawyer: how is non-reciprocity not an interstate trade issue that federal courts can strike down?

romeostevensit on Three Notions of "Power"

I have attempted to communicate to ultra-high-net-worth individuals, seemingly to little success so far, that given the reality of limited personal bandwidth, with over 99% of their influence and decision-making typically mediated through others, it’s essential to refine the ability to identify trustworthy advisors in each domain. Expert judgment is an active field of research with valuable, actionable insights.

chris_leong on What TMS is like

Fascinating. Sounds related to the Yoga concept of kryias.

rogerdearnaley on Motivation control

Opacity: if you could directly inspect an AI’s motivations (or its cognition more generally), this would help a lot. But you can’t do this with current ML models.

The ease with which Anthropic's model organisms of misalignment were diagnosed by a simple and obvious linear probe suggests otherwise. So does the number of elements in SAE feature dictionaries that describe emotions, motivations, and behavioral patterns. Current ML models are no longer black boxes: they rapidly becoming more-translucent grey boxes. So the sorts of applications for this you go on to discuss look like they're rapidly becoming practicable.

elityre on avturchin's Shortform

You have been attacked by a pack of stray dogs twice?!?!

clone-of-saturn on The Alignment Trap: AI Safety as Path to Power

Can anyone lay out a semi-plausible scenario where humanity survives but isn't dominated by an AI or posthuman god-king? I can't really picture it. I always thought that's what we were going for since it's better than being dead.

czynski on Lighthaven Sequences Reading Group #8 (Tuesday 10/29)

Could you please announce these further in advance? Especially given the reading required beforehand it's inconvenient and honestly seems a little inconsiderate.

matthew4244 on Chapter 45: Humanism, Pt 3

Great chapter, Great message. +1