LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Consider tabooing "I think"
Adam Zerner (adamzerner) · 2024-11-12T02:00:08.433Z · comments (2)

[link] It's important to know when to stop: Mechanistic Exploration of Gemma 2 List Generation
Gerard Boxo (gerard-boxo) · 2024-10-14T17:04:57.010Z · comments (0)

LLMs are likely not conscious
research_prime_space · 2024-09-29T20:57:26.111Z · comments (8)

[link] What is autonomy? Why boundaries are necessary.
Chipmonk · 2024-10-21T17:56:33.722Z · comments (1)

AXRP Episode 38.0 - Zhijing Jin on LLMs, Causality, and Multi-Agent Systems
DanielFilan · 2024-11-14T07:00:06.977Z · comments (0)

[link] Jailbreaking language models with user roleplay
loops (smitop) · 2024-09-28T23:43:10.870Z · comments (0)

[link] Models of life
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-29T19:24:40.060Z · comments (0)

Dario Amodei's "Machines of Loving Grace" sound incredibly dangerous, for Humans
Super AGI (super-agi) · 2024-10-27T05:05:13.763Z · comments (1)

Quantitative Trading Bootcamp [Nov 6-10]
Ricki Heicklen (bayesshammai) · 2024-10-28T18:39:58.480Z · comments (0)

The Great Bootstrap
KristianRonn · 2024-10-11T19:46:51.752Z · comments (0)

[link] Thinking LLMs: General Instruction Following with Thought Generation
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-10-15T09:21:22.583Z · comments (0)

[link] Taking nonlogical concepts seriously
Kris Brown (kris-brown) · 2024-10-15T18:16:01.226Z · comments (5)

[question] What makes one a "rationalist"?
mathyouf · 2024-10-08T20:25:21.812Z · answers+comments (5)

[link] [Linkpost] Hawkish nationalism vs international AI power and benefit sharing
jakub_krys (kryjak) · 2024-10-18T18:13:19.425Z · comments (5)

[question] What actual bad outcome has "ethics-based" RLHF AI Alignment already prevented?
Roko · 2024-10-19T06:11:12.602Z · answers+comments (16)

[link] Consciousness As Recursive Reflections
Gunnar_Zarncke · 2024-10-05T20:00:53.053Z · comments (3)

The Personal Implications of AGI Realism
xizneb · 2024-10-20T16:43:37.870Z · comments (7)

A brief theory of why we think things are good or bad
David Johnston (david-johnston) · 2024-10-20T20:31:26.309Z · comments (10)

A Brief Explanation of AI Control
Aaron_Scher · 2024-10-22T07:00:56.954Z · comments (1)

Foresight Vision Weekend 2024
Allison Duettmann (allison-duettmann) · 2024-10-01T21:59:55.107Z · comments (0)

Of Birds and Bees
RussellThor · 2024-09-30T10:52:15.069Z · comments (9)

[question] What are some good ways to form opinions on controversial subjects in the current and upcoming era?
notfnofn · 2024-10-27T14:33:53.960Z · answers+comments (21)

[question] somebody explain the word "epistemic" to me
KvmanThinking (avery-liu) · 2024-10-28T16:40:24.275Z · answers+comments (8)

Enhancing Mathematical Modeling with LLMs: Goals, Challenges, and Evaluations
ozziegooen · 2024-10-28T21:44:42.352Z · comments (0)

[link] October 2024 Progress in Guaranteed Safe AI
Quinn (quinn-dougherty) · 2024-10-28T23:34:51.689Z · comments (0)

[question] On the subject of in-house large language models versus implementing frontier models
Annapurna (jorge-velez) · 2024-09-23T15:00:32.811Z · answers+comments (1)

[link] Boons and banes
dkl9 · 2024-09-23T06:18:38.335Z · comments (0)

Join my new subscriber chat
sarahconstantin · 2024-11-06T02:30:11.059Z · comments (0)

[link] Validating / finding alignment-relevant concepts using neural data
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-09-20T21:12:49.267Z · comments (0)

Moral Trade, Impact Distributions and Large Worlds
Larks · 2024-09-20T03:45:56.273Z · comments (0)

Piling bounded arguments
momom2 (amaury-lorin) · 2024-09-19T22:27:41.534Z · comments (0)

[link] Spherical cow
dkl9 · 2024-11-11T03:10:27.788Z · comments (0)

Not all biases are equal - a study of sycophancy and bias in fine-tuned LLMs
jakub_krys (kryjak) · 2024-11-11T23:11:15.233Z · comments (0)

[question] How to cite LessWrong as an academic source?
PhilosophicalSoul (LiamLaw) · 2024-11-06T08:28:26.309Z · answers+comments (6)

[question] why won't this alignment plan work?
KvmanThinking (avery-liu) · 2024-10-10T15:44:59.450Z · answers+comments (7)

[question] Is School of Thought related to the Rationality Community?
Shoshannah Tekofsky (DarkSym) · 2024-10-15T12:41:33.224Z · answers+comments (6)

Against Job Boards: Human Capital and the Legibility Trap
vaishnav92 · 2024-10-24T20:50:50.266Z · comments (1)

[link] The Early Christian Strategy
Chipmonk · 2024-11-14T17:02:20.095Z · comments (0)

The Existential Dread of Being a Powerful AI System
testingthewaters · 2024-09-26T10:56:32.904Z · comments (1)

Introducing Kairos: a new AI safety fieldbuilding organization (the new home for SPAR and FSP)
agucova · 2024-10-25T21:59:08.782Z · comments (0)

'Chat with impactful research & evaluations' (Unjournal NotebookLMs)
david reinstein (david-reinstein) · 2024-09-28T00:32:16.845Z · comments (0)

[question] how to truly feel my beliefs?
KvmanThinking (avery-liu) · 2024-11-11T00:04:30.994Z · answers+comments (6)

Retrieval Augmented Genesis
João Ribeiro Medeiros (joao-ribeiro-medeiros) · 2024-10-01T20:18:01.836Z · comments (0)

[link] AI Safety Newsletter #43: White House Issues First National Security Memo on AI Plus, AI and Job Displacement, and AI Takes Over the Nobels
Corin Katzke (corin-katzke) · 2024-10-28T16:03:39.258Z · comments (0)

GPT4o is still sensitive to user-induced bias when writing code
Reed (ThomasReed) · 2024-09-22T21:04:54.717Z · comments (0)

Exploring Shard-like Behavior: Empirical Insights into Contextual Decision-Making in RL Agents
Alejandro Aristizabal (alejandro-aristizabal) · 2024-09-29T00:32:42.161Z · comments (0)

Inquisitive vs. adversarial rationality
gb (ghb) · 2024-09-18T13:50:09.198Z · comments (9)

2025 Q1 Pivotal Research Fellowship (Technical & Policy)
Tobias H (clearthis) · 2024-11-12T10:56:24.858Z · comments (0)

Thoughts on Evo-Bio Math and Mesa-Optimization: Maybe We Need To Think Harder About "Relative" Fitness?
Lorec · 2024-09-28T14:07:42.412Z · comments (6)

Avoiding jailbreaks by discouraging their representation in activation space
Guido Bergman · 2024-09-27T17:49:20.785Z · comments (2)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

brambleboy on Is Success the Enemy of Freedom? (Full)

It's been about 4 years. How do you feel about this now?

paul-kent on Belief in Belief

The recent conception of the hostile telepaths problem [LW · GW] goes a long way towards explaining why people believe in belief in the first place.

tomcatfish on Being the (Pareto) Best in the World

In regard to bullet 1, I would caution against relying on this. If you show up to many fields expecting to smash through it because you're smart, you'll be torn to bits in many many fields. This is because the fields that are useful are already being dominated by people who are good at things to the extent that they're economically or emotionally valuable.

The exact example of chess makes this clear. If a smart LWer thinks "Oh, I'll get to the chess leaderboards because I'm really smart", they are going to find out after some weeks of studying that… everyone else on the leaderboards is smart too!

dalcy on Dalcy's Shortform

The critical insight is that this is not always the case!

Let's call two graphs I-equivalent if their set of independencies (implied by d-separation) are identical. A theorem of Bayes Nets say that two graphs are I-equivalent if they have the same skeleton and the same set of immoralities.

This last constraint, plus the constraint that the graph must be acyclic, allows some arrow directions to be identified - namely, across all I-equivalent graphs that are the perfect map of a distribution, some of the edges have identical directions assigned to them.

The IC algorithm (Verma & Pearl, 1990) for finding perfect maps (hence temporal direction) is exactly about exploiting these conditions to orient as many of the edges as possible:

More intuitively, (Verma & Pearl, 1992) and (Meek, 1995) together shows that the following four rules are necessary and sufficient operations to maximally orient the graph according to the I-equivalence (+ acyclicity) constraint:

Anyone interested in further detail should consult Pearl's Causality Ch 2. Note that for some reason Ch 2 is the only chapter in the book where Pearl talks about Causal Discovery (i.e. inferring time from observational distribution) and the rest of the book is all about Causal Inference (i.e. inferring causal effect from (partially) known causal structure).

m-y-zuo on Jan_Kulveit's Shortform

I already think that "the entire shape of the zeitgeist in America" is downstream of non-trivial efforts by more than one state actor. Those links explain documented cases of China and Russia both trying to foment race war in the US, but I could pull links for other subdimensions of culture (in science, around the second amendment, and in other areas) where this has been happening since roughly 2014.

This theory likely assigns too much intention to too large of a structure. The cleavage lines are so obvious in the U.S. that it wouldn’t take much more than a random PSYOP middle manager every week having a lark on a slow Friday afternoon, who decides to just deploy some of their resources to mess around.

Although it’s possible policy makers know this too and intentionally make it very low hanging fruit for bored personnel to mess around and get away with only a slap on the wrist.

The core issue, in any society, is that it’s thousands of times easier to destroy trust than to rebuild it.

buck on Buck's Shortform

An example snippet of a rogue internal deployment [LW · GW] threat model that I find fairly compelling:

The schemer, while deployed in an authorized scaffold, launches an unmonitored scaffold that has access to trusted model weights.
It uses this scaffold to do a GCG attack on the trusted model, then writes the resulting GCG attack somewhere that the agent in the authorized scaffold can see. This takes a moderate amount of compute, but a small amount compared to the amount the untrusted model will regularly need to run experiments.
The agent in the authorized scaffold uses that GCG attack to do something else that's more bad.

To defend against this, we should try to both prevent the untrusted model from ever getting rogue deployments where it can research jailbreaks on the trusted model, and also try to prevent unacceptable outcomes conditional on the untrusted agent having access to the best jailbreaks you can get with GCG etc.

I find this interesting because it seems kind of hard to stop either step 1 (because rogue deployments might be naturally possible given the affordances the model has) or step 3 (because it's hard to ensure safety against an adversary with white-box access to a model it wants to jailbreak). I'd be excited for more research on both of these.

sharmake-farah on Quick look: applications of chaos theory

I think the crux of it is here:

I've always struggled to make sense of the idea of brain uploading because it seems to rely on some sort of dualism. As a materialist, it seems obvious to me that a brain is a brain, a program that replicates the brain's output is a program (and will perform its task more or less well but probably not perfectly), and the two are not the same.

I think that basically everything in the universe can be considered a program/computation, but I also think the notion of a program/computation is quite trivial.

More substantively, I think it might be possible to replicate at least some parts of the physical world with future computers that have what is called physical universality, where they can manipulate the physical world essentially arbitrarily.

So I don't view brains and computer programs as being of 2 different types, but rather as the same type as a program/computation.

See below for some intuition as to why.

http://www.amirrorclear.net/academic/ideas/simulation/index.html

martin-randall on Anvil Problems

Naming: I've more commonly heard "anvil problem" to refer to an exploring agent that doesn't understand that it is part of the environment it is exploring and therefore "drops an anvil on its own head". See anvil problem [? · GW] tag for more.

garrison on Is Deep Learning Actually Hitting a Wall? Evaluating Ilya Sutskever's Recent Claims

Thanks for these!

viliam on The Humanitarian Economy

If we ignored housing, then "free market + some taxation and giving the money to the poor" kinda sounds like the best of both words. Unfortunately, the increasing rents can eat the extra money given to the poor. (See also: Georgism.)

Maybe if we could get UBI high enough that people could survive on that alone, it would no longer be necessary to live in cities (close to good jobs) and people could avoid paying too high rent. Or maybe not, because ultimately all land belongs to someone? Not sure.