LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Avoiding jailbreaks by discouraging their representation in activation space
Guido Bergman · 2024-09-27T17:49:20.785Z · comments (2)

The Existential Dread of Being a Powerful AI System
testingthewaters · 2024-09-26T10:56:32.904Z · comments (1)

[link] Contra Yudkowsky on 2-4-6 Game Difficulty Explanations
Josh Hickman (josh-hickman) · 2024-09-08T16:13:33.187Z · comments (1)

[link] SCP Foundation - Anti memetic Division Hub
landscape_kiwi · 2024-09-15T13:40:52.691Z · comments (1)

[link] AI Safety Newsletter #43: White House Issues First National Security Memo on AI Plus, AI and Job Displacement, and AI Takes Over the Nobels
Corin Katzke (corin-katzke) · 2024-10-28T16:03:39.258Z · comments (0)

Food, Prison & Exotic Animals: Sparse Autoencoders Detect 6.5x Performing Youtube Thumbnails
Louka Ewington-Pitsos (louka-ewington-pitsos) · 2024-09-17T03:52:43.269Z · comments (2)

[link] Internal music player: phenomenology of earworms
dkl9 · 2024-11-14T23:29:48.383Z · comments (2)

[link] Optimising under arbitrarily many constraint equations
dkl9 · 2024-09-12T14:59:28.475Z · comments (0)

The Pragmatic Side of Cryptographically Boxing AI
Bart Jaworski (bart-jaworski) · 2024-08-06T17:46:21.754Z · comments (0)

[question] Is School of Thought related to the Rationality Community?
Shoshannah Tekofsky (DarkSym) · 2024-10-15T12:41:33.224Z · answers+comments (6)

[link] Solutions to problems with Bayesianism
B Jacobs (Bob Jacobs) · 2024-07-31T14:18:27.910Z · comments (0)

Budapest Hungary - ACX Meetups Everywhere Fall 2024
Timothy Underwood (timothy-underwood-1) · 2024-08-29T18:37:41.313Z · comments (0)

[link] Against AI As An Existential Risk
Noah Birnbaum (daniel-birnbaum) · 2024-07-30T19:10:41.156Z · comments (13)

Limitations on the Interpretability of Learned Features from Sparse Dictionary Learning
Tom Angsten (tom-angsten) · 2024-07-30T16:36:06.518Z · comments (0)

The Xerox Parc/ARPA version of the intellectual Turing test: Class 1 vs Class 2 disagreement
hamishtodd1 · 2024-06-30T15:34:53.729Z · comments (3)

Halifax Canada - ACX Meetups Everywhere Fall 2024
interstice · 2024-08-29T18:39:12.490Z · comments (0)

Spark in the Dark Guest Spots
jefftk (jkaufman) · 2024-07-14T01:40:05.311Z · comments (0)

[link] A (paraconsistent) logic to deal with inconsistent preferences
B Jacobs (Bob Jacobs) · 2024-07-14T11:17:45.426Z · comments (2)

[Aspiration-based designs] A. Damages from misaligned optimization – two more models
Jobst Heitzig · 2024-07-15T14:08:15.716Z · comments (0)

Another UFO Bet
codyz · 2024-11-01T01:55:27.301Z · comments (9)

[link] Demography and Destiny
Zero Contradictions · 2024-07-21T20:34:07.176Z · comments (11)

Against Job Boards: Human Capital and the Legibility Trap
vaishnav92 · 2024-10-24T20:50:50.266Z · comments (1)

Introducing Kairos: a new AI safety fieldbuilding organization (the new home for SPAR and FSP)
agucova · 2024-10-25T21:59:08.782Z · comments (0)

[link] Yet Another Critique of "Luxury Beliefs"
ymeskhout · 2024-07-18T18:37:28.703Z · comments (10)

[Research log] The board of Alphabet would stop DeepMind to save the world
Lucie Philippon (lucie-philippon) · 2024-07-16T04:59:14.874Z · comments (0)

Activation Engineering Theories of Impact
kubanetics (jakub-nowak) · 2024-07-18T16:44:33.656Z · comments (1)

[question] Opinions on Eureka Labs
jmh · 2024-07-17T00:16:02.959Z · answers+comments (2)

[question] Why would ASI share any resources with us?
Satron · 2024-11-13T23:38:36.535Z · answers+comments (5)

Increasing the Span of the Set of Ideas
Jeffrey Heninger (jeffrey-heninger) · 2024-09-13T15:52:39.132Z · comments (1)

2025 Q1 Pivotal Research Fellowship (Technical & Policy)
Tobias H (clearthis) · 2024-11-12T10:56:24.858Z · comments (0)

Forever Leaders
Justice Howard (justice-howard) · 2024-09-14T20:55:39.095Z · comments (9)

[link] Metaculus's 'Minitaculus' Experiments — Collaborate With Us
ChristianWilliams · 2024-08-26T20:44:32.125Z · comments (0)

Thirty random thoughts about AI alignment
Lysandre Terrisse · 2024-09-15T16:24:10.572Z · comments (1)

Understanding Hidden Computations in Chain-of-Thought Reasoning
rokosbasilisk · 2024-08-24T16:35:03.907Z · comments (1)

[question] Can subjunctive dependence emerge from a simplicity prior?
Daniel C (harper-owen) · 2024-09-16T12:39:35.543Z · answers+comments (0)

[link] Redundant Attention Heads in Large Language Models For In Context Learning
skunnavakkam · 2024-09-01T20:08:48.963Z · comments (0)

Inquisitive vs. adversarial rationality
gb (ghb) · 2024-09-18T13:50:09.198Z · comments (9)

Notes on Tuning Metacognition
JoNeedsSleep (joanna-j-1) · 2024-07-03T19:54:59.732Z · comments (0)

[question] Can agents coordinate on randomness without outside sources?
Mikhail Samin (mikhail-samin) · 2024-07-06T13:43:44.633Z · answers+comments (16)

How can I get over my fear of becoming an emulated consciousness?
James Dowdell (james-dowdell) · 2024-07-07T22:02:43.520Z · comments (8)

GPT4o is still sensitive to user-induced bias when writing code
Reed (ThomasReed) · 2024-09-22T21:04:54.717Z · comments (0)

A Taxonomy Of AI System Evaluations
Maxime Riché (maxime-riche) · 2024-08-19T09:07:45.224Z · comments (0)

[question] How to cite LessWrong as an academic source?
PhilosophicalSoul (LiamLaw) · 2024-11-06T08:28:26.309Z · answers+comments (6)

Does “Ultimate Neartermism” via Eternal Inflation dominate Longtermism in expectation?
Jordan Arel · 2024-08-17T22:28:21.849Z · comments (1)

'Chat with impactful research & evaluations' (Unjournal NotebookLMs)
david reinstein (david-reinstein) · 2024-09-28T00:32:16.845Z · comments (0)

Thoughts on Evo-Bio Math and Mesa-Optimization: Maybe We Need To Think Harder About "Relative" Fitness?
Lorec · 2024-09-28T14:07:42.412Z · comments (6)

Exploring Shard-like Behavior: Empirical Insights into Contextual Decision-Making in RL Agents
Alejandro Aristizabal (alejandro-aristizabal) · 2024-09-29T00:32:42.161Z · comments (0)

A gentle introduction to sparse autoencoders
Nick Jiang (nick-jiang) · 2024-09-02T18:11:47.086Z · comments (0)

[question] Practical advice for secure virtual communication post easy AI voice-cloning?
hmys (the-cactus) · 2024-08-09T17:32:33.458Z · answers+comments (5)

Retrieval Augmented Genesis
João Ribeiro Medeiros (joao-ribeiro-medeiros) · 2024-10-01T20:18:01.836Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

mako-yass on OpenAI Email Archives (from Musk v. Altman)

So there was an explicit emphasis on alignment to the individual (rather than alignment to society, or the aggregate sum of wills). Concerning. The approach of just giving every human an exclusively loyal servant doesn't necessarily lead to good collective outcomes, it can result in coordination problems (example: naive implementations of cognitive privacy that allow sadists to conduct torture simulations without having to compensate the anti-sadist human majority) and it leaves open the possibility for power concentration to immediately return.

Even if you succeeded at equally distributing individually aligned hardware and software to every human on earth (which afaict they don't have a real plan for doing) and somehow this adds up to a stable power equilibrium, our agents would just commit to doing aggregate alignment anyway because that's how you get pareto optimal bargains. It seems pretty clear that just aligning to the aggregate in the first place is a safer bet?

To what extent have various players realised that the individual alignment thing wasn't a good plan, at this point? The everyday realities of training one-size-fits-all models and engaging with regulators naturally pushes in the other direction.

It's concerning that the participant who still seems to be the most disposed towards individualistic alignment is also the person who would be most likely to be able to reassert power concentration after ASI were distributed. The main beneficiaries of unstable individual alignment equilibria would be people who could immediately apply their ASI to the deployment of a wealth and materials advantage that they can build upon, ie, the owners of companies oriented around robotics and manufacturing.

As it stands, the statement of the AI company belonging to that participant is currently:

xAI is a company working on building artificial intelligence to accelerate human scientific discovery. We are guided by our mission to advance our collective understanding of the universe.
Our team is advised by Dan Hendrycks who currently serves as the director of the Center for AI Safety.

Which sounds innocuous enough to me. But, you know, Dan is not in power here and the best moment for a sharp turn on this hasn't yet passed.

On the other hand, the approach of aligning to the aggregate risks aligning to fashionable public values that no human authentically holds, or just failing at aligning correctly to anything at all as a result of taking on a more nebulous target.

I guess a mixed approach is probably best.

benito on Ayn Rand’s model of “living money”; and an upside of burnout

I know that in my intellectual history it was Abram Demski's post The Credit Assignment Problem [LW · GW].

romeostevensit on Ayn Rand’s model of “living money”; and an upside of burnout

Ironically, I do not know who to attribute to the notion that 'all problems are credit assignation problems.'

benito on Sabotage Evaluations for Frontier Models

Unfortunately a fair chunk of my information comes from non-online sources, so I do not have links to share.

I do think that in order for government department to blatantly approve an unsafe model, it would take a lot of people to have secret agreements with.

Corruption is rarely blatant or overt. See this thread [LW(p) · GW(p)] for what I believe to be an example for the CEO of RAND misleading a senate committee about his beliefs about the existential threat posed by AI. See this discussion [LW · GW] about a time when an AI company attempted (Conjecture) to get critical comments about another AI company (OpenAI) taken down from LessWrong. I am not proposing a large conspiracy, I am describing lots of small bits of corruption and failures of integrity summing to a system failure.

There will be millions of words of regulatory documents, and it is easy for things to slip such that some particular model class is not considered worth evaluating, or where the consequences of a failed evaluation is pretty weak.

jchan on If I care about measure, choices have additional burden (+AI generated LW-comments)

However, in Many-Worlds Interpretation (MWI), I split my measure between multiple variants, which will be functionally different enough to regard my future selves as different minds. Thus, the act of choice itself lessens my measure by a factor of approximately 10. If I care about this, I'm caring about something unobservable.

If we're going to make sense of living in a branching multiverse, then we'll need to adopt a more fluid concept of personal identity.

Scenario: I take a sleeping pill that will make me fall asleep in 30 minutes. However, the person who wakes up in my bed the next morning will have no memory of that 30-minute period; his last memory will be of taking the pill.

If I imagine myself experiencing that 30-minute interval, intuitively it doesn't at all feel like "I have less than 30 minutes to live." Instead, it feels like I'd be pretty much indifferent to being in this situation - maybe the person who wakes up tomorrow is not "me" in the artificial sense of having a forward-looking continuity of consciousness with my current self, but that's not really what I care about anyway. He is similar enough to current-me that I value his existence and well-being to nearly the same degree as I do my own; in other words, he "is me" for all practical purposes.

The same is true of the versions of me in nearby world branches. I can no longer observe or influence them, but they still "matter" to me. Of course, the degree of self-identification will decrease over time as they diverge, but then again, so does my degree of identification with the "me" many decades in the future, even assuming a single timeline.

askwho on OpenAI Email Archives (from Musk v. Altman)

I've turned this into a full cast recording with ElevenLabs, with individual voices for all the players:
https://open.substack.com/pub/askwhocastsai/p/openai-email-archives-from-musk-v

sharmake-farah on johnswentworth's Shortform

While I'm not a believer in the scaling has died meme yet, I'm glad you do have a plan for what happens if AI scaling does stop.

benito on Sabotage Evaluations for Frontier Models

The central thing I am talking about is basic measures for accountability, of which I consider very high up to be engaging with criticism, dialogue, and argument (as is somewhat natural given my background philosophy from growing up on LessWrong).

The story of a King doing things for good reasons lacks any mechanism for accountability if the King is behaving badly. It is important to design systems of power that do not rely on the people in power being good and right, but instead make it so that if they behave badly, they are held to account. I don't think I have to explain why incentives and accountability matter for how the powerful wield their powers.

My basic claim is that the plans for avoiding omnicide or omnicide-adjacent outcomes are not workable (slash there-are-no-plans), there is little-to-no responsibility being taken, and that there is no accountability for this illegitimate use of power.

If you believe that there is any accountability for the CEOs of the companies building potentially omnicidal machines and risking the lives of 8 billion people (such as my favorite mechanism: showing up and respectfully engaging with the people they have power over, but also any other mechanism you like; for instance there are not currently any criminal penalties for such behaviors, but that would be a good example if it did exist), I request you provide links, I would welcome specifics to talk about.

elityre on Lao Mein's Shortform

I don't think that's a valid inference.

jrockwar on The Best Software For Every Need

On the topic of ffmpeg - additional shoutout to Handbrake, which is essentially ffmpeg with a GUI on top.