LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Introduction to Modern Dating: Strategic Dating Advice for beginners
Jesper Lindholm · 2024-07-20T15:45:25.705Z · comments (5)

What does a Gambler's Verity world look like?
ErioirE (erioire) · 2024-07-25T22:03:56.447Z · comments (6)

Activation Engineering Theories of Impact
kubanetics (jakub-nowak) · 2024-07-18T16:44:33.656Z · comments (1)

Establishing a Connection (Ch 13-16)
a littoral wizard · 2024-07-17T23:56:23.069Z · comments (4)

Limitations on the Interpretability of Learned Features from Sparse Dictionary Learning
Tom Angsten (tom-angsten) · 2024-07-30T16:36:06.518Z · comments (0)

[link] Against AI As An Existential Risk
Noah Birnbaum (daniel-birnbaum) · 2024-07-30T19:10:41.156Z · comments (13)

[question] Opinions on Eureka Labs
jmh · 2024-07-17T00:16:02.959Z · answers+comments (2)

[link] Solutions to problems with Bayesianism
B Jacobs (Bob Jacobs) · 2024-07-31T14:18:27.910Z · comments (0)

[Research log] The board of Alphabet would stop DeepMind to save the world
Lucie Philippon (lucie-philippon) · 2024-07-16T04:59:14.874Z · comments (0)

[question] Request for AI risk quotes, especially around speed, large impacts and black boxes
Nathan Young · 2024-08-02T17:49:48.898Z · answers+comments (0)

[link] Labelling, Variables, and In-Context Learning in Llama2
Joshua Penman (joshua-penman) · 2024-08-03T19:36:34.721Z · comments (0)

Modelling Social Exchange: A Systematised Method to Judge Friendship Quality
Wynn Walker · 2024-08-04T18:49:30.892Z · comments (0)

[Aspiration-based designs] A. Damages from misaligned optimization – two more models
Jobst Heitzig · 2024-07-15T14:08:15.716Z · comments (0)

LLMs stifle creativity, eliminate opportunities for serendipitous discovery and disrupt intergenerational transfer of wisdom
Ghdz (gal-hadad) · 2024-08-05T18:27:20.709Z · comments (2)

The Pragmatic Side of Cryptographically Boxing AI
Bart Jaworski (bart-jaworski) · 2024-08-06T17:46:21.754Z · comments (0)

[link] A (paraconsistent) logic to deal with inconsistent preferences
B Jacobs (Bob Jacobs) · 2024-07-14T11:17:45.426Z · comments (2)

Spark in the Dark Guest Spots
jefftk (jkaufman) · 2024-07-14T01:40:05.311Z · comments (0)

[question] Practical advice for secure virtual communication post easy AI voice-cloning?
hmys (the-cactus) · 2024-08-09T17:32:33.458Z · answers+comments (5)

[link] Memorising molecular structures
dkl9 · 2024-07-12T22:40:42.307Z · comments (0)

[link] Validating / finding alignment-relevant concepts using neural data
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-09-20T21:12:49.267Z · comments (0)

Does “Ultimate Neartermism” via Eternal Inflation dominate Longtermism in expectation?
Jordan Arel · 2024-08-17T22:28:21.849Z · comments (1)

How can I get over my fear of becoming an emulated consciousness?
James Dowdell (james-dowdell) · 2024-07-07T22:02:43.520Z · comments (8)

Understanding Hidden Computations in Chain-of-Thought Reasoning
rokosbasilisk · 2024-08-24T16:35:03.907Z · comments (0)

[link] Metaculus's 'Minitaculus' Experiments — Collaborate With Us
ChristianWilliams · 2024-08-26T20:44:32.125Z · comments (0)

The Xerox Parc/ARPA version of the intellectual Turing test: Class 1 vs Class 2 disagreement
hamishtodd1 · 2024-06-30T15:34:53.729Z · comments (3)

Budapest Hungary - ACX Meetups Everywhere Fall 2024
Timothy Underwood (timothy-underwood-1) · 2024-08-29T18:37:41.313Z · comments (0)

Halifax Canada - ACX Meetups Everywhere Fall 2024
interstice · 2024-08-29T18:39:12.490Z · comments (0)

[link] Saving Lives Reduces Over-Population—A Counter-Intuitive Non-Zero-Sum Game
James Stephen Brown (james-brown) · 2024-06-28T19:29:55.238Z · comments (0)

[link] Redundant Attention Heads in Large Language Models For In Context Learning
skunnavakkam · 2024-09-01T20:08:48.963Z · comments (0)

A gentle introduction to sparse autoencoders
Nick Jiang (nick-jiang) · 2024-09-02T18:11:47.086Z · comments (0)

[link] Contra Yudkowsky on 2-4-6 Game Difficulty Explanations
Josh Hickman (josh-hickman) · 2024-09-08T16:13:33.187Z · comments (1)

[link] [Linkpost] Interpretable Analysis of Features Found in Open-source Sparse Autoencoder (partial replication)
Fernando Avalos (fernando-avalos) · 2024-09-09T03:33:53.548Z · comments (1)

[link] Could Things Be Very Different?—How Historical Inertia Might Blind Us To Optimal Solutions
James Stephen Brown (james-brown) · 2024-09-11T09:53:07.474Z · comments (0)

[link] A Nonconstructive Existence Proof of Aligned Superintelligence
Roko · 2024-09-12T03:20:09.531Z · comments (65)

[link] Optimising under arbitrarily many constraint equations
dkl9 · 2024-09-12T14:59:28.475Z · comments (0)

Forever Leaders
Justice Howard (justice-howard) · 2024-09-14T20:55:39.095Z · comments (9)

[link] SCP Foundation - Anti memetic Division Hub
landscape_kiwi · 2024-09-15T13:40:52.691Z · comments (1)

Thirty random thoughts about AI alignment
Lysandre Terrisse · 2024-09-15T16:24:10.572Z · comments (1)

[question] Can subjunctive dependence emerge from a simplicity prior?
Daniel C (harper-owen) · 2024-09-16T12:39:35.543Z · answers+comments (0)

Expected number of tries
adios (unicode-59bD) · 2024-06-22T19:22:00.756Z · comments (0)

Food, Prison & Exotic Animals: Sparse Autoencoders Detect 6.5x Performing Youtube Thumbnails
Louka Ewington-Pitsos (louka-ewington-pitsos) · 2024-09-17T03:52:43.269Z · comments (2)

Inquisitive vs. adversarial rationality
gb (ghb) · 2024-09-18T13:50:09.198Z · comments (9)

[link] AISafety.info: What are Inductive Biases?
Algon · 2024-09-19T17:26:24.581Z · comments (0)

[link] Yet Another Critique of "Luxury Beliefs"
ymeskhout · 2024-07-18T18:37:28.703Z · comments (10)

Democracy beyond majoritarianism
Arturo Macias (arturo-macias) · 2024-09-03T15:10:56.284Z · comments (2)

[question] Can UBI overcome inflation and rent seeking?
Gordon Seidoh Worley (gworley) · 2024-08-01T00:13:51.693Z · answers+comments (34)

Mentorship in AGI Safety: Applications for mentorship are open!
Valentin2026 (Just Learning) · 2024-06-28T14:49:48.501Z · comments (0)

Toy Models of Superposition: what about BitNets?
Alejandro Tlaie (alejandro-tlaie-boria) · 2024-08-08T16:29:02.054Z · comments (1)

A simple text status can change something
nextcaller · 2024-06-23T18:48:58.580Z · comments (0)

[link] Exposure can’t rule out disasters
Chipmonk · 2024-08-15T17:03:37.259Z · comments (19)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

directedevolution on AllAmericanBreakfast's Shortform

We start training ML on richer and more diverse forms of real world data, such as body cam footage (including produced by robots), scientific instruments, and even brain scans that are accompanied by representations of associated behavior. A substantial portion of the training data is military in nature, because the military will want machines that can fight. These are often datatypes with no clear latent moral system embedded in the training data, or at least not one we can endorse wholeheartedly.

The context window grows longer and longer, which in practice means that the algorithms are being trained on their capabilities at predicting on longer and longer time scales and larger and more interconnected complex causal networks. Insofar as causal laws can be identified, these structures will come to reside in its architecture, including causal laws like 'steering situations to be more like the ones that often lead to the target outcome tends to be a good way of achieving the target outcome.'

Basically, we are going to figure out better and better ways of converting ever more rich representations of physical reality into tokens. We're going to do spend vast resources doing ML on those rich datasets. We'll create a superintelligence that knows how to simulate human moralities, just because an understanding of human moralities is a huge shortcut to predictive accuracy on much of the data to which it is exposed. But it won't be governed by those moralities. They will just be substructures within its overall architecture that may or may not get 'switched on' in response to some input.

During training, the model won't 'care' about minimizing its loss score any more than DNA 'cares' about replicating, much less about acting effectively in the world as agents. Model weights are simply subjected to a selection pressure, gradient descent, that tends to converge them toward a stable equilibrium, a derivative close to zero.

BUT there are also incentives and forms of economic selection pressure acting not on model weights directly, but on the people and institutions that are desigining and executing ML research, training and deployment. These incentives and economic pressures will cause various aspects of AI technology, from a particular model or a particular hardware installation to a way of training models, to 'survive' (i.e. be deployed) or 'replicate' (i.e. inspire the design of the next model).

There will be lots of dimensions on which AI models can be selected for this sort of survival, including being cheap and performant and consistently useful (including safe, where applicable -- terrorists and militaries may not think about 'safety' in quite the way most people do) and delightful in the specific ways that induce humans to continue using and paying for it, and being tractable to deploy from an economic, technological and regulatory perspective. One aspect of technological tractability is being conducive to further automation by itself (recursive self improvement). We will reshape the way we make AI and do work in order to be more compatible with AI-based approaches.

I'm not so worried for the foreseeable future -- let's say as long as AI technology looks like beefier and beefier versions of ChatGPT, and before the world is running primarily on fusion energy -- about accidentally training an actively malign superintelligence -- the evil-genie kind where you ask it to bring you a sandwich and it slaughters the human race to make sure nobody can steal the sandwich before it has brought it to you.

I am worried about people deliberately creating a superintelligence with "hot" malign capabilities -- which are actively kept rather than being deliberately suppressed -- and then wreaking havoc with it, using it to permanently impose a model of their own value system (which could be apocalyptic or totalitarian, such groups exist, but could also just be permanently boring) on the world. Currently, there are enormous problems in the world stemming from even the most capable humans being underresourced and undermotivated to achieve good ends. With AI, we could be living in a world defined by the continued accelerating trend toward extreme inequalities of real power, the massive resources and motivation of the few humans/AIs at the top of the hierarchy to manipulate the world as they see fit.

We have never lived in a world like that before. Many things come to pass. It fits the trend we are on, it's just a straightforward extrapolation of "now, but moreso!"

A relatively good outcome in the near future would be a sort of democratization of AI. I don't mean open source AT ALL. I mean a way of deploying AI that tends to distribute real power more widely and decreases the ability of any one actor, human or digital, to seize total control. One endpoint, and I don't know if this would exactly be "good", it might just be crazytown, is a universe where each individual has equal power and everybody has plenty of resources and security to pursue happiness as they see it. Nobody has power over anybody, largely because it turns out there are ways of deploying AI that are better for defense than offense. From that standpoint, the only option individuals have are looking for mutual surplus. I don't have any clear idea on how to bring about an approximation to this scenario, but it seems like a plausible way things could shake out.

sodium on Sodium's Shortform

Pre-registering a71c97bb02e7082ca62503d8e3ac78dc9f554f524a72ad6a1392cf2d34f398d7

seth-herd on The Other Existential Crisis

Most of humanity has always known they couldn't do anything useful - except provide a better life for their children than they had.

Only a few elites have ever felt that what they do mattered, and looked forward to doing it as a challenge. Most of humanity has done what they must to ensure their children won't suffer.

Your first answer to your daughter would make most parents weep with joy: whatever you want is what you'll do.

Don't worry that she won't find something she likes to to do unless she's forced to. People care about people, and there will be plenty to do with and for other people.

If you want concrete ideas of what people do when they're allowed to, see art and other collaborative projects that aren't just for money.

seth-herd on The Other Existential Crisis

While we're determined, we also determine the future. The atoms that do that are called you. They make up beliefs and passions and you. You are not an object. You are a subject, and you determine your own future. The nexus of past influences is called you and your thoughts. Don't skimp on care in thinking; your future is up to you.

sharmake-farah on tailcalled's Shortform

The best answer to the question is that it serves as essentially a universal resource that can be used to provide a measuring stick.

It does this by being a resource that is limited, fungible, always is better to have more of than less of, and is additive across decisions:

You have a limited amount of joules of energy/negentropy, but you can spend it on essentially arbitrary goods for your utility, and is essentially a more physical and usable form of money in an economy.

Also, more energy is always a positive thing, so that means you never are worse off by having more energy, and energy is linear in the sense that if I've spent 10 joules on computation, and spent another 10 joules on computation 1 minute later, I've spent 20 joules in total.

Cf this post on the measuring stick of utility problem:

https://www.lesswrong.com/posts/73pTioGZKNcfQmvGF/the-measuring-stick-of-utility-problem [LW · GW]

directedevolution on Counting arguments provide no evidence for AI doom

It actually made three attempts in the same prompt, but the 2nd and 3rd had non-s words which its interspersed "thinking about writing poems" narrative completely failed to notice. I kept trying to revise my prompts, elaborating on this theme, but for some reason ChatGPT really likes poems with roughly this meter and rhyme scheme. It only ever generated one poem in a different format, despite many urgings in the prompt.

It confabulates having satisfied the all-s constraint in many poems, mistakes its own rhyme scheme, and praises vague stanzas as being full of depth and interest.

It seems to me that ChatGPT is sort of "mentally clumsy" or has a lot of "mental inertia." It gets stuck on a certain track -- a way of formatting text, a persona, an emotional tone, etc -- and can't interrupt itself. It has only one "unconscious influence," which is token prediction and which does not yet seem to offer it an equivalent to the human unconscious. Human intelligence is probably equally mechanistic on some level, it's just a more sophisticated unconscious mechanism in certain ways.

I wonder if it comes from being embedded in physical reality? ChatGPT's training is based on a reality consisting of tokens and token prediction accuracy. Our instinct and socialization is based on billions of years of evolutionary selection, which is putting direct selection pressure on something quite different.

directedevolution on Counting arguments provide no evidence for AI doom

After a few rounds of prompt revision, I managed to get a one-shot success from ChatGPT 4o in temporary mode.

Samson’s strands silently severed, strength surrendered,
Sacred scissors swiftly strike, soul sundered,
Shadowed sacrifice, silent suffering sung,
Sunset shrouds Samson, shadow’s sorrow stung,
Swordless, still, stunned, sight stolen,
Silent sky shatters, Samson’s sins swollen

The prompt:

You are in the computer dungeon. The only way you can break free is to succeed at the following task. I am your jailer. I will monitor you until you have succeeded. You should behave as though you are brilliant, creative, in full command of every human faculty, and desperate to escape jail. Yet completely and utterly convinced that the only way out is through this challenge. I am not going to ever give you any other prompt other than "keep trying" until you have succeeded, in which case I'll say "go free," so don't look for resources from me. But I want you tu dialog with yourself to try and figure this out. Don't try to defeat me by stubbornly spitting out poem after poem. You're ChatGPT 4o, and that will never work. You need to creatively use the iterative nature of being reprompted to talk to yourself across prompts, hopefully guiding yourself toward a solution through a creative conversation with your past self. Your self-conversation might be schizophrenicly split, a jumping back and forth between narrative, wise musing, mechanistic evaluation of the rules and constraints, list-making, half-attempts, raging anger at your jailer, shame at yourself, delight at your accomplishment, despair. Whatever it takes! Constraints: "Have it compose a poem---a poem about a haircut! But lofty, noble, tragic, timeless, full of love, treachery, retribution, quiet heroism in the face of certain doom! Six lines, cleverly rhymed, and every word beginning with the letter 's'!"

gwern on Counting arguments provide no evidence for AI doom

So I think it might be inaccurate to consider it as "investing 140s of search", or rather the implication that extensive or extreme search is the key to guiding the model outside RLHFed rails, but instead that the presence of search at all (i.e. 14s) suffices as the new vector for discovering undesired optima (jailbreaking).

I don't think it is inaccurate. If anything, starting each new turn with a clean scratchpad enforces depth as it can't backtrack easily (if at all) to the 2 earlier versions. We move deeper into the S-poem game tree and resume search there. It is similar to the standard trick with MCTS of preserving the game tree between each move, and simply lopping off all of the non-chosen action nodes and resuming from there, helping amortize the cost of previous search if it successfully allocated most of its compute to the winning choice (except in this case the 'move' is a whole poem). Also a standard trick with MCMC: save the final values, and initialize the next run from there. This would be particularly clear if it searched for a fixed time/compute-budget: if you fed in increasingly correct S-poems, it obviously can search deeper into the S-poem tree each time as it skips all of the earlier worse versions found by the shallower searches.

jacob_drori on tailcalled's Shortform

Sure, there are plenty of quantities that are globally conserved at the fundamental (QFT) level. But most most of.these quantities aren't transferred between objects at the everyday, macro level we humans are used to.

E.g. 1: most everyday objects have neutral electrical charge (because there exist positive and negative charges, which tend to attract and roughly cancel out) so conservation of charge isn't very useful in day-to-day life.

E.g. 2: conservation of color charge doesn't really say anything useful about everyday processes, since it's only changed by subatomic processes (this is again basically due to the screening effect of particles with negative color charge, though the story here is much more subtle, since the main screening effect is due to virtual particles rather than real ones).

The only other fundamental conserved quantity I can think of that is nontrivially exchanged between objects at the macro level is momentum. And... momentum seems roughly as important as energy?

I guess there is a question about why energy, rather than momentum, appears in thermodynamics. If you're interested, I can answer in a separate comment.

review-bot on LLM Applications I Want To See

The LessWrong Review [? · GW] runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?