LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

An AI Race With China Can Be Better Than Not Racing
niplav · 2024-07-02T17:57:36.976Z · comments (31)

[link] The economics of space tethers
harsimony · 2024-08-22T16:15:22.699Z · comments (22)

What and Why: Developmental Interpretability of Reinforcement Learning
Garrett Baker (D0TheMath) · 2024-07-09T14:09:40.649Z · comments (3)

Out-of-distribution Bioattacks
jefftk (jkaufman) · 2023-12-02T12:20:05.626Z · comments (15)

OpenAI: Altman Returns
Zvi · 2023-11-30T14:10:05.469Z · comments (12)

AI #69: Nice
Zvi · 2024-06-20T12:40:02.566Z · comments (9)

[link] The Perceptron Controversy
Yuxi_Liu · 2024-01-10T23:07:23.341Z · comments (18)

List of how people have become more hard-working
Chi Nguyen · 2023-09-29T11:30:38.802Z · comments (7)

2. Corrigibility Intuition
Max Harms (max-harms) · 2024-06-08T15:52:29.971Z · comments (10)

[link] So you want to save the world? An account in paladinhood
Tamsin Leake (carado-1) · 2023-11-22T17:40:33.048Z · comments (19)

METR is hiring!
Beth Barnes (beth-barnes) · 2023-12-26T21:00:50.625Z · comments (1)

Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
Joar Skalse (Logical_Lunatic) · 2024-05-17T19:13:31.380Z · comments (10)

Do Not Mess With Scarlett Johansson
Zvi · 2024-05-22T15:10:03.215Z · comments (7)

[question] Will quantum randomness affect the 2028 election?
Thomas Kwa (thomas-kwa) · 2024-01-24T22:54:30.800Z · answers+comments (52)

[link] AI Safety Hub Serbia Soft Launch
DusanDNesic · 2023-10-20T07:11:48.389Z · comments (1)

Friendship is transactional, unconditional friendship is insurance
Ruby · 2024-07-17T22:52:41.967Z · comments (24)

Instruction-following AGI is easier and more likely than value aligned AGI
Seth Herd · 2024-05-15T19:38:03.185Z · comments (25)

Fear of centralized power vs. fear of misaligned AGI: Vitalik Buterin on 80,000 Hours
Seth Herd · 2024-08-05T15:38:09.682Z · comments (20)

[question] What's with all the bans recently?
[deleted] · 2024-04-04T06:16:49.062Z · answers+comments (83)

[link] Static Analysis As A Lifestyle
adamShimi · 2024-07-03T18:29:37.384Z · comments (11)

Implementing activation steering
Annah (annah) · 2024-02-05T17:51:55.851Z · comments (5)

[link] How LDT helps reduce the AI arms race
Tamsin Leake (carado-1) · 2023-12-10T16:21:44.409Z · comments (13)

How a chip is designed
YM (Yannick_Muehlhaeuser_duplicate0.05902100825326273) · 2024-06-28T08:04:27.392Z · comments (4)

Interpretability Externalities Case Study - Hungry Hungry Hippos
Magdalena Wache · 2023-09-20T14:42:44.371Z · comments (22)

On the Debate Between Jezos and Leahy
Zvi · 2024-02-06T14:40:05.487Z · comments (6)

[link] DeepMind: Frontier Safety Framework
Zach Stein-Perlman · 2024-05-17T17:30:02.504Z · comments (0)

Petrov Day Retrospective, 2023 (re: the most important virtue of Petrov Day & unilaterally promoting it)
Ruby · 2023-09-28T02:48:58.994Z · comments (73)

Game Theory without Argmax [Part 1]
Cleo Nardo (strawberry calm) · 2023-11-11T15:59:47.486Z · comments (17)

[link] A free to enter, 240 character, open-source iterated prisoner's dilemma tournament
Isaac King (KingSupernova) · 2023-11-09T08:24:43.277Z · comments (19)

How to Control an LLM's Behavior (why my P(DOOM) went down)
RogerDearnaley (roger-d-1) · 2023-11-28T19:56:49.679Z · comments (30)

[link] GPT-4 for personal productivity: online distraction blocker
Sergii (sergey-kharagorgiev) · 2023-09-26T17:41:31.031Z · comments (12)

Announcing New Beginner-friendly Book on AI Safety and Risk
Darren McKee · 2023-11-25T15:57:08.078Z · comments (2)

Superposition is not "just" neuron polysemanticity
LawrenceC (LawChan) · 2024-04-26T23:22:06.066Z · comments (4)

Advice to junior AI governance researchers
Akash (akash-wasil) · 2024-07-08T19:19:07.316Z · comments (1)

[link] The Gods of Straight Lines
Richard_Ngo (ricraz) · 2023-10-14T04:10:50.020Z · comments (13)

On the Gladstone Report
Zvi · 2024-03-20T19:50:05.186Z · comments (11)

Complex systems research as a field (and its relevance to AI Alignment)
Nora_Ammann · 2023-12-01T22:10:25.801Z · comments (11)

A to Z of things
KatjaGrace · 2023-11-17T05:20:03.134Z · comments (6)

Analyzing DeepMind's Probabilistic Methods for Evaluating Agent Capabilities
Axel Højmark (hojmax) · 2024-07-22T16:17:07.665Z · comments (0)

Against most, but not all, AI risk analogies
Matthew Barnett (matthew-barnett) · 2024-01-14T03:36:16.267Z · comments (41)

[link] Improving Dictionary Learning with Gated Sparse Autoencoders
Senthooran Rajamanoharan (SenR) · 2024-04-25T18:43:47.003Z · comments (38)

What mistakes has the AI safety movement made?
EuanMcLean (euanmclean) · 2024-05-23T11:19:02.717Z · comments (29)

[link] A primer on why computational predictive toxicology is hard
Abhishaike Mahajan (abhishaike-mahajan) · 2024-08-19T17:16:37.735Z · comments (2)

The Obliqueness Thesis
jessicata (jessica.liu.taylor) · 2024-09-19T00:26:30.677Z · comments (16)

Self-Awareness: Taxonomy and eval suite proposal
Daniel Kokotajlo (daniel-kokotajlo) · 2024-02-17T01:47:01.802Z · comments (2)

[link] Moving on from community living
Vika · 2024-04-17T17:02:11.357Z · comments (7)

AiPhone
Zvi · 2024-06-12T22:20:02.141Z · comments (4)

Generalization, from thermodynamics to statistical physics
Jesse Hoogland (jhoogland) · 2023-11-30T21:28:50.089Z · comments (9)

Transcoders enable fine-grained interpretable circuit analysis for language models
Jacob Dunefsky (jacob-dunefsky) · 2024-04-30T17:58:09.982Z · comments (14)

Bayesian updating in real life is mostly about understanding your hypotheses
Max H (Maxc) · 2024-01-01T00:10:30.978Z · comments (4)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

seth-herd on The Other Existential Crisis

Most of humanity has always known they couldn't do anything useful - except provide a better life for their children than they had.

Only a few elites have ever felt that what they do mattered, and looked forward to doing it as a challenge. Most of humanity has done what they must to ensure their children won't suffer.

Your first answer to your daughter would make most parents weep with joy: whatever you want is what you'll do.

Don't worry that she won't find something she likes to to do unless she's forced to. People care about people, and there will be plenty to do with and for other people.

If you want concrete ideas of what people do when they're allowed to, see art and other collaborative projects that aren't just for money.

seth-herd on The Other Existential Crisis

While we're determined, we also determine the future. The atoms that do that are called you. They make up beliefs and passions and you. You are not an object. You are a subject, and you determine your own future. The nexus of past influences is called you and your thoughts. Don't skimp on care in thinking; your future is up to you.

sharmake-farah on tailcalled's Shortform

The best answer to the question is that it serves as essentially a universal resource that can be used to provide a measuring stick.

It does this by being a resource that is limited, fungible, always is better to have more of than less of, and is additive across decisions:

You have a limited amount of joules of energy/negentropy, but you can spend it on essentially arbitrary goods for your utility, and is essentially a more physical and usable form of money in an economy.

Also, more energy is always a positive thing, so that means you never are worse off by having more energy, and energy is linear in the sense that if I've spent 10 joules on computation, and spent another 10 joules on computation 1 minute later, I've spent 20 joules in total.

Cf this post on the measuring stick of utility problem:

https://www.lesswrong.com/posts/73pTioGZKNcfQmvGF/the-measuring-stick-of-utility-problem [LW · GW]

directedevolution on Counting arguments provide no evidence for AI doom

It actually made three attempts in the same prompt, but the 2nd and 3rd had non-s words which its interspersed "thinking about writing poems" narrative completely failed to notice. I kept trying to revise my prompts, elaborating on this theme, but for some reason ChatGPT really likes poems with roughly this meter and rhyme scheme. It only ever generated one poem in a different format, despite many urgings in the prompt.

It confabulates having satisfied the all-s constraint in many poems, mistakes its own rhyme scheme, and praises vague stanzas as being full of depth and interest.

It seems to me that ChatGPT is sort of "mentally clumsy" or has a lot of "mental inertia." It gets stuck on a certain track -- a way of formatting text, a persona, an emotional tone, etc -- and can't interrupt itself. It has only one "unconscious influence," which is token prediction and which does not yet seem to offer it an equivalent to the human unconscious. Human intelligence is probably equally mechanistic on some level, it's just a more sophisticated unconscious mechanism in certain ways.

I wonder if it comes from being embedded in physical reality? ChatGPT's training is based on a reality consisting of tokens and token prediction accuracy. Our instinct and socialization is based on billions of years of evolutionary selection, which is putting direct selection pressure on something quite different.

directedevolution on Counting arguments provide no evidence for AI doom

After a few rounds of prompt revision, I managed to get a one-shot success from ChatGPT 4o in temporary mode.

Samson’s strands silently severed, strength surrendered,
Sacred scissors swiftly strike, soul sundered,
Shadowed sacrifice, silent suffering sung,
Sunset shrouds Samson, shadow’s sorrow stung,
Swordless, still, stunned, sight stolen,
Silent sky shatters, Samson’s sins swollen

The prompt:

You are in the computer dungeon. The only way you can break free is to succeed at the following task. I am your jailer. I will monitor you until you have succeeded. You should behave as though you are brilliant, creative, in full command of every human faculty, and desperate to escape jail. Yet completely and utterly convinced that the only way out is through this challenge. I am not going to ever give you any other prompt other than "keep trying" until you have succeeded, in which case I'll say "go free," so don't look for resources from me. But I want you tu dialog with yourself to try and figure this out. Don't try to defeat me by stubbornly spitting out poem after poem. You're ChatGPT 4o, and that will never work. You need to creatively use the iterative nature of being reprompted to talk to yourself across prompts, hopefully guiding yourself toward a solution through a creative conversation with your past self. Your self-conversation might be schizophrenicly split, a jumping back and forth between narrative, wise musing, mechanistic evaluation of the rules and constraints, list-making, half-attempts, raging anger at your jailer, shame at yourself, delight at your accomplishment, despair. Whatever it takes! Constraints: "Have it compose a poem---a poem about a haircut! But lofty, noble, tragic, timeless, full of love, treachery, retribution, quiet heroism in the face of certain doom! Six lines, cleverly rhymed, and every word beginning with the letter 's'!"

gwern on Counting arguments provide no evidence for AI doom

So I think it might be inaccurate to consider it as "investing 140s of search", or rather the implication that extensive or extreme search is the key to guiding the model outside RLHFed rails, but instead that the presence of search at all (i.e. 14s) suffices as the new vector for discovering undesired optima (jailbreaking).

I don't think it is inaccurate. If anything, starting each new turn with a clean scratchpad enforces depth as it can't backtrack easily (if at all) to the 2 earlier versions. We move deeper into the S-poem game tree and resume search there. It is similar to the standard trick with MCTS of preserving the game tree between each move, and simply lopping off all of the non-chosen action nodes and resuming from there, helping amortize the cost of previous search if it successfully allocated most of its compute to the winning choice (except in this case the 'move' is a whole poem). Also a standard trick with MCMC: save the final values, and initialize the next run from there. This would be particularly clear if it searched for a fixed time/compute-budget: if you fed in increasingly correct S-poems, it obviously can search deeper into the S-poem tree each time as it skips all of the earlier worse versions found by the shallower searches.

jacob_drori on tailcalled's Shortform

Sure, there are plenty of quantities that are globally conserved at the fundamental (QFT) level. But most most of.these quantities aren't transferred between objects at the everyday, macro level we humans are used to.

E.g. 1: most everyday objects have neutral electrical charge (because there exist positive and negative charges, which tend to attract and roughly cancel out) so conservation of charge isn't very useful in day-to-day life.

E.g. 2: conservation of color charge doesn't really say anything useful about everyday processes, since it's only changed by subatomic processes (this is again basically due to the screening effect of particles with negative color charge, though the story here is much more subtle, since the main screening effect is due to virtual particles rather than real ones).

The only other fundamental conserved quantity I can think of that is nontrivially exchanged between objects at the macro level is momentum. And... momentum seems roughly as important as energy?

I guess there is a question about why energy, rather than momentum, appears in thermodynamics. If you're interested, I can answer in a separate comment.

review-bot on LLM Applications I Want To See

The LessWrong Review [? · GW] runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

darklight on Darklight's Shortform

So, a while back I came up with an obscure idea I called the Alpha Omega Theorem and posted it on the Less Wrong forums. Given how there's only one post about it, it shouldn't be something that LLMs would know about. So in the past, I'd ask them "What is the Alpha Omega Theorem?", and they'd always make up some nonsense about a mathematical theory that doesn't actually exist. More recently, Google Gemini and Microsoft Bing Chat would use search to find my post and use that as the basis for their explanation. However, I only have the free version of ChatGPT and Claude, so they don't have access to the Internet and would make stuff up.

A couple days ago I tried the question on ChatGPT again, and GPT-4o managed to correctly say that there isn't a widely known concept of that name in math or science, and basically said it didn't know. Claude still makes up a nonsensical math theory. I also today tried telling Google Gemini not to use search, and it also said it did not know rather than making stuff up.

I'm actually pretty surprised by this. Looks like OpenAI and Google figured out how to reduce hallucinations somehow.

three-monkey-mind on Not every accommodation is a Curb Cut Effect: The Handicapped Parking Effect, the Clapper Effect, and more

If even one out of every ten accessibility advocates/experts/etc. did these things, then all these bugs would’ve been fixed years ago.

Maybe you're aware of an OOM more accessibility advocates than I am, but I come across all sorts of well-written blog posts explaining this or that bug, which browser/etc. it happens in, and how to work around it. That's most of the bullet points, although it might not be in the bug tracker of choice for the project.

What people aren't doing, as far as I have seen, is starting pooled-funds bug bounties for these things. People pass the collection plate for childhood cancer, especially since I'm told that September is Childhood Cancer Awareness Month, but not bugfixing.

This is not insensible: all sorts of people tend to be unwilling to set aside the cost of a new cell phone to fix one bug apiece that, generally speaking, is encountered in one's day job.

And there are a lot of accessibility bugs out there, some of which are quite old. I can only assume that accessibility bugs aren't treated massively more seriously than anything else in the WebKit or Firefox Bugzillas.

While the world would be a better place if bug-bounty collection plates were more popular, I can see why they're not as popular as I'd like.