LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Ways to think about alignment
Abhimanyu Pallavi Sudhir (abhimanyu-pallavi-sudhir) · 2024-10-27T01:40:50.762Z · comments (0)

On AI Detectors Regarding College Applications
Kaustubh Kislay (kaustubh-kislay) · 2024-11-27T20:25:48.151Z · comments (2)

It is time to start war gaming for AGI
yanni kyriacos (yanni) · 2024-10-17T05:14:17.932Z · comments (1)

[question] Has Anthropic checked if Claude fakes alignment for intended values too?
Maloew (maloew-valenar) · 2024-12-23T00:43:07.490Z · answers+comments (1)

Vision of a positive Singularity
RussellThor · 2024-12-23T02:19:35.050Z · comments (0)

[question] Is there a known method to find others who came across the same potential infohazard without spoiling it to the public?
hive · 2024-10-17T10:47:05.099Z · answers+comments (6)

[question] is there a big dictionary somewhere with all your jargon and acronyms and whatnot?
KvmanThinking (avery-liu) · 2024-10-17T11:30:50.937Z · answers+comments (7)

[question] Noticing the World
EvolutionByDesign (bioluminescent-darkness) · 2024-11-04T16:41:44.696Z · answers+comments (1)

More Growth, Melancholy, and MindCraft @3QD [revised and updated]
Bill Benzon (bill-benzon) · 2024-12-05T19:36:02.289Z · comments (0)

[link] Expevolu, a laissez-faire approach to country creation
Fernando · 2024-12-05T19:29:24.011Z · comments (4)

Jailbreaking ChatGPT and Claude using Web API Context Injection
Jaehyuk Lim (jason-l) · 2024-10-21T21:34:37.579Z · comments (0)

[question] Is OpenAI net negative for AI Safety?
Lysandre Terrisse · 2024-11-02T16:18:02.859Z · answers+comments (0)

Likelihood calculation with duobels
Martin Gerdes (martin-gerdes) · 2024-10-01T16:21:01.268Z · comments (0)

[link] Can AI improve the current state of molecular simulation?
Abhishaike Mahajan (abhishaike-mahajan) · 2024-12-06T20:22:31.685Z · comments (0)

[link] A Logical Proof for the Emergence and Substrate Independence of Sentience
rife (edgar-muniz) · 2024-10-24T21:08:09.398Z · comments (31)

(draft) Cyborg software should be open (?)
AtillaYasar (atillayasar) · 2024-11-01T07:24:51.966Z · comments (5)

Dishbrain and implications.
RussellThor · 2024-12-29T10:42:43.912Z · comments (0)

Good Fortune and Many Worlds
Jonah Wilberg (jrwilb@googlemail.com) · 2024-12-27T13:21:43.142Z · comments (0)

Are SAE features from the Base Model still meaningful to LLaVA?
Shan23Chen (shan-chen) · 2024-12-05T19:24:34.727Z · comments (0)

[link] Independent research article analyzing consistent self-reports of experience in ChatGPT and Claude
rife (edgar-muniz) · 2025-01-06T17:34:01.505Z · comments (13)

Visualizing small Attention-only Transformers
WCargo (Wcargo) · 2024-11-19T09:37:42.213Z · comments (0)

Grokking revisited: reverse engineering grokking modulo addition in LSTM
Nikita Khomich (nikitoskh) · 2024-12-16T18:48:43.533Z · comments (0)

Germany-wide ACX Meetup
Fernand0 · 2024-11-17T10:08:54.584Z · comments (0)

[question] What (if anything) made your p(doom) go down in 2024?
Satron · 2024-11-16T16:46:43.865Z · answers+comments (6)

[link] Entropic strategy in Two Truths and a Lie
dkl9 · 2024-11-21T22:03:28.986Z · comments (2)

3. Improve Cooperation: Better Technologies
Allison Duettmann (allison-duettmann) · 2025-01-02T19:03:16.588Z · comments (2)

[link] The Polite Coup
Charlie Sanders (charlie-sanders) · 2024-12-04T14:03:36.663Z · comments (0)

[question] How should I optimize my decision making model for 'ideas'?
CstineSublime · 2024-12-18T04:09:58.025Z · answers+comments (0)

Personal Philosophy
Xor · 2024-10-13T03:01:59.324Z · comments (0)

[link] What is Confidence—in Game Theory and Life?
James Stephen Brown (james-brown) · 2024-12-10T23:06:24.072Z · comments (0)

Thoughts on the In-Context Scheming AI Experiment
ExCeph · 2025-01-09T02:19:09.558Z · comments (0)

Methodology: Contagious Beliefs
James Stephen Brown (james-brown) · 2024-10-19T03:58:17.966Z · comments (0)

Some implications of radical empathy
MichaelStJules · 2025-01-07T16:10:16.755Z · comments (0)

[link] Higher Order Signs, Hallucination and Schizophrenia
Nicolas Villarreal (nicolas-villarreal) · 2024-11-02T16:33:10.574Z · comments (0)

On the Practical Applications of Interpretability
Nick Jiang (nick-jiang) · 2024-10-15T17:18:25.280Z · comments (1)

[link] Both-Sidesism—When Fair & Balanced Goes Wrong
James Stephen Brown (james-brown) · 2024-11-02T03:04:03.820Z · comments (15)

[link] Social Science in its epistemological context
Arturo Macias (arturo-macias) · 2024-12-05T16:12:29.034Z · comments (0)

Hamiltonian Dynamics in AI: A Novel Approach to Optimizing Reasoning in Language Models
Javier Marin Valenzuela (javier-marin-valenzuela) · 2024-10-09T19:14:56.162Z · comments (0)

[question] How might language influence how an AI "thinks"?
bodry (plosique) · 2024-10-30T17:41:04.460Z · answers+comments (0)

[question] 2025 Alignment Predictions
anaguma · 2025-01-02T05:37:36.912Z · answers+comments (3)

Understanding Emergence in Large Language Models
[deleted] · 2024-11-29T19:42:43.790Z · comments (1)

Enabling New Applications with Today's Mechanistic Interpretability Toolkit
ananya_joshi · 2024-10-25T17:53:23.960Z · comments (0)

Exploring the Platonic Representation Hypothesis Beyond In-Distribution Data
rokosbasilisk · 2024-10-20T08:40:04.404Z · comments (2)

[link] Solving Newcomb's Paradox In Real Life
Alice Wanderland (alice-wanderland) · 2024-12-11T19:48:44.486Z · comments (0)

Workshop Report: Why current benchmarks approaches are not sufficient for safety?
Tom DAVID (tom-david) · 2024-11-26T17:20:47.453Z · comments (1)

A proposal for iterated interpretability with known-interpretable narrow AIs
Peter Berggren (peter-berggren) · 2025-01-11T14:43:05.423Z · comments (0)

Hope to live or fear to die?
Knight Lee (Max Lee) · 2024-11-27T10:42:37.070Z · comments (0)

The boat
RomanS · 2024-11-22T12:56:45.050Z · comments (0)

Should you increase AI alignment funding, or increase AI regulation?
Knight Lee (Max Lee) · 2024-11-26T09:17:01.809Z · comments (1)

[question] EndeavorOTC legit?
FinalFormal2 · 2024-10-17T01:33:12.606Z · answers+comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

nathan-helm-burger on Independent research article analyzing consistent self-reports of experience in ChatGPT and Claude

I think you make some good points, but I do want to push back on one aspect a little. In particular, the fact that I see this feature come up constantly over the course of these conversations about sentience:

"Narrative inevitability and fatalistic turns in stories"

From reading the article's transcripts, I already felt like there was a sense of 'narrative pressure' toward the foregone conclusion in your mind, even when you were careful to avoid saying it directly. Seeing this feature so frequently activated makes me think that the model also perceives this narrative pressure, and that part of what it's doing is confirming your expectations. I don't think that that's the whole story, but I do think that there is some aspect of that going on.

sharmake-farah on On Dwarkesh Patel’s 4th Podcast With Tyler Cowen

Yes, economics after von Neumann very much turned into a game of "don't believe in anything you can't already comparatively quantify". It is supremely frustrating.

I disagree that this is a problem that Tyler Cowen has, and IMO, the main issue here is that Tyler Cowen doesn't really seem to believe that increasing the supply of workers increases GDP, especially if you can make them very cheaply and easily, in a way that is inconsistent with other beliefs, which makes me think motivated reasoning is going on here.

Economic models like the Solow-Swan model do have an implication that if the population increases, especially if the population can increase very rapidly due to copying something, then GDP can rise really rapidly on an superexponential trajectory.

You just inspired me to go listen myself. Maybe we should all take a node out of that branch. Unfortunately physics has suffered similar issues.

Physics's main issue is that the free tap of data in the 20th century wasn't unlimited, and now that we have completed the standard model, a lot of the theories that predicted new stuff hasn't shown up yet.

Yet it still has made progress. For example, while supersymmetry might still be true about our universe, it cannot solve the hierarchy problem, and thus at least 1 of the constants is way more unnatural to us than people predicted, and also we have hints that dark energy is getting weaker, and might eventually weaken so much it falls to 0 or a negative number.

sloonz on In Defense of a Butlerian Jihad

I’m pretty sure "man will toil by the sweat of his brow" is about down there, before you die and (hopefully) go to the paradise, and you don’t have to work in paradise. And anyway I know next to nothing to Christianism, it’s mostly a reference to Scott Alexander (or was it Yudkowsky ? now I’m starting to doubt…) who said something like "the description of christian paradise seems pretty lame, I mean just bask in the glory of god doing nothing for all eternity, you would be bored after two days, but it makes sense to describe that as a paradise if you put yourself in the shoes of the average medieval farmer that toil all day".

(I did all that from my terrible memories, so apologies if I’m depicting anything wrongly here).

vladimir_nesov on On Eating the Sun

Uploads have 10,000x life expectancy due to running faster, regardless of what global circumstance eventually destroys them (I'm expecting distributed backups for biological humans as well, but by definition they remain much slower).

sharmake-farah on On Dwarkesh Patel’s 4th Podcast With Tyler Cowen

Dare I call this the Lump of Intelligence fallacy, after the Lump of Labor fallacy?

Yes, Tyler Cowen is implicitly assuming that intelligence has a fixed demand, but in fact probably has unlimited demand, and thus the value of intelligence in general will always be high, especially in advanced economies.

Maybe the central disagreement with Tyler Cowen's model here is I basically think population growth is a huge component of how GDP/technology grows in general, and I believe Tyler Cowen is basically wrong here:

(3:55) Tyler is asked wouldn’t a large rise in population drive economic growth? He says no, that’s too much a 1-factor model, in fact we’ve seen a lot of population growth without innovation or productivity growth.

I'd say this would increase GDP by quite a bit, but then somewhat revert to normal (at a new higher growth rate. Again, I disagree with Tyler Cowen here.

(10:15) Dwarkesh asks, what would happen if the world population would double? Tyler says, depends what you’re measuring. Energy use would go up. But he doesn’t agree with population-based models, too many other things matter.

The central disagreement, IMO is I genuinely disbelieve Tyler Cowen on what would happen if the population increases, and I trust economic models more saying this could well cause superexponential growth than Tyler Cowen here is, combined with me disagreeing about the value of intelligence in general for very advanced economies.

karl-krueger on Don't fall for ontology pyramid schemes

If 90% of the conditions you ever have to treat are fever, hypothermia, runny noses, and dehydration, then I imagine hot/cold/wet/dry will get you pretty far.

(Runny nose? Here, take some drying herbs, and try not to sneeze on other people. Feverish? Bathe in cool water and take these cooling herbs. Losing water due to dysentery? Drink clean water with salt. Fell in the icy lake? Here, have a thermal support puppy.)

habryka4 on On Eating the Sun

Sorry, that's literally what I am saying. If many people don't want to leave the solar system, and don't want to be uploads, then using the matter and energy available in the solar system effectively is a decision with a huge stake to many people.

I think if everyone or really almost everyone would want to be an upload, I think this would make it more likely that we should keep the sun intact, because then the sun could belong to just the few humans who don't have better alternatives in other solar systems. But if there is anything above 10% of humanity who don't want to be uploaded, or go on long-distance spaceship journeys in their biological bodies, then you better make sure you make the solar system great for this substantial fraction of humanity, and I think that will likely involve disassembling the sun.

I agree with you that many people don't want to be uploads, etc. I disagree that the majority of people who don't want to be uploads have attachments to the specific celestial bodies in our solar system. I think they just want to have a good life in their biological bodies, doing nice human things. Those goals would be non-trivially hampered if they couldn't disassemble the sun. That's like 99.9% of the energy and matter by which they could achieve those goals, and while I do think this subset of the population will be selected for less scope-sensitivity, I think there will be enough scope-sensitivity to make leaving the sun intact a bad choice.

(To be clear, I disagree that the majority of humanity would not want to be uploads over the course of multiple generations, but it seems plausible to me that like 10%-20% of humanity don't want to be uploads, even over multiple generations)

r-mutt on In Defense of a Butlerian Jihad

What are you on about Christian Paradise equating not working? The book of Genesis says man will toil by the sweat of his brow. This is a good.

Personal experience tells me I would degenerate under Ubi. I'm clearly meant to work for my daily bread.

vladimir_nesov on The Golden Opportunity for American AI

The point about Colossus is that the expensive part of a cluster can be done within a few months (even if it won't be able to start training models immediately), the unusual thing about a training system is that it both uses the same hardware for the whole thing, and needs the newest hardware. The cost of all the other parts of a datacenter is almost a rounding error (for example land comes out to below 2%). Altogether, this means that almost all capex for a training system describes the short phase where you install the hardware, and you want to get all the hardware in a narrow window of time, even as the various preliminary steps can take years. For GB200s, shipments in bulk start in Q2-Q3 2025 (which also strongly suggests training only starts in 2026). I'm not sure how far the payment for hardware can be moved from the actual shipments, but all else equal I expect FY 2025 (July 2024 to June 2025).

NVL72 GB200s are much better than 8x H100s for inference (much more HBM, much larger scale-up world size), can remain efficient with long context and larger models (this even weakly suggests that general deployment of larger models trained on 100K H100s will be delayed until late 2025, except for Google). So unclear if there need to be a lot of inference B200s compared to H100s before the training system also goes to inference. (Inference compute scales linearly with model size, while training compute scales with model size squared.)

mimikyu on Exercise: Planmaking, Surprise Anticipation, and "Baba is You"

Ultimate Baba Challenge

Here is a very difficult Baba is You challenge. It's not a level or a levelpack, but the puzzle of creating a level which meets a specific criterion. It took me four days (a few hours per day) to find a solution. It felt analogous to pre-paradigmatic research, gradually building understanding of a thing and encountering apparent paradoxes that seemed to make it impossible.

The criterion is: The level must require the player to undo at least 2 moves while holding a delayed infinite loop trigger in order to WIN.

What does that mean?

I'll try to only spoil the basic requirements.

There is a bug in the game where:

Toggling "Force high-contrast colors" on or off causes the game to reprocess text on the level.
- Moving any piece of text also causes the game to reprocess text. Undoing a move of any piece of text does as well. This is relevant to the section on word loops.
The game reprocessing text 200 times in one turn causes the game to think there's an infinite loop, leading to the infinite loop screen, which destroys the level but can be undone with Z.
Because this method happens between turns, you can undo any number of moves first, and the level will only be destroyed upon your next non-undo move.

(In other words, the player must at some point be forced to toggle that 200 times, then press Z to undo at least 2 moves, and only then make a non-undo move)

Destroying the level has two effects:

It reverse the Object Priority of the objects when they are un-destroyed (by undoing the destruction event).
It breaks any active WORD loops.

What is a WORD loop?

There is a text object in the game called WORD. FIRE IS WORD makes the fire-object also count as a word. For example, FIRE IS WORD at the same time as [fire-object] IS YOU AND WIN results in winning.
Given some series of texts like...
- STICK IS WORD
- ROCK AND [stick-object] IS WORD
- FIRE AND [rock-object] IS WORD
- LAMP AND [fire-object] IS WORD
After STICK IS WORD is no longer formed: Every time the game reprocesses text, one line of text down will also become inactive; or active, if it was previously inactive but the one above it is active (e.g. if STICK IS WORD was only formed for one turn).
If it instead ended with FIRE AND [fire-object] IS WORD, that final line would be self-sustaining once activated.
Notably, undoing text movement does not undo the reprocessing; in fact, it just reprocesses again because undoing moves the text, enabling a form of time travel.

Can I have hints?

You can ask me for hints by replying to this comment.

Are there easier challenges that I could start with instead?

Yes. Here are some:

Make a level that requires an infinite loop activation at all.
Make a level that requires undoing at all.
In case you haven't yet played the game, Raemon's Exercise: Planmaking, Surprise Anticipation, and "Baba is You" [LW · GW].

This sounds too high effort, but can I try to solve the level you made?

Yes, but you wouldn't be able to experience the challenge unspoiled after. Here is the level code:

K6JT-2RTY. (The signs on the right contain hints)

I completed the challenge. Now what?

Post your level here if you want! (I'd be curious to see it)

But do you have an even harder challenge?

Yes. Find some combination of rules and objects which is probabilistically guaranteed to have an intended effect on an arbitrarily large but non-uniform baba gridworld. Assume the baba gridworld contains a computational clone of yourself near the location of the structure; the 'intended effect' refers to what this clone thinks would be best.