LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Bellevue Library Meetup - Nov 23
Cedar (xida-ren) · 2024-11-09T23:05:02.452Z · comments (3)

"Alignment at Large": Bending the Arc of History Towards Life-Affirming Futures
welfvh · 2024-12-03T21:17:56.466Z · comments (0)

[link] How to Edit an Essay into a Solstice Speech?
Czynski (JacobKopczynski) · 2024-12-15T04:30:50.545Z · comments (1)

Using Narrative Prompting to Extract Policy Forecasts from LLMs
Max Ghenis (MaxGhenis) · 2024-11-05T04:37:52.004Z · comments (0)

[link] World Models I'm Currently Building
temporary · 2024-12-15T16:29:08.287Z · comments (1)

"Pick Two" AI Trilemma: Generality, Agency, Alignment.
Black Flag (robert-shala-1) · 2025-01-15T18:52:00.780Z · comments (0)

[link] OpenAI o1 + ChatGPT Pro release
anaguma · 2024-12-05T19:13:21.843Z · comments (0)

Why empiricists should believe in AI risk
Knight Lee (Max Lee) · 2024-12-11T03:51:17.979Z · comments (0)

A proposal for iterated interpretability with known-interpretable narrow AIs
Peter Berggren (peter-berggren) · 2025-01-11T14:43:05.423Z · comments (0)

Towards mutually assured cooperation
mikko (morrel) · 2024-12-22T20:46:21.965Z · comments (0)

Apply to be a mentor in SPAR!
agucova · 2024-11-05T21:32:45.797Z · comments (0)

Theories With Mentalistic Atoms Are As Validly Called Theories As Theories With Only Non-Mentalistic Atoms
Lorec · 2024-11-12T06:45:26.039Z · comments (5)

Reducing x-risk might be actively harmful
MountainPath · 2024-11-18T14:25:07.127Z · comments (5)

No, the Polymarket price does not mean we can immediately conclude what the probability of a bird flu pandemic is. We also need to know the interest rate!
Christopher King (christopher-king) · 2024-12-28T16:05:47.037Z · comments (8)

Agency overhang as a proxy for Sharp left turn
Eris (anton-zheltoukhov) · 2024-11-07T12:14:24.333Z · comments (0)

Scattered thoughts on what it means for an LLM to believe
TheManxLoiner · 2024-11-06T22:10:29.429Z · comments (4)

[link] Inescapably Value-Laden Experience—a Catchy Term I Made Up to Make Morality Rationalisable
James Stephen Brown (james-brown) · 2024-12-19T04:45:37.906Z · comments (0)

Project Adequate: Seeking Cofounders/Funders
Lorec · 2024-11-17T03:12:12.995Z · comments (7)

Germany-wide ACX Meetup
Fernand0 · 2024-11-17T10:08:54.584Z · comments (0)

Investing in Robust Safety Mechanisms is critical for reducing Systemic Risks
Tom DAVID (tom-david) · 2024-12-11T13:37:24.177Z · comments (3)

[question] Are there ways to artificially fix laziness?
Aidar (aidar-toktargazin) · 2024-12-08T18:26:26.433Z · answers+comments (2)

Transformers Explained (Again)
RohanS · 2024-10-22T04:06:33.646Z · comments (0)

[link] Predictions as Public Works Project — What Metaculus Is Building Next
ChristianWilliams · 2024-10-22T16:35:13.999Z · comments (0)

ARC-AGI is a genuine AGI test but o3 cheated :(
Knight Lee (Max Lee) · 2024-12-22T00:58:05.447Z · comments (6)

It is time to start war gaming for AGI
yanni kyriacos (yanni) · 2024-10-17T05:14:17.932Z · comments (1)

Levels of Thought: from Points to Fields
HNX · 2024-12-02T20:25:02.802Z · comments (2)

[question] Is there a known method to find others who came across the same potential infohazard without spoiling it to the public?
hive · 2024-10-17T10:47:05.099Z · answers+comments (6)

[question] Is OpenAI net negative for AI Safety?
Lysandre Terrisse · 2024-11-02T16:18:02.859Z · answers+comments (0)

[question] is there a big dictionary somewhere with all your jargon and acronyms and whatnot?
KvmanThinking (avery-liu) · 2024-10-17T11:30:50.937Z · answers+comments (7)

On AI Detectors Regarding College Applications
Kaustubh Kislay (kaustubh-kislay) · 2024-11-27T20:25:48.151Z · comments (2)

Dishbrain and implications.
RussellThor · 2024-12-29T10:42:43.912Z · comments (0)

Some Comments on Recent AI Safety Developments
testingthewaters · 2024-11-09T16:44:58.936Z · comments (0)

[link] Better antibodies by engineering targets, not engineering antibodies (Nabla Bio)
Abhishaike Mahajan (abhishaike-mahajan) · 2025-01-13T15:05:35.261Z · comments (0)

[link] Entropic strategy in Two Truths and a Lie
dkl9 · 2024-11-21T22:03:28.986Z · comments (2)

Towards a Clever Hans Test: Unmasking Sentience Biases in Chatbot Interactions
glykokalyx · 2024-11-10T22:34:58.956Z · comments (0)

(draft) Cyborg software should be open (?)
AtillaYasar (atillayasar) · 2024-11-01T07:24:51.966Z · comments (5)

[question] Noticing the World
EvolutionByDesign (bioluminescent-darkness) · 2024-11-04T16:41:44.696Z · answers+comments (1)

Linkpost: Look at the Water
J Bostock (Jemist) · 2024-12-30T19:49:04.107Z · comments (3)

Vision of a positive Singularity
RussellThor · 2024-12-23T02:19:35.050Z · comments (0)

[question] Has Anthropic checked if Claude fakes alignment for intended values too?
Maloew (maloew-valenar) · 2024-12-23T00:43:07.490Z · answers+comments (1)

Morality as Cooperation Part III: Failure Modes
DeLesley Hutchins (delesley-hutchins) · 2024-12-05T09:39:27.816Z · comments (0)

Good Fortune and Many Worlds
Jonah Wilberg (jrwilb@googlemail.com) · 2024-12-27T13:21:43.142Z · comments (0)

Grokking revisited: reverse engineering grokking modulo addition in LSTM
Nikita Khomich (nikitoskh) · 2024-12-16T18:48:43.533Z · comments (0)

Are SAE features from the Base Model still meaningful to LLaVA?
Shan23Chen (shan-chen) · 2024-12-05T19:24:34.727Z · comments (0)

[link] Expevolu, a laissez-faire approach to country creation
Fernando · 2024-12-05T19:29:24.011Z · comments (4)

More Growth, Melancholy, and MindCraft @3QD [revised and updated]
Bill Benzon (bill-benzon) · 2024-12-05T19:36:02.289Z · comments (0)

3. Improve Cooperation: Better Technologies
Allison Duettmann (allison-duettmann) · 2025-01-02T19:03:16.588Z · comments (2)

Ways to think about alignment
Abhimanyu Pallavi Sudhir (abhimanyu-pallavi-sudhir) · 2024-10-27T01:40:50.762Z · comments (0)

Activation Magnitudes Matter On Their Own: Insights from Language Model Distributional Analysis
Matt Levinson · 2025-01-10T06:53:02.228Z · comments (0)

[link] Can AI improve the current state of molecular simulation?
Abhishaike Mahajan (abhishaike-mahajan) · 2024-12-06T20:22:31.685Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

cole-wyeth on What is the most impressive game LLMs can play well?

This is a pretty strong update against LLMs for me. I would have expected them to perform okay against a random model given free access to the board state and list of legal moves. I suspect I could probably win blind (and I am a serious player, certainly others can win multiple blind games at once) so this is not entirely a perception issue. On the other hand, o1 is certainly getting some traction, which often precedes steady improvement (based on the last couple of years). But... like, it's basically doing a super overpriced tree search. I'm guessing a tree search to depth 3 with a naive heuristic is already enough to beat a random player, so I'm not convinced that the LLM is lifting any weight here.

milosal on Introducing the WeirdML Benchmark

Perhaps it would be worth getting in touch with METR to discuss findings from this work. They might like to collaborate with you, as well.

havard-tveit-ihle on Introducing the WeirdML Benchmark

Thank you!

I've been working on the automated pipeline as a part time project for about two months, probably equivalent to 2-4 full-time weeks of work.

One run for one model and one task typically takes perhaps 5-15 minutes, but it can be up to about an hour (if they use their 10 min compute time efficiently, which they tend not to do).

Total API costs for the project is probably below 200$ (if you do not count the credits used on googles free tier). Most of the cost is for running o1-mini and o1-preview (even though o1-preview only went through a third of the runs compared to the other models). o1-preview costs about 2$ for each run on each task. For compute I'm using hardware we have locally with my employer, so I have not tracked what the equivalent cost of renting it would be, but I guess it would be of the same order of magnitude or as the API costs or a factor of a few larger.

I expect the API costs to dominate going forward though if we want to run o3 models etc through the eval.

steve2152 on What Is The Alignment Problem?

“Is scratching your nose right now something you desire?” Yes. “Is scratching your nose right now something you value?” Not really, no. But I claim that the Value Reinforcement Learning framework would assign a positive score to the idea of scratching my nose when it’s itchy. Otherwise, nobody would scratch their nose.

I desire peace and justice, but I also value peace and justice, so that’s not a good way to distinguish them.

(I suspect that you took my definition of "values" to be nearly synonymous with rewards or immediately anticipated rewards, which it very much is not; projected-upstream-generators-of-rewards are a quite different beast from rewards themselves, especially as we push further upstream.)

No, that’s not what I think. I think your definition points to whether things are motivating versus demotivating all-things-considered, including both immediate plans and long-term plans. And I want to call that desires. Desires can be long-term—e.g. “being a dad someday is something I very much desire”.

I think “values”, as people use the term in everyday life, tends to be something more specific, where not only is the thing motivating, but it’s also motivating when you think about it in a self-reflective way. A.k.a. “X is motivating” AND “the-idea-of-myself-doing-X is motivating”. If I’m struggling to get out of bed, because I’m going to be late for work, then the feeling of my head remaining on the pillow is motivating, but the self-reflective idea of myself being in bed is demotivating. Consequently, I might describe the soft feeling of the pillow on my head as something I desire, but not something I value.

(I talk about this in §8.4.2–8.5 here [LW · GW] but that might be pretty hard to follow out of context.)

vladimir_nesov on Is AI Hitting a Wall or Moving Faster Than Ever?

There is enough natural text data until 2026-2028, as I describe in the Peak Data [LW · GW] section of the linked post. It's not very good data, but with 2,500x raw compute of original GPT-4 (and possibly 10,000x-25,000x in effective compute [LW(p) · GW(p)] due to algorithmic improvement in pretraining), that's a lot of headroom that doesn't depend on inventing new things (such as synthetic data suitable for improving general intelligence through pretraining the way natural text data is).

Insufficient data could in principle be an issue with making good use of 5e28 FLOPs, but actually getting 5e28 FLOPs by 2028 (from a single training system) only requires funding. The decisions about this don't need to be taken based on AIs that exist today, they'll be taken based on AIs that exist in 2026-2027, trained on 1 GW training systems being built this year. With o3-like post-training, the utility and impressiveness of an LLM improves, so the chances of getting that project funded improve (compared to absence of such a technique).

havard-tveit-ihle on Introducing the WeirdML Benchmark

Thank you!

It would be really great with human baselines, but it’s very hard to do in practice. For a human to do one of these tasks it would take several hours.

I don’t really have any funding for this project, but I might find someone that wants to do one task for fun, or do my best effort myself on a fresh task when I make one.

What we would really want is to have several top researchers/ml engineers do it, and I know that METR is working on that, so that is probably the best source we have for a realistic comparison at the moment.

johnswentworth on What Is The Alignment Problem?

Seems false, though that specific experiment is not super cruxy. One central line of evidence: desires tend to be more myopic, values longer term or "deeper". That's exactly the sort of thing you'd expect if "desires" is mostly about immediate reward signals (or anticipated reward signals), whereas "values" is more about the (projected) upstream generators of those signals, potentially projected quite a ways back upstream.

(I suspect that you took my definition of "values" to be nearly synonymous with rewards or immediately anticipated rewards, which it very much is not; projected-upstream-generators-of-rewards are a quite different beast from rewards themselves, especially as we push further upstream.)

davidbeniaguev on Applying traditional economic thinking to AGI: a trilemma

Adding the time dimension solves the issue. The price of "AI from two years ago" will be cheap, and the price of "state of the art AI" will be high. You already see this happening today, and it is in many ways the economic history of technology thus far anyway.

The situation should result in large percentage increase of world GDP growth rate compared to current rate, but it's hard to know exactly how high as it depends on various bottlenecks that might be hard to foresee right now.

vladimir_nesov on Vladimir_Nesov's Shortform

A reflectively stable agent prefers to preserve some property of itself. This doesn't in general prevent it from being able to self-improve, in the same way that unchanging laws of physics don't prevent presence of self-improving agents in the world.

The content of the world can keep changing under the unchanging laws of how it changes, and similarly a reflectively stable agent (against safety properties) has content (such as beliefs) that keeps changing, in principle enabling unfettered self-improvement. Mesa-agents existing in the form of the content of the outer agent's cognition don't even need to have its safety properties. This is one framing for the way people might live within a superintelligence.

lawrencec on Introducing the WeirdML Benchmark

This is really impressive -- could I ask how long this project took, how long does each eval take to run on average, and what you spent on compute/API credits?

(Also, I found the preliminary BoK vs 5-iteration results especially interesting, especially the speculation on reasoning models.)