LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Intrinsic Power-Seeking: AI Might Seek Power for Power’s Sake
TurnTrout · 2024-11-19T18:36:20.721Z · comments (5)

[link] Forecasting: the way I think about it
Molly (hickman-santini) · 2024-05-09T00:49:01.768Z · comments (4)

Whiteboard Pen Magazines are Useful
Johannes C. Mayer (johannes-c-mayer) · 2024-07-12T17:15:33.200Z · comments (8)

[link] List of Collective Intelligence Projects
Chipmonk · 2024-07-02T14:10:41.789Z · comments (9)

Winners of the Essay competition on the Automation of Wisdom and Philosophy
owencb · 2024-10-28T17:10:04.272Z · comments (3)

[link] Progress Conference 2024: Toward Abundant Futures
jasoncrawford · 2024-06-26T15:39:45.267Z · comments (2)

When Are Results from Computational Complexity Not Too Coarse?
Dalcy (Darcy) · 2024-07-03T19:06:44.953Z · comments (7)

[link] Conflict in Posthuman Literature
Martín Soto (martinsq) · 2024-04-06T22:26:04.051Z · comments (1)

Beware unfinished bridges
Adam Zerner (adamzerner) · 2024-05-12T09:29:07.808Z · comments (9)

[Linkpost] Play with SAEs on Llama 3
Tom McGrath · 2024-09-25T22:35:44.824Z · comments (2)

International Scientific Report on the Safety of Advanced AI: Key Information
Aryeh Englander (alenglander) · 2024-05-18T01:45:10.194Z · comments (0)

Forget Everything (Statistical Mechanics Part 1)
J Bostock (Jemist) · 2024-04-22T13:33:35.446Z · comments (6)

[link] FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI
Tamay · 2024-11-14T06:13:22.042Z · comments (0)

Signaling with Small Orange Diamonds
jefftk (jkaufman) · 2024-11-07T20:20:08.026Z · comments (1)

[Interim research report] Evaluating the Goal-Directedness of Language Models
Rauno Arike (rauno-arike) · 2024-07-18T18:19:04.260Z · comments (4)

[link] Linear infra-Bayesian Bandits
Vanessa Kosoy (vanessa-kosoy) · 2024-05-10T06:41:09.206Z · comments (5)

Medical Roundup #3
Zvi · 2024-07-09T13:10:06.862Z · comments (4)

AI #98: World Ends With Six Word Story
Zvi · 2025-01-09T16:30:07.341Z · comments (2)

What happens next?
Logan Zoellner (logan-zoellner) · 2024-12-29T01:41:33.685Z · comments (19)

Open Source Replication of Anthropic’s Crosscoder paper for model-diffing
Connor Kissane (ckkissane) · 2024-10-27T18:46:21.316Z · comments (4)

[question] Are You More Real If You're Really Forgetful?
Thane Ruthenis · 2024-11-24T19:30:55.233Z · answers+comments (25)

Natural abstractions are observer-dependent: a conversation with John Wentworth
Martín Soto (martinsq) · 2024-02-12T17:28:38.889Z · comments (13)

Simple distribution approximation: When sampled 100 times, can language models yield 80% A and 20% B?
Teun van der Weij (teun-van-der-weij) · 2024-01-29T00:24:27.706Z · comments (5)

[link] AI governance needs a theory of victory
Corin Katzke (corin-katzke) · 2024-06-21T16:15:46.560Z · comments (6)

[link] Language Models Don't Learn the Physical Manifestation of Language
Bruce W. Lee (bruce-lee) · 2024-02-22T18:52:32.237Z · comments (23)

[link] An AI Manhattan Project is Not Inevitable
Maxwell Tabarrok (maxwell-tabarrok) · 2024-07-06T16:42:35.920Z · comments (25)

Stitching SAEs of different sizes
Bart Bussmann (Stuckwork) · 2024-07-13T17:19:20.506Z · comments (12)

Apply to the 2024 PIBBSS Summer Research Fellowship
Nora_Ammann · 2024-01-12T04:06:58.328Z · comments (1)

China-AI forecasts
[deleted] · 2024-02-25T16:49:33.652Z · comments (29)

Nitric oxide for covid and other viral infections
Elizabeth (pktechgirl) · 2024-02-07T21:30:03.774Z · comments (6)

Monthly Roundup #23: October 2024
Zvi · 2024-10-16T13:50:05.869Z · comments (13)

You're a Space Wizard, Luke
lsusr · 2024-08-18T05:35:39.238Z · comments (6)

Instrumental deception and manipulation in LLMs - a case study
Olli Järviniemi (jarviniemi) · 2024-02-24T02:07:01.769Z · comments (13)

[link] Understanding Gödel’s completeness theorem
jessicata (jessica.liu.taylor) · 2024-05-27T18:55:02.079Z · comments (0)

[question] How would you navigate a severe financial emergency with no help or resources?
Tigerlily · 2024-05-02T18:27:51.329Z · answers+comments (22)

[link] [Fiction] A Confession
Arjun Panickssery (arjun-panickssery) · 2024-04-18T16:28:48.194Z · comments (2)

[link] On what research policymakers actually need
MondSemmel · 2024-04-23T19:50:12.833Z · comments (0)

[link] Elon files grave charges against OpenAI
mako yass (MakoYass) · 2024-03-01T17:42:13.963Z · comments (10)

[link] Win Friends and Influence People Ch. 2: The Bombshell
gull · 2024-01-28T21:40:47.986Z · comments (13)

LLMs as a Planning Overhang
Larks · 2024-07-14T02:54:14.295Z · comments (8)

Losing Faith In Contrarianism
omnizoid · 2024-04-25T20:53:34.842Z · comments (44)

AI #48: The Talk of Davos
Zvi · 2024-01-25T16:20:26.625Z · comments (9)

From Finite Factors to Bayes Nets
J Bostock (Jemist) · 2024-01-23T20:03:51.845Z · comments (7)

Compelling Villains and Coherent Values
Cole Wyeth (Amyr) · 2024-10-06T19:53:47.891Z · comments (4)

Making a Secular Solstice Songbook
jefftk (jkaufman) · 2024-01-23T19:40:05.055Z · comments (6)

[link] An X-Ray is Worth 15 Features: Sparse Autoencoders for Interpretable Radiology Report Generation
hugofry · 2024-10-07T08:53:14.658Z · comments (0)

OODA your OODA Loop
Raemon · 2024-10-11T00:50:48.119Z · comments (3)

[link] Things You're Allowed to Do: At the Dentist
rbinnn · 2024-01-28T18:39:33.584Z · comments (16)

Evaluating Sparse Autoencoders with Board Game Models
Adam Karvonen (karvonenadam) · 2024-08-02T19:50:21.525Z · comments (1)

Monthly Roundup #14: January 2024
Zvi · 2024-01-24T12:50:09.231Z · comments (22)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

alexander-gietelink-oldenziel on Lecture Series on Tiling Agents

skills issue.

prep for the model that is coming tomorrow not the model of today

bokov-1 on My latest attempt to understand decision theory: I asked ChatGPT to debate me.

Because, based on the behavior of people here whose intelligence and ideas I have come to respect, this is an important topic.

Clearly I completely lack the background to understand the full theoretical argument. I also lack the background to understand the full theoretical argument behind general relatively and quantum uncertainty. Yet there are many real-world practical examples that I do understand and can work backwards from to get a roughly correct intuition about these ideas.

Every example I have seen for CDT falling short has been a hypothetical scenario that almost certainly never happened.

But if the only scenarios where CDT is a dominated strategy are hypothetical ones, I wouldn't expect smart people on LW to spend so much time and energy on them.

abramdemski on Lecture Series on Tiling Agents

Entering, but not entered. The machines do not yet understand the prompts I write them. (Seriously, it's total garbage still, even with lots of high quality background material in the context.)

bokov-1 on My latest attempt to understand decision theory: I asked ChatGPT to debate me.

Thank you for responding to my post despite its negative rating.

Can you, as a human, give any practical real-world examples that do not rely on non-existent tech where anything outperforms non-naive CDT?

By non-naive I mean CDT that isn't myopically just trying to maximize the immediate payoff but rather trying to maximize the long term value to the player into account future interactions, reputation, uncertainty about causal relationships, etc.

henry-sleight on [deleted]

Currently, during research programs such as MATS, many impactful AI Safety projects are being worked on

I think you could get to the problem faster than this. As I understand it, you're trying to motivate the shared repo by thinking about all of the duplicated work happening across the community & how valuable it would be for people trying to learn this style of research for the first time to work from a shared foundation.

I think this is a pretty complex problem and needs to be called out more explicitly. Something like:

[For many early-career researchers, there's an unnecessarily steep learning curve for even figuring out what good norms for their research code should look like in the first place. We're all for people learning and trying things for themselves, but they should do that on top of a solid, trusted, well documented foundation. That's why things like e.g. the ARENA curriculum are so valuable.

But there aren't standardised templates/repos for doing most of the work in empirical alignment research, and we think this probably slows down new researchers a lot, and requires them to unnecessarily duplicate work and make mistakes that they might not notice are slowing them down. ML research in general involves so much tinkering and figuring things out, that building from a strong template can be a meaningful speedup and provide a helpful initial learning experience.

For the MATS 7 scholars mentored by Ethan, Jan, Fabien, Mrinank, and others from the Anthropic Alignment Science team, we have created....

alexander-gietelink-oldenziel on Lecture Series on Tiling Agents

Mmm. You are entering the Cyborg Era. The only ideas you may take to the next epoch are those that can be uploaded to the machine intelligence.

jbash on In Defense of a Butlerian Jihad

Yeah, I’m curious.

OK...

Some of this kind of puts words in your mouth by extrapolating from similar discussions with others. I apologize in advance for anything I've gotten wrong.

What's so great about failure?

This one is probably the simplest from my viewpoint, and I bet it's the one that's you'll "get" the least. Because it's basically my not "getting" your view at a very basic level.

Why would you ever even want to be able to fail big, in a way that would follow you around? What actual value do you get out of it? Failure in itself is valuable to you?

Wut?

It feels to me like a weird need to make your whole life into some kind of game to be "won" or "lost", or some kind of gambling addiction or something.

And I do have to wonder if there may not be a full appreciation for what crushing failure really is.

Failure is always an option

If you're in the "UBI paradise", it's not like you can't still succeed or fail. Put 100 years into a project. You're gonna feel the failure if it fails, and feel the success if it succeeds.

That's artificial? Weak sauce? Those aren't real real stakes? You have to be an effete pampered hothouse flower to care about that kind of made-up stuff?

Well, the big stakes are already gone. If you're on Less Wrong, you probably don't have much real chance of failing so hard that you die, without intentionally trying. Would your medieval farmer even recognize that your present stakes are significant?

... and if you care, your social prestige, among whoever you care about, can always be on the table, which is already most of what you're risking most of the time.

Basically, it seems like you're treating a not-particularly-qualitative change as bigger than it is, and privileging the status quo.

What agency?

Agency is another status quo issue.

Everybody's agency is already limited, severely and arbitrarily, but it doesn't seem to bother them.

Forces mostly unknown and completely beyond your control have made a universe in which you can exist, and fitted you for it. You depend on the fine structure constant. You have no choice about whether it changes. You need not and cannot act to maintain the present value. I doubt that makes you feel your agency is meaningless.

You could be killed by a giant meteor tomorrow, with no chance of acting to change that. More likely, other humans could kill you, still in a way you couldn't influence, for reasons you couldn't change and might never learn. You will someday die of some probably unchosen cause. But I bet none of this worries you on the average day. If it does, people will worry about you.

The Grand Sweep of History is being set by chaotically interacting causes, both natural and human. You don't know what most of them are. If you're one of a special few, you may be positioned to Change History by yourself... but you don't know if you are, what to do, or what the results would actually be. Yet you don't go around feeling like a leaf in the wind.

The "high impact" things that you do control are pretty randomly selected. You can get into Real Trouble or gain Real Advantages, but how is contingent, set by local, ephemeral circumstances. You can get away with things that would have killed a caveman, and you can screw yourself in ways you couldn't easily even explain to a caveman.

Yet, even after swallowing all the existing arbitrariness, new arbitrariness seems not-OK. Imagine a "UBI paradise", except each person gets a bunch of random, arbitrary, weird Responsibilities, none of them with much effect on anything or anybody else. Each Responsibility is literally a bad joke. But the stakes are real: you're Shot at Dawn if you don't Meet Your Responsibilities. I doubt you'd feel the Meaning very strongly.

... even though some of the human-imposed stuff we have already can seem too close to a bad joke.

The upshot is that it seems the "important" control people say they need is almost exactly the control they're used to having (just as the failures they need to worry about are suspiciously close to failures they presently have to worry about). Like today's scope of action is somehow automatically optimal by natural law.

That feels like a lack of imagination or flexibility.

And I definitely don't feel that way. There are things I'd prefer to keep control over, but they're not exactly the things I control today, and don't fall neatly into (any of) the categories people call "meaningful". I'd probably make some real changes in my scope of control if I could.

What about everybody else?

It's all very nice to talk about being able to fail, but you don't fail in a vaccuum. You affect others. Your "agentic failure" can be other people's "mishap they don't control". It's almost impossible to totally avoid that. Even if you want that, why do you think you should get it?

The Universe doesn't owe you a value system

This is a bit nebulous, and not dead on the topic of "stakes", and maybe even a bit insulting... but I also think it's related in an important way, and I don't know a better way to say it clearly.

I always feel a sense that what people who talk about "meaning" really want is value realism. You didn't say this, but this is what I feel like I see underneath practically everybody's talk about meaning:

Gosh darn it, there should be some external, objective, sharable way to assign Real Value to things. Only things that Real Value are "meaningful.

And if there is no such thing, it's important not to accept it, not really, not on a gut level...

... because I need it, dammit!

Say that or not, believe it or not, feel it or not, your needs, real or imagined, don't mean anything to the Laws that Govern All. They don't care to define Real Value, and they don't.

You get to decide what matters to you, and that means you have to decide what matters to you. Of course what you pick is ultimately caused by things you don't control, because you are caused by things you don't control. That doesn't make it any less yours. And it won't exactly match anybody else.

... and choosing to need the chance to fail, because it superficially looks like an externally imposed part of the Natural Order(TM), seems unfortunate. I mean, if you can avoid it.

"But don't you see, Sparklebear? The value was inside of YOU all the time!"

moonlight on Exercise: Solve "Thinking Physics"

A practical exercise which is both fun and helps me think better? Sign me up.

I definitely enjoyed doing thinking physics exercises in my free time, they feel similar to chess in the way that they're a fun activity to do in my free time while also making me feel like I'm spending my time doing something really useful, which is really great to feel.

They also provide a tangible way of seeing your "prediction ability" for your own thinking and planning improve, which is helpful in staying motivated in regard to self-improvement exercises.

I can recommend to anyone on the fence about this to try their hands at a few thinking physics exercises!

josh-you on Implications of the inference scaling paradigm for AI safety

In Holden Karnofsky's "AI Could Defeat All Of Us Combined" a plausible existential risk threat model is described, in which a swarm of human-level AIs outmanoeuvre humans due to AI's faster cognitive speeds and improved coordination, rather than qualitative superintelligence capabilities. This scenario is predicated on the belief that "once the first human-level AI system is created, whoever created it could use the same computing power it took to create it in order to run several hundred million copies for about a year each." If the first AGIs are as expensive to run as o3-high (costing ~$3k/task), this threat model seems much less plausible.

I wonder how different the reasoning paradigm is, actually, from the picture presented here. After all, running a huge number of AI copies in parallel is... scaling up test-time compute.

The overhang argument is a rough analogy anyway. I think you are invoking the intuition of replacing the AI equivalent of a very large group of typical humans with the AI equivalent of a small number of ponderous geniuses, but those analogies are going to be highly imperfect in practice.

viliam on CstineSublime's Shortform

especially if it controls your social media feed

but... it already does :(

I mean, on facebook and xitter and reddit; I am still free to control my browsing of substack

and yes, applying the same level of control to my real life sounds like a bad idea