LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

GPT4o is still sensitive to user-induced bias when writing code
Reed (ThomasReed) · 2024-09-22T21:04:54.717Z · comments (0)

[link] A (paraconsistent) logic to deal with inconsistent preferences
B Jacobs (Bob Jacobs) · 2024-07-14T11:17:45.426Z · comments (2)

[Aspiration-based designs] A. Damages from misaligned optimization – two more models
Jobst Heitzig · 2024-07-15T14:08:15.716Z · comments (0)

The Existential Dread of Being a Powerful AI System
testingthewaters · 2024-09-26T10:56:32.904Z · comments (1)

[link] Memorising molecular structures
dkl9 · 2024-07-12T22:40:42.307Z · comments (0)

Exploring Shard-like Behavior: Empirical Insights into Contextual Decision-Making in RL Agents
Alejandro Aristizabal (alejandro-aristizabal) · 2024-09-29T00:32:42.161Z · comments (0)

Avoiding jailbreaks by discouraging their representation in activation space
Guido Bergman · 2024-09-27T17:49:20.785Z · comments (2)

'Chat with impactful research & evaluations' (Unjournal NotebookLMs)
david reinstein (david-reinstein) · 2024-09-28T00:32:16.845Z · comments (0)

Retrieval Augmented Genesis
João Ribeiro Medeiros (joao-ribeiro-medeiros) · 2024-10-01T20:18:01.836Z · comments (0)

Against Job Boards: Human Capital and the Legibility Trap
vaishnav92 · 2024-10-24T20:50:50.266Z · comments (1)

Another UFO Bet
codyz · 2024-11-01T01:55:27.301Z · comments (9)

Seeking mentorship
Kevin Afachao (kevin-afachao) · 2024-09-21T16:54:58.353Z · comments (0)

If I care about measure, choices have additional burden (+AI generated LW-comments)
avturchin · 2024-11-15T10:27:15.212Z · comments (9)

Metastrategy get-started guide
Tahp · 2024-06-25T15:04:11.542Z · comments (1)

[question] How do we know dreams aren't real?
Logan Zoellner (logan-zoellner) · 2024-08-22T12:41:57.380Z · answers+comments (31)

Agency overhang as a proxy for Sharp left turn
Eris (anton-zheltoukhov) · 2024-11-07T12:14:24.333Z · comments (0)

Scattered thoughts on what it means for an LLM to believe
TheManxLoiner · 2024-11-06T22:10:29.429Z · comments (3)

Using LLM's for AI Foundation research and the Simple Solution assumption
Donald Hobson (donald-hobson) · 2024-09-24T11:00:53.658Z · comments (0)

The Future of Work: How Can Policymakers Prepare for AI's Impact on Labor Markets?
davidconrad · 2024-06-24T14:18:55.023Z · comments (0)

Biasing VLM Response with Visual Stimuli
Jaehyuk Lim (jason-l) · 2024-10-03T18:04:31.474Z · comments (0)

[link] An "Observatory" For a Shy Super AI?
Sherrinford · 2024-09-27T21:22:40.296Z · comments (0)

Apply to be a mentor in SPAR!
agucova · 2024-11-05T21:32:45.797Z · comments (0)

Using Narrative Prompting to Extract Policy Forecasts from LLMs
Max Ghenis (MaxGhenis) · 2024-11-05T04:37:52.004Z · comments (0)

[link] Why People in Poverty Make Bad Decisions
James Stephen Brown (james-brown) · 2024-07-15T23:40:32.116Z · comments (8)

[link] Join the $10K AutoHack 2024 Tournament
Paul Bricman (paulbricman) · 2024-09-25T11:54:20.112Z · comments (0)

[link] Launching the Respiratory Outlook 2024/25 Forecasting Series
ChristianWilliams · 2024-07-17T19:51:05.380Z · comments (0)

Mentorship in AGI Safety: Applications for mentorship are open!
Valentin2026 (Just Learning) · 2024-06-28T14:49:48.501Z · comments (0)

[link] Risk Overview of AI in Bio Research
J Bostock (Jemist) · 2024-07-15T00:04:41.818Z · comments (0)

[link] Linkpost: Hypocrisy standoff
Chris_Leong · 2024-09-29T14:27:19.175Z · comments (1)

[question] What are some positive developments in AI safety in 2024?
Satron · 2024-11-15T10:32:39.541Z · answers+comments (0)

A simple text status can change something
nextcaller · 2024-06-23T18:48:58.580Z · comments (0)

Differential knowledge interconnection
Roman Leventov · 2024-10-12T12:52:36.267Z · comments (0)

35 Interactive Learning Modules Relevant to EAs / Effective Altruism (that are all free)
spencerg · 2024-06-23T17:57:44.636Z · comments (0)

That which can be destroyed by the truth, should be assumed to should be destroyed by it
Thac0 · 2024-07-09T19:39:57.887Z · comments (0)

[link] How long should political (and other) terms be?
ohmurphy · 2024-10-14T21:38:43.050Z · comments (0)

[question] Why would ASI share any resources with us?
Satron · 2024-11-13T23:38:36.535Z · answers+comments (5)

Educational CAI: Aligning a Language Model with Pedagogical Theories
Bharath Puranam (bharath-puranam) · 2024-11-01T18:55:26.993Z · comments (1)

New Capabilities, New Risks? - Evaluating Agentic General Assistants using Elements of GAIA & METR Frameworks
Tej Lander (tej-lander) · 2024-09-29T18:58:56.253Z · comments (0)

[link] Internal music player: phenomenology of earworms
dkl9 · 2024-11-14T23:29:48.383Z · comments (0)

Meta: On viewing the latest LW posts
quiet_NaN · 2024-08-25T19:31:39.008Z · comments (2)

[link] Launching the AI Forecasting Benchmark Series Q3 | $30k in Prizes
ChristianWilliams · 2024-07-08T17:20:54.717Z · comments (0)

[link] AISN #38: Supreme Court Decision Could Limit Federal Ability to Regulate AI Plus, “Circuit Breakers” for AI systems, and updates on China’s AI industry
Corin Katzke (corin-katzke) · 2024-07-09T19:28:29.338Z · comments (0)

[link] Should we abstain from voting? (In nondeterministic elections)
B Jacobs (Bob Jacobs) · 2024-10-02T10:07:43.167Z · comments (6)

[question] If the DoJ goes through with the Google breakup,where does Deepmind end up?
O O (o-o) · 2024-10-12T05:06:50.996Z · answers+comments (1)

[question] Artificial V/S Organoid Intelligence
10xyz (10xyz-coder) · 2024-10-23T14:31:46.385Z · answers+comments (0)

[question] AMA: International School Student in China
Novice · 2024-10-01T06:00:16.282Z · answers+comments (0)

Some reasons to start a project to stop harmful AI
Remmelt (remmelt-ellen) · 2024-08-22T16:23:34.132Z · comments (0)

[link] Is P(Doom) Meaningful? Bayesian vs. Popperian Epistemology Debate
Liron · 2024-11-09T23:39:30.039Z · comments (0)

Bellevue Library Meetup - Nov 23
Cedar (xida-ren) · 2024-11-09T23:05:02.452Z · comments (1)

[link] Formalize the Hashiness Model of AGI Uncontainability
Remmelt (remmelt-ellen) · 2024-11-09T16:10:05.032Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

kave on Making a conservative case for alignment

I thought this title meant the post would be making a case from conservative (i.e. minimal) assumptions.

Maybe change the title to "making a politically conservative case for alignment" or something?

joe-rogero on What are Emotions?

I think you have correctly noticed an empirical fact about emotions (they tend to be preferred or dispreferred by animals who experience them) but are drawing several incorrect conclusions therefrom.

First and foremost, my model of the universe leaves no room for it valuing anything. "Values" happen to be a thing possessed by thinking entities; the universe cares not one whit more for our happiness or sadness than the rules of the game of chess care whether the game is won by white or black. Values happen inside minds, they are not fundamental to the universe in any way.

Secondly, emotions are not exactly and always akin to terminal values, even if they seem to hang out together. For a counterexample to the claim "emotions are valued positively or negatively", consider the case of curiosity, which you've labeled an emotional value. I don't know about you, but I would not say that feeling curious about something "feels good". I would almost call it a category error to even try to label the feeling as "good" or "bad". It certainly feels good to learn something, or to gain insight, or to satisfy curiosity, but the sense of curiosity itself is neutral at best.

On top of that, I would describe myself as reflectively endorsing the process of learning for its own sake, not because of the good feeling it produces. The good feeling is a bonus. The emotion of curiosity is a useful impetus to getting the thing I actually value, insight.

I also think you're calling something universal to humans when it really isn't. For instance, you're underestimating the degree to which masochists are genuinely wired differently, such that they sometimes interpret a neural pain signal that other humans would parse as "bad" as instead feeling very good. There are many similar examples where this model breaks down - for instance, in the concept of "loving to hate someone" i.e. the positive valence that comes with a feeling of righteous anger at Sauron.

I agree that there are good reasons to value the feelings of others. I'm not sure the Ship of Theseus argument is one of them, really, but I'm also not sure I fully understood your point there.

I agree that AI probably won't feel anything. I disagree that we would expect its "soul searching" to land anywhere close to valuing human emotions. I expect AIs grown by gradient descent to end up a massive knot of conflicting values, similar to how evolution made humans a massive knot of conflicting values, but I expect the AI's efforts to unravel this knot will land it very far away from us, if only because the space of values it is exploring is so terribly vast and the cluster of human values so terribly small in comparison. There's no moral force that impels the AI to value things like joy or friendship; the fact that we value them is a happy accident.

I also suspect that some of the things you're calling "material terminal values" are actually better modeled as instrumental, which is why they seem so squirrely and changeable sometimes. I value tabletop RPGs because I find them fun, and people having fun is the terminal goal (well, the main one). If tabletop RPGs stopped being fun, then I'd lose interest. I suspect something similar may be going on with valuing kinetic sculptures - I'm guessing you don't want to tile the universe with them, you simply enjoy the process of building them.

(People change their terminal values sometimes too, especially when they notice a conflict between two or more of them, but it's more rare. I know mine have changed somewhat.)

I think maybe the missing piece is that it's perfectly okay to say "I value these things for their own sake" without seeking a reason that everyone else and their universe should too.

seth-herd on Making a conservative case for alignment

This might be the most important alignment idea in a while.

Making an honest argument based on ideological agreements is a solidly good idea.

"Alignment" meaning alignment to one group is not ideal. But I'm afraid it's inevitable. Technical alignment will always be easier with a simpler alignment target. For instance, making an AGI aligned to the good of all humanity is much trickier than aligning it to want to do what one particular human says to do. Taking directions is almost completely a subset of inferring desires, and one person (or a small group) is a subset of all of humanity — and much easier to define.

If that human (or their designated successor(s)) has any compassion and any sense, they'll make their own and their AGIs goal to create fully value-aligned AGI. Instruction following or Intent alignment can be a stepping-stone to value alignment [LW · GW].

It is time to reach across the aisle. The reasons you mention are powerful. Another is to avoid polarization on this issue. Polarization appears to have completely derailed the discussion of climate change, similar to alignment in being new and science-based. Curernt guesses are that the US democratic party would be prone to pick up the AI safety banner — which could polarize alignment. Putting existential risk, at least, on the conservative side might be a better idea for the next four years, and for longer if it reduces polarization by aligning US liberal concerns about harms to individuals (e.g., artists) and bias in AI systems, with conservative concerns about preserving our values and our way of life(e.g., concerns we'll all die or be obsoleted)

jesseclifton on Why I’m not a Bayesian

This paper discusses two semantics for Bayesian inference in the case where the hypotheses under consideration are known to be false.

Verisimilitude: p(h) = the probability that that h is closest to the truth [according to some measure of closeness-to-truth] among hypotheses under consideration
Counterfactual: p(h) = the probability of h given the (false) supposition that one of the hypotheses under consideration is true

In any case, it’s unclear what motivates making decisions by maximizing expected value against such probabilities, which seems like a problem for boundedly rational decision-making.

sharmake-farah on o1 is a bad idea

To answer the question:

So, as a speculative example, further along in the direction of o1 you could have something like MCTS help train these things to solve very difficult math problems, with the sparse feedback being given for complete formal proofs.

Similarly, playing text-based video games, with the sparse feedback given for winning.

Similarly, training CoT to reason about code, with sparse feedback given for predictions of the code output.

Etc.

You think these sorts of things just won't work well enough to be relevant?

Assuming the goals are done over say 1-10 year timescales, or maybe even just 1 year timescales with no reward-shaping/giving feedback for intermediate rewards at all, I do think that the system won't work well enough to be relevant, since it requires way too much time training, and plausibly way too much compute depending on how sparse the feedback actually is.

Other AIs relying on much denser feedback will already rule the world before that happens.

[insert standard skepticism about these sorts of generalizations when generalizing to superintelligence]

But what lesson do you think you can generalize, and why do you think you can generalize that?

Alright, I'll give 2 lessons that I do think generalize to superintelligence:

The data is a large factor in both it's capabilities and alignment, and alignment strategies should not ignore the data sources when trying to make predictions or trying to intervene on the AI for alignment purposes.
Instrumental convergence in a weak sense will likely exist, because having some ability to get more resources are useful for a lot of goals, but the extremely unconstrained versions of instrumental convergence often assumed where an AI will grab so much power as to effectively control humanity is unlikely to exist, given the constraints and feedback given to the AI.

For 1, the basic answer for why is because a lot of AI success in fields like Go and language modeling etc was jumpstarted by good data.

More importantly, I remember this post, and while I think it overstates things in stating that an LLM is just the dataset (it probably isn't now with o1), it does matter that LLMs are influenced by their data sources.

For 2, the basic reason for this is that the strongest capabilities we have seen that come out of RL either require immense amounts of data on pretty narrow tasks, or non-instrumental world models.

This is because constraints prevent you from having to deal with the problem where you produce completely useless RL artifacts, and evolution got around this constraint by accepting far longer timelines and far more computation in FLOPs than the world economy can tolerate.

jay on Quick look: applications of chaos theory

The ancients considered everything to be the work of spirits. The medievals considered the cosmos to be a kingdom. Early moderns likened the universe to a machine. Every age has its dominant metaphors. All of them are oversimplifications of a more complex truth.

milan-w on The Online Sports Gambling Experiment Has Failed

Sure, but people in general are really bad at that kind of precise quantitative world-knowledge. They have pretty weak priors and a mostly-anecdotes-and-gut-feeling-informed negative opinion of gambling, such that when presented with the 28% percent increase in bankruptcy study they go "ok sure that's compatible with my worldview" instead of being surprised and taking the evidence as a big update.

jay on Quick look: applications of chaos theory

Suppose you had an identical twin with identical genes and, until very recently, an identical history. From the perspective of anyone else, you're similar enough to be interchangeable with each other. But from your perspective, the twin would be a different person.

The brain is you, full stop. It isn't running a computer program; its hardware and software are inseparable and developed together over the course of your life. In other words, the hardware/software distinction doesn't apply to brains.

cameron-berg on Making a conservative case for alignment

Whether one is an accelerationist, Pauser, or an advocate of some nuanced middle path, the prospects/goals of everyone are harmed if the discourse-landscape becomes politicized/polarized.
...
I just continue to think that any mention, literally at all, of ideology or party is courting discourse-disaster for all, again no matter what specific policy one is advocating for.
...
Like a bug stuck in a glue trap, it places yet another limb into the glue in a vain attempt to push itself free.

I would agree in a world where the proverbial bug hasn't already made any contact with the glue trap, but this very thing has clearly already been happening for almost a year in a troubling direction. The political left has been fairly casually 'Everything-Bagel-izing' AI safety, largely in smuggling in social progressivism that has little to do with the core existential risks, and the right, as a result, is increasingly coming to view AI safety as something approximating 'woke BS stifling rapid innovation.' The fly is already a bit stuck.

The point we are trying to drive home here is precisely what you're also pointing at: avoiding an AI-induced catastrophe is obviously not a partisan goal [LW · GW]. We are watching people in DC slowly lose sight of this critical fact. This is why we're attempting to explain here why basic AI x-risk concerns are genuinely important regardless of one's ideological leanings. ie, genuinely important to left-leaning and right-leaning people alike. Seems like very few people have explicitly spelled out the latter case, though, which is why we thought it would be worthwhile to do so here.

ryan_greenblatt on Mark Xu's Shortform

By "control", I mean AI Control [? · GW]: approaches aiming to ensure safety and benefit from AI systems, even if they are goal-directed and are actively trying to subvert your control measures.

AI control stops working once AIs are sufficiently capable (and likely don't work for all possible deployments that might eventually be otherwise desirable), but there could be other approaches that work at that point. In particular aligning systems.

The main hope I think about is something like:

Use control until AIs are capable enough that if we trusted them, we could obsolete top human scientists and experts.
Use our controlled AI labor to do the work needed to make systems which are capable enough, trustworthy enough (via alignment), and philosophically competent enough that we can safely hand things off to them. (There might be some intermediate states to get to here.)
Have these systems which totally obsolete us figure out what to do, including figuring out how to aligning more powerful systems as needed.

We discuss our hopes more in this post [LW · GW].