LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Virtue is a Vector
robotelvis · 2024-09-10T03:02:45.737Z · comments (1)

Moral Trade, Impact Distributions and Large Worlds
Larks · 2024-09-20T03:45:56.273Z · comments (0)

[link] Michael Streamlines on Buddhism
Chris_Leong · 2024-08-09T04:44:52.126Z · comments (0)

[link] [Linkpost] Automated Design of Agentic Systems
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-08-19T23:06:06.669Z · comments (1)

[question] What do you expect AI capabilities may look like in 2028?
nonzerosum · 2024-08-23T16:59:53.007Z · answers+comments (5)

Measuring Visual Sycophancy in Multimodal Models
Jaehyuk Lim (jason-l) · 2024-08-27T22:02:47.917Z · comments (0)

[link] Universal dimensions of visual representation
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-08-28T10:38:58.396Z · comments (0)

[link] Approval-Seeking ⇒ Playful Evaluation
Jonathan Moregård (JonathanMoregard) · 2024-08-28T21:03:51.244Z · comments (0)

[link] Can AI agents learn to be good?
Ram Rachum (ram@rachum.com) · 2024-08-29T14:20:04.336Z · comments (0)

On epistemic autonomy
sanyer (santeri-koivula) · 2024-08-31T18:50:43.377Z · comments (0)

Inquisitive vs. adversarial rationality
gb (ghb) · 2024-09-18T13:50:09.198Z · comments (5)

[question] If I ask an LLM to think step by step, how big are the steps?
ryan_b · 2024-09-13T20:30:50.558Z · answers+comments (1)

Fake Blog Posts as a Problem Solving Device
silentbob · 2024-08-31T09:22:54.513Z · comments (0)

[link] Checking public figures on whether they "answered the question" quick analysis from Harris/Trump debate, and a proposal
david reinstein (david-reinstein) · 2024-09-11T20:25:27.845Z · comments (4)

Funding for programs and events on global catastrophic risk, effective altruism, and other topics
abergal · 2024-08-14T23:59:48.146Z · comments (0)

[link] Is Redistributive Taxation Justifiable? Part 1: Do the Rich Deserve their Wealth?
Alexander de Vries (alexander-de-vries) · 2024-09-05T10:23:08.958Z · comments (20)

[question] Does a time-reversible physical law/Cellular Automaton always imply the First Law of Thermodynamics?
Noosphere89 (sharmake-farah) · 2024-08-30T15:12:28.823Z · answers+comments (11)

Behavior Cloning for Alignment & Immortality
Dev.Errata (ethan.roland) · 2024-08-17T23:42:56.699Z · comments (1)

Deception and Jailbreak Sequence: 2. Iterative Refinement Stages of Jailbreaks in LLM
Winnie Yang (winnie-yang) · 2024-08-28T08:41:38.967Z · comments (2)

One person's worth of mental energy for AI doom aversion jobs. What should I do?
Lorec · 2024-08-26T01:29:01.700Z · comments (16)

Denver USA - ACX Meetups Everywhere Fall 2024
Eneasz · 2024-08-29T18:40:53.332Z · comments (0)

Sequence overview: Welfare and moral weights
MichaelStJules · 2024-08-15T04:22:32.567Z · comments (0)

A gentle introduction to sparse autoencoders
Nick Jiang (nick-jiang) · 2024-09-02T18:11:47.086Z · comments (0)

Halifax Canada - ACX Meetups Everywhere Fall 2024
interstice · 2024-08-29T18:39:12.490Z · comments (0)

[link] Contra Yudkowsky on 2-4-6 Game Difficulty Explanations
Josh Hickman (josh-hickman) · 2024-09-08T16:13:33.187Z · comments (1)

[link] Redundant Attention Heads in Large Language Models For In Context Learning
skunnavakkam · 2024-09-01T20:08:48.963Z · comments (0)

[link] AISafety.info: What are Inductive Biases
Algon · 2024-09-19T17:26:24.581Z · comments (0)

[link] [Linkpost] Interpretable Analysis of Features Found in Open-source Sparse Autoencoder (partial replication)
Fernando Avalos (fernando-avalos) · 2024-09-09T03:33:53.548Z · comments (1)

Understanding Hidden Computations in Chain-of-Thought Reasoning
rokosbasilisk · 2024-08-24T16:35:03.907Z · comments (0)

[question] Can subjunctive dependence emerge from a simplicity prior?
Daniel C (harper-owen) · 2024-09-16T12:39:35.543Z · answers+comments (0)

[link] Optimising under arbitrarily many constraint equations
dkl9 · 2024-09-12T14:59:28.475Z · comments (0)

[question] Practical advice for secure virtual communication post easy AI voice-cloning?
hmys (the-cactus) · 2024-08-09T17:32:33.458Z · answers+comments (5)

[link] A Nonconstructive Existence Proof of Aligned Superintelligence
Roko · 2024-09-12T03:20:09.531Z · comments (64)

[link] Metaculus's 'Minitaculus' Experiments — Collaborate With Us
ChristianWilliams · 2024-08-26T20:44:32.125Z · comments (0)

Thirty random thoughts about AI alignment
Lysandre Terrisse · 2024-09-15T16:24:10.572Z · comments (1)

Does “Ultimate Neartermism” via Eternal Inflation dominate Longtermism in expectation?
Jordan Arel · 2024-08-17T22:28:21.849Z · comments (1)

Forever Leaders
Justice Howard (justice-howard) · 2024-09-14T20:55:39.095Z · comments (9)

[link] Could Things Be Very Different?—How Historical Inertia Might Blind Us To Optimal Solutions
James Stephen Brown (james-brown) · 2024-09-11T09:53:07.474Z · comments (0)

[link] SCP Foundation - Anti memetic Division Hub
landscape_kiwi · 2024-09-15T13:40:52.691Z · comments (1)

Food, Prison & Exotic Animals: Sparse Autoencoders Detect 6.5x Performing Youtube Thumbnails
Louka Ewington-Pitsos (louka-ewington-pitsos) · 2024-09-17T03:52:43.269Z · comments (2)

Budapest Hungary - ACX Meetups Everywhere Fall 2024
Timothy Underwood (timothy-underwood-1) · 2024-08-29T18:37:41.313Z · comments (0)

Just How Good Are Modern Chess Computers?
nem · 2024-09-19T18:57:21.254Z · comments (1)

[link] Exposure can’t rule out disasters
Chipmonk · 2024-08-15T17:03:37.259Z · comments (19)

[link] The AI regulator’s toolbox: A list of concrete AI governance practices
Adam Jones (domdomegg) · 2024-08-10T21:15:09.265Z · comments (1)

Democracy beyond majoritarianism
Arturo Macias (arturo-macias) · 2024-09-03T15:10:56.284Z · comments (2)

A Taxonomy Of AI System Evaluations
Maxime Riché (maxime-riche) · 2024-08-19T09:07:45.224Z · comments (0)

[question] How do we know dreams aren't real?
Logan Zoellner (logan-zoellner) · 2024-08-22T12:41:57.380Z · answers+comments (31)

The Carnot Engine of Economics
StrivingForLegibility · 2024-08-09T15:59:40.458Z · comments (0)

Thinking About Propensity Evaluations
Maxime Riché (maxime-riche) · 2024-08-19T09:23:55.091Z · comments (0)

Meta: On viewing the latest LW posts
quiet_NaN · 2024-08-25T19:31:39.008Z · comments (2)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

tamsin-leake on Being nicer than Clippy

I wonder how much of those seemingly idealistic people retained power when it was available because they were indeed only pretending to be idealistic. Assuming one is actually initially idealistic but then gets corrupted by having power in some way, one thing someone can do in CEV that you can't do in real life is reuse the CEV process to come up with even better CEV processes which will be even more likely to retain/recover their just-before-launching-CEV values. Yes, many people would mess this up or fail in some other way in CEV; but we only need one person or group who we'd be somewhat confident would do alright in CEV. Plausibly there are at least a few eg MIRIers who would satisfy this. Importantly, to me, this reduces outer alignment to "find someone smart and reasonable and likely to have good goal-content integrity", which is a matter of social & psychology that seems to be much smaller than the initial problem of outer alignment.
One of the main reasons to do CEV is because we're gonna die of AI soon, and CEV is a way to have infinite time to solve the necessary problems. Another is that even if we don't die of AI, we get eaten by various moloch instead of being able to safely solve the necessary problems at whatever pace is necessary.

tom-r on Bordeaux France - ACX Meetups Everywhere Fall 2024

Because of the rain, we will move to the nearby pub the "Dog and Duck".

The time is 20th at 6pm, but I will be there until at least 9pm !

See you there

signer on RLHF is the worst possible thing done when facing the alignment problem

RLHF does not solve the alignment problem because humans can’t provide good-enough feedback fast-enough.

Yeah, but the point is that the system learns values before an unrestricted AI vs AI conflict.

As mentioned in the beginning, I think the intuition goes that neural networks have a personality trait which we call “alignment”, caused by the correspondence between their values and our values. But “their values” only really makes sense after an unrestricted AI vs AI conflict, since without such conflicts, AIs are just gonna propagate energy to whichever constraints we point them at, so this whole worldview is wrong.

I mean, if your definition of values doesn't make sense for real systems, then it's the problem of your definition. As a hypothesis describing reality "alignment trait makes AI not splash harm on humans" is coherent enough. So the question is how do you know it is unlikely to happen?

This has not lead to the destruction of humanity yet because the biggest adversaries have kept their conflicts limited (because too much conflict is too costly) so no entity has pursued an end by any means necessary. But this only works because there’s a sufficiently small number of sufficiently big adversaries (USA, Russia, China, …), and because there’s sufficiently much opportunity cost.

First, "alignment is easy" is compatible with "we need to keep the set of big adversaries small". But more generally, without numbers it seems like generalized anti-future-technology argument - what's stopping human-regulation mechanisms from solving this adversarial problem, that didn't stop them from solving previous adversarial problems?

It makes conflict more viable for small adversaries against large adversaries

Not necessary? It's not unconceivable for future defense being more effective than offence (trivially true if "defense" is not giving AI to attackers). It kind of required for any future where humans have more power, than in present day?

wei-dai on Being nicer than Clippy

Once they get into CEV, they may not want to defer to others anymore, or may set things up with a large power/status imbalance between themselves and everyone else which may be detrimental to moral/philosophical progress. There are plenty of seemingly idealistic people in history refusing to give up or share power once they got power. The prudent thing to do seems to never get that much power in the first place, or to share it as soon as possible.
If you're pretty sure you will defer to others once inside CEV, then you might as well do it outside CEV due to #1 in my grandparent comment.

tamsin-leake on Being nicer than Clippy

the main arguments for the programmers including all of [current?] humanity in the CEV "extrapolation base" […] apply symmetrically to AIs-we're-sharing-the-world-with at the time

I think timeless values might possibly help resolve this; if some {AIs that are around at the time} are moral patients, then sure, just like other moral patients around they should get a fair share of the future.

If an AI grabs more resources than is fair, you do the exact same thing as if a human grabs more resources than is fair: satisfy the values of moral patients (including ones who are no longer around) not weighed by how much leverage they current have over the future, but how much leverage they would have over the future if things had gone more fairly/if abuse/powergrab/etc wasn't the kind of thing that gets your more control of the future.

"Sorry clippy, we do want you to get some paperclips, we just don't want you to get as many paperclips as you could if you could murder/brainhack/etc all humans, because that doesn't seem to be a very fair way to allocate the future." — and in the same breath, "Sorry Putin, we do want you to get some of whatever-intrinsic-values-you're-trying-to-satisfy, we just don't want you to get as much as ruthlessly ruling Russia can get you, because that doesn't seem to be a very fair way to allocate the future."

And this can apply regardless of how much of clippy already exists by the time you're doing CEV.

ape-in-the-coat on What's the Deal with Logical Uncertainty?

It's an interesting question, but its a different, more complex problem than simply not knowing googolth digit of pi and trying to estimate whether it's even or odd.

tamsin-leake on Being nicer than Clippy

trying to solve morality by themselves

It doesn't have to be by themselves; they can defer to others inside CEV, or come up with better schemes that their initial CEV inside CEV and then defer to that. Whatever other solutions than "solve everything on your own inside CEV" might exist, they can figure those out and defer to them from inside CEV. At least that's the case in my own attempts at implementing CEV in math (eg QACI).

tailcalled on Why I'm bearish on mechanistic interpretability: the shards are not in the network

If mechanistic interpretability is the AI equivalent of finding tiny organisms in a microscope, what is the AI equivalent of the tiny organisms?

romeostevensit on The Obliqueness Thesis

You mention 'warp' when talking about cross ontology mapping which seems like your best summary of a complicated intuition. I'd be curious to hear more (I recognize this might not be practical). My own intuition surfaced 'introducing degrees of freedom' a la indeterminacy of translation.

adastra22 on Did Christopher Hitchens change his mind about waterboarding?

While true, that doesn't apply to the years in question, post-2001.