LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Free Will, Determinism, And Choice
Zero Contradictions · 2024-07-06T06:34:41.495Z · comments (3)

[link] Validating / finding alignment-relevant concepts using neural data
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-09-20T21:12:49.267Z · comments (0)

[question] Can agents coordinate on randomness without outside sources?
Mikhail Samin (mikhail-samin) · 2024-07-06T13:43:44.633Z · answers+comments (16)

Expected number of tries
adios (unicode-59bD) · 2024-06-22T19:22:00.756Z · comments (0)

[link] Saving Lives Reduces Over-Population—A Counter-Intuitive Non-Zero-Sum Game
James Stephen Brown (james-brown) · 2024-06-28T19:29:55.238Z · comments (0)

The Xerox Parc/ARPA version of the intellectual Turing test: Class 1 vs Class 2 disagreement
hamishtodd1 · 2024-06-30T15:34:53.729Z · comments (3)

Notes on Tuning Metacognition
JoNeedsSleep (joanna-j-1) · 2024-07-03T19:54:59.732Z · comments (0)

How can I get over my fear of becoming an emulated consciousness?
James Dowdell (james-dowdell) · 2024-07-07T22:02:43.520Z · comments (8)

[link] Memorising molecular structures
dkl9 · 2024-07-12T22:40:42.307Z · comments (0)

Spark in the Dark Guest Spots
jefftk (jkaufman) · 2024-07-14T01:40:05.311Z · comments (0)

[link] A (paraconsistent) logic to deal with inconsistent preferences
B Jacobs (Bob Jacobs) · 2024-07-14T11:17:45.426Z · comments (2)

[Aspiration-based designs] A. Damages from misaligned optimization – two more models
Jobst Heitzig · 2024-07-15T14:08:15.716Z · comments (0)

[Research log] The board of Alphabet would stop DeepMind to save the world
Lucie Philippon (lucie-philippon) · 2024-07-16T04:59:14.874Z · comments (0)

[question] Opinions on Eureka Labs
jmh · 2024-07-17T00:16:02.959Z · answers+comments (2)

Activation Engineering Theories of Impact
kubanetics (jakub-nowak) · 2024-07-18T16:44:33.656Z · comments (1)

[link] Yet Another Critique of "Luxury Beliefs"
ymeskhout · 2024-07-18T18:37:28.703Z · comments (10)

[link] Demography and Destiny
Zero Contradictions · 2024-07-21T20:34:07.176Z · comments (11)

Limitations on the Interpretability of Learned Features from Sparse Dictionary Learning
Tom Angsten (tom-angsten) · 2024-07-30T16:36:06.518Z · comments (0)

[link] Against AI As An Existential Risk
Noah Birnbaum (daniel-birnbaum) · 2024-07-30T19:10:41.156Z · comments (13)

[link] Solutions to problems with Bayesianism
B Jacobs (Bob Jacobs) · 2024-07-31T14:18:27.910Z · comments (0)

[question] Request for AI risk quotes, especially around speed, large impacts and black boxes
Nathan Young · 2024-08-02T17:49:48.898Z · answers+comments (0)

[link] Labelling, Variables, and In-Context Learning in Llama2
Joshua Penman (joshua-penman) · 2024-08-03T19:36:34.721Z · comments (0)

Modelling Social Exchange: A Systematised Method to Judge Friendship Quality
Wynn Walker · 2024-08-04T18:49:30.892Z · comments (0)

LLMs stifle creativity, eliminate opportunities for serendipitous discovery and disrupt intergenerational transfer of wisdom
Ghdz (gal-hadad) · 2024-08-05T18:27:20.709Z · comments (2)

The Pragmatic Side of Cryptographically Boxing AI
Bart Jaworski (bart-jaworski) · 2024-08-06T17:46:21.754Z · comments (0)

[question] Practical advice for secure virtual communication post easy AI voice-cloning?
hmys (the-cactus) · 2024-08-09T17:32:33.458Z · answers+comments (5)

Does “Ultimate Neartermism” via Eternal Inflation dominate Longtermism in expectation?
Jordan Arel · 2024-08-17T22:28:21.849Z · comments (1)

A Taxonomy Of AI System Evaluations
Maxime Riché (maxime-riche) · 2024-08-19T09:07:45.224Z · comments (0)

Understanding Hidden Computations in Chain-of-Thought Reasoning
rokosbasilisk · 2024-08-24T16:35:03.907Z · comments (1)

[link] Metaculus's 'Minitaculus' Experiments — Collaborate With Us
ChristianWilliams · 2024-08-26T20:44:32.125Z · comments (0)

Budapest Hungary - ACX Meetups Everywhere Fall 2024
Timothy Underwood (timothy-underwood-1) · 2024-08-29T18:37:41.313Z · comments (0)

Halifax Canada - ACX Meetups Everywhere Fall 2024
interstice · 2024-08-29T18:39:12.490Z · comments (0)

[link] Contra Yudkowsky on 2-4-6 Game Difficulty Explanations
Josh Hickman (josh-hickman) · 2024-09-08T16:13:33.187Z · comments (1)

[link] [Linkpost] Interpretable Analysis of Features Found in Open-source Sparse Autoencoder (partial replication)
Fernando Avalos (fernando-avalos) · 2024-09-09T03:33:53.548Z · comments (1)

[link] Optimising under arbitrarily many constraint equations
dkl9 · 2024-09-12T14:59:28.475Z · comments (0)

[link] SCP Foundation - Anti memetic Division Hub
landscape_kiwi · 2024-09-15T13:40:52.691Z · comments (1)

Thirty random thoughts about AI alignment
Lysandre Terrisse · 2024-09-15T16:24:10.572Z · comments (1)

[question] Can subjunctive dependence emerge from a simplicity prior?
Daniel C (harper-owen) · 2024-09-16T12:39:35.543Z · answers+comments (0)

GPT4o is still sensitive to user-induced bias when writing code
Reed (ThomasReed) · 2024-09-22T21:04:54.717Z · comments (0)

'Chat with impactful research & evaluations' (Unjournal NotebookLMs)
david reinstein (david-reinstein) · 2024-09-28T00:32:16.845Z · comments (0)

Thoughts on Evo-Bio Math and Mesa-Optimization: Maybe We Need To Think Harder About "Relative" Fitness?
Lorec · 2024-09-28T14:07:42.412Z · comments (6)

Exploring Shard-like Behavior: Empirical Insights into Contextual Decision-Making in RL Agents
Alejandro Aristizabal (alejandro-aristizabal) · 2024-09-29T00:32:42.161Z · comments (0)

Retrieval Augmented Genesis
João Ribeiro Medeiros (joao-ribeiro-medeiros) · 2024-10-01T20:18:01.836Z · comments (0)

[link] AI Safety Newsletter #43: White House Issues First National Security Memo on AI Plus, AI and Job Displacement, and AI Takes Over the Nobels
Corin Katzke (corin-katzke) · 2024-10-28T16:03:39.258Z · comments (0)

[question] How to cite LessWrong as an academic source?
PhilosophicalSoul (LiamLaw) · 2024-11-06T08:28:26.309Z · answers+comments (6)

2025 Q1 Pivotal Research Fellowship (Technical & Policy)
Tobias H (clearthis) · 2024-11-12T10:56:24.858Z · comments (0)

[question] how to truly feel my beliefs?
KvmanThinking (avery-liu) · 2024-11-11T00:04:30.994Z · answers+comments (6)

Against Job Boards: Human Capital and the Legibility Trap
vaishnav92 · 2024-10-24T20:50:50.266Z · comments (1)

Food, Prison & Exotic Animals: Sparse Autoencoders Detect 6.5x Performing Youtube Thumbnails
Louka Ewington-Pitsos (louka-ewington-pitsos) · 2024-09-17T03:52:43.269Z · comments (2)

Forever Leaders
Justice Howard (justice-howard) · 2024-09-14T20:55:39.095Z · comments (9)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

jesseclifton on Why I’m not a Bayesian

This paper discusses two semantics for Bayesian inference in the case where the hypotheses under consideration are known to be false.

Verisimilitude: p(h) = the probability that that h is closest to the truth [according to some measure of closeness-to-truth] among hypotheses under consideration
Counterfactual: p(h) = the probability of h given the (false) supposition that one of the hypotheses under consideration is true

In any case, it’s unclear what motivates making decisions by maximizing expected value against such probabilities, which seems like a problem for boundedly rational decision-making.

sharmake-farah on o1 is a bad idea

To answer the question:

So, as a speculative example, further along in the direction of o1 you could have something like MCTS help train these things to solve very difficult math problems, with the sparse feedback being given for complete formal proofs.

Similarly, playing text-based video games, with the sparse feedback given for winning.

Similarly, training CoT to reason about code, with sparse feedback given for predictions of the code output.

Etc.

You think these sorts of things just won't work well enough to be relevant?

Assuming the goals are done over say 1-10 year timescales, or maybe even just 1 year timescales with no reward-shaping/giving feedback for intermediate rewards at all, I do think that the system won't work well enough to be relevant, since it requires way too much time training, and plausibly way too much compute depending on how sparse the feedback actually is.

Other AIs relying on much denser feedback will already rule the world before that happens.

[insert standard skepticism about these sorts of generalizations when generalizing to superintelligence]

But what lesson do you think you can generalize, and why do you think you can generalize that?

Alright, I'll give 2 lessons that I do think generalize to superintelligence:

The data is a large factor in both it's capabilities and alignment, and alignment strategies should not ignore the data sources when trying to make predictions or trying to intervene on the AI for alignment purposes.
Instrumental convergence in a weak sense will likely exist, because having some ability to get more resources are useful for a lot of goals, but the extremely unconstrained versions of instrumental convergence often assumed where an AI will grab so much power as to effectively control humanity is unlikely to exist, given the constraints and feedback given to the AI.

For 1, the basic answer for why is because a lot of AI success in fields like Go and language modeling etc was jumpstarted by good data.

More importantly, I remember this post, and while I think it overstates things in stating that an LLM is just the dataset (it probably isn't now with o1), it does matter that LLMs are influenced by their data sources.

For 2, the basic reason for this is that the strongest capabilities we have seen that come out of RL either require immense amounts of data on pretty narrow tasks, or non-instrumental world models.

This is because constraints prevent you from having to deal with the problem where you produce completely useless RL artifacts, and evolution got around this constraint by accepting far longer timelines and far more computation in FLOPs than the world economy can tolerate.

jay on Quick look: applications of chaos theory

The ancients considered everything to be the work of spirits. The medievals considered the cosmos to be a kingdom. Early moderns likened the universe to a machine. Every age has its dominant metaphors. All of them are oversimplifications of a more complex truth.

milan-w on The Online Sports Gambling Experiment Has Failed

Sure, but people in general are really bad at that kind of precise quantitative world-knowledge. They have pretty weak priors and a mostly-anecdotes-and-gut-feeling-informed negative opinion of gambling, such that when presented with the 28% percent increase in bankruptcy study they go "ok sure that's compatible with my worldview" instead of being surprised and taking the evidence as a big update.

jay on Quick look: applications of chaos theory

Suppose you had an identical twin with identical genes and, until very recently, an identical history. From the perspective of anyone else, you're similar enough to be interchangeable with each other. But from your perspective, the twin would be a different person.

The brain is you, full stop. It isn't running a computer program; its hardware and software are inseparable and developed together over the course of your life. In other words, the hardware/software distinction doesn't apply to brains.

cameron-berg on Making a conservative case for alignment

Whether one is an accelerationist, Pauser, or an advocate of some nuanced middle path, the prospects/goals of everyone are harmed if the discourse-landscape becomes politicized/polarized.
...
I just continue to think that any mention, literally at all, of ideology or party is courting discourse-disaster for all, again no matter what specific policy one is advocating for.
...
Like a bug stuck in a glue trap, it places yet another limb into the glue in a vain attempt to push itself free.

I would agree in a world where the proverbial bug hasn't already made any contact with the glue trap, but this very thing has clearly already been happening for almost a year in a troubling direction. The political left has been fairly casually 'Everything-Bagel-izing' AI safety, largely in smuggling in social progressivism that has little to do with the core existential risks, and the right, as a result, is increasingly coming to view AI safety as something approximating 'woke BS stifling rapid innovation.' The fly is already a bit stuck.

The point we are trying to drive home here is precisely what you're also pointing at: avoiding an AI-induced catastrophe is obviously not a partisan goal [LW · GW]. We are watching people in DC slowly lose sight of this critical fact. This is why we're attempting to explain here why basic AI x-risk concerns are genuinely important regardless of one's ideological leanings. ie, genuinely important to left-leaning and right-leaning people alike. Seems like very few people have explicitly spelled out the latter case, though, which is why we thought it would be worthwhile to do so here.

ryan_greenblatt on Mark Xu's Shortform

By "control", I mean AI Control [? · GW]: approaches aiming to ensure safety and benefit from AI systems, even if they are goal-directed and are actively trying to subvert your control measures.

AI control stops working once AIs are sufficiently capable (and likely don't work for all possible deployments that might eventually be otherwise desirable), but there could be other approaches that work at that point. In particular aligning systems.

The main hope I think about is something like:

Use control until AIs are capable enough that if we trusted them, we could obsolete top human scientists and experts.
Use our controlled AI labor to do the work needed to make systems which are capable enough, trustworthy enough (via alignment), and philosophically competent enough that we can safely hand things off to them. (There might be some intermediate states to get to here.)
Have these systems which totally obsolete us figure out what to do, including figuring out how to aligning more powerful systems as needed.

We discuss our hopes more in this post [LW · GW].

matrice-jacobine on Proposing the Conditional AI Safety Treaty (linkpost TIME)

(Discussion continues here on EA Forum. [EA(p) · GW(p)])

nick_tarleton on D0TheMath's Shortform

Embarrassingly, that was a semi-unintended reaction — I would bet a small amount against that statement if someone gave me a resolution method, but am not motivated to figure one out, and realized this a second after making it — that I hadn't figured out how to remove by the time you made that comment. Sorry.

ryan_greenblatt on 5 ways to improve CoT faithfulness

You can still inspect the underlying CoT to know if you have big issues.

It is worth noting that the baseline proposal was already incentivizing the Face to be deceptive to the extent to which this proposal is important (the whole hope was to allow for hiding deception). So, I agree that my proposed change trades off "less deceptive CoT" for "more deceptive Face" relative to the baseline proposal, but this feels like this is mostly a quantitative difference?

(My guess is that you have this concern with the baseline protocol and thus don't like moving further in this direction.)

It could be that the baseline proposal doesn't make sense because shaping with process based feedback on the CoT is so useful that it is worth taking the risk of incentivizing unfaithful CoT.

make the system overall much harder to shape in ways you want (because it is actively hiding things from you so your RL is less accurate)

I thought the point was that we wouldn't train against the CoT basically at all with the baseline proposal? (Similar to o1.)