LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

An Introduction To The Mandelbrot Set That Doesn't Mention Complex Numbers
Yitz (yitz) · 2024-01-17T09:48:07.930Z · comments (11)

[link] New voluntary commitments (AI Seoul Summit)
Zach Stein-Perlman · 2024-05-21T11:00:41.794Z · comments (17)

[link] "The Heart of Gaming is the Power Fantasy", and Cohabitive Games
Raemon · 2023-10-08T21:02:33.526Z · comments (49)

Refusal mechanisms: initial experiments with Llama-2-7b-chat
Andy Arditi (andy-arditi) · 2023-12-08T17:08:01.250Z · comments (7)

AISafety.com – Resources for AI Safety
Søren Elverlin (soren-elverlin-1) · 2024-05-17T15:57:11.712Z · comments (3)

Bostrom Goes Unheard
Zvi · 2023-11-13T14:11:07.586Z · comments (9)

[link] Palworld development blog post
bhauth · 2024-01-28T05:56:19.984Z · comments (12)

Decomposing the QK circuit with Bilinear Sparse Dictionary Learning
keith_wynroe · 2024-07-02T13:17:16.352Z · comments (7)

My Criticism of Singular Learning Theory
Joar Skalse (Logical_Lunatic) · 2023-11-19T15:19:16.874Z · comments (56)

Announcing Athena - Women in AI Alignment Research
Claire Short (claire-short) · 2023-11-07T21:46:41.741Z · comments (2)

Self-Referential Probabilistic Logic Admits the Payor's Lemma
Yudhister Kumar (randomwalks) · 2023-11-28T10:27:29.029Z · comments (14)

[link] Not every accommodation is a Curb Cut Effect: The Handicapped Parking Effect, the Clapper Effect, and more
Michael Cohn (michael-cohn) · 2024-09-15T05:27:36.691Z · comments (39)

Studying The Alien Mind
Quentin FEUILLADE--MONTIXI (quentin-feuillade-montixi) · 2023-12-05T17:27:28.049Z · comments (10)

3C's: A Recipe For Mathing Concepts
johnswentworth · 2024-07-03T01:06:11.944Z · comments (5)

The Gemini Incident
Zvi · 2024-02-22T21:00:04.594Z · comments (19)

Values Are Real Like Harry Potter
johnswentworth · 2024-10-09T23:42:24.724Z · comments (17)

Survey of 2,778 AI authors: six parts in pictures
KatjaGrace · 2024-01-06T04:43:34.590Z · comments (1)

Quick look: applications of chaos theory
Elizabeth (pktechgirl) · 2024-08-18T15:00:07.853Z · comments (45)

LessWrong Community Weekend 2024, open for applications
UnplannedCauliflower · 2024-05-01T10:18:21.992Z · comments (2)

[link] MIRI's May 2024 Newsletter
Harlan · 2024-05-15T00:13:30.153Z · comments (1)

[link] My thesis (Algorithmic Bayesian Epistemology) explained in more depth
Eric Neyman (UnexpectedValues) · 2024-05-09T19:43:16.543Z · comments (4)

The case for a negative alignment tax
Cameron Berg (cameron-berg) · 2024-09-18T18:33:18.491Z · comments (20)

[link] The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists
EJT (ElliottThornley) · 2023-10-23T21:00:48.398Z · comments (22)

How to prevent collusion when using untrusted models to monitor each other
Buck · 2024-09-25T18:58:20.693Z · comments (5)

Thomas Kwa's research journal
Thomas Kwa (thomas-kwa) · 2023-11-23T05:11:08.907Z · comments (1)

[Intuitive self-models] 2. Conscious Awareness
Steven Byrnes (steve2152) · 2024-09-25T13:29:02.820Z · comments (48)

New report: "Scheming AIs: Will AIs fake alignment during training in order to get power?"
Joe Carlsmith (joekc) · 2023-11-15T17:16:42.088Z · comments (26)

Secular interpretations of core perennialist claims
zhukeepa · 2024-08-25T23:41:02.683Z · comments (32)

I'm a Former Israeli Officer. AMA
Yovel Rom · 2023-10-10T08:33:51.557Z · comments (70)

[link] The Cognitive-Theoretic Model of the Universe: A Partial Summary and Review
jessicata (jessica.liu.taylor) · 2024-03-27T19:59:27.893Z · comments (36)

Corrigibility = Tool-ness?
johnswentworth · 2024-06-28T01:19:48.883Z · comments (8)

[link] Is "superhuman" AI forecasting BS? Some experiments on the "539" bot from the Centre for AI Safety
titotal (lombertini) · 2024-09-18T13:07:40.754Z · comments (3)

EU policymakers reach an agreement on the AI Act
tlevin (trevor) · 2023-12-15T06:02:44.668Z · comments (7)

Spaciousness In Partner Dance: A Naturalism Demo
LoganStrohl (BrienneYudkowsky) · 2023-11-19T07:00:19.555Z · comments (5)

A couple productivity tips for overthinkers
Steven Byrnes (steve2152) · 2024-04-20T16:05:50.332Z · comments (13)

Reward hacking behavior can generalize across tasks
Kei · 2024-05-28T16:33:50.674Z · comments (5)

Send us example gnarly bugs
Beth Barnes (beth-barnes) · 2023-12-10T05:23:00.773Z · comments (10)

ACX Covid Origins Post convinced readers
ErnestScribbler · 2024-05-01T13:06:20.818Z · comments (7)

Reactions to the Executive Order
Zvi · 2023-11-01T20:40:02.438Z · comments (4)

Secondary forces of debt
KatjaGrace · 2024-06-27T21:10:06.131Z · comments (18)

Questions for labs
Zach Stein-Perlman · 2024-04-30T22:15:55.362Z · comments (11)

[link] [Linkpost] Practically-A-Book Review: Rootclaim $100,000 Lab Leak Debate
trevor (TrevorWiesinger) · 2024-03-28T16:03:36.452Z · comments (22)

MATS Summer 2023 Retrospective
utilistrutil · 2023-12-01T23:29:47.958Z · comments (34)

Attention SAEs Scale to GPT-2 Small
Connor Kissane (ckkissane) · 2024-02-03T06:50:22.583Z · comments (4)

OpenAI: Leaks Confirm the Story
Zvi · 2023-12-12T14:00:04.812Z · comments (9)

The Parable Of The Fallen Pendulum - Part 2
johnswentworth · 2024-03-12T21:41:30.180Z · comments (8)

Grief is a fire sale
Nathan Young · 2024-03-04T01:11:06.882Z · comments (1)

[link] Are language models good at making predictions?
dynomight · 2023-11-06T13:10:36.379Z · comments (14)

Darwinian Traps and Existential Risks
KristianRonn · 2024-08-25T22:37:14.142Z · comments (14)

Universal Love Integration Test: Hitler
Raemon · 2024-01-10T23:55:35.526Z · comments (65)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

mr-hire on Matt Goldenberg's Short Form Feed

They replicated it within the video itself?

jbash on Update on the Mysterious Trump Buyers on Polymarket

Can't this only be judged in retrospect, and over a decent sample size?

The model that makes you hope for accuracy from the market is that it aggregates the information, including non-public information, available to a large number of people who are doing their best to maximize profits in a reasonable VNM-ish rational way.

In this case, everybody seems pretty sure that the price is where it is because of the actions of a single person who's dumped in a very large amount of money relative to the float. It seems likely that that person has done this despite having no access to any important non-public information about the actual election. For one thing, they've said that they're dumping all of their liquidity into bets on Trump. Not just all the money they already have allocated to semi-recreational betting, or even all the money they have allocated to speculative long shots in general, but their entire personal liquidity. That suggests a degree of certainty that almost no plausible non-public information could actually justify.

Not only that, but apparently they've done it in a way calculated to maximally move the price, which is the opposite of what you'd expect a profit maximizer to want to do given their ongoing buying and their (I think) stated and (definitely at this point) evidenced intention to hold until the market resolves.

If the model is that makes you expect accuracy to begin with is known to be violated, it seems reasonable to assume that the market is out of whack.

Sure, it's possible that the market just happens to be giving an accurate probability for some reason unrelated to how it's "supposed" to work, but that sort of speculation would take a lot of evidence to establish confidently.

I'm assuming that by "every other prediction source" you mean everything other than prediction/betting markets

Well, yes. I would expect that if you successfully mess up Polymarket, you have actually messed up "The Betting Market" as a whole. If there's a large spread between any two specific operators, that really is free money for somebody, especially if that person is already set up to deal on both.

tapatakt on What's a good book for a technically-minded 11-year old?

Only one mention of Jules Verne in answers seems weird to me.

First and foremost, "The Mysterious Island". (But maybe it has already been read at nine?)

steve2152 on Complete Feedback

How about “purely epistemic” means “updated by self-supervised learning”, i.e. the updates (gradients, trader bankrolls, whatever) are derived from “things being true vs false” as opposed to “things being good vs bad”. Right?

[I learned the term teleosemantics [LW · GW] from you! :) ]

The original LI paper was in that category, IIUC. The updates (to which traders had more vs less money) are derived from mathematical propositions being true vs false.

LI defines a notion of logically uncertain variable, which can be used to represent desires

I would say that they don’t really represent desires. They represent expectations about what’s going to happen, possibly including expectations about an AI’s own actions.

And then you can then put the LI into a larger system that follows the rule: whatever the expectations are about the AI’s own actions, make that actually happen.

The important thing that changes in this situation is that the convergence of the algorithm is underdetermined—you can have multiple fixed points. I can expect to stand up, and then I stand up, and my expectation was validated. No update. I can expect to stay seated, and then I stay seated, and my expectation was validated. No update.

(I don’t think I’m saying anything you don’t already know well.)

Anyway, if you do that, then I guess you could say that the LI’s expectations “can be used” to represent desires … but I maintain that that’s a somewhat confused and unproductive way to think about what’s going on. If I intervene to change the LI variable, it would be analogous to changing habits (what do I expect myself to do ≈ which action plans seem most salient and natural), not analogous to changing desires.

(I think the human brain has a system vaguely like LI, and that it resolves the underdetermination by a separate valence [LW · GW] system, which evaluates expectations as being good vs bad, and applies reinforcement learning to systematically seek out the good ones.)

beliefs can have impacts on the world if the world looks at them

…Indeed, what I said above is just a special case. Here’s something more general and elegant. You have the core LI system, and then some watcher system W, which reads off some vector of internal variables V of the core LI system, and then W takes actions according to some function A(V).

After a while, the LI system will automatically catch onto what W is doing, and “learn” to interpret V as an expectation that A(V) is going to happen.

I think the central case is that W is part of the larger AI system, as above, leading to normal agent-like behavior (assuming some sensible system for resolving the underdetermination). But in theory W could also be humans peeking into the LI system and taking actions based on what they see. Fundamentally, these aren’t that different.

So whatever solution we come up with to resolve the underdetermination, whether human-brain-like “valence” [LW · GW] or something else, that solution ought to work for the humans-peeking-into-the-LI situation just as it works for the normal W-is-part-of-the-larger-AI situation.

(But maybe weird things would happen before convergence. And also, if you don’t have any system at all to resolve the underdetermination, then probably the results would be weird and hard to reason about.)

Also, it is easy for end users to build agentlike things out of belieflike things by making queries about how to accomplish things. Thus, we need to train epistemic systems to be responsible about how such queries are answered (as is already apparent in existing chatbots).

I’m not sure that this is coming from a coherent threat model (or else I don’t follow).

If Dr. Evil trains his own AGI, then this whole thing is moot, because he wants the AGI to have accurate beliefs about bioweapons.
If Benevolent Bob trains the AGI and gives API access to Dr. Evil, then Bob can design the AGI to (1) have accurate beliefs about bioweapons, and (2) not answer Dr. Evil’s questions about bioweapons. That might ideally look like what we’re used to in the human world: the AGI says things because it wants to say those things, all things considered, and it doesn’t want Dr. Evil to build bioweapons, either directly or because it’s guessing what Bob would want.

james-stephen-brown on If we solve alignment, do we die anyway?

Hi Seth,

I share your concern that AGI comes with the potential for a unilateral first strike capability that, at present, no nuclear power has (which is vital to the maintenance of MAD), though I think, in game theoretical terms, this becomes more difficult the more self-interested (in survival) players there are. Like in open-source software, there is a level of protection against malicious code because bad players are outnumbered, even if they try to hide their code, there are many others who can find it. But I appreciate that 100s of coders finding malicious code within a single repository is much easier than finding something hidden in the real world, and I have to admit I'm not even sure how robust the open-source model is (I only know how it works in theory). I'm more pointing to the principle, not as an excuse for complacency but as a safety model on which to capitalise.

My point about the UN's law against aggression wasn't that in and of itself it is a deterrent, only that it gives a permission structure for any party to legitimately retaliate.

I also agree that RSI-capable AGI introduces a level of independence that we haven't seen before in a threat. And I do understand inter-dependence is a key driver of cooperation. Another driver is confidence and my hope is that the more intelligent a system gets, the more confident it is, the better it is able to balance the autonomy of others with its goals, meaning it is able to "confide" in others—in the same way as the strongest kid in class was very rarely the bully, because they had nothing to prove. Collateral damage is still damage after all, a truly confident power doesn't need these sorts of inefficiencies. I stress this is a hope, and not a cause for complacency. I recognise that in analogy, the strongest kid, the true class alpha, gets whatever they want with the willing complicity of the classroom. RSI-cabable AGI might get what it wants coercively in a way that makes us happy with our own subjugation, which is still a species of dystopia.

But if you've got a super-intelligent inventor on your side and a few resources, you can be pretty sure you and some immediate loved ones can survive and live in material comfort, while rebuilding a new society according to your preferences.

This sort of illustrates the contradiction here, if you're pretty intelligent (as in you're designing a super-intelligent AGI) you're probably smart enough to know that the scenario outlined here has a near 100% chance of failure for you and your family, because you've created something more intelligent than you that is willing to hide its intentions and destroy billions of people, it doesn't take much to realise that that intelligence isn't going to think twice about also destroying you.

Now, I realise this sounds a lot like the situation humanity is in as a whole... so I agree with you that...

multipolar human-controlled AGI scenario will necessitate ubiquitous surveillance.

I'm just suggesting that the other AGI teams do (or can, leveraging the right incentives) provide a significant contribution to this surveillance.

habryka4 on Bogdan Ionut Cirstea's Shortform

(Most people in AI Alignment work at scaling labs and are therefore almost exclusively working on LLM alignment. That said, I don't actually know what it means to work on LLM alignment over aligning other systems, it's not like we have a ton of traction on LLM alignment, and most techniques and insights seem general enough to not be conditional specifically on LLMs)

dana on Matt Goldenberg's Short Form Feed

A few glaring issues here:
1) Does the question imply causation or not? It shouldn't.
2) Are these stats intended to be realistic such that I need to consider potential flaws and take a holistic view or just a toy scenario to test my numerical skills? If I believe it's the former and I'm confident X and Y are positively correlated, a 2x2 grid showing X and Y negatively correlated should of course make me question the quality of your data proportionally.
3) Is this an adversarial question such that my response may be taken out of context or otherwise misused?

The sample interviews from Veritasium did not seem to address any of these issues:
(1) They seemed to cut out the gun question, but the skin cream question implied causation, "Did the skin cream make the rash better or worse?"
(2) One person mentioned "I Wouldn't have expected that..." which implies he thought it was real data,
(3) the last person clearly interpreted it adversarially.

In the original study, the question was stated as "cities that enacted a ban on carrying concealed handguns were more likely to have a decrease in crime." This framing is not as bad, but still too close to implying causation in my opinion.

ann-brown on Survival without dignity

Too much runs into the very real issue that truth is stranger. 😉

ann-brown on Survival without dignity

It's nice to read some realistic science fiction.

l-rudolf-l on Survival without dignity

Also this very recent one: https://www.lesswrong.com/posts/6h9p6NZ5RRFvAqWq5/the-summoned-heroine-s-prediction-markets-keep-providing