LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Toward Safety Case Inspired Basic Research
Lucas Teixeira · 2024-10-31T23:06:32.854Z · comments (2)

Safe Predictive Agents with Joint Scoring Rules
Rubi J. Hudson (Rubi) · 2024-10-09T16:38:16.535Z · comments (10)

A Path out of Insufficient Views
Unreal · 2024-09-24T20:00:27.332Z · comments (46)

Secret Collusion: Will We Know When to Unplug AI?
schroederdewitt · 2024-09-16T16:07:01.119Z · comments (7)

[link] The Evals Gap
Marius Hobbhahn (marius-hobbhahn) · 2024-11-11T16:42:46.287Z · comments (7)

[question] Could orcas be (trained to be) smarter than humans? 
Towards_Keeperhood (Simon Skade) · 2024-11-04T23:29:26.677Z · answers+comments (11)

[link] How Likely Are Various Precursors of Existential Risk?
NunoSempere (Radamantis) · 2024-10-28T13:27:31.620Z · comments (4)

Neuroscience of human social instincts: a sketch
Steven Byrnes (steve2152) · 2024-11-22T16:16:52.552Z · comments (0)

Win/continue/lose scenarios and execute/replace/audit protocols
Buck · 2024-11-15T15:47:24.868Z · comments (2)

[link] On the Role of Proto-Languages
adamShimi · 2024-09-22T16:50:34.720Z · comments (1)

Counting AGIs
cash (cshunter) · 2024-11-26T00:06:17.845Z · comments (10)

A Qualitative Case for LTFF: Filling Critical Ecosystem Gaps
Linch · 2024-11-18T00:44:57.133Z · comments (2)

Reformative Hypocrisy, and Paying Close Enough Attention to Selectively Reward It.
Andrew_Critch · 2024-09-11T04:41:24.872Z · comments (11)

How might we solve the alignment problem? (Part 1: Intro, summary, ontology)
Joe Carlsmith (joekc) · 2024-10-28T21:57:12.063Z · comments (5)

[question] If I wanted to spend WAY more on AI, what would I spend it on?
Logan Zoellner (logan-zoellner) · 2024-09-15T21:24:46.742Z · answers+comments (16)

[link] The Mysterious Trump Buyers on Polymarket
Annapurna (jorge-velez) · 2024-10-18T13:26:25.565Z · comments (9)

How to Give in to Threats (without incentivizing them)
Mikhail Samin (mikhail-samin) · 2024-09-12T15:55:50.384Z · comments (26)

Parental Writing Selection Bias
jefftk (jkaufman) · 2024-10-13T14:00:03.225Z · comments (3)

[link] Anthropic's updated Responsible Scaling Policy
Zac Hatfield-Dodds (zac-hatfield-dodds) · 2024-10-15T16:46:48.727Z · comments (3)

Claude Sonnet 3.5.1 and Haiku 3.5
Zvi · 2024-10-24T14:50:06.286Z · comments (9)

Model evals for dangerous capabilities
Zach Stein-Perlman · 2024-09-23T11:00:00.866Z · comments (11)

[link] Prices are Bounties
Maxwell Tabarrok (maxwell-tabarrok) · 2024-10-12T14:51:40.689Z · comments (13)

Applications of Chaos: Saying No (with Hastings Greer)
Elizabeth (pktechgirl) · 2024-09-21T16:30:07.415Z · comments (16)

[Intuitive self-models] 7. Hearing Voices, and Other Hallucinations
Steven Byrnes (steve2152) · 2024-10-29T13:36:16.325Z · comments (2)

The Fragility of Life Hypothesis and the Evolution of Cooperation
KristianRonn · 2024-09-04T21:04:49.878Z · comments (6)

AI #82: The Governor Ponders
Zvi · 2024-09-19T13:30:04.863Z · comments (8)

Metastatic Cancer Treatment Since 2010: The Success Stories
sarahconstantin · 2024-11-04T22:50:09.386Z · comments (2)

[link] Can AI Outpredict Humans? Results From Metaculus's Q3 AI Forecasting Benchmark
ChristianWilliams · 2024-10-10T18:58:46.041Z · comments (2)

[link] cancer rates after gene therapy
bhauth · 2024-10-16T15:32:53.949Z · comments (0)

A Conflicted Linkspost
Screwtape · 2024-11-21T00:37:54.035Z · comments (0)

Low Probability Estimation in Language Models
Gabriel Wu (gabriel-wu) · 2024-10-18T15:50:05.947Z · comments (0)

Evaluating the truth of statements in a world of ambiguous language.
Hastings (hastings-greer) · 2024-10-07T18:08:09.920Z · comments (19)

AI and the Technological Richter Scale
Zvi · 2024-09-04T14:00:08.625Z · comments (8)

Interested in Cognitive Bootcamp?
Raemon · 2024-09-19T22:12:13.348Z · comments (0)

[link] Book review: Xenosystems
jessicata (jessica.liu.taylor) · 2024-09-16T20:17:56.670Z · comments (18)

[link] Active Recall and Spaced Repetition are Different Things
Saul Munn (saul-munn) · 2024-11-08T20:14:56.092Z · comments (2)

[link] a space habitat design
bhauth · 2024-11-25T17:28:48.481Z · comments (9)

Demis Hassabis and Geoffrey Hinton Awarded Nobel Prizes
Anna Gajdova (anna-gajdova) · 2024-10-09T12:56:24.856Z · comments (14)

An alternative approach to superbabies
Towards_Keeperhood (Simon Skade) · 2024-11-05T22:56:15.740Z · comments (19)

On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback
Marcus Williams · 2024-11-07T15:39:06.854Z · comments (6)

D&D.Sci Coliseum: Arena of Data Evaluation and Ruleset
aphyer · 2024-10-29T01:21:03.075Z · comments (12)

Secular Solstice Round Up 2024
dspeyer · 2024-11-21T10:49:36.682Z · comments (10)

[link] [Paper Blogpost] When Your AIs Deceive You: Challenges with Partial Observability in RLHF
Leon Lang (leon-lang) · 2024-10-22T13:57:41.125Z · comments (0)

Which evals resources would be good?
Marius Hobbhahn (marius-hobbhahn) · 2024-11-16T14:24:48.012Z · comments (4)

Toy Models of Feature Absorption in SAEs
chanind · 2024-10-07T09:56:53.609Z · comments (8)

[link] What Ketamine Therapy Is Like
Sable · 2024-11-11T11:09:08.602Z · comments (8)

Conflating value alignment and intent alignment is causing confusion
Seth Herd · 2024-09-05T16:39:51.967Z · comments (18)

[link] Michael Dickens' Caffeine Tolerance Research
niplav · 2024-09-04T15:41:53.343Z · comments (3)

Looking back on the Future of Humanity Institute - Asterisk
jakeeaton · 2024-11-19T00:44:40.928Z · comments (0)

AI as a powerful meme, via CGP Grey
TheManxLoiner · 2024-10-30T18:31:58.544Z · comments (8)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

ben-lang on A very strange probability paradox

Yes, its a bit weird. I was replying because I thought (perhaps getting the wrong end of the stick) that you were confused about what the question was, not (as it seems now) pointing out that the question (in your view) is open to being confused.

In probability theory the phrase "given that" is a very important, and it is (as far as I know) always used in the way used here. ["given that X happens" means "X may or may not happen, but we are thinking about the cases where it does", which is very different from meaning "X always happens"]

A more common use would be "What is the probability that a person is sick, given that they are visiting a doctor right now?". This doesn't mean "everyone in the world is visiting a doctor right now", it means that the people who are not visiting a doctor right now exist, but we are not talking about them. Similarly, the original post's imagined world involves cases where odd numbers are rolled, but we are talking about the set without odds. It is weird to think about how proposing a whole set of imaginary situations (odd and even rolls) then talking only about a subset of them (only evens) is NOT the same as initially proposing the smaller set of imaginary events in the first place (your D3 labelled 2,4,6).

But yes, I can definitely see how the phrase "given that", could be interpreted the other way.

joachim-bartosik on You are not too "irrational" to know your preferences.

If they can’t do that, why on earth should you give up on your preferences? In what bizarro world would that sort of acquiescence to someone else’s self-claimed authority be “rational?”

Well if they consistently make recommendations that in retrospect end up looking good then maybe you're bad at understanding. Or maybe they're bad at explaining. But trusting them when you don't understand their recommendation is exploitable so maybe they're running a strategy where they deliberately make good recommendations with poor explanations so when you start trusting them they can start mixing in exploitative recommendations (which you can't tell apart because all recommendations have poor explanations).

So I'd really rather not do that in community context. There are ways to work with that. Eg. boss can skip some details of employees recommendations and if results are bad enough fire the employee. On the other hand I think it's pretty common for employee to act in their own interest. But yeah, we're talking principal-agent problem at that point and tradeoffs what's more efficient...

mfar on The Queen’s Dilemma: A Paradox of Control

W. Ross Ashby's Law of Requisite Variety (1956) suggests fundamental limits to human control over more capable systems.

This law sounds super enticing and I want to understand it more. Could you spell out how the law suggests this?

I did a quick search of LessWrong and Wikipedia regarding this law.

"... Ashby's "Law of requisite variety", which roughly speaking states that a system can only remain in homeostasis if it has more internal states than the external states it encounters." from Yuxi_Liu, "Cybernetic dreams" [LW · GW].
"Either the AI is too simple to be an independent robust agent in human society, or it needs to be approximately as complex as humans themselves. Cf. the law of requisite variety." from Roman Leventov, "For alignment, we should simultaneously use multiple theories of cognition and value" [LW · GW].
"This law (of which Shannon's theorem 10 relating to the suppression of noise is a special case) says that if a certain quantity of disturbance is prevented by a regulator from reaching some essential variables, then that regulator must be capable of exerting at least that quantity of selection." from W. R. Ashby (1960), "Design for a Brain", p. 229, quoted via Wikipedia page.

Enough testimonials, the Wikipedia page itself describes the law as based on the observation that in a two-player game between the environment (disturber) and a system trying to maintain stasis (regulator), if the environment has D moves that all lead to different outcomes (given any move from the system), and the system has R possible responses, then the best the system can do is restrict the number of outcomes to D/R.

I can see the link between this and the descriptions from Yuxi_Liu, Roman Leventov, and Ashby. Your reading is a couple of steps removed. How did you get from D/R outcomes in this game to "fundamental limits to human control over more capable systems"? My guess it that you simply mean that if the more capable system is more complex / has more moves available moves / more "variety" than humans then the law will apply with the human as the regulator and the AI as the disturber. Is that right? Could you comment on how you see capability in terms of variety?

daystareld on You are not too "irrational" to know your preferences.

I'm a little confused. Do the examples in the post all seem purely hypothetical to you?

Whether or not it's rational to have ice cream.
Whether or not wanting your partner to do housework is reasonable.
Whether or not you want to receive unfiltered criticism or judgements.
Whether being mono vs open vs poly is a sign of rationality.
Whether your career preference is a sign of rationality.

Or are they not sufficiently detailed, or...? They're all real things I have encountered, and obviously not all are as equally detailed, and I could always add more, but if it doesn't seem concrete enough yet, I'm not sure what else to add or in how much detail.

viliam on Making a conservative case for alignment

I think I still don't understand the main conflict which bothers you.

Two major points.

1) It annoys me if someone insists that I accept their theory about what being trans really is.

Zack insists that Blanchard is right, and that I fail at rationality if I disagree with him. People on Twitter and Reddit insist that Blanchard is wrong, and that I fail at being a decent human if I disagree with them. My opinion is that I have no comparative advantage at figuring out who is right and who is wrong on this topic, or maybe everyone is wrong, anyway it is an empirical question and I don't have the data. I hope that people who have more data and better education will one day sort it out, but until that happens, my position firmly remains "I don't know (and most likely neither do you), stop bothering me".

Also, from larger perspective, this is moving the goalposts. Long ago, tolerance was defined as basically not hurting other people, and letting them do whatever they want as long as it does not hurt others. Recently it also includes agreeing with the beliefs of their woke representatives. (Note that this is about the representatives, not the people being represented. Two trans people can have different opinions, but you are required to believe the woke one and oppose the non-woke one.) Otherwise, you are transphobic. I completely reject that. Furthermore, I claim that even trans people themselves are not necessarily experts on themselves. Science exists for a reason, otherwise we could just make opinion polls.

Shortly: disagreement is not hate. But it often gets conflated, especially in environments that overwhelmingly contain people of one political tribe.

2) Every cause gets abused. It is bad if it becomes a taboo to point this out.

A few months (or is it already years?) ago, there was an epidemic of teenagers on TikTok who appeared to have developed Tourette syndrome overnight. A few weeks or months later, apparently the epidemic was gone. I have no way to check those teenagers, but I think it is reasonable to assume that many of them were faking it. Why would anyone do that? Most likely, attention seeking. (There is also a things called Munchausen syndrome.) This is what I referred to as "cosplayers".

Note that this is completely different from saying that Tourette syndrome does not exist.

If you adopt a rule that e.g. everyone must use everyone else's preferred pronouns all the time, no exception, and you get banned for hate speech otherwise, this becomes a perfect opportunity for... anyone who enjoys using it as a leverage. You get an explosion of pronouns: it starts with "he" and "she", proceeds with "they", then you get "xe", "ve", "foo", "bar", "baz", and ultimately anyone is free to make up their own pronouns, and everyone else is required to play along, or else. (That's when you get the "attack helicopters" as an attempt to point out the absurdity of the system.)

Again, moving the goalposts. We started with trans people who report feeling gender dysphoria, so we use their preferred pronouns to alleviate their suffering. So far, okay. But if there is a person who actually feels dysphoria from not being addressed as "ve" (someone who would be triggered by calling them any of: "he", "she", or "they"), then I believe that this is between them and their psychiatrist, and I want to be left out of this game.

Another annoying thing is how often this is used to derail the debate (on places like Twitter and Reddit). Suppose that someone is called "John" and has a male-passing photo. So you try to say something about John, and your automatically use the pronoun "he". Big mistake! You haven't noticed it, but recently John identifies as agender. And whatever you wanted to talk about originally is unimportant now, and the thread becomes about what a horrible person you are. Okay, you have learned your lesson; but the point is that the next time someone else is going to make the same mistake. So it basically becomes impossible to discuss John, ever. And sometimes, it is important to be able to discuss John, without getting the debate predictably derailed.

Shortly: misgendering should be considered bad manners, but not something you ban people for.

...and that's basically all.

mitchell_porter on Dave Kasten's AGI-by-2027 vignette

Started promisingly, but like everyone else, I don't believe in the ten-year gap from AGI to ASI. If anything, we got a kind of AGI in 2022 (with ChatGPT), and we'll get ASI by 2027, from something like your "cohort of Shannon instances".

daystareld on You are not too "irrational" to know your preferences.

Completely agree, and for what it's worth, I don't think anything in the frame of my post contradicts these points.

"You either do or do not feel a want" is not the same as "you either do now or you never will," and I note that conditioning is also a cause of preferences, though I will edit to highlight that this is an ongoing process in case it sounds like I was saying it's all locked-in from some vague "past" or developmental experiences, which was not my intent.

mfar on The Queen’s Dilemma: A Paradox of Control

I like this analogy, but there are a couple of features that I think make it hard to think about:

1. The human wants to play, not just to win. You stipulated that "the human aims to win, and instructs their AI teammate to prioritise winning above all else". The dilemma then arises because the aim to win cuts against the human having agency and control. Your takeaway is "Even perfectly aligned systems, genuinely pursuing human goals, might naturally evolve to restrict human agency."

So in this analogy, it seems that "winning" stands for the human's true goals. But (as you acknowledge) it seems like the human doesn't just want to win, but actually wants both some "winning" and some "agency". You've implicitly tried to factor the entirety of the human's goals into the outcome of the game, but you have left some of the agency behind, outside of this objective, and this is what creates the dilemma.

For an AI system that is truly 'perfectly aligned,' truly pursuing the human's goals, it seems like either (A) the AI partner would not pursue winning above all else, but would allow some human control at the cost of some 'winning, or (B) if it were possible to actually factor the human's meta-preference for having agency into 'winning', then we shouldn't care if the AI plays to win above all else, because that already accounts for the human's desired amount of agency.

For an AI system is NOT perfectly aligned, this becomes a different game (in the sense of game theory). It's a three player game between the AI partner, the human partner, and the opponent, each of which have different objectives (the difference between the AI and human partners is that the human wants some combination of 'winning' and 'agency' while the AI just wants 'winning'; probably the opponent just wants both of them to lose). One interesting dynamic that could then arise is that the human partner could threaten and punish the AI partner by making worse moves than the best moves they can see if the AI doesn't give them enough control. To stop the human from doing this, the AI either has to (C) negotiate to give the human some control, or (D) remove all control from the human (e.g. force the queen to have no bad moves or no moves at all). In particular, (D) seems like it would be expensive for the AI partner as it requires playing without the queen, so maybe the AI will let the human play sometimes.

2. I don't think it needs to be a stochastic chess variant. The game is set up so that the human gets to play whenever they roll a 6 on a (presumably six-sided) die. You said this stands in for the idea that in the real world, the AI system makes decisions on a faster timescale than the human. But this particular mechanism of implementing the speed differential as a game mechanism comes at the cost of making the chess variant stochastic. I think that determinism is an important feature of standard chess. In theory, you can solve chess with an adversarial look-ahead search, mini-max, alpha-beta pruning, etc. But as soon as the dice becomes involved, all of the players involved have to switch to expecti-mini-max. Rolling a six can suddenly throw off the tempo in your delicate exchange or your whirlwind manoeuvre. Etc.

I'm a novice at chess, so it's not like this is going to make a difference to how I think about the analogy (I will struggle to think strategically in both cases). And maybe a sufficiently accomplished chess player is familiar with stochastic variants already. But for someone in-between who is familiar with deterministic chess, maybe it's easier to consider a non-stochastic variant of the chess game, for example where the human gets the option to play every 6 turns (deterministically), which gives the same speed differential in expectation.

darrenreynolds on A very strange probability paradox

Yes, exactly - thank you. It depends on the interpretation of the phrase "given that all rolls were even". Most ordinary people will assume it means that all the rolls were even, but as you have succinctly explained, that is not what it means in the specialist language of mathematics. It is only when you apply the latter interpretation, that some of the rolls are odd but we throw those out afterwards, that the result becomes at first surprising.

I do find LessWrong a curious place and am not a regular here. You can post something and it will get downvoted as wrong, then someone else comes along and says exactly the same thing and it's marked as correct. Heh.

lee-aao on OpenAI Email Archives (from Musk v. Altman)

Greg Brockman to Elon Musk, (cc: Sam Altman) - Nov 22, 2015 6:11 PM

In response to this follow up, Elon first mentions that $100M is not enough. And that he is encouraging OpenAI to raise more money on their own and promises to increase the amount they can raise to $1B.

I found this on the OpenAI blog: https://openai.com/index/openai-elon-musk/
There is a couple of other messages there. With the vibe that OpenAI team felt a betrayal from Elon.

We're sad that it's come to this with someone whom we’ve deeply admired—someone who inspired us to aim higher, then told us we would fail, started a competitor, and then sued us when we started making meaningful progress towards OpenAI’s mission without him.

@habryka [LW · GW] can you pls check the link? I think these messages could have added more context. Not sure why they weren't also included in the original source, though.