LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Base LLMs refuse too
Connor Kissane (ckkissane) · 2024-09-29T16:04:21.343Z · comments (20)

[link] An Opinionated Evals Reading List
Marius Hobbhahn (marius-hobbhahn) · 2024-10-15T14:38:58.778Z · comments (0)

AI Alignment via Slow Substrates: Early Empirical Results With StarCraft II
Lester Leong (lester-leong) · 2024-10-14T04:05:05.096Z · comments (9)

Toward Safety Cases For AI Scheming
Mikita Balesni (mykyta-baliesnyi) · 2024-10-31T17:20:06.019Z · comments (1)

Against empathy-by-default
Steven Byrnes (steve2152) · 2024-10-16T16:38:49.926Z · comments (24)

[link] Linkpost: Memorandum on Advancing the United States’ Leadership in Artificial Intelligence
Nisan · 2024-10-25T04:37:00.828Z · comments (2)

Why our politicians aren't Median
Yair Halberstadt (yair-halberstadt) · 2024-11-03T14:03:33.779Z · comments (15)

The Geometry of Feelings and Nonsense in Large Language Models
7vik (satvik-golechha) · 2024-09-27T17:49:27.420Z · comments (10)

Mira Murati leaves OpenAI/ OpenAI to remove non-profit control
Sodium · 2024-09-25T21:15:17.315Z · comments (4)

[link] The Alignment Trap: AI Safety as Path to Power
crispweed · 2024-10-29T15:21:26.545Z · comments (17)

AI #86: Just Think of the Potential
Zvi · 2024-10-17T15:10:06.552Z · comments (8)

AI #84: Better Than a Podcast
Zvi · 2024-10-03T15:00:07.128Z · comments (7)

[link] How much I'm paying for AI productivity software (and the future of AI use)
jacquesthibs (jacques-thibodeau) · 2024-10-11T17:11:27.025Z · comments (16)

A Path out of Insufficient Views
Unreal · 2024-09-24T20:00:27.332Z · comments (46)

Safe Predictive Agents with Joint Scoring Rules
Rubi J. Hudson (Rubi) · 2024-10-09T16:38:16.535Z · comments (10)

[Intuitive self-models] 5. Dissociative Identity (Multiple Personality) Disorder
Steven Byrnes (steve2152) · 2024-10-15T13:31:46.157Z · comments (7)

[link] How Likely Are Various Precursors of Existential Risk?
NunoSempere (Radamantis) · 2024-10-28T13:27:31.620Z · comments (4)

[question] Could orcas be (trained to be) smarter than humans? 
Towards_Keeperhood (Simon Skade) · 2024-11-04T23:29:26.677Z · answers+comments (10)

Seeking Collaborators
abramdemski · 2024-11-01T17:13:36.162Z · comments (14)

Personal AI Planning
jefftk (jkaufman) · 2024-11-10T14:00:06.837Z · comments (6)

[link] Slightly More Than You Wanted To Know: Pregnancy Length Effects
JustisMills · 2024-10-21T01:26:02.030Z · comments (4)

AI #87: Staying in Character
Zvi · 2024-10-29T07:10:08.212Z · comments (3)

Parental Writing Selection Bias
jefftk (jkaufman) · 2024-10-13T14:00:03.225Z · comments (3)

[link] The Mysterious Trump Buyers on Polymarket
Annapurna (jorge-velez) · 2024-10-18T13:26:25.565Z · comments (9)

How might we solve the alignment problem? (Part 1: Intro, summary, ontology)
Joe Carlsmith (joekc) · 2024-10-28T21:57:12.063Z · comments (5)

Claude Sonnet 3.5.1 and Haiku 3.5
Zvi · 2024-10-24T14:50:06.286Z · comments (9)

Model evals for dangerous capabilities
Zach Stein-Perlman · 2024-09-23T11:00:00.866Z · comments (9)

[link] Anthropic's updated Responsible Scaling Policy
Zac Hatfield-Dodds (zac-hatfield-dodds) · 2024-10-15T16:46:48.727Z · comments (3)

[link] Prices are Bounties
Maxwell Tabarrok (maxwell-tabarrok) · 2024-10-12T14:51:40.689Z · comments (13)

[link] Can AI Outpredict Humans? Results From Metaculus's Q3 AI Forecasting Benchmark
ChristianWilliams · 2024-10-10T18:58:46.041Z · comments (2)

[Intuitive self-models] 6. Awakening / Enlightenment / PNSE
Steven Byrnes (steve2152) · 2024-10-22T13:23:08.836Z · comments (5)

Low Probability Estimation in Language Models
Gabriel Wu (gabriel-wu) · 2024-10-18T15:50:05.947Z · comments (0)

[link] Active Recall and Spaced Repetition are Different Things
Saul Munn (saul-munn) · 2024-11-08T20:14:56.092Z · comments (2)

Evaluating the truth of statements in a world of ambiguous language.
Hastings (hastings-greer) · 2024-10-07T18:08:09.920Z · comments (19)

An alternative approach to superbabies
Towards_Keeperhood (Simon Skade) · 2024-11-05T22:56:15.740Z · comments (19)

Toward Safety Case Inspired Basic Research
Lucas Teixeira · 2024-10-31T23:06:32.854Z · comments (2)

Demis Hassabis and Geoffrey Hinton Awarded Nobel Prizes
Anna Gajdova (anna-gajdova) · 2024-10-09T12:56:24.856Z · comments (14)

D&D.Sci Coliseum: Arena of Data Evaluation and Ruleset
aphyer · 2024-10-29T01:21:03.075Z · comments (12)

Toy Models of Feature Absorption in SAEs
chanind · 2024-10-07T09:56:53.609Z · comments (7)

Bounty for Evidence on Some of Palisade Research's Beliefs
benwr · 2024-09-23T20:01:20.917Z · comments (4)

The Shallow Bench
Karl Faulks (karl-faulks) · 2024-11-05T05:07:27.357Z · comments (5)

AI as a powerful meme, via CGP Grey
TheManxLoiner · 2024-10-30T18:31:58.544Z · comments (6)

AI #88: Thanks for the Memos
Zvi · 2024-10-31T15:00:07.412Z · comments (5)

[Intuitive self-models] 8. Rooting Out Free Will Intuitions
Steven Byrnes (steve2152) · 2024-11-04T18:16:26.736Z · comments (9)

~80 Interesting Questions about Foundation Model Agent Safety
RohanS · 2024-10-28T16:37:04.713Z · comments (4)

Start an Upper-Room UV Installation Company?
jefftk (jkaufman) · 2024-10-19T02:00:10.691Z · comments (9)

Motivation control
Joe Carlsmith (joekc) · 2024-10-30T17:15:50.881Z · comments (7)

[link] cancer rates after gene therapy
bhauth · 2024-10-16T15:32:53.949Z · comments (0)

Minimal Motivation of Natural Latents
johnswentworth · 2024-10-14T22:51:58.125Z · comments (14)

Australian AI Safety Forum 2024
Liam Carroll (liam-carroll) · 2024-09-27T00:40:11.451Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

donqueror on Seed's Shortform

Yudkowsky + Wolfram Debate

Some language to simplify some of the places where the debate got stuck.

Is-Ought

Analyzing how to preserve or act on preferences is a coherent thing to do, and it's possible to do so without assuming a one true universal morality. Assume a preference ordering, and now you're in the land of is, not ought, where there can be a correct answer (highest expected value).

Is There One Reality?

Let existence be defined to mean everything, all the math, all the indexical facts. "Ah, but you left out-" Nope, throw that in too. Everything. Existence is a pretty handy word for that; let's reserve it for that purpose. As for any points about how our observations are compatible with multiple implementations: we've already lumped those into our description of a "unique reality".

Noise, In MY Conformal Geometry?!

Noise is noise with respect to a prediction, and so is coherent to discuss. One can abstract away from certain details for the purpose of making a specific prediction; call the stuff that can be abstracted away from noise relative to that prediction.

Decoupled Outer And Inner Optimization Targets

Inclusive genetic fitness led to weirdos that like ice cream, but predictive loss may be a purer target than IGF. If we don't press down on that insanely hard, it's quite plausible that we get all the way to significantly superhuman generality without any unfortunate parallels to that issue. If you work at a frontier AI lab, probably don't build agents in stupid ways or enable their being built too quickly; that seems like the greatest liability at present.

nathan-helm-burger on AI Craftsmanship

My agenda of human connectome inspired modular architecture, with functional localization induced by bottlenecks, tried to be something more refined and less mushy.

I do, however, need collaborators and funding to make it happen in any kind of reasonable timeframe.

programcrafter on Eli's shortform feed

I've started writing a small research paper on this, using mathematical framework, and understood that I had long conflated Shapley values with ROSE values. Here's what I found, having corrected that error.

ROSE bargaining [LW · GW] satisfies Efficiency, Pareto Optimality, Symmetry*, Maximin Dominance and Linearity - a bunch of important desiderata. Shapley values, on other hand, don't satisfy Maximin Dominance so someone might unilaterally reject cooperation; I'll explore ROSE equilibrium below.

Subjects: people and services for finding partners.
By Proposition 8.2, ROSE value remains same if moves transferring money within game are discarded. Thus, we can assume no money transfers.
By Proposition 11.3, ROSE value for dating service is equal or greater than its maximin.
By Proposition 12.2, ROSE value for dating service is equal or less than its maximum attainable value.
There's generally one move for a person to maximize their utility: use the dating service with highest probability of success (or expected relationship quality) available.
There are generally two moves for a service: to launch or not to launch. First involves some intrinsic motivation and feeling of goodness minus running costs, the second option has value of zero exactly.
For a large service, running costs (including moderation) exceed much realistic motivation. Therefore, maximum and maximin values for it are both zero.
From (7), (3) and (4), ROSE value for large dating service is zero.
Therefore, total money transfers to a large dating service equal its total costs.

So, why yes or why no?

By the way, Shapley values suggest paying a significant sum! Given value of a relationship of $10K (can be scaled), and four options for finding partners ( -- self-search, $α : p_{α} = 0.09$ -- friend's help, $β : p_{β} = 0.10$ -- dating sites, $γ : p_{γ} = 0.70$ -- the specialized project suggested up the comments), the Shapley-fair price per success would be respectively $550, $650 and $4400.

P.S. I'm explicitly not open to discussing what price I'd be cheerful to pay to service which would help to build relationships. In this thread, I'm more interested in whether there are new decision theory developments which would find maximin-satisfying equilibria closer to Shapley one.

buck on Buck's Shortform

Two different meanings of “misuse”

The term "AI misuse" encompasses two fundamentally different threat models that deserve separate analysis and different mitigation strategies:

Democratization of offense-dominant capabilities
- This involves currently weak actors gaining access to capabilities that dramatically amplify their ability to cause harm. That amplification of ability to cause harm is only a huge problem if access to AI didn’t also dramatically amplify the ability of others to defend against harm, which is why I refer to “offense-dominant” capabilities; this is discussed in The Vulnerable World Hypothesis.
- The canonical example is terrorists using AI to design bioweapons that would be beyond their current technical capacity (c.f. Aum Shinrikyo, which failed to produce bioweapons despite making a serious effort)
Power Concentration Risk
- This involves AI systems giving already-powerful actors dramatically more power over others
- Examples could include:
  - Government leaders using AI to stage a self-coup then install a permanent totalitarian regime, using AI to maintain a regime with currently impossible levels of surveillance.
  - AI company CEOs using advanced AI systems to become world dictator.
- The key risk here is particular already-powerful people getting potentially unassailable advantages

These threats require different solutions:

Misuse that involves offense-dominant capabilities can be addressed by preventing users of your AIs from doing catastrophically bad things, e.g. by training the models to robustly refuse requests that could lead to these catastrophic outcomes (which might require improvements in adversarial robustness), or by removing dangerous knowledge from the AI training data.
Power concentration risks require different solutions. Technical measures to prevent users from using the AI for particular tasks don’t help against the threat of the lab CEO trying to use the AI for those harmful tasks, or the threat of the US government expropriating the AI system and using it for their own purposes. To resist against these threats, interventions include:
- Computer security, to prevent powerful third parties from stealing model weights and using them in bad ways.
- Security against insider threats, to make it harder for the AI to be misused internally. This is a classic insider threat problem; addressing it will require both technical interventions and workflow changes inside AI companies.

Many discussions of "AI misuse" focus primarily on interventions that only help with the first category, while using rhetoric that suggests they're addressing both. This creates a motte-and-bailey situation where:

The "motte" (easily defensible position) is "we need to prevent terrorists from using AI for mass harm"
The "bailey" (broader claim) is "our work on AI misuse prevention will solve the major misuse risks from AI"

This conflation is dangerous because it may lead us to overinvest in technical solutions that only address the less concerning risk, and underinvest in countermeasures for power concentration risks.

papetoast on adamzerner's Shortform

Butterfly ideas [LW · GW]?
By default I expect the author to have a pretty strong stance on the main idea of a post, also the content are usually already refined and complete, so the barrier of entry to having a comment that is valuable is higher.

papetoast on papetoast's Shortforms

Bob can choose whether to to hide this waste (at a cost of the utility loss by having $300 and worse listening experience, but a "benefit" of misleading Tim about his misplaced altruism)

True in my example. I acknowledge that my example is wrong and should have been more explicit about having an alternative. Quoting myself from the comment to Vladimir_Nesov:

Anyways, the unwritten thing is that Bob care about having a quality headphone and a good pair of shoes equally. So given that he already has an alright headphone, he would get more utility by buying a good pairs of shoes instead. It is essentially a choice between (a) getting a $300 headphone and (b) getting a $100 headphone and a $300 pair of shoes.

If the bad translation is good enough that the incremental value of a good translation doesn't justify doing it, then that is your answer.

I do accept this as the rational answer, doesn't mean it is not irritating. If A (skillful translator) cares about having a good translation of X slightly more than Y, and B (poor translator) cares about Y much more than X. If B can act first, he can work on X and "force" A (via expected utility) to work on Y. This is a failure of mine to not talk about difference in preference in my examples and expect people to extrapolate and infer it out.

adamzerner on adamzerner's Shortform

I've noticed that there's a pretty big difference in the discussion that follows from me showing someone a draft of a post and asking for comments and the discussion in the comments section after I publish a post. The former is richer and more enjoyable whereas the latter doesn't usually result in much back and forth. And I get the sense that this is true for other authors as well.

I guess one important thing might be that with drafts, you're talking to people who you know. But I actually don't suspect that this plays much of a role, at least on LessWrong. As an anecdote, I've had some incredible conversations with the guy who reviews drafts of posts on LessWrong for free and I had never talked to him previously.

I wonder what it is about drafts. I wonder if it can or should be incorporated into regular posts.

gilch on Open Thread Fall 2024

That seems to be getting into Game Theory [? · GW] territory. One can model agents (players) with different strategies, even suboptimal ones. A lot of the insight from Game Theory isn't just about how to play a better strategy, but how changing the rules affects the game.

papetoast on papetoast's Shortforms

Again, seems like we are in agreement lol. I agree with what you said and I meant that, but tried to compress it into one sentence and failed to communicate.

jiro on The Online Sports Gambling Experiment Has Failed

This is the core reason why it is so difficult for ordinary people to pay their bills or raise families, despite earnings that would make them rich elsewhere or elsewhen. These productive actions are severely restricted, because if you are going to be productive then you have to do so ‘correctly’ and obey all sorts of rules and requirements.

There are plenty of good things that aren't restricted and bad things that are. But elites are human and aren't going to get it right every time, and you'll notice most the cases where they got it wrong.