LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

The Limitations of GPT-4
p.b. · 2023-11-24T15:30:30.933Z · comments (12)

Singular learning theory and bridging from ML to brain emulations
kave · 2023-11-01T21:31:54.789Z · comments (16)

The Overkill Conspiracy Hypothesis
ymeskhout · 2023-10-20T16:51:20.308Z · comments (8)

Vote in the LessWrong review! (LW 2022 Review voting phase)
habryka (habryka4) · 2024-01-17T07:22:17.921Z · comments (9)

Losing Metaphors: Zip and Paste
jefftk (jkaufman) · 2023-11-29T20:31:07.464Z · comments (6)

What is the best argument that LLMs are shoggoths?
JoshuaFox · 2024-03-17T11:36:23.636Z · comments (22)

[link] Let's Design A School, Part 2.1 School as Education - Structure
Sable · 2024-05-02T22:04:30.435Z · comments (2)

D&D.Sci Hypersphere Analysis Part 3: Beat it with Linear Algebra
aphyer · 2024-01-16T22:44:52.424Z · comments (1)

Quick takes on "AI is easy to control"
So8res · 2023-12-02T22:31:45.683Z · comments (49)

Causality is Everywhere
silentbob · 2024-02-13T13:44:49.952Z · comments (12)

Three Types of Constraints in the Space of Agents
Nora_Ammann · 2024-01-15T17:27:27.560Z · comments (3)

Meetup In a Box: Year In Review
Czynski (JacobKopczynski) · 2024-02-14T01:18:28.259Z · comments (0)

Evaluating Solar
jefftk (jkaufman) · 2024-02-17T21:50:04.783Z · comments (5)

How do LLMs give truthful answers? A discussion of LLM vs. human reasoning, ensembles & parrots
Owain_Evans · 2024-03-28T02:34:21.799Z · comments (0)

AI #57: All the AI News That’s Fit to Print
Zvi · 2024-03-28T11:40:05.435Z · comments (14)

[link] Positive visions for AI
L Rudolf L (LRudL) · 2024-07-23T20:15:26.064Z · comments (4)

A New Class of Glitch Tokens - BPE Subtoken Artifacts (BSA)
Lao Mein (derpherpize) · 2024-09-20T13:13:26.181Z · comments (7)

Optimizing Repeated Correlations
SatvikBeri · 2024-08-01T17:33:23.823Z · comments (1)

[link] Can a Bayesian Oracle Prevent Harm from an Agent? (Bengio et al. 2024)
mattmacdermott · 2024-09-01T07:46:26.647Z · comments (0)

LessWrong email subscriptions?
Raemon · 2024-08-27T21:59:56.855Z · comments (6)

The Wisdom of Living for 200 Years
Martin Sustrik (sustrik) · 2024-06-28T04:44:10.609Z · comments (3)

[link] Beware the science fiction bias in predictions of the future
Nikita Sokolsky (nikita-sokolsky) · 2024-08-19T05:32:47.372Z · comments (20)

[link] Fictional parasites very different from our own
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-08T14:59:39.080Z · comments (0)

[question] What's the Deal with Logical Uncertainty?
Ape in the coat · 2024-09-16T08:11:43.588Z · answers+comments (21)

[link] Announcing Open Philanthropy's AI governance and policy RFP
Julian Hazell (julian-hazell) · 2024-07-17T02:02:39.933Z · comments (0)

Housing Roundup #9: Restricting Supply
Zvi · 2024-07-17T12:50:05.321Z · comments (8)

Distinguish worst-case analysis from instrumental training-gaming
Olli Järviniemi (jarviniemi) · 2024-09-05T19:13:34.443Z · comments (0)

[link] A primer on the next generation of antibodies
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-01T22:37:59.207Z · comments (0)

The Drowning Child
Tomás B. (Bjartur Tómas) · 2023-10-22T16:39:53.016Z · comments (8)

[link] OpenAI Superalignment: Weak-to-strong generalization
Dalmert · 2023-12-14T19:47:24.347Z · comments (3)

[question] Impressions from base-GPT-4?
mishka · 2023-11-08T05:43:23.001Z · answers+comments (25)

Fertility Roundup #2
Zvi · 2023-10-17T13:20:01.901Z · comments (30)

Changing Contra Dialects
jefftk (jkaufman) · 2023-10-26T17:30:10.387Z · comments (2)

Improving SAE's by Sqrt()-ing L1 & Removing Lowest Activating Features
Logan Riggs (elriggs) · 2024-03-15T16:30:00.744Z · comments (5)

[link] Was a Subway in New York City Inevitable?
Jeffrey Heninger (jeffrey-heninger) · 2024-03-30T00:53:21.314Z · comments (4)

$250K in Prizes: SafeBench Competition Announcement
ozhang (oliver-zhang) · 2024-04-03T22:07:41.171Z · comments (0)

[link] The Coming Wave
PeterMcCluskey · 2023-09-28T22:59:58.551Z · comments (1)

[question] What ML gears do you like?
Ulisse Mini (ulisse-mini) · 2023-11-11T19:10:11.964Z · answers+comments (4)

[link] Report: Evaluating an AI Chip Registration Policy
Deric Cheng (deric-cheng) · 2024-04-12T04:39:45.671Z · comments (0)

Control Symmetry: why we might want to start investigating asymmetric alignment interventions
domenicrosati · 2023-11-11T17:27:10.636Z · comments (1)

Why I got the smallpox vaccine in 2023
joec · 2023-10-02T05:11:41.249Z · comments (6)

[link] **In defence of Helen Toner, Adam D'Angelo, and Tasha McCauley**
mrtreasure · 2023-12-06T02:02:32.004Z · comments (3)

Evaluating hidden directions on the utility dataset: classification, steering and removal
Annah (annah) · 2023-09-25T17:19:13.988Z · comments (3)

Weighing Animal Worth
jefftk (jkaufman) · 2023-09-28T13:50:06.752Z · comments (11)

Useful starting code for interpretability
eggsyntax · 2024-02-13T23:13:47.940Z · comments (2)

Is Yann LeCun strawmanning AI x-risks?
Chris_Leong · 2023-10-19T11:35:08.167Z · comments (4)

If a little is good, is more better?
DanielFilan · 2023-11-04T07:10:05.943Z · comments (16)

Virtually Rational - VRChat Meetup
Tomás B. (Bjartur Tómas) · 2024-01-28T05:52:36.934Z · comments (3)

Testing for consequence-blindness in LLMs using the HI-ADS unit test.
David Scott Krueger (formerly: capybaralet) (capybaralet) · 2023-11-24T23:35:29.560Z · comments (2)

AXRP Episode 30 - AI Security with Jeffrey Ladish
DanielFilan · 2024-05-01T02:50:04.621Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

matthew-barnett on ASIs will not leave just a little sunlight for Earth

Workers regularly trade with billionaires and earn more than $77 in wages, despite vast differences in wealth. Countries trade with each other despite vast differences in military power. In fact, some countries don't even have military forces, or at least have a very small one, and yet do not get invaded by their neighbors or by the United States.

It is possible that these facts are explained by generosity on behalf of billionaires and other countries, but the standard social science explanation says that this is not the case. Rather, the standard explanation is that war is usually (though not always) more costly than trade, when compromise is a viable option. Thus, people usually choose to trade, rather than go to war with each other when they want stuff. This is true even in the presence of large differences in power.

I mostly don't see this post as engaging with any of the best reasons one might expect smarter-than-human AIs to compromise with humans. By contrast to you, I think it's important to note that AIs will be created within an existing system of law and property rights. Unlike animals, they'll be able to communicate with us and make contracts. It therefore seems perfectly plausible for AIs to simply get rich within the system we have already established, and make productive compromises, rather than violently overthrowing the system itself.

That doesn't rule out the possibility that the future will be very alien, or that it will turn out in a way that humans do not endorse. I'm also not saying that humans will always own all the wealth and control everything permanently forever. I'm simply arguing against the point that smart AIs will automatically turn violent and steal from agents who are less smart than they are, unless they're value aligned. This is a claim that I don't think has been established with any reasonable degree of rigor.

richard_kennaway on We Don't Know Our Own Values, but Reward Bridges The Is-Ought Gap

I can still be interested, even if I don't have the answers.

hmys on My 10-year retrospective on trying SSRIs

I've used SSRIs for maybe 5 years, and I think they've been really useful, with no negative effects, and more or less unwavering efficacy. The only exception is that they've non-negligibly lowered my libido. But to be honest, I don't mind it that much.

Also, few times where I've had to not use them for a while (travelling and was very stupid not to bring enough), the withdrawal effects were quite strange and somewhat scary.

I also feel they had some very strange positive effects. Like I think they made my reaction time improve by quite a bit. Although it could be something random coinciding with starting SSRIs. Or just me being confused. I haven't tested it. On humanbenchmark I score around the same now as I did in high school. But I feel like I can catch falling things with much better regularity, and this was an almost immediate effect after starting.

cosmia_nebula on Applications of Chaos: Saying No (with Hastings Greer)

It seems to me that chaos control and anti-control is another non-application.

[Handbook of Chaos Control: Schöll, Eckehard, Schuster, Heinz Georg](https://www.amazon.com/Handbook-Chaos-Control-Eckehard-Sch%C3%B6ll/dp/3527406050)

keltan on keltan's Shortform

If I had sufficient funds. I would consider if it would be beneficial to invade a few subreddits, and offer $1000 to whoever can make the most viral meme that subtly teaches basic concepts of AI Doom.

This thought stems from a comment on “The Best Lay Argument is not a Simple English Yud Essay”. I have more thoughts, but not much time. If my reasoning is unclear I apologise.

benjy_forstadt on Another argument against utility-centric alignment paradigms

I don’t think the way you split things up into Alpha and Beta quite carves things at the joints. If you take an individual human as Beta, then stuff like “eudaimonia” is in Alpha - it’s a concept in the cultural environment that we get exposed to and sometimes come to value. The vast majority of an individual human’s values are not new abstractions that we develop over the course of our training process (for most people at least).

yonatan-cale-1 on My 10-year retrospective on trying SSRIs

Thanks for sharing <3

My main concern about trying SSRIs is that they'll make me stop noticing certain things that I care about, things that currently manifest as anxiety or so.

Opinions?

zack_m_davis on ASIs will not leave just a little sunlight for Earth

if there's a bunch of superintelligences running around and they don't care about you—no, they will not spare just a little sunlight to keep Earth alive.

Yes, I agree that this conditional statement is obvious. But while we're on the general topic of whether Earth will be kept alive, it would be nice to see some engagement with Paul Christiano's arguments (which Carl Shulman "agree[s] with [...] approximately in full" [LW(p) · GW(p)]) that superintelligences might care about what happens to you a little bit, articulated in a comment thread on Soares's "But Why Would the AI Kill Us?" [LW(p) · GW(p)] and another thread on "Cosmopolitan Values Don't Come Free" [LW(p) · GW(p)],

The reason I think this is important is because "[t]o argue against an idea honestly, you should argue against the best arguments of the strongest advocates" [LW · GW]: if you write 3000 words inveighing against people who think comparative advantage means that horses can't get sent to glue factories, that doesn't license the conclusion that superintelligence Will Definitely Kill You if there are other reasons why superintelligence Might Not Kill You that don't stop being real just because very few people have the expertise to formulate them carefully.

(An important caveat: the possibility of superintelligences having human-regarding preferences may or may not be comforting: as a fictional illustration [LW · GW] of some relevant considerations, the Superhappies in "Three Worlds Collide" [LW · GW] cared about the humans to some extent, but not in the specific way [LW · GW] that the humans wanted to be cared for.)

Now, you are on the record stating [LW(p) · GW(p)] that you "sometimes mention the possibility of being stored and sold to aliens a billion years later, which seems to [you] to validly incorporate most all the hopes and fears and uncertainties that should properly be involved, without getting into any weirdness that [you] don't expect Earthlings to think about validly." If that's all you have to say on the matter, fine. (Given the premise of AIs spending some fraction of their resources on human-regarding preferences, I agree that uploads look a lot more efficient than literally saving the physical Earth!)

But you should take into account that if you're strategically dumbing down your public communication in order to avoid topics that you don't trust Earthlings to think about validly—and especially if you have a general policy of systematically ignoring counterarguments that it would be politically inconvenient for you to address [LW · GW]—you should expect that Earthlings who are trying to achieve the map that reflects the territory will correspondingly attach much less weight to your words, because we have to take into account how hard you're trying to epistemically screw us over by filtering the evidence [LW · GW].

No more than Bernard Arnalt, having $170 billion, will surely give you $77.

Bernald Arnalt has given eight-figure amounts to charity. Someone who reasoned, "Arnalt is so rich, surely he'll spare a little for the less fortunate" would in fact end up making a correct prediction about Bernald Arnalt's behavior!

Obviously, it would not be valid to conclude "... and therefore superintelligences will, too", because superintelligences and Bernald Arnalt are very different things. But you chose the illustrative example! As a matter of local validity [LW · GW], It doesn't seem like a big ask for illustrative examples to in fact illustrate what what they purport to.

dkl9 on How harmful is music, really?

I added intention-to-treat statistics in an addendum.

quetzal_rainbow on ASIs will not leave just a little sunlight for Earth

In this analogy, you:every other human::humanity:every other stuff AI can care about. Arnault can give money to dying people in Africa (I have no idea who he is as person, I'm just guessing), but he has no particular reasons to give them to you specifically and not to the most profitable investment/most efficient charity.