LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

On OpenAI’s Model Spec
Zvi · 2024-06-21T13:00:03.014Z · comments (3)

Vaniver's thoughts on Anthropic's RSP
Vaniver · 2023-10-28T21:06:07.323Z · comments (4)

Untrustworthy models: a frame for scheming evaluations
Olli Järviniemi (jarviniemi) · 2024-08-19T16:27:11.088Z · comments (3)

Decision Theory in Space
lsusr · 2024-08-18T07:02:11.847Z · comments (18)

Forecasting One-Shot Games
Raemon · 2024-08-31T23:10:05.475Z · comments (0)

I finally got ChatGPT to sound like me
lsusr · 2024-09-17T09:39:59.415Z · comments (18)

Conflating value alignment and intent alignment is causing confusion
Seth Herd · 2024-09-05T16:39:51.967Z · comments (17)

Interested in Cognitive Bootcamp?
Raemon · 2024-09-19T22:12:13.348Z · comments (0)

Atlantis: Berkeley event venue available for rent
Jonas V (Jonas Vollmer) · 2023-11-22T01:47:12.026Z · comments (0)

Exercise: Planmaking, Surprise Anticipation, and "Baba is You"
Raemon · 2024-02-24T20:33:49.574Z · comments (19)

AI #32: Lie Detector
Zvi · 2023-10-05T13:50:05.030Z · comments (19)

[link] Rational Animations' intro to mechanistic interpretability
Writer · 2024-06-14T16:10:57.015Z · comments (1)

[link] NYT on the Manifest forecasting conference
Austin Chen (austin-chen) · 2023-10-09T21:40:16.732Z · comments (14)

[link] Loneliness and suicide mitigation for students using GPT3-enabled chatbots (survey of Replika users in Nature)
Kaj_Sotala · 2024-01-23T14:05:40.986Z · comments (2)

On Tapping Out
Screwtape · 2023-11-17T03:23:55.880Z · comments (13)

[link] Towards Evaluating AI Systems for Moral Status Using Self-Reports
Ethan Perez (ethan-perez) · 2023-11-16T20:18:51.730Z · comments (3)

AI #36: In the Background
Zvi · 2023-11-02T18:00:01.803Z · comments (5)

[link] I'd also take $7 trillion
bhauth · 2024-02-19T03:31:45.552Z · comments (12)

Monthly Roundup #18: May 2024
Zvi · 2024-05-13T12:30:04.863Z · comments (10)

We ran an AI safety conference in Tokyo. It went really well. Come next year!
Blaine (blaine-rogers) · 2024-07-17T06:55:39.620Z · comments (1)

[link] AlphaGeometry: An Olympiad-level AI system for geometry
alyssavance · 2024-01-17T17:17:30.913Z · comments (9)

AI #53: One More Leap
Zvi · 2024-02-29T16:10:04.049Z · comments (0)

[link] Paper: Tell, Don't Show- Declarative facts influence how LLMs generalize
Owain_Evans · 2023-12-19T19:14:26.423Z · comments (4)

[link] How people stopped dying from diarrhea so much (& other life-saving decisions)
Writer · 2024-03-16T16:00:47.830Z · comments (0)

A starting point for making sense of task structure (in machine learning)
Kaarel (kh) · 2024-02-24T01:51:49.227Z · comments (2)

[link] Book review: Everything Is Predictable
PeterMcCluskey · 2024-05-27T03:33:53.857Z · comments (0)

AI #72: Denying the Future
Zvi · 2024-07-11T15:00:05.865Z · comments (8)

AI #54: Clauding Along
Zvi · 2024-03-07T16:00:05.066Z · comments (11)

Dating Roundup #3: Third Time’s the Charm
Zvi · 2024-05-08T13:30:03.232Z · comments (26)

The Gemini Incident Continues
Zvi · 2024-02-27T16:00:05.648Z · comments (6)

Some open-source dictionaries and dictionary learning infrastructure
Sam Marks (samuel-marks) · 2023-12-05T06:05:21.903Z · comments (7)

New intro textbook on AIXI
Alex_Altair · 2024-05-11T18:18:50.945Z · comments (5)

[link] Book review: Deep Utopia
PeterMcCluskey · 2024-04-23T19:55:50.417Z · comments (14)

Applying refusal-vector ablation to a Llama 3 70B agent
Simon Lermen (dalasnoin) · 2024-05-11T00:08:08.117Z · comments (12)

Quick thoughts on the implications of multi-agent views of mind on AI takeover
Kaj_Sotala · 2023-12-11T06:34:06.395Z · comments (14)

Things Solenoid Narrates
Solenoid_Entity · 2024-04-12T23:57:16.169Z · comments (2)

[link] AI Rights for Human Safety
Simon Goldstein (simon-goldstein) · 2024-08-01T23:01:07.252Z · comments (6)

Startup Roundup #2
Zvi · 2024-08-06T13:30:06.554Z · comments (0)

AI #80: Never Have I Ever
Zvi · 2024-09-10T17:50:08.074Z · comments (20)

How to Give in to Threats (without incentivizing them)
Mikhail Samin (mikhail-samin) · 2024-09-12T15:55:50.384Z · comments (21)

Economics Roundup #3
Zvi · 2024-09-10T13:50:06.955Z · comments (9)

Commonsense Good, Creative Good
jefftk (jkaufman) · 2023-09-27T19:50:07.486Z · comments (11)

[link] Open Sourcing Metaculus
ChristianWilliams · 2024-07-02T22:30:01.339Z · comments (0)

[link] Fluent dreaming for language models (AI interpretability method)
tbenthompson (ben-thompson) · 2024-02-06T06:02:59.296Z · comments (4)

[link] Chinese scientists acknowledge xrisk & call for international regulatory body [Linkpost]
Akash (akash-wasil) · 2023-11-01T13:28:43.723Z · comments (4)

[link] Amazon to invest up to $4 billion in Anthropic
Davis_Kingsley · 2023-09-25T14:55:35.983Z · comments (8)

[link] LLM Evaluators Recognize and Favor Their Own Generations
Arjun Panickssery (arjun-panickssery) · 2024-04-17T21:09:12.007Z · comments (1)

Back to Basics: Truth is Unitary
lsusr · 2024-03-29T21:10:33.399Z · comments (13)

ProLU: A Nonlinearity for Sparse Autoencoders
Glen Taggart · 2024-04-23T14:09:21.592Z · comments (4)

Auditing failures vs concentrated failures
ryan_greenblatt · 2023-12-11T02:47:35.703Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

redman on Luck based medicine: my resentful story of becoming a medical miracle

https://www.nature.com/articles/d41586-024-03129-3 this is separate research. It looks like this will happen, and it will come from somewhere other than the west.

mako-yass on Pronouns are Annoying

Hot take: Prevalence of gender transition in male-majority fields is attempt to restore pronoun compression efficiency.

Communication efficiency is just that important.

mako-yass on Pronouns are Annoying

You need just enough of them to distinguish subjects, but not so much that they lose their intuitive meaning. When cops are interviewing witnesses about a suspect, they’ll glom onto easily observable and distinguishing physical traits. Was the suspect a man or a woman? White or black? Tall or short?

Yeah.

Notably, basically all of the people I've known who have asked for neutral pronouns were also visibly of indeterminate gender (for instance, mid-transition), and over time their preferred/accepted pronouns always lined up with what a person would guess by looking at them.

This is generally the norm.

If you've encountered a lot of genderqueer people with non-obvious pronoun preferences, and they're pushy about them, that's probably a product of some kinda perverse selection process. In the least, whatever is causing those people to be annoying about that is not the queerness per se.

gb on Alignment by default: the simulation hypothesis

Sure. But I think you’re reading my argument to be stronger than I mean it to be. Which is partially my fault since I made my previous replies a bit too short, and for that I apologize.

What I’m doing here is presenting one particular simulation scenario that (to me) seems quite plausible within the realm of simulations. I’m not claiming that that one scenario dominates all others combined. But luckily that stronger claim is really not necessary to argue against Eliezer’s point: the weaker one suffices. Indeed, if the scenario I’m presenting is more than 4.5e-10 likely (and I do think it’s much more likely than that, probably by a few orders of magnitude), than it is more than enough to outweigh the practical cost of the ASI having to build a Dyson shell with a hole with the order of 4.5e-10 of it’s surface area.

Now, that scenario is (I claim) the most likely one, conditional of course on a simulation taking place to begin with. The other candidate simulation scenarios are various, and none of them seems particularly likely, though combined they might well outweigh this one in terms of mass probability, as I already acknowledged. But so what? Are you really claiming that the distribution of those other simulation scenarios is skewed enough to tilt the scales back to the doom side? It might be, but that’s a much harder argument to make. I’m approximately completely unsure, which seems way better than the 99%+ chance Eliezer seems to give to total doom. So I guess I’d count that as good news.

arthur-conmy on peterbarnett's Shortform

The Improbability Principle sounds close. The summary seems to suggest law of large numbers is one part of the pop science book, but admittedly some of the other parts ("probability lever") seem less relevant

quila on Doing Nothing Utility Function

If not (paused): utility=NormalUtilityFunction

If NormalUtilityFunction is not indifferent about being paused, then that has the usual problem where the utility function prefers worlds where its agent avoids being paused or shut down.

I'll try modifying your proposal to instead use UtilityFunction2, which assigns, to worlds where its agent ends up paused, utility equal to the average value of {worlds where the agent will not end up paused} in the agent's prior as it was just before updating on observing being paused. I think this agent would not try to specifically avoid action paths where it may end up paused. This doesn't mean it wouldn't still instrumentally converge and start disassembling things^[1], just that if you tried to pause it, it wouldn't have a direct incentive to specifically try to stop you, because if you succeed it still assigns utility equal to that of the average world in its prior where you didn't.

The latter function also leads something which may be unintuitive. Consider if this utility function were to evaluate two possible actions: on action (1), which has a 99% of being paused, and results in an average of 10 utility if not paused; and action (2), which has a 0% chance of being paused, and results in 5 utility if not paused. In this case, the stated function prefers action (1). (That may or may not be a problem; this is just an observation).

Also, note how "the average value of {worlds where the agent will not end up paused} in the agent's hypothesis space as it was just before updating on observing being paused" does depend on its own action policy. Specifically, the value of the average world where the agent does not end up paused could be higher or lower conditional on whether it tries to resist being paused (as resisting would add more worlds to the set of worlds whose values are being averaged).

^{^}
If this utility function is otherwise a classical physical-thing-maximizer

wei-dai on Being nicer than Clippy

but we only need one person or group who we’d be somewhat confident would do alright in CEV. Plausibly there are at least a few eg MIRIers who would satisfy this.

Why do you think this, and how would you convince skeptics? And there are two separate issues here. One is how to know their CEV won't be corrupted relative to what their values really are or should be, and the other is how to know that their real/normative values are actually highly altruistic. It seems hard to know both of these, and perhaps even harder to persuade others who may be very distrustful of such person/group from the start.

Another is that even if we don’t die of AI, we get eaten by various moloch instead of being able to safely solve the necessary problems at whatever pace is necessary.

Would be interested in understanding your perspective on this better. I feel like aside from AI, our world is not being eaten by molochs very quickly, and I prefer something like stopping AI development and doing (voluntary and subsidized) embryo selection to increase human intelligence for a few generations, then letting the smarter humans decide what to do next. (Please contact me via PM if you want to have a chat about this.)

linch on Stanislav Petrov Quarterly Performance Review

Mild spoilers for a contemporary science-fiction book, but the second half was a major plot point in

The Dark Forest, the sequel to Three-Body Problem

philip-bellew on The Sun is big, but superintelligences will not spare Earth a little sunlight

Each of these carry assumptions about reality I'm not convinced a superintelligence would share. Though it may be able to find the answer in some cases.

We'd be just as likely for it to choose to preserve out of some sense of amusement or preservation.

To use the OP example: billionaire won't spare everyone 78 bucks, but will spend more on things he prefers. Some keep private zoos or other stuff which only purpose is anti boredom.

Making the intelligence like us won't eliminate the problem. There are plenty of fail states for humanity where it isn't extinct. But while we pave over ant colonies and actively hunt wild hogs as a nuisance, there are lots of human cultures that won't do the same to cats. I hope that isn't the best we can do, but it's probably better than extinction.

gwern on Lao Mein's Shortform

They (historically) had a large head start(up) on being scaling-pilled and various innovations like RLHF/instruction-tuning*, while avoiding pathologies of other organizations, and currently enjoy some incumbent advantages like what seems like far more compute access via MS than Anthropic gets through its more limited partnerships. There is, of course, no guarantee [LW(p) · GW(p)] any of that will last, and it generally seems like (even allowing for the unknown capabilities of GPT-5 and benefits from o1 and everything else under the hood) the OA advantage over everyone else has been steadily eroding since May 2020.

* which as much as I criticize the side-effects, have been crucial in democratizing LLM use for everybody who just wants to something done instead of learning the alien mindset of prompt-programming a base model