LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] My Apartment Art Commission Process
jenn (pixx) · 2024-08-26T18:36:44.363Z · comments (4)

[link] Inferring the model dimension of API-protected LLMs
Ege Erdil (ege-erdil) · 2024-03-18T06:19:25.974Z · comments (3)

Motivating Alignment of LLM-Powered Agents: Easy for AGI, Hard for ASI?
RogerDearnaley (roger-d-1) · 2024-01-11T12:56:29.672Z · comments (4)

LessWrong: After Dark, a new side of LessWrong
So8res · 2024-04-01T22:44:04.449Z · comments (5)

CHAI internship applications are open (due Nov 13)
Erik Jenner (ejenner) · 2023-10-26T00:53:49.640Z · comments (0)

[link] The $100B plan with "70% risk of killing us all" w Stephen Fry [video]
Oleg Trott (oleg-trott) · 2024-07-21T20:06:39.615Z · comments (8)

AI Impacts Survey: December 2023 Edition
Zvi · 2024-01-05T14:40:06.156Z · comments (6)

How to develop a photographic memory 1/3
PhilosophicalSoul (LiamLaw) · 2023-12-28T13:26:36.669Z · comments (6)

Unpicking Extinction
ukc10014 · 2023-12-09T09:15:41.291Z · comments (10)

My Trial Period as an Independent Alignment Researcher
Bart Bussmann (Stuckwork) · 2023-08-08T14:16:35.122Z · comments (1)

AXRP Episode 33 - RLHF Problems with Scott Emmons
DanielFilan · 2024-06-12T03:30:05.747Z · comments (0)

Monthly Roundup #12: November 2023
Zvi · 2023-11-14T15:20:06.926Z · comments (5)

[link] Why Yudkowsky is wrong about "covalently bonded equivalents of biology"
titotal (lombertini) · 2023-12-06T14:09:15.402Z · comments (40)

Video and transcript of presentation on Otherness and control in the age of AGI
Joe Carlsmith (joekc) · 2024-10-08T22:30:38.054Z · comments (1)

Difficulty classes for alignment properties
Jozdien · 2024-02-20T09:08:24.783Z · comments (5)

Could randomly choosing people to serve as representatives lead to better government?
John Huang · 2024-10-21T17:10:20.920Z · comments (3)

The Schumer Report on AI (RTFB)
Zvi · 2024-05-24T15:10:03.122Z · comments (3)

The murderous shortcut: a toy model of instrumental convergence
Thomas Kwa (thomas-kwa) · 2024-10-02T06:48:06.787Z · comments (0)

Computational Mechanics Hackathon (June 1 & 2)
Adam Shai (adam-shai) · 2024-05-24T22:18:44.352Z · comments (5)

[link] legged robot scaling laws
bhauth · 2024-01-20T05:45:56.632Z · comments (8)

Adam Smith Meets AI Doomers
James_Miller · 2024-01-31T15:53:03.070Z · comments (10)

[link] hydrogen tube transport
bhauth · 2024-04-18T22:47:08.790Z · comments (12)

[link] GPT2, Five Years On
Joel Burget (joel-burget) · 2024-06-05T17:44:17.552Z · comments (0)

[link] AI governance needs a theory of victory
Corin Katzke (corin-katzke) · 2024-06-21T16:15:46.560Z · comments (6)

Linear encoding of character-level information in GPT-J token embeddings
mwatkins · 2023-11-10T22:19:14.654Z · comments (4)

D&D.Sci (Easy Mode): On The Construction Of Impossible Structures
abstractapplic · 2024-05-17T00:25:42.950Z · comments (12)

Reflective consistency, randomized decisions, and the dangers of unrealistic thought experiments
Radford Neal · 2023-12-07T03:33:16.149Z · comments (25)

[link] Robin Hanson & Liron Shapira Debate AI X-Risk
Liron · 2024-07-08T21:45:40.609Z · comments (4)

ARENA4.0 Capstone: Hyperparameter tuning for MELBO + replication on Llama-3.2-1b-Instruct
25Hour (aaron-kaufman) · 2024-10-05T11:30:11.953Z · comments (2)

Trying to deconfuse some core AI x-risk problems
habryka (habryka4) · 2023-10-17T18:36:56.189Z · comments (13)

Direction of Fit
NicholasKees (nick_kees) · 2023-10-02T12:34:24.385Z · comments (0)

If You Can Climb Up, You Can Climb Down
jefftk (jkaufman) · 2024-07-30T00:00:06.295Z · comments (9)

What I Learned (Conclusion To "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-03-20T21:24:37.464Z · comments (0)

Exploring SAE features in LLMs with definition trees and token lists
mwatkins · 2024-10-04T22:15:28.108Z · comments (4)

[link] Book review: On the Edge
PeterMcCluskey · 2024-08-30T22:18:39.581Z · comments (0)

(Maybe) A Bag of Heuristics is All There Is & A Bag of Heuristics is All You Need
Sodium · 2024-10-03T19:11:58.032Z · comments (16)

Finding the Wisdom to Build Safe AI
Gordon Seidoh Worley (gworley) · 2024-07-04T19:04:16.089Z · comments (10)

Copyright Confrontation #1
Zvi · 2024-01-03T15:50:04.850Z · comments (7)

[link] Romae Industriae
Maxwell Tabarrok (maxwell-tabarrok) · 2024-07-19T13:03:31.536Z · comments (2)

AI #56: Blackwell That Ends Well
Zvi · 2024-03-21T12:10:05.412Z · comments (16)

[link] Suffering Is Not Pain
jbkjr · 2024-06-18T18:04:43.407Z · comments (45)

[question] If I have some money, whom should I donate it to in order to reduce expected P(doom) the most?
KvmanThinking (avery-liu) · 2024-10-03T11:31:19.974Z · answers+comments (33)

[link] The last era of human mistakes
owencb · 2024-07-24T09:58:42.116Z · comments (2)

[link] AI Safety Memes Wiki
plex (ete) · 2024-07-24T18:53:04.977Z · comments (1)

The Consciousness Box
GradualImprovement · 2023-12-11T16:45:08.172Z · comments (22)

[question] Do websites and apps actually generally get worse after updates, or is it just an effect of the fear of change?
lillybaeum · 2023-12-10T17:26:34.206Z · answers+comments (34)

[link] patent process problems
bhauth · 2024-07-14T21:12:04.953Z · comments (13)

ChatGPT 4 solved all the gotcha problems I posed that tripped ChatGPT 3.5
VipulNaik · 2023-11-29T18:11:53.252Z · comments (16)

Confusing the metric for the meaning: Perhaps correlated attributes are "natural"
NickyP (Nicky) · 2024-07-23T12:43:18.681Z · comments (3)

Monthly Roundup #20: July 2024
Zvi · 2024-07-23T12:50:07.991Z · comments (9)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

dana on Distinguishing ways AI can be "concentrated"

I do not really understand your framing of these three "dimensions". The way I see it, they form a dependency chain. If either of the first two are concentrated, they can easily cut off access during takeoff (and I would expect this). If both of the first two are diffuse, the third will necessarily also be diffuse.

How could one control AI without access to the hardware/software? What would stop one with access to the hardware/software from controlling AI?

jbash on The Mask Comes Off: At What Price?

Taxes enforced by whom?

Well, that's where the "safe" part comes in, isn't it?

I think a fair number of people would say that ASI/AGI can't be called "safe" if it's willing to wage war to physically take over the world on behalf of its owners, or to go around breaking laws all the time, or to thwart whatever institutions are supposed to make and enforce the laws. I'm pretty sure that even OpenAI's (present) "safety" department would have an issue if ChatGPT started saying stuff like "Sam Altman is Eternal Tax-Exempt God-King".

Personally, I go further than that. I'm not sure about "basic" AGI, but I'm pretty confident that very powerful ASI, the kind that would be capable of really total world domination, can't be called "safe" if it leaves really decisive power over anything in the hands of humans, individually or collectively, directly or via institutions. To be safe, it has to enforce its own ideas about how things should go. Otherwise the humans it empowers are probably going to send things south irretrievably fairly soon, and if they don't do so very soon they always still could, and you can't call that safe.

Yeah, that means you get exactly one chance to get "its own ideas" right, and no, I don't think that success is likely. I don't think it's technically likely to be able to "align" it to any particular set of values. I also don't think people or insitutions would make good choices about what values to give it even if they could. AND I don't think anybody can prevent it from getting built for very long. I put more hope in it being survivably unsafe (maybe because it just doesn't usually happen to care to do anything to/with humans), or on intelligence just not being that powerful, or whatever. Or even in it just luckily happening to at least do something less boring or annoying than paperclipping the universe or mass torture or whatever.

matthew-barnett on Against empathy-by-default

It is not always an expression of selfish motives when people take a stance against genocide. I would even go as far as saying that, in the majority of cases, people genuinely have non-selfish motives when taking that position. That is, they actually do care, to at least some degree, about the genocide, beyond the fact that signaling their concern helps them fit in with their friend group.

Nonetheless, and this is important: few people are willing to pay substantial selfish costs in order to prevent genocides that are socially distant from them.

The theory I am advancing here does not rest on the idea that people aren't genuine in their desire for faraway strangers to be better off. Rather, my theory is that people generally care little about such strangers, when helping those strangers trades off significantly against objectives that are closer to themselves, their family, friend group, and their own tribe.

Or, put another way, distant strangers usually get little weight in our utility function. Our family, and our own happiness, by contrast, usually get a much larger weight.

The core element of my theory concerns the amount that people care about themselves (and their family, friends, and tribe) versus other people, not whether they care about other people at all.

leogao on A Rocket–Interpretability Analogy

I think there are a whole bunch of inputs that determine a company's success. Research direction, management culture, engineering culture, product direction, etc. To be a really successful startup you often just need to have exceptional vision on one or a small number of these inputs, possibly even just once or twice. I'd guess it's exceedingly rare for a company to have leaders with consistently great vision across all the inputs that go into a company. Everything else will constantly revert towards local incentives. So, even in a company with top 1 percentile leadership vision quality, most things will still be messed up because of incentives most of the time.

steve2152 on Against empathy-by-default

Honest question: Suppose that my friends and other people whom I like and respect and trust all believe that genocide is very bad. I find myself (subconsciously) motivated to fit in with them, and I wind up adopting their belief that genocide is very bad. And then I take corresponding actions, by writing letters to politicians urging military intervention in Myanmar.

In your view, would that count as “selfish” because I “selfishly” benefit from ideologically fitting in with my friends and trusted leaders? Or would it count as “altruistic” because I am now moved by the suffering of some ethnic group across the world that I’ve never met and can’t even pronounce?

elityre on Bitter lessons about lucid dreaming

I don't have much information about your case, but I'd make a 1-to-1 bet that if you got up and wrote down your dreams first thing in the morning every morning, especially if you're woken up by an alarm for the first 3 times, that you'd start remembering your dreams. Just jot dow whatever you remember, however vague or instinct, upto and including "litterally nothing. The the last thing I remember is going to bed last night."

I rarely remember my own dreams, but in periods of my life when I've kept a dream journal, I easily remembered them.

david-johnston on A brief theory of why we think things are good or bad

For what it's worth, one idea I had as a result of our discussion was this:

We form lots of beliefs as a result of motivated reasoning
These beliefs are amenable to revision due to evidence, reason or (maybe) social pressure
Those beliefs that are largely resilient to these challenges are "moral foundations"

So philosophers like "pain is bad" as a moral foundation because we want to believe it + it is hard to challenge with evidence or reason. Laypeople probably have lots of foundational moral beliefs that don't stand up as well to evidence or reason, but (perhaps) are equally attributable to motivated reasoning.

Social pressure is a bit iffy to include because I think lots of people relate to beliefs that they adopted because of social pressure as moral foundations, and believing something because you're under pressure to do so is an instance of motivated reasoning.

I don't think this is a response to your objections, but I'm leaving it here in case it interests you.

thane-ruthenis on The Mask Comes Off: At What Price?

In a transformed-except-corporate-ownership-stays-the-same world, I don't see any reason such lottery winners' portion wouldn't increase asymptotically toward 100 percent, with nobody else getting anything at all.

Well yeah [LW · GW], exactly.

Even without an overtly revolutionary restructuring, I kind of doubt "OpenAI owns everything" would fly. Maybe corporate ownership would stay exactly the same, but there'd be a 99.999995 percent tax rate.

Taxes enforced by whom?

akash-wasil on What AI companies should do: Some rough ideas

Some ideas relating to comms/policy:

Communicate your models of AI risk to policymakers
- Help policymakers understand emergency scenarios (especially misalignment scenarios) and how to prepare for them
- Use your lobbying/policy teams primarily to raise awareness about AGI and help policymakers prepare for potential AGI-related global security risks.
Develop simple/clear frameworks that describe which dangerous capabilities you are tracking (I think OpenAI's preparedness framework is a good example, particularly RE simplicity/clarity/readability.)
Advocate for increased transparency into frontier AI development through measures like stronger reporting requirements, whistleblower mechanisms, embedded auditors/resident inspectors, etc.
Publicly discuss threat models (kudos to DeepMind [? · GW])
Engage in public discussions/debates with people like Hinton, Bengio, Hendrycks, Kokotajlo, etc.
Encourage employees to engage in such discussions/debates, share their threat models, etc.
Make capability forecasts public (predictions for when models would have XYZ capabilities)
Communicate under what circumstances you think major government involvement would be necessary (e.g., nationalization, "CERN for AI" setups).

evolutionbydesign on Advice on Communicating Concisely

Actually, both.

I started a AI club at my high school last year, and I've been (slowly) trying to teach other students the basics of deep learning. They generally come out of my 15-to-20 minute-long explanations confused, rather than understanding.
This too (I don't have a specific example in mind - I'll see if any pop up during school tomorrow)

I normally think what I'm saying is clear, but the result is that others don't understand what I mean when I finish saying it - which causes me to tack on hasty clarifications of my intentions / ideas.