LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[question] How Should We Use Limited Time to Maximize Long-Term Impact?
queelius · 2024-10-12T20:02:46.801Z · answers+comments (3)

[link] The Computational Complexity of Circuit Discovery for Inner Interpretability
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-10-17T13:18:46.378Z · comments (2)

[question] Has Eliezer publicly and satisfactorily responded to attempted rebuttals of the analogy to evolution?
kaler · 2024-07-28T12:23:40.671Z · answers+comments (13)

[question] I want a good multi-LLM API-powered chatbot
rotatingpaguro · 2024-09-08T09:40:52.736Z · answers+comments (3)

Request for advice: Research for Conversational Game Theory for LLMs
Rome Viharo (rome-viharo) · 2024-10-16T17:53:30.243Z · comments (0)

[link] HVM Superposition Node Program Search Flake
Johannes C. Mayer (johannes-c-mayer) · 2024-08-09T14:03:16.381Z · comments (0)

A Policy Proposal
phdead · 2024-09-29T20:45:34.745Z · comments (4)

Crafting Polysemantic Transformer Benchmarks with Known Circuits
Evan Anders (evan-anders) · 2024-08-23T22:03:15.288Z · comments (0)

A short project on Mamba: grokking & interpretability
Alejandro Tlaie (alejandro-tlaie-boria) · 2024-10-18T16:59:45.314Z · comments (0)

Pleasure and suffering are not conceptual opposites
MichaelStJules · 2024-08-11T18:32:30.359Z · comments (0)

Keyboard Gremlins
jefftk (jkaufman) · 2024-09-20T02:30:07.140Z · comments (0)

[link] Molecular dynamics data will be essential for the next generation of ML protein models
Abhishaike Mahajan (abhishaike-mahajan) · 2024-08-26T14:50:23.790Z · comments (0)

[question] Doing Nothing Utility Function
k64 · 2024-09-26T22:05:18.821Z · answers+comments (9)

Jailbreaking ChatGPT and Claude to Explain Almost ANYTHING
Jaehyuk Lim (jason-l) · 2024-10-21T21:34:37.579Z · comments (0)

Simultaneous Footbass and Footdrums II
jefftk (jkaufman) · 2024-08-11T23:50:01.982Z · comments (0)

Common Uses of "Acceptance"
Yi-Yang (yiyang) · 2024-07-26T11:18:30.719Z · comments (5)

[question] How tokenization influences prompting?
Boris Kashirin (boris-kashirin) · 2024-07-29T10:28:25.056Z · answers+comments (4)

An Interpretability Illusion from Population Statistics in Causal Analysis
Daniel Tan (dtch1997) · 2024-07-29T14:50:19.497Z · comments (3)

[link] Contagious Beliefs—Simulating Political Alignment
James Stephen Brown (james-brown) · 2024-10-13T00:27:08.084Z · comments (0)

Open letter to young EAs
Leif Wenar · 2024-10-11T19:49:10.818Z · comments (10)

Trying to understand Hanson's Cultural Drift argument
Kemp (ethan-kemp) · 2024-07-22T20:20:32.734Z · comments (3)

[question] What's a good book for a technically-minded 11-year old?
Martin Sustrik (sustrik) · 2024-10-19T06:05:12.178Z · answers+comments (19)

[link] Temporary Cognitive Hyperparameter Alteration
Jonathan Moregård (JonathanMoregard) · 2024-08-01T10:27:11.917Z · comments (0)

Rationalist Gnosticism
tailcalled · 2024-10-10T09:06:34.149Z · comments (10)

[question] Are UV-C Air purifiers so useful?
JohnBuridan · 2024-09-04T14:16:01.310Z · answers+comments (0)

Thinking About a Pedalboard
jefftk (jkaufman) · 2024-10-08T11:50:02.054Z · comments (2)

[link] The EA case for Trump
Judd Rosenblatt (judd) · 2024-08-03T01:00:45.422Z · comments (1)

[link] Virtue is a Vector
robotelvis · 2024-09-10T03:02:45.737Z · comments (1)

[link] Why do firms choose to be inefficient?
Nicholas D. (nicholas-d) · 2024-08-28T18:39:41.664Z · comments (4)

[question] Is this a Pivotal Weak Act? Creating bacteria that decompose metal
doomyeser · 2024-09-11T18:07:19.385Z · answers+comments (9)

Keeping it (less than) real: Against ℶ₂ possible people or worlds
quiet_NaN · 2024-09-13T17:29:44.915Z · comments (0)

[question] What are some of the proposals for solving the control problem?
Dakara (chess-ice) · 2024-08-14T23:04:44.863Z · answers+comments (0)

[question] What do you expect AI capabilities may look like in 2028?
nonzerosum · 2024-08-23T16:59:53.007Z · answers+comments (5)

Will AI and Humanity Go to War?
Simon Goldstein (simon-goldstein) · 2024-10-01T06:35:22.374Z · comments (4)

[link] Apply to Aether - Independent LLM Agent Safety Research Group
RohanS · 2024-08-21T09:47:11.493Z · comments (0)

[link] Physics of Language models (part 2.1)
Nathan Helm-Burger (nathan-helm-burger) · 2024-09-19T16:48:32.301Z · comments (2)

Electric Mandola
jefftk (jkaufman) · 2024-09-21T13:40:04.772Z · comments (0)

Becket First
jefftk (jkaufman) · 2024-09-22T17:10:04.304Z · comments (0)

AGI's Opposing Force
SimonBaars (simonbaars) · 2024-08-16T04:18:06.900Z · comments (2)

[link] In Praise of the Beatitudes
robotelvis · 2024-09-24T05:08:21.133Z · comments (7)

A Dialogue on Deceptive Alignment Risks
Rauno Arike (rauno-arike) · 2024-09-25T16:10:12.294Z · comments (0)

[link] What is autonomy? Why boundaries are necessary.
Chipmonk · 2024-10-21T17:56:33.722Z · comments (0)

HDBSCAN is Surprisingly Effective at Finding Interpretable Clusters of the SAE Decoder Matrix
Jaehyuk Lim (jason-l) · 2024-10-11T23:06:14.340Z · comments (2)

[link] Triangulating My Interpretation of Methods: Black Boxes by Marco J. Nathan
adamShimi · 2024-10-09T19:13:26.631Z · comments (0)

MIT FutureTech are hiring for a Head of Operations role
peterslattery · 2024-10-02T17:11:42.960Z · comments (0)

[link] [Linkpost] Automated Design of Agentic Systems
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-08-19T23:06:06.669Z · comments (1)

Three main arguments that AI will save humans and one meta-argument
avturchin · 2024-10-02T11:39:08.910Z · comments (8)

[link] Jailbreaking language models with user roleplay
loops (smitop) · 2024-09-28T23:43:10.870Z · comments (0)

My covid-related beliefs and questions
Severin T. Seehrich (sts) · 2024-07-23T03:27:09.348Z · comments (0)

[link] AI Safety at the Frontier: Paper Highlights, July '24
gasteigerjo · 2024-08-05T13:00:46.028Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

jbash on The Mask Comes Off: At What Price?

Taxes enforced by whom?

Well, that's where the "safe" part comes in, isn't it?

I think a fair number of people would say that ASI/AGI can't be called "safe" if it's willing to wage war to physically take over the world on behalf of its owners, or to go around breaking laws all the time, or to thwart whatever institutions are supposed to make and enforce the laws. I'm pretty sure that even OpenAI's (present) "safety" department would have an issue if ChatGPT started saying stuff like "Sam Altman is Eternal Tax-Exempt God-King".

Personally, I go further than that. I'm not sure about "basic" AGI, but I'm pretty confident that very powerful ASI, the kind that would be capable of really total world domination, can't be called "safe" if it leaves really decisive power over anything in the hands of humans, individually or collectively, directly or via institutions. To be safe, it has to enforce its own ideas about how things should go. Otherwise the humans it empowers are probably going to send things south irretrievably fairly soon, and if they don't do so very soon they always still could, and you can't call that safe.

Yeah, that means you get exactly one chance to get "its own ideas" right, and no, I don't think that success is likely. I don't think it's technically likely to be able to "align" it to any particular set of values. I also don't think people or insitutions would make good choices about what values to give it even if they could. AND I don't think anybody can prevent it from getting built for very long. I put more hope in it being survivably unsafe (maybe because it just doesn't usually happen to care to do anything to/with humans), or on intelligence just not being that powerful, or whatever. Or even in it just luckily happening to at least do something less boring or annoying than paperclipping the universe or mass torture or whatever.

matthew-barnett on Against empathy-by-default

It is not always an expression of selfish motives when people take a stance against genocide. I would even go as far as saying that, in the majority of cases, people genuinely have non-selfish motives when taking that position. That is, they actually do care, to at least some degree, about the genocide, beyond the fact that signaling their concern helps them fit in with their friend group.

Nonetheless, and this is important: few people are willing to pay substantial selfish costs in order to prevent genocides that are socially distant from them.

The theory I am advancing here does not rest on the idea that people aren't genuine in their desire for faraway strangers to be better off. Rather, my theory is that people generally care little about such strangers, when helping those strangers trades off significantly against objectives that are closer to themselves, their family, friend group, and their own tribe.

Or, put another way, distant strangers usually get little weight in our utility function. Our family, and our own happiness, by contrast, usually get a much larger weight.

The core element of my theory concerns the amount that people care about themselves (and their family, friends, and tribe) versus other people, not whether they care about other people at all.

leogao on A Rocket–Interpretability Analogy

I think there are a whole bunch of inputs that determine a company's success. Research direction, management culture, engineering culture, product direction, etc. To be a really successful startup you often just need to have exceptional vision on one or a small number of these inputs, possibly even just once or twice. I'd guess it's exceedingly rare for a company to have leaders with consistently great vision across all the inputs that go into a company. Everything else will constantly revert towards local incentives. So, even in a company with top 1 percentile leadership vision quality, most things will still be messed up because of incentives most of the time.

steve2152 on Against empathy-by-default

Honest question: Suppose that my friends and other people whom I like and respect and trust all believe that genocide is very bad. I find myself (subconsciously) motivated to fit in with them, and I wind up adopting their belief that genocide is very bad. And then I take corresponding actions, by writing letters to politicians urging military intervention in Myanmar.

In your view, would that count as “selfish” because I “selfishly” benefit from ideologically fitting in with my friends and trusted leaders? Or would it count as “altruistic” because I am now moved by the suffering of some ethnic group across the world that I’ve never met and can’t even pronounce?

elityre on Bitter lessons about lucid dreaming

I don't have much information about your case, but I'd make a 1-to-1 bet that if you got up and wrote down your dreams first thing in the morning every morning, especially if you're woken up by an alarm for the first 3 times, that you'd start remembering your dreams. Just jot dow whatever you remember, however vague or instinct, upto and including "litterally nothing. The the last thing I remember is going to bed last night."

I rarely remember my own dreams, but in periods of my life when I've kept a dream journal, I easily remembered them.

david-johnston on A brief theory of why we think things are good or bad

For what it's worth, one idea I had as a result of our discussion was this:

We form lots of beliefs as a result of motivated reasoning
These beliefs are amenable to revision due to evidence, reason or (maybe) social pressure
Those beliefs that are largely resilient to these challenges are "moral foundations"

So philosophers like "pain is bad" as a moral foundation because we want to believe it + it is hard to challenge with evidence or reason. Laypeople probably have lots of foundational moral beliefs that don't stand up as well to evidence or reason, but (perhaps) are equally attributable to motivated reasoning.

Social pressure is a bit iffy to include because I think lots of people relate to beliefs that they adopted because of social pressure as moral foundations, and believing something because you're under pressure to do so is an instance of motivated reasoning.

I don't think this is a response to your objections, but I'm leaving it here in case it interests you.

thane-ruthenis on The Mask Comes Off: At What Price?

In a transformed-except-corporate-ownership-stays-the-same world, I don't see any reason such lottery winners' portion wouldn't increase asymptotically toward 100 percent, with nobody else getting anything at all.

Well yeah [LW · GW], exactly.

Even without an overtly revolutionary restructuring, I kind of doubt "OpenAI owns everything" would fly. Maybe corporate ownership would stay exactly the same, but there'd be a 99.999995 percent tax rate.

Taxes enforced by whom?

akash-wasil on What AI companies should do: Some rough ideas

Some ideas relating to comms/policy:

Communicate your models of AI risk to policymakers
- Help policymakers understand emergency scenarios (especially misalignment scenarios) and how to prepare for them
- Use your lobbying/policy teams primarily to raise awareness about AGI and help policymakers prepare for potential AGI-related global security risks.
Develop simple/clear frameworks that describe which dangerous capabilities you are tracking (I think OpenAI's preparedness framework is a good example, particularly RE simplicity/clarity/readability.)
Advocate for increased transparency into frontier AI development through measures like stronger reporting requirements, whistleblower mechanisms, embedded auditors/resident inspectors, etc.
Publicly discuss threat models (kudos to DeepMind [? · GW])
Engage in public discussions/debates with people like Hinton, Bengio, Hendrycks, Kokotajlo, etc.
Encourage employees to engage in such discussions/debates, share their threat models, etc.
Make capability forecasts public (predictions for when models would have XYZ capabilities)
Communicate under what circumstances you think major government involvement would be necessary (e.g., nationalization, "CERN for AI" setups).

evolutionbydesign on Advice on Communicating Concisely

Actually, both.

I started a AI club at my high school last year, and I've been (slowly) trying to teach other students the basics of deep learning. They generally come out of my 15-to-20 minute-long explanations confused, rather than understanding.
This too (I don't have a specific example in mind - I'll see if any pop up during school tomorrow)

I normally think what I'm saying is clear, but the result is that others don't understand what I mean when I finish saying it - which causes me to tack on hasty clarifications of my intentions / ideas.

cubefox on Alexander Gietelink Oldenziel's Shortform

Yeah. I think the technical term for that would be cringe.