LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Idealized Agents Are Approximate Causal Mirrors (+ Radical Optimism on Agent Foundations)
Thane Ruthenis · 2023-12-22T20:19:13.865Z · comments (14)

SAE-VIS: Announcement Post
CallumMcDougall (TheMcDouglas) · 2024-03-31T15:30:49.079Z · comments (8)

[link] MIRI's June 2024 Newsletter
Harlan · 2024-06-14T23:02:23.721Z · comments (18)

Neural uncertainty estimation review article (for alignment)
Charlie Steiner · 2023-12-05T08:01:32.723Z · comments (3)

Mistakes people make when thinking about units
Isaac King (KingSupernova) · 2024-06-25T03:39:20.138Z · comments (14)

A Simple Toy Coherence Theorem
johnswentworth · 2024-08-02T17:47:50.642Z · comments (19)

Interpreting Preference Models w/ Sparse Autoencoders
Logan Riggs (elriggs) · 2024-07-01T21:35:40.603Z · comments (12)

Interpretability with Sparse Autoencoders (Colab exercises)
CallumMcDougall (TheMcDouglas) · 2023-11-29T12:56:21.608Z · comments (9)

Q&A on Proposed SB 1047
Zvi · 2024-05-02T15:10:02.916Z · comments (8)

Joshua Achiam Public Statement Analysis
Zvi · 2024-10-10T12:50:06.285Z · comments (14)

[link] Soft Nationalization: how the USG will control AI labs
Deric Cheng (deric-cheng) · 2024-08-27T15:11:14.601Z · comments (7)

The World in 2029
Nathan Young · 2024-03-02T18:03:29.368Z · comments (37)

On Dwarkesh’s Podcast with OpenAI’s John Schulman
Zvi · 2024-05-21T17:30:04.332Z · comments (4)

A Gentle Introduction to Risk Frameworks Beyond Forecasting
pendingsurvival · 2024-04-11T18:03:25.605Z · comments (10)

Companies' safety plans neglect risks from scheming AI
Zach Stein-Perlman · 2024-06-03T15:00:20.236Z · comments (4)

The One and a Half Gemini
Zvi · 2024-02-22T13:10:04.725Z · comments (4)

AI for Bio: State Of The Field
sarahconstantin · 2024-08-30T18:00:02.187Z · comments (2)

[link] Nick Bostrom’s new book, “Deep Utopia”, is out today
PeterH · 2024-03-27T11:24:01.401Z · comments (5)

Scalable Oversight and Weak-to-Strong Generalization: Compatible approaches to the same problem
Ansh Radhakrishnan (anshuman-radhakrishnan-1) · 2023-12-16T05:49:23.672Z · comments (3)

[link] A Narrow Path: a plan to deal with AI extinction risk
Andrea_Miotti (AndreaM) · 2024-10-07T13:02:15.229Z · comments (11)

[question] Interest in Leetcode, but for Rationality?
Gregory (gregory-eales) · 2024-10-16T17:54:25.578Z · answers+comments (20)

[Full Post] Progress Update #1 from the GDM Mech Interp Team
Neel Nanda (neel-nanda-1) · 2024-04-19T19:06:59.185Z · comments (10)

Prompts for Big-Picture Planning
Raemon · 2024-04-13T03:04:24.523Z · comments (1)

In Defense of Open-Minded UDT
abramdemski · 2024-08-12T18:27:36.220Z · comments (27)

[link] LK-99 in retrospect
bhauth · 2024-07-07T02:06:27.660Z · comments (21)

The case for a negative alignment tax
Cameron Berg (cameron-berg) · 2024-09-18T18:33:18.491Z · comments (20)

AXRP Episode 31 - Singular Learning Theory with Daniel Murfet
DanielFilan · 2024-05-07T03:50:05.001Z · comments (4)

Announcing Suffering For Good
Garrett Baker (D0TheMath) · 2024-04-01T17:08:12.322Z · comments (5)

When "yang" goes wrong
Joe Carlsmith (joekc) · 2024-01-08T16:35:50.607Z · comments (6)

D&D.Sci Scenario Index
aphyer · 2024-07-23T02:00:43.483Z · comments (0)

[link] Excerpts from "A Reader's Manifesto"
Arjun Panickssery (arjun-panickssery) · 2024-09-06T22:37:40.254Z · comments (1)

Claude 3 claims it's conscious, doesn't want to die or be modified
Mikhail Samin (mikhail-samin) · 2024-03-04T23:05:00.376Z · comments (113)

Do sparse autoencoders find "true features"?
Demian Till · 2024-02-22T18:06:59.630Z · comments (33)

Guide to SB 1047
Zvi · 2024-08-20T13:10:07.408Z · comments (18)

Survey for alignment researchers!
Cameron Berg (cameron-berg) · 2024-02-02T20:41:44.323Z · comments (11)

LW Frontpage Experiments! (aka "Take the wheel, Shoggoth!")
Ruby · 2024-04-23T03:58:43.443Z · comments (27)

The Mask Comes Off: At What Price?
Zvi · 2024-10-21T23:50:05.247Z · comments (16)

We need a Science of Evals
Marius Hobbhahn (marius-hobbhahn) · 2024-01-22T20:30:39.493Z · comments (13)

FarmKind's Illusory Offer
jefftk (jkaufman) · 2024-08-09T11:30:07.082Z · comments (5)

Epistemic Hell
rogersbacon · 2024-01-27T17:13:09.578Z · comments (20)

[link] Yoshua Bengio: Reasoning through arguments against taking AI safety seriously
Judd Rosenblatt (judd) · 2024-07-11T23:53:17.187Z · comments (3)

[link] If far-UV is so great, why isn't it everywhere?
Austin Chen (austin-chen) · 2024-10-19T18:56:58.910Z · comments (23)

[link] The True Story of How GPT-2 Became Maximally Lewd
Writer · 2024-01-18T21:03:08.167Z · comments (7)

Transcoders enable fine-grained interpretable circuit analysis for language models
Jacob Dunefsky (jacob-dunefsky) · 2024-04-30T17:58:09.982Z · comments (14)

[link] [Link Post] "Foundational Challenges in Assuring Alignment and Safety of Large Language Models"
David Scott Krueger (formerly: capybaralet) (capybaralet) · 2024-06-06T18:55:09.151Z · comments (2)

If we solve alignment, do we die anyway?
Seth Herd · 2024-08-23T13:13:10.933Z · comments (108)

Dumbing down
Martin Sustrik (sustrik) · 2024-06-09T06:50:47.469Z · comments (0)

[link] OpenAI: Preparedness framework
Zach Stein-Perlman · 2023-12-18T18:30:10.153Z · comments (23)

Instruction-following AGI is easier and more likely than value aligned AGI
Seth Herd · 2024-05-15T19:38:03.185Z · comments (25)

Update on Chinese IQ-related gene panels
Lao Mein (derpherpize) · 2023-12-14T10:12:21.212Z · comments (7)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

viliam on A few questions about recent developments in EA

Some orgs did that and it generally didn't go well (eg Leverage Research). I think most people believe that totalizing jobs are bad for mental health and create bad epistemics and it's not worth it.

Working hard together with similarly minded people seems great. Never taking a break, and isolating yourself from the world, is not.

People working at startups usually get at least free weekends, and often have a partner at home who is not a member of the startup. If you never take a break, I suspect that you are optimizing for appearing to work hard, rather than for actually being productive.

viliam on A few questions about recent developments in EA

I have read the "TESCREAL" paper recently, and wrote some thoughts about it in an ACX Open Thread.

It also gave me conspiracy theory vibes, as it tried too hard to connect together various groups and people that are parts of the sinister-sounding "TESCREAL" (including a table of individuals and organizations involved in various parts), trace their roots back to eugenicists (but also Plato and Aristotle), and warn about their wealth and influence.

It reminded me how some people in my country love to compile lists of people working at various non-profits to prove how this is all linked to Soros and how they are all servants of American propaganda trying to destroy our independence. Because apparently you cannot volunteer in a shelter for abandoned puppies without being a part of some larger sinister plot.

From the Dark Arts perspective, I think it would be useful to sigh and say "oh, this conspiracy theory again?" to signal that you consider the authors low-status. But then focus on the object-level objections.

The actual objection, from my perspective, is that the thing that connects the parts of the "TESCREAL" is simply "nerds who care, and think that technology is the answer". Some parts are more strongly related; if you believe in technological progress, then longtermism and transhumanism and extropianism and cosmism are more or less the same thing, the belief that in future, humans will overcome their current limitations using technology. That should not really come as huge a surprise for anyone.

The connection with EA is cherry-picking; yes, there are some longtermist projects, but most of it is stuff like curing malaria. But of course, you can't say that, if your agenda is to call them ~~Nazis~~ eugenicists.

And the connection with eugenicists is mostly "you know who else worried about the future of humanity?" (I find it difficult to think of a more appropriate response than "fuck you!") But also, speaking about intelligence is a taboo, which means that it is a taboo to worry about artificial intelligences becoming potentially smarter than humans. -- Here, I think a potential solution would be to push the authors towards making some object-level statements. Not just "people who say X are like ~~Hitler~~ eugenicists", but state your opinion clearly, whether it is "X" or "not X"; make a falsifiable statement.

But I think it is not too uncharitable to summarize the paper as "a conspiracy theory claiming that people who donate money to African charities that cure malaria are secretly eugenicists", because that is an important part of the "TESCR-EA-L" construct.

sharmake-farah on Benito's Shortform Feed

I'd say that the reason why the SpaceX cult/business can actually make working rockets is because they have rich feedback from reality when they try to design rockets, even at the pre-testing stage, because while it's not obvious to a layperson if a rocket does work, it is relatively easy to check the physics of whether a new rocket does work for an expert, meaning the checking of claims can be made legible, which is an enemy to cults in general.

More generally, I'd say the difference between a cult and a high-impact startup/business is whether they can get rich and reliable feedback from a source, and secondarily how legible their theory of impact/claims are.

Bigness alone doesn't cut it.

philh on Economics101 predicted the failure of special card payments for refugees, 3 months later whole of Germany wants to adopt it

I don't know anything about the card. I haven't re-read the post, but I think the point I was making was "you haven't successfully argued that this is good cost-benefit", not "I claim that this is bad cost-benefit". Another possibility is that I was just pointing out that the specific quoted paragraph had an implied bad argument, but I didn't think it said much about the post overall.

turntrout on Announcing turntrout.com, my new digital home

(I think individual FB questions can toggle whether to show/hide predictions before you've made your own)

I think it should be hidden by default in the editor, with a user-side setting to show by default for all questions.

alexander-gietelink-oldenziel on Why I Think All The Species Of Significantly Debated Consciousness Are Conscious And Suffer Intensely

Suppose one buys your thesis that most or all animals are conscious and feel intense pain. What is to be done ? Upload the shrimp ?

neel-nanda-1 on Mechanistic Interpretability of Llama 3.2 with Sparse Autoencoders

Cool project! Thanks for doing it and sharing, great to see more models with SAEs

interpretability research on proprietary LLMs that was quite popular this year and great research papers by Anthropic[1][2], OpenAI[3][4] and Google Deepmind

I run the Google DeepMind team, and just wanted to clarify that our work was not on proprietary closed weight models, but instead on Gemma 2, as were our open weight SAEs - Gemma 2 is about as open as llama imo. We try to use open models wherever possible for these general reasons of good scientific practice, ease of replicability, etc. Though we couldn't open source the data, and didn't go to the effort of open sourcing the code, so I don't think they can be considered true open source. OpenAI did most of their work on gpt2, and only did their large scale experiment on GPT4 I believe. All Anthropic work I'm aware of is on proprietary models, alas.

richard_kennaway on How Universal Basic Income Could Help Us Build a Brighter Future

The style vaguely feels like something ChatGPT might right. Brightly polished, safe and stale.

It is definitely ChatGPT. There are a lot of things in the essay that make no sense the moment you stop and think about what is actually being said [LW · GW]. For example:

At its core, UBI is about ensuring that everyone has the financial resources to meet their basic needs.

Not "at its core". That is what UBI is.

For businesses, UBI provides a stable customer base...

A customer base for buying basic necessities, but not for anything above that, like a shiny new games console. And a customer base for basic necessities already exists. Broadly speaking (a glance at Wikipedia), in the developed world it falls about 10 to 20% short of being the entire population, and there are typically government programs of some sort to assist most of them.

...and a workforce

How does UBI provide a workforce? UBI pays people whether they work or not. That's what the U means. One of the motivations for UBI is a predicted lack of any useful employment for large numbers of people in the near future.

By investing in UBI, businesses can

How does a business "invest in UBI"? UBI is paid by the government out of taxes.

The beauty of UBI lies in its potential to align individual aspirations with collective progress. By ensuring that basic needs are met, we free people to contribute their skills and energy to areas where they’re most needed

People will already pay people to do the work that they need done. Is it envisaged that under UBI, people will joyfully "contribute their skills and energy" without pay, at whatever work someone has judged to be "needed"? I don't know, but the more I look at this passage the more the apparent meaning drains out of it. There is nothing here but hurrah words. There is nothing in the whole essay.

anders-lindstroem on A very strange probability paradox

The thing is that, if you roll a 6 and then a non-6, in an "A" sequence you're likely to just die due to rolling an odd number before you succeed in getting the double 6, and thus exclude the sequence from the surviving set; whereas in a "B" sequence there's a much higher chance you'll roll a 6 before dying, and thus include this longer "sequence of 3+ rolls" in the set.

Yes! This kind of kills the "paradox". Its approaching an apples and oranges comparison.

Surviving sequences with n=100 rolls (for illustrative purposes)

[6, 6]
[6, 6]
[2, 6, 6]
[6, 6]
[2, 6, 6]
[6, 6]
Estimate for A: 2.333
[6, 6]
[4, 4, 6, 2, 2, 6]
[6, 6]
[6, 2, 4, 4, 6]
[6, 4, 6]
[4, 4, 6, 4, 6]
[6, 6]
[6, 6]
Estimate for B: 3.375

if you rephrase

: The probability that you roll a fair die until you roll two $6 s$ in a row, given that all rolls were even.

$B$ : The probability that you roll a fair die until you roll two non-consecutive $6 s$ (not necessarily in a row), given that all rolls were even.

This changes the code to:

A_estimate = num_sequences_without_odds/n

B_estimate = num_sequences_without_odds/n

And the result (n=100000)

Estimate for A: 0.045
Estimate for B: 0.062

I guess this is what most people where thinking when reading the problem, i.e., its a bigger chance of getting two non consecutive 6s. But by the wording (see above) of the "paradox" it gives more rolls on average for the surviving sequences, but you on the other hand have more surviving sequences hence higher probability.

viliam on koratkar's Shortform

Sometimes the thing that seems like zero-sum between two players actually has a third player, let's call them "audience" or "environment", and the payout is different when you include those. Two people trying to win a tennis match provide entertainment for the audience. Also, in short term, one of the players wins and the other one loses, but in long term, both have practiced their skills and had some healthy exercise.

Status seeking is immoral when it comes to conflict with doing the right thing. Sometimes that means cheating to appear better than you actually are. Sometimes it means generating negative externalities.

But in a healthy environment, social status can be a way to recognize and reward doing the right thing.