LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Anthropic Fall 2023 Debate Progress Update
Ansh Radhakrishnan (anshuman-radhakrishnan-1) · 2023-11-28T05:37:30.070Z · comments (9)

What is malevolence? On the nature, measurement, and distribution of dark traits
David Althaus (wallowinmaya) · 2024-10-23T08:41:33.197Z · comments (13)

Mistakes people make when thinking about units
Isaac King (KingSupernova) · 2024-06-25T03:39:20.138Z · comments (14)

(Not) Derailing the LessOnline Puzzle Hunt
Error · 2024-06-04T01:28:31.688Z · comments (2)

On the UK Summit
Zvi · 2023-11-07T13:10:04.895Z · comments (6)

Q&A on Proposed SB 1047
Zvi · 2024-05-02T15:10:02.916Z · comments (8)

Could randomly choosing people to serve as representatives lead to better government?
John Huang · 2024-10-21T17:10:20.920Z · comments (12)

[link] MIRI's June 2024 Newsletter
Harlan · 2024-06-14T23:02:23.721Z · comments (18)

SAE-VIS: Announcement Post
CallumMcDougall (TheMcDouglas) · 2024-03-31T15:30:49.079Z · comments (8)

Introducing Transluce — A Letter from the Founders
jsteinhardt · 2024-10-23T18:10:02.526Z · comments (2)

Scalable Oversight and Weak-to-Strong Generalization: Compatible approaches to the same problem
Ansh Radhakrishnan (anshuman-radhakrishnan-1) · 2023-12-16T05:49:23.672Z · comments (3)

[link] A Narrow Path: a plan to deal with AI extinction risk
Andrea_Miotti (AndreaM) · 2024-10-07T13:02:15.229Z · comments (10)

Interpreting Preference Models w/ Sparse Autoencoders
Logan Riggs (elriggs) · 2024-07-01T21:35:40.603Z · comments (12)

[Full Post] Progress Update #1 from the GDM Mech Interp Team
Neel Nanda (neel-nanda-1) · 2024-04-19T19:06:59.185Z · comments (10)

[link] Soft Nationalization: how the USG will control AI labs
Deric Cheng (deric-cheng) · 2024-08-27T15:11:14.601Z · comments (7)

[link] Nick Bostrom’s new book, “Deep Utopia”, is out today
PeterH · 2024-03-27T11:24:01.401Z · comments (5)

The World in 2029
Nathan Young · 2024-03-02T18:03:29.368Z · comments (37)

A Gentle Introduction to Risk Frameworks Beyond Forecasting
pendingsurvival · 2024-04-11T18:03:25.605Z · comments (10)

Companies' safety plans neglect risks from scheming AI
Zach Stein-Perlman · 2024-06-03T15:00:20.236Z · comments (4)

JargonBot Beta Test
Raemon · 2024-11-01T01:05:26.552Z · comments (52)

On Dwarkesh’s Podcast with OpenAI’s John Schulman
Zvi · 2024-05-21T17:30:04.332Z · comments (4)

The One and a Half Gemini
Zvi · 2024-02-22T13:10:04.725Z · comments (4)

Joshua Achiam Public Statement Analysis
Zvi · 2024-10-10T12:50:06.285Z · comments (14)

Dialogue on the Claim: "OpenAI's Firing of Sam Altman (And Shortly-Subsequent Events) On Net Reduced Existential Risk From AGI"
johnswentworth · 2023-11-21T17:39:17.828Z · comments (84)

In Defense of Open-Minded UDT
abramdemski · 2024-08-12T18:27:36.220Z · comments (27)

[link] LK-99 in retrospect
bhauth · 2024-07-07T02:06:27.660Z · comments (21)

D&D.Sci Scenario Index
aphyer · 2024-07-23T02:00:43.483Z · comments (0)

[link] Excerpts from "A Reader's Manifesto"
Arjun Panickssery (arjun-panickssery) · 2024-09-06T22:37:40.254Z · comments (1)

Prompts for Big-Picture Planning
Raemon · 2024-04-13T03:04:24.523Z · comments (1)

Claude 3 claims it's conscious, doesn't want to die or be modified
Mikhail Samin (mikhail-samin) · 2024-03-04T23:05:00.376Z · comments (113)

Announcing Suffering For Good
Garrett Baker (D0TheMath) · 2024-04-01T17:08:12.322Z · comments (5)

[question] Interest in Leetcode, but for Rationality?
Gregory (gregory-eales) · 2024-10-16T17:54:25.578Z · answers+comments (20)

When "yang" goes wrong
Joe Carlsmith (joekc) · 2024-01-08T16:35:50.607Z · comments (6)

AXRP Episode 31 - Singular Learning Theory with Daniel Murfet
DanielFilan · 2024-05-07T03:50:05.001Z · comments (4)

AI for Bio: State Of The Field
sarahconstantin · 2024-08-30T18:00:02.187Z · comments (2)

Guide to SB 1047
Zvi · 2024-08-20T13:10:07.408Z · comments (18)

FarmKind's Illusory Offer
jefftk (jkaufman) · 2024-08-09T11:30:07.082Z · comments (5)

Survey for alignment researchers!
Cameron Berg (cameron-berg) · 2024-02-02T20:41:44.323Z · comments (11)

Some Rules for an Algebra of Bayes Nets
johnswentworth · 2023-11-16T23:53:11.650Z · comments (31)

Testbed evals: evaluating AI safety even when it can’t be directly measured
joshc (joshua-clymer) · 2023-11-15T19:00:41.908Z · comments (2)

The Mask Comes Off: At What Price?
Zvi · 2024-10-21T23:50:05.247Z · comments (16)

We need a Science of Evals
Marius Hobbhahn (marius-hobbhahn) · 2024-01-22T20:30:39.493Z · comments (13)

Do sparse autoencoders find "true features"?
Demian Till · 2024-02-22T18:06:59.630Z · comments (33)

LW Frontpage Experiments! (aka "Take the wheel, Shoggoth!")
Ruby · 2024-04-23T03:58:43.443Z · comments (27)

Dumbing down
Martin Sustrik (sustrik) · 2024-06-09T06:50:47.469Z · comments (0)

[link] OpenAI: Preparedness framework
Zach Stein-Perlman · 2023-12-18T18:30:10.153Z · comments (23)

[link] If far-UV is so great, why isn't it everywhere?
Austin Chen (austin-chen) · 2024-10-19T18:56:58.910Z · comments (23)

[link] Yoshua Bengio: Reasoning through arguments against taking AI safety seriously
Judd Rosenblatt (judd) · 2024-07-11T23:53:17.187Z · comments (3)

[link] The True Story of How GPT-2 Became Maximally Lewd
Writer · 2024-01-18T21:03:08.167Z · comments (7)

“Artificial General Intelligence”: an extremely brief FAQ
Steven Byrnes (steve2152) · 2024-03-11T17:49:02.496Z · comments (6)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

scroogemcduck1 on Bets and updating

They don't need to solve the whole Halting Problem, for the same reason you don't need to contradict Rice's theorem if you had some proof (which I take as an axiom for the sake of the hypothetical) that the predictor was in fact perfect and that it is utility maximizing. Also, we can just try saying that there is a high probability that they will do this. Furthermore, you can imagine a restricted subset of Turing machines for which the Halting problem is computable. But also the only computers that exist in reality are really finite state machines.

scroogemcduck1 on Bets and updating

Well, the perplexing situation doesn't actually happen if the predictors are good enough, because they'll predict you both won't update and won't take the bet. Thus you'll never have been approached in the first place.

deepthoughtlife on Abstractions are not Natural

When reading the piece, it seemed to assume far too much (and many of the assumptions are ones I obviously disagree with). I would call many of the assumptions made to be a relative of the false dichotomy (though I don't know what it is called when you present more than two possibilities as exhaustive but they really aren't.) If you were more open in your writing to the idea that you don't necessarily know what the believers in natural abstractions mean, and that the possibilities mentioned were not exhaustive, I probably would have had a less negative reaction.

When combined with a dismissive tone, many (me included) will read it as hostile, regardless of actual intent (though frustration is actually just as good a possibility for why someone would write in that manner, and genuine confusion over what people believe is also likely). People are always on the lookout for potential hostility it seems (probably a safety related instinct) and usually err on the side of seeing it (though some overcorrect against the instinct instead).

I'm sure I come across as hostile when I write reasonably often though that is rarely my intent.

benito on Lighthaven Sequences Reading Group #9 (Tuesday 11/05)

I really enjoyed the discussion tonight. Something different about the mood in a warmly lit, cozy attic.

Here's a write-up [LW(p) · GW(p)] of a discussion we had that ended up with polling over 2,000 people on a question that was raised about the essay Are Your Enemies Innately Evil? [LW · GW].

benito on Are Your Enemies Innately Evil?

Realistically, most people don’t construct their life stories with themselves as the villains. Everyone is the hero of their own story.

At tonight's sequences-reading meetup [? · GW], I argued that while it is a mistake to think that people typically see themselves as villains, it is also a mistake to think that they typically view themselves as heroes. Most people don't have especially grand narratives, nor do they view themselves as very strongly moral in either direction (even though I believe there's a trend toward positive self-image).

To get a little data [LW · GW] on this question, resident Queen-of-Polls Aella polled over 2,000 people on the following question:

do you feel a sense of heroism - like righteous grand goodness - when it comes to your behavior or advocacy around politics, religion, or cultural opinions?
Yes very much
Sometimes/partially
Not really

Here are the (spoilered) results, after being up for a little over 2 hours and getting over 2,000 responses. Write down your predictions now if you want to actually test your models.

Yes very much: 12.1%
Sometimes/partially: 27.2%
Not really: 60.7%

My comments on the results:

Once Aella told me she had sent out the poll, when I queried my anticipations, I actually predicted differently to the direction of my argument. (I later noted it was similar to the person who believes there's a dragon in their garage but anticipates the flour falling to the ground [LW · GW]). Anyway, given this poll, I predicted

Yes = 30%
Sometimes = 50%
No = 20%

Whereas in fact it was much more in-line with the argument I was giving. Not sure what that says about my world-models!

habryka4 on The Shallow Bench

"stop reading here if you don't want to be spoiled."

(I added that sentence based on Jonathan Claybrough's comment, feel free to suggest an alternative one)

zhukeepa on Against empathy-by-default

“If I’m thinking about how to pick up tofu with a fork, I might analogize to how I might pick up feta with a fork, and so if tofu is yummy then I’ll get a yummy vibe and I’ll wind up feeling that feta is yummy too.”

Isn't the more analogous argument "If I'm thinking about how to pick up tofu with a fork, and it feels good when I imagine doing that, then when I analogize to picking up feta with a fork, it would also feel good when I imagine that"? This does seem valid to me, and also seems more analogous to the argument you'd compared the counter-to-common-sense second argument with:

“If I’m thinking about what someone else might do and feel in situation X by analogy to what I might do and feel in situation X, and then if situation X is unpleasant than that simulation will be unpleasant, and I’ll get a generally unpleasant feeling by doing that.”

karl-faulks on The Shallow Bench

Thanks! Sorry about that.

sahil-1 on Live Machinery: Interface Design Workshop for AI Safety @ EA Hotel

Thank you, Dusan!

Next time there will be more notice, and also a more refined workshop!

charlie-steiner on Another UFO Bet

Probably? But I'll feel bad if I don't try to talk you out of this first.

It's true that alien sightings, videos of UFOs, etc. are slowly accumulating evidence for alien visitors, even if each item has reasonable mundane excuses (e.g. 'the mysterious shape on the infrared footage was probably just a distant plane or missile' or 'the eyewitness probably lost track of time but has doubled-down on being confident they didn't'). However, all the time that passes without yet-stronger evidence for aliens is evidence against alien visitors.
- You could imagine aliens landing in the middle of the Superbowl, or sending us each a messenger drone, or the US government sending alien biological material to 50 different labs composed of hundreds of individual researchers, who hold seminars on what they're doing that you can watch on youtube. Every year nothing like this happens puts additional restrictions on alien-visitor hypotheses, which I think outweigh the slow trickle of evidence from very-hard-to-verify encounters. Relative to our informational state in the year 2000, alien visitors actually seem less likely to me.
- Imagine someone making a 5-year bet on whether we'd have publicly-replicable evidence of alien visitors, every 5 years since the 1947 Roswell news story. Like, really imagine losing this bet 15 times in a row. Even conditional on the aliens being out there, clearly there's a pretty good process keeping the truth from getting out, and you should be getting more and more confident that this process won't break down in the next 5 years.
Even if we're in a simulation, there is no particular reason for the simulators to be behind UAP.
- Like, suppose you're running an ancestor simulation of Earth. Maybe you're a historian interested in researching our response to perturbations to the timeline that you think really could have happened, or maybe you're trying to recreate a specific person to pull out of the simulation, or you're self-inserting to have lots of great sex, or self-inserting along with several of your friends to play some sort of decades-long game. Probably you have much better things to do with this simulation than inserting some hard-to-verify floating orbs into the atmosphere.
There is a 'UFO entertainment industry' that is creating an adverse information environment for us.
- E.g. Skinwalker Ranch is a place where some people have seen spooky stuff. But it's also a way to sell books and TV shows and get millions-of-dollars government grants, each of which involves quite a few people whose livelihood now depends not only on the spookiness of this one place, but more generally on how spooky claims are treated by the public and the US government.
- There's an analogy here to the NASA Artemis program, which involves a big web of contractors whose livelihoods depended on the space shuttle program. These contractors, and politicians working with them, and government managers who like being in charge of larger programs, all benefit from what we might call an "adverse information environment" regarding how valuable certain space programs are, how well they'll work, and how much they'll cost.