LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Avoiding the Bog of Moral Hazard for AI
Nathan Helm-Burger (nathan-helm-burger) · 2024-09-13T21:24:34.137Z · comments (12)

[link] Towards the Operationalization of Philosophy & Wisdom
Thane Ruthenis · 2024-10-28T19:45:07.571Z · comments (2)

"Real AGI"
Seth Herd · 2024-09-13T14:13:24.124Z · comments (20)

[link] Why Swiss watches and Taylor Swift are AGI-proof
Kevin Kohler (KevinKohler) · 2024-09-05T13:23:27.033Z · comments (11)

"Which Future Mind is Me?" Is a Question of Values
dadadarren · 2024-08-09T18:17:09.884Z · comments (12)

Automating LLM Auditing with Developmental Interpretability
htlou · 2024-09-04T15:50:04.337Z · comments (0)

[question] Is there any rigorous work on using anthropic uncertainty to prevent situational awareness / deception?
David Scott Krueger (formerly: capybaralet) (capybaralet) · 2024-09-04T12:40:07.678Z · answers+comments (7)

Training a Sparse Autoencoder in < 30 minutes on 16GB of VRAM using an S3 cache
Louka Ewington-Pitsos (louka-ewington-pitsos) · 2024-08-24T07:39:00.057Z · comments (0)

Graceful Degradation
Screwtape · 2024-11-05T23:57:53.362Z · comments (0)

Invitation to lead a project at AI Safety Camp (Virtual Edition, 2025)
Linda Linsefors · 2024-08-23T14:18:24.327Z · comments (2)

Bridging the VLM and mech interp communities for multimodal interpretability
Sonia Joseph (redhat) · 2024-10-28T14:41:41.969Z · comments (5)

[link] some questionable space launch guns
bhauth · 2024-10-13T22:52:26.418Z · comments (0)

[link] Four Levels of Voting Methods
hive · 2024-09-26T18:15:00.565Z · comments (3)

[link] Instruction Following without Instruction Tuning
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-09-24T13:49:09.078Z · comments (0)

Is Text Watermarking a lost cause?
egor.timatkov · 2024-10-01T16:20:51.113Z · comments (13)

[link] Will we ever run out of new jobs?
Kevin Kohler (KevinKohler) · 2024-08-19T15:04:03.849Z · comments (7)

[link] Update on the Mysterious Trump Buyers on Polymarket
Annapurna (jorge-velez) · 2024-11-04T19:22:06.540Z · comments (9)

[link] AlignedCut: Visual Concepts Discovery on Brain-Guided Universal Feature Space
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-09-14T23:23:26.296Z · comments (1)

OpenAI defected, but we can take honest actions
Remmelt (remmelt-ellen) · 2024-10-21T08:41:25.728Z · comments (15)

My career exploration: Tools for building confidence
lynettebye · 2024-09-13T11:37:55.843Z · comments (0)

[link] Jonothan Gorard:The territory is isomorphic to an equivalence class of its maps
Daniel C (harper-owen) · 2024-09-07T10:04:47.840Z · comments (18)

[question] Is this voting system strategy proof?
Donald Hobson (donald-hobson) · 2024-09-06T20:44:46.691Z · answers+comments (9)

[link] My lukewarm take on GLP-1 agonists
George3d6 · 2024-08-26T12:34:27.929Z · comments (0)

Interview with Robert Kralisch on Simulators
WillPetillo · 2024-08-26T05:49:15.543Z · comments (0)

Reducing global AI competition through the Commerce Control List and Immigration reform: a dual-pronged approach
Ben Smith (ben-smith) · 2024-09-03T05:28:24.549Z · comments (2)

Slave Morality: A place for every man and every man in his place
Martin Sustrik (sustrik) · 2024-09-19T04:20:04.491Z · comments (7)

Hiring a writer to co-author with me (Spencer Greenberg for ClearerThinking.org)
spencerg · 2024-10-27T17:34:50.479Z · comments (0)

[question] Does the "ancient wisdom" argument have any validity? If a particular teaching or tradition is old, to what extent does this make it more trustworthy?
SpectrumDT · 2024-11-04T15:20:14.822Z · answers+comments (11)

[link] Non-Transactional Compliments
Jonathan Moregård (JonathanMoregard) · 2024-08-09T13:42:16.471Z · comments (0)

Review: Dr Stone
ProgramCrafter (programcrafter) · 2024-09-29T10:35:53.175Z · comments (5)

Thinking in 2D
sarahconstantin · 2024-10-20T19:30:05.842Z · comments (0)

[link] Why good things often don’t lead to better outcomes
DMMF · 2024-09-19T16:37:07.778Z · comments (1)

Appealing to the Public
jefftk (jkaufman) · 2024-10-23T19:00:07.669Z · comments (0)

[link] CultFrisbee
Gauraventh (aryangauravyadav) · 2024-08-11T21:36:36.550Z · comments (3)

[question] Is there a CFAR handbook audio option?
FinalFormal2 · 2024-10-26T17:08:36.480Z · answers+comments (0)

Physical Therapy Sucks (but have you tried hiding it in some peanut butter?)
Declan Molony (declan-molony) · 2024-09-10T05:54:47.000Z · comments (12)

Announcing the Ultimate Jailbreaking Championship
InnerHufflepuff (grayswan) · 2024-09-04T00:35:31.234Z · comments (1)

Simulation-aware causal decision theory: A case for one-boxing in CDT
kongus_bongus · 2024-08-09T18:09:20.013Z · comments (11)

[link] Pronouns are Annoying
ymeskhout · 2024-09-18T13:30:04.620Z · comments (21)

[link] Where is the Learn Everything System?
Shoshannah Tekofsky (DarkSym) · 2024-09-27T21:30:16.379Z · comments (8)

[link] Fragile, Robust, and Antifragile Preference Satisfaction
adamShimi · 2024-11-02T17:25:55.986Z · comments (0)

Join a LessWrong Team for the Unaging System Challenge
Crissman · 2024-10-23T06:01:08.018Z · comments (5)

Against Explosive Growth
c.trout (ctrout) · 2024-09-04T21:45:03.120Z · comments (1)

[link] The Ap Distribution
criticalpoints · 2024-08-24T21:45:35.029Z · comments (3)

Pomodoro Method Randomized Self Experiment
niplav · 2024-09-29T21:55:04.740Z · comments (2)

[question] Looking to interview AI Safety researchers for a book
jeffreycaruso · 2024-08-24T19:57:33.119Z · answers+comments (0)

Primary Perceptive Systems
ChristianKl · 2024-08-15T11:26:01.667Z · comments (2)

[link] Verification methods for international AI agreements
Akash (akash-wasil) · 2024-08-31T14:58:10.986Z · comments (1)

Funding for work that builds capacity to address risks from transformative AI
abergal · 2024-08-14T23:52:09.922Z · comments (0)

[link] A brief history of the automated corporation
owencb · 2024-11-04T14:35:04.906Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

scroogemcduck1 on Bets and updating

They don't need to solve the whole Halting Problem, for the same reason you don't need to contradict Rice's theorem if you had some proof (which I take as an axiom for the sake of the hypothetical) that the predictor was in fact perfect and that it is utility maximizing. Also, we can just try saying that there is a high probability that they will do this. Furthermore, you can imagine a restricted subset of Turing machines for which the Halting problem is computable. But also the only computers that exist in reality are really finite state machines.

scroogemcduck1 on Bets and updating

Well, the perplexing situation doesn't actually happen if the predictors are good enough, because they'll predict you both won't update and won't take the bet. Thus you'll never have been approached in the first place.

deepthoughtlife on Abstractions are not Natural

When reading the piece, it seemed to assume far too much (and many of the assumptions are ones I obviously disagree with). I would call many of the assumptions made to be a relative of the false dichotomy (though I don't know what it is called when you present more than two possibilities as exhaustive but they really aren't.) If you were more open in your writing to the idea that you don't necessarily know what the believers in natural abstractions mean, and that the possibilities mentioned were not exhaustive, I probably would have had a less negative reaction.

When combined with a dismissive tone, many (me included) will read it as hostile, regardless of actual intent (though frustration is actually just as good a possibility for why someone would write in that manner, and genuine confusion over what people believe is also likely). People are always on the lookout for potential hostility it seems (probably a safety related instinct) and usually err on the side of seeing it (though some overcorrect against the instinct instead).

I'm sure I come across as hostile when I write reasonably often though that is rarely my intent.

benito on Lighthaven Sequences Reading Group #9 (Tuesday 11/05)

I really enjoyed the discussion tonight. Something different about the mood in a warmly lit, cozy attic.

Here's a write-up [LW(p) · GW(p)] of a discussion we had that ended up with polling over 2,000 people on a question that was raised about the essay Are Your Enemies Innately Evil? [LW · GW].

benito on Are Your Enemies Innately Evil?

Realistically, most people don’t construct their life stories with themselves as the villains. Everyone is the hero of their own story.

At tonight's sequences-reading meetup [? · GW], I argued that while it is a mistake to think that people typically see themselves as villains, it is also a mistake to think that they typically view themselves as heroes. Most people don't have especially grand narratives, nor do they view themselves as very strongly moral in either direction (even though I believe there's a trend toward positive self-image).

To get a little data [LW · GW] on this question, resident Queen-of-Polls Aella polled over 2,000 people on the following question:

do you feel a sense of heroism - like righteous grand goodness - when it comes to your behavior or advocacy around politics, religion, or cultural opinions?
Yes very much
Sometimes/partially
Not really

Here are the (spoilered) results, after being up for a little over 2 hours and getting over 2,000 responses. Write down your predictions now if you want to actually test your models.

Yes very much: 12.1%
Sometimes/partially: 27.2%
Not really: 60.7%

My comments on the results:

Once Aella told me she had sent out the poll, when I queried my anticipations, I actually predicted differently to the direction of my argument. (I later noted it was similar to the person who believes there's a dragon in their garage but anticipates the flour falling to the ground [LW · GW]). Anyway, given this poll, I predicted

Yes = 30%
Sometimes = 50%
No = 20%

Whereas in fact it was much more in-line with the argument I was giving. Not sure what that says about my world-models!

habryka4 on The Shallow Bench

"stop reading here if you don't want to be spoiled."

(I added that sentence based on Jonathan Claybrough's comment, feel free to suggest an alternative one)

zhukeepa on Against empathy-by-default

“If I’m thinking about how to pick up tofu with a fork, I might analogize to how I might pick up feta with a fork, and so if tofu is yummy then I’ll get a yummy vibe and I’ll wind up feeling that feta is yummy too.”

Isn't the more analogous argument "If I'm thinking about how to pick up tofu with a fork, and it feels good when I imagine doing that, then when I analogize to picking up feta with a fork, it would also feel good when I imagine that"? This does seem valid to me, and also seems more analogous to the argument you'd compared the counter-to-common-sense second argument with:

“If I’m thinking about what someone else might do and feel in situation X by analogy to what I might do and feel in situation X, and then if situation X is unpleasant than that simulation will be unpleasant, and I’ll get a generally unpleasant feeling by doing that.”

karl-faulks on The Shallow Bench

Thanks! Sorry about that.

sahil-1 on Live Machinery: Interface Design Workshop for AI Safety @ EA Hotel

Thank you, Dusan!

Next time there will be more notice, and also a more refined workshop!

charlie-steiner on Another UFO Bet

Probably? But I'll feel bad if I don't try to talk you out of this first.

It's true that alien sightings, videos of UFOs, etc. are slowly accumulating evidence for alien visitors, even if each item has reasonable mundane excuses (e.g. 'the mysterious shape on the infrared footage was probably just a distant plane or missile' or 'the eyewitness probably lost track of time but has doubled-down on being confident they didn't'). However, all the time that passes without yet-stronger evidence for aliens is evidence against alien visitors.
- You could imagine aliens landing in the middle of the Superbowl, or sending us each a messenger drone, or the US government sending alien biological material to 50 different labs composed of hundreds of individual researchers, who hold seminars on what they're doing that you can watch on youtube. Every year nothing like this happens puts additional restrictions on alien-visitor hypotheses, which I think outweigh the slow trickle of evidence from very-hard-to-verify encounters. Relative to our informational state in the year 2000, alien visitors actually seem less likely to me.
- Imagine someone making a 5-year bet on whether we'd have publicly-replicable evidence of alien visitors, every 5 years since the 1947 Roswell news story. Like, really imagine losing this bet 15 times in a row. Even conditional on the aliens being out there, clearly there's a pretty good process keeping the truth from getting out, and you should be getting more and more confident that this process won't break down in the next 5 years.
Even if we're in a simulation, there is no particular reason for the simulators to be behind UAP.
- Like, suppose you're running an ancestor simulation of Earth. Maybe you're a historian interested in researching our response to perturbations to the timeline that you think really could have happened, or maybe you're trying to recreate a specific person to pull out of the simulation, or you're self-inserting to have lots of great sex, or self-inserting along with several of your friends to play some sort of decades-long game. Probably you have much better things to do with this simulation than inserting some hard-to-verify floating orbs into the atmosphere.
There is a 'UFO entertainment industry' that is creating an adverse information environment for us.
- E.g. Skinwalker Ranch is a place where some people have seen spooky stuff. But it's also a way to sell books and TV shows and get millions-of-dollars government grants, each of which involves quite a few people whose livelihood now depends not only on the spookiness of this one place, but more generally on how spooky claims are treated by the public and the US government.
- There's an analogy here to the NASA Artemis program, which involves a big web of contractors whose livelihoods depended on the space shuttle program. These contractors, and politicians working with them, and government managers who like being in charge of larger programs, all benefit from what we might call an "adverse information environment" regarding how valuable certain space programs are, how well they'll work, and how much they'll cost.