LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[Intuitive self-models] 1. Preliminaries
Steven Byrnes (steve2152) · 2024-09-19T13:45:27.976Z · comments (20)

[link] [Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Teun van der Weij (teun-van-der-weij) · 2024-06-13T10:04:49.556Z · comments (10)

[link] [Paper] Stress-testing capability elicitation with password-locked models
Fabien Roger (Fabien) · 2024-06-04T14:52:50.204Z · comments (10)

[link] Hardshipification
Jonathan Moregård (JonathanMoregard) · 2024-05-28T20:02:29.709Z · comments (17)

Why you should be using a retinoid
GeneSmith · 2024-08-19T03:07:41.722Z · comments (57)

[link] What Depression Is Like
Sable · 2024-08-27T17:43:22.549Z · comments (23)

Retirement Accounts and Short Timelines
jefftk (jkaufman) · 2024-02-19T18:50:05.231Z · comments (35)

[link] What are you getting paid in?
Austin Chen (austin-chen) · 2024-07-17T19:23:04.219Z · comments (14)

Actually, Power Plants May Be an AI Training Bottleneck.
Lao Mein (derpherpize) · 2024-06-20T04:41:33.567Z · comments (13)

Sparse Autoencoders Work on Attention Layer Outputs
Connor Kissane (ckkissane) · 2024-01-16T00:26:14.767Z · comments (9)

AI #51: Altman’s Ambition
Zvi · 2024-02-20T19:50:07.439Z · comments (5)

OpenAI o1, Llama 4, and AlphaZero of LLMs
Vladimir_Nesov · 2024-09-14T21:27:41.241Z · comments (24)

Self-prediction acts as an emergent regularizer
Cameron Berg (cameron-berg) · 2024-10-23T22:27:03.664Z · comments (4)

Agent Boundaries Aren't Markov Blankets. [Unless they're non-causal; see comments.]
abramdemski · 2023-11-20T18:23:40.443Z · comments (11)

Release: Optimal Weave (P1): A Prototype Cohabitive Game
mako yass (MakoYass) · 2024-08-17T14:08:18.947Z · comments (21)

Coup probes: Catching catastrophes with probes trained off-policy
Fabien Roger (Fabien) · 2023-11-17T17:58:28.687Z · comments (7)

AI #83: The Mask Comes Off
Zvi · 2024-09-26T12:00:08.689Z · comments (19)

Some Vacation Photos
johnswentworth · 2024-01-04T17:15:01.187Z · comments (0)

[link] Essay competition on the Automation of Wisdom and Philosophy — $25k in prizes
owencb · 2024-04-16T10:10:13.338Z · comments (12)

Refusal mechanisms: initial experiments with Llama-2-7b-chat
Andy Arditi (andy-arditi) · 2023-12-08T17:08:01.250Z · comments (7)

Bostrom Goes Unheard
Zvi · 2023-11-13T14:11:07.586Z · comments (9)

Constructability: Plainly-coded AGIs may be feasible in the near future
Épiphanie Gédéon (joy_void_joy) · 2024-04-27T16:04:45.894Z · comments (13)

AISafety.com – Resources for AI Safety
Søren Elverlin (soren-elverlin-1) · 2024-05-17T15:57:11.712Z · comments (3)

An Introduction To The Mandelbrot Set That Doesn't Mention Complex Numbers
Yitz (yitz) · 2024-01-17T09:48:07.930Z · comments (11)

My Criticism of Singular Learning Theory
Joar Skalse (Logical_Lunatic) · 2023-11-19T15:19:16.874Z · comments (56)

Decomposing the QK circuit with Bilinear Sparse Dictionary Learning
keith_wynroe · 2024-07-02T13:17:16.352Z · comments (7)

[link] Palworld development blog post
bhauth · 2024-01-28T05:56:19.984Z · comments (12)

[link] New voluntary commitments (AI Seoul Summit)
Zach Stein-Perlman · 2024-05-21T11:00:41.794Z · comments (17)

Values Are Real Like Harry Potter
johnswentworth · 2024-10-09T23:42:24.724Z · comments (17)

Studying The Alien Mind
Quentin FEUILLADE--MONTIXI (quentin-feuillade-montixi) · 2023-12-05T17:27:28.049Z · comments (10)

3C's: A Recipe For Mathing Concepts
johnswentworth · 2024-07-03T01:06:11.944Z · comments (5)

Survey of 2,778 AI authors: six parts in pictures
KatjaGrace · 2024-01-06T04:43:34.590Z · comments (1)

The Gemini Incident
Zvi · 2024-02-22T21:00:04.594Z · comments (19)

Self-Referential Probabilistic Logic Admits the Payor's Lemma
Yudhister Kumar (randomwalks) · 2023-11-28T10:27:29.029Z · comments (14)

[link] Not every accommodation is a Curb Cut Effect: The Handicapped Parking Effect, the Clapper Effect, and more
Michael Cohn (michael-cohn) · 2024-09-15T05:27:36.691Z · comments (39)

Announcing Athena - Women in AI Alignment Research
Claire Short (claire-short) · 2023-11-07T21:46:41.741Z · comments (2)

[Intuitive self-models] 2. Conscious Awareness
Steven Byrnes (steve2152) · 2024-09-25T13:29:02.820Z · comments (48)

How to prevent collusion when using untrusted models to monitor each other
Buck · 2024-09-25T18:58:20.693Z · comments (5)

Quick look: applications of chaos theory
Elizabeth (pktechgirl) · 2024-08-18T15:00:07.853Z · comments (45)

[link] My thesis (Algorithmic Bayesian Epistemology) explained in more depth
Eric Neyman (UnexpectedValues) · 2024-05-09T19:43:16.543Z · comments (4)

LessWrong Community Weekend 2024, open for applications
UnplannedCauliflower · 2024-05-01T10:18:21.992Z · comments (2)

New report: "Scheming AIs: Will AIs fake alignment during training in order to get power?"
Joe Carlsmith (joekc) · 2023-11-15T17:16:42.088Z · comments (26)

The case for a negative alignment tax
Cameron Berg (cameron-berg) · 2024-09-18T18:33:18.491Z · comments (20)

Thomas Kwa's research journal
Thomas Kwa (thomas-kwa) · 2023-11-23T05:11:08.907Z · comments (1)

[link] MIRI's May 2024 Newsletter
Harlan · 2024-05-15T00:13:30.153Z · comments (1)

EU policymakers reach an agreement on the AI Act
tlevin (trevor) · 2023-12-15T06:02:44.668Z · comments (7)

[link] Is "superhuman" AI forecasting BS? Some experiments on the "539" bot from the Centre for AI Safety
titotal (lombertini) · 2024-09-18T13:07:40.754Z · comments (3)

Corrigibility = Tool-ness?
johnswentworth · 2024-06-28T01:19:48.883Z · comments (8)

Spaciousness In Partner Dance: A Naturalism Demo
LoganStrohl (BrienneYudkowsky) · 2023-11-19T07:00:19.555Z · comments (5)

[link] The Cognitive-Theoretic Model of the Universe: A Partial Summary and Review
jessicata (jessica.liu.taylor) · 2024-03-27T19:59:27.893Z · comments (36)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

scroogemcduck1 on Bets and updating

They don't need to solve the whole Halting Problem, for the same reason you don't need to contradict Rice's theorem if you had some proof (which I take as an axiom for the sake of the hypothetical) that the predictor was in fact perfect and that it is utility maximizing. Also, we can just try saying that there is a high probability that they will do this. Furthermore, you can imagine a restricted subset of Turing machines for which the Halting problem is computable. But also the only computers that exist in reality are really finite state machines.

scroogemcduck1 on Bets and updating

Well, the perplexing situation doesn't actually happen if the predictors are good enough, because they'll predict you both won't update and won't take the bet. Thus you'll never have been approached in the first place.

deepthoughtlife on Abstractions are not Natural

When reading the piece, it seemed to assume far too much (and many of the assumptions are ones I obviously disagree with). I would call many of the assumptions made to be a relative of the false dichotomy (though I don't know what it is called when you present more than two possibilities as exhaustive but they really aren't.) If you were more open in your writing to the idea that you don't necessarily know what the believers in natural abstractions mean, and that the possibilities mentioned were not exhaustive, I probably would have had a less negative reaction.

When combined with a dismissive tone, many (me included) will read it as hostile, regardless of actual intent (though frustration is actually just as good a possibility for why someone would write in that manner, and genuine confusion over what people believe is also likely). People are always on the lookout for potential hostility it seems (probably a safety related instinct) and usually err on the side of seeing it (though some overcorrect against the instinct instead).

I'm sure I come across as hostile when I write reasonably often though that is rarely my intent.

benito on Lighthaven Sequences Reading Group #9 (Tuesday 11/05)

I really enjoyed the discussion tonight. Something different about the mood in a warmly lit, cozy attic.

Here's a write-up [LW(p) · GW(p)] of a discussion we had that ended up with polling over 2,000 people on a question that was raised about the essay Are Your Enemies Innately Evil? [LW · GW].

benito on Are Your Enemies Innately Evil?

Realistically, most people don’t construct their life stories with themselves as the villains. Everyone is the hero of their own story.

At tonight's sequences-reading meetup [? · GW], I argued that while it is a mistake to think that people typically see themselves as villains, it is also a mistake to think that they typically view themselves as heroes. Most people don't have especially grand narratives, nor do they view themselves as very strongly moral in either direction (even though I believe there's a trend toward positive self-image).

To get a little data [LW · GW] on this question, resident Queen-of-Polls Aella polled over 2,000 people on the following question:

do you feel a sense of heroism - like righteous grand goodness - when it comes to your behavior or advocacy around politics, religion, or cultural opinions?
Yes very much
Sometimes/partially
Not really

Here are the (spoilered) results, after being up for a little over 2 hours and getting over 2,000 responses. Write down your predictions now if you want to actually test your models.

Yes very much: 12.1%
Sometimes/partially: 27.2%
Not really: 60.7%

My comments on the results:

Once Aella told me she had sent out the poll, when I queried my anticipations, I actually predicted differently to the direction of my argument. (I later noted it was similar to the person who believes there's a dragon in their garage but anticipates the flour falling to the ground [LW · GW]). Anyway, given this poll, I predicted

Yes = 30%
Sometimes = 50%
No = 20%

Whereas in fact it was much more in-line with the argument I was giving. Not sure what that says about my world-models!

habryka4 on The Shallow Bench

"stop reading here if you don't want to be spoiled."

(I added that sentence based on Jonathan Claybrough's comment, feel free to suggest an alternative one)

zhukeepa on Against empathy-by-default

“If I’m thinking about how to pick up tofu with a fork, I might analogize to how I might pick up feta with a fork, and so if tofu is yummy then I’ll get a yummy vibe and I’ll wind up feeling that feta is yummy too.”

Isn't the more analogous argument "If I'm thinking about how to pick up tofu with a fork, and it feels good when I imagine doing that, then when I analogize to picking up feta with a fork, it would also feel good when I imagine that"? This does seem valid to me, and also seems more analogous to the argument you'd compared the counter-to-common-sense second argument with:

“If I’m thinking about what someone else might do and feel in situation X by analogy to what I might do and feel in situation X, and then if situation X is unpleasant than that simulation will be unpleasant, and I’ll get a generally unpleasant feeling by doing that.”

karl-faulks on The Shallow Bench

Thanks! Sorry about that.

sahil-1 on Live Machinery: Interface Design Workshop for AI Safety @ EA Hotel

Thank you, Dusan!

Next time there will be more notice, and also a more refined workshop!

charlie-steiner on Another UFO Bet

Probably? But I'll feel bad if I don't try to talk you out of this first.

It's true that alien sightings, videos of UFOs, etc. are slowly accumulating evidence for alien visitors, even if each item has reasonable mundane excuses (e.g. 'the mysterious shape on the infrared footage was probably just a distant plane or missile' or 'the eyewitness probably lost track of time but has doubled-down on being confident they didn't'). However, all the time that passes without yet-stronger evidence for aliens is evidence against alien visitors.
- You could imagine aliens landing in the middle of the Superbowl, or sending us each a messenger drone, or the US government sending alien biological material to 50 different labs composed of hundreds of individual researchers, who hold seminars on what they're doing that you can watch on youtube. Every year nothing like this happens puts additional restrictions on alien-visitor hypotheses, which I think outweigh the slow trickle of evidence from very-hard-to-verify encounters. Relative to our informational state in the year 2000, alien visitors actually seem less likely to me.
- Imagine someone making a 5-year bet on whether we'd have publicly-replicable evidence of alien visitors, every 5 years since the 1947 Roswell news story. Like, really imagine losing this bet 15 times in a row. Even conditional on the aliens being out there, clearly there's a pretty good process keeping the truth from getting out, and you should be getting more and more confident that this process won't break down in the next 5 years.
Even if we're in a simulation, there is no particular reason for the simulators to be behind UAP.
- Like, suppose you're running an ancestor simulation of Earth. Maybe you're a historian interested in researching our response to perturbations to the timeline that you think really could have happened, or maybe you're trying to recreate a specific person to pull out of the simulation, or you're self-inserting to have lots of great sex, or self-inserting along with several of your friends to play some sort of decades-long game. Probably you have much better things to do with this simulation than inserting some hard-to-verify floating orbs into the atmosphere.
There is a 'UFO entertainment industry' that is creating an adverse information environment for us.
- E.g. Skinwalker Ranch is a place where some people have seen spooky stuff. But it's also a way to sell books and TV shows and get millions-of-dollars government grants, each of which involves quite a few people whose livelihood now depends not only on the spookiness of this one place, but more generally on how spooky claims are treated by the public and the US government.
- There's an analogy here to the NASA Artemis program, which involves a big web of contractors whose livelihoods depended on the space shuttle program. These contractors, and politicians working with them, and government managers who like being in charge of larger programs, all benefit from what we might call an "adverse information environment" regarding how valuable certain space programs are, how well they'll work, and how much they'll cost.