LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Predictive model agents are sort of corrigible
Raymond D · 2024-01-05T14:05:03.037Z · comments (6)

'Theories of Values' and 'Theories of Agents': confusions, musings and desiderata
Mateusz Bagiński (mateusz-baginski) · 2023-11-15T16:00:48.926Z · comments (8)

Forecasting AI (Overview)
jsteinhardt · 2023-11-16T19:00:04.218Z · comments (0)

[Valence series] 4. Valence & Social Status (deprecated)
Steven Byrnes (steve2152) · 2023-12-15T14:24:41.040Z · comments (19)

Proposal for improving the global online discourse through personalised comment ordering on all websites
Roman Leventov · 2023-12-06T18:51:37.645Z · comments (21)

Open Thread – Winter 2023/2024
habryka (habryka4) · 2023-12-04T22:59:49.957Z · comments (160)

Humans aren't fleeb.
Charlie Steiner · 2024-01-24T05:31:46.929Z · comments (5)

How I select alignment research projects
Ethan Perez (ethan-perez) · 2024-04-10T04:33:08.092Z · comments (4)

Monthly Roundup #12: November 2023
Zvi · 2023-11-14T15:20:06.926Z · comments (5)

AI #56: Blackwell That Ends Well
Zvi · 2024-03-21T12:10:05.412Z · comments (16)

What I Learned (Conclusion To "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-03-20T21:24:37.464Z · comments (0)

Linear encoding of character-level information in GPT-J token embeddings
mwatkins · 2023-11-10T22:19:14.654Z · comments (4)

Adam Smith Meets AI Doomers
James_Miller · 2024-01-31T15:53:03.070Z · comments (10)

Unpicking Extinction
ukc10014 · 2023-12-09T09:15:41.291Z · comments (10)

Difficulty classes for alignment properties
Jozdien · 2024-02-20T09:08:24.783Z · comments (5)

How to develop a photographic memory 1/3
PhilosophicalSoul (LiamLaw) · 2023-12-28T13:26:36.669Z · comments (6)

[link] Robin Hanson & Liron Shapira Debate AI X-Risk
Liron · 2024-07-08T21:45:40.609Z · comments (4)

[link] Why Yudkowsky is wrong about "covalently bonded equivalents of biology"
titotal (lombertini) · 2023-12-06T14:09:15.402Z · comments (40)

D&D.Sci (Easy Mode): On The Construction Of Impossible Structures
abstractapplic · 2024-05-17T00:25:42.950Z · comments (12)

Reflective consistency, randomized decisions, and the dangers of unrealistic thought experiments
Radford Neal · 2023-12-07T03:33:16.149Z · comments (25)

[link] Inferring the model dimension of API-protected LLMs
Ege Erdil (ege-erdil) · 2024-03-18T06:19:25.974Z · comments (3)

[link] Romae Industriae
Maxwell Tabarrok (maxwell-tabarrok) · 2024-07-19T13:03:31.536Z · comments (2)

LessWrong: After Dark, a new side of LessWrong
So8res · 2024-04-01T22:44:04.449Z · comments (5)

[link] hydrogen tube transport
bhauth · 2024-04-18T22:47:08.790Z · comments (12)

Motivating Alignment of LLM-Powered Agents: Easy for AGI, Hard for ASI?
RogerDearnaley (roger-d-1) · 2024-01-11T12:56:29.672Z · comments (4)

[link] The $100B plan with "70% risk of killing us all" w Stephen Fry [video]
Oleg Trott (oleg-trott) · 2024-07-21T20:06:39.615Z · comments (8)

Copyright Confrontation #1
Zvi · 2024-01-03T15:50:04.850Z · comments (7)

The Cognitive Bootcamp Agreement
Raemon · 2024-10-16T23:24:05.509Z · comments (0)

(Maybe) A Bag of Heuristics is All There Is & A Bag of Heuristics is All You Need
Sodium · 2024-10-03T19:11:58.032Z · comments (17)

ARENA4.0 Capstone: Hyperparameter tuning for MELBO + replication on Llama-3.2-1b-Instruct
25Hour (aaron-kaufman) · 2024-10-05T11:30:11.953Z · comments (2)

[question] If I have some money, whom should I donate it to in order to reduce expected P(doom) the most?
KvmanThinking (avery-liu) · 2024-10-03T11:31:19.974Z · answers+comments (36)

Augmenting Statistical Models with Natural Language Parameters
jsteinhardt · 2024-09-20T18:30:10.816Z · comments (0)

Intransitive Trust
Screwtape · 2024-05-27T16:55:29.294Z · comments (15)

Computational Mechanics Hackathon (June 1 & 2)
Adam Shai (adam-shai) · 2024-05-24T22:18:44.352Z · comments (5)

If You Can Climb Up, You Can Climb Down
jefftk (jkaufman) · 2024-07-30T00:00:06.295Z · comments (9)

AXRP Episode 33 - RLHF Problems with Scott Emmons
DanielFilan · 2024-06-12T03:30:05.747Z · comments (0)

[link] AI governance needs a theory of victory
Corin Katzke (corin-katzke) · 2024-06-21T16:15:46.560Z · comments (6)

[link] Suffering Is Not Pain
jbkjr · 2024-06-18T18:04:43.407Z · comments (45)

AI Impacts Survey: December 2023 Edition
Zvi · 2024-01-05T14:40:06.156Z · comments (6)

[link] GPT2, Five Years On
Joel Burget (joel-burget) · 2024-06-05T17:44:17.552Z · comments (0)

AI Safety Strategies Landscape
Charbel-Raphaël (charbel-raphael-segerie) · 2024-05-09T17:33:45.853Z · comments (1)

[link] Book review: On the Edge
PeterMcCluskey · 2024-08-30T22:18:39.581Z · comments (0)

[link] The last era of human mistakes
owencb · 2024-07-24T09:58:42.116Z · comments (2)

[link] My Apartment Art Commission Process
jenn (pixx) · 2024-08-26T18:36:44.363Z · comments (4)

The Schumer Report on AI (RTFB)
Zvi · 2024-05-24T15:10:03.122Z · comments (3)

[link] legged robot scaling laws
bhauth · 2024-01-20T05:45:56.632Z · comments (8)

[link] AI Safety Memes Wiki
plex (ete) · 2024-07-24T18:53:04.977Z · comments (1)

UDT1.01: Logical Inductors and Implicit Beliefs (5/10)
Diffractor · 2024-04-18T08:39:13.368Z · comments (2)

[link] patent process problems
bhauth · 2024-07-14T21:12:04.953Z · comments (13)

Confusing the metric for the meaning: Perhaps correlated attributes are "natural"
NickyP (Nicky) · 2024-07-23T12:43:18.681Z · comments (3)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

tsvibt on Scissors Statements for President?

My hope is that this can become more feasible if we can provide accurate patterns for how the scissors-generating-process is trying to trick Susan(/Robert). And that if Susan is trying to figure out how she and Robert were tricked, by modeling the tricking process, this can somehow help undo the trick, without needing to empathize at any point with "what if candidate X is great."

This is clarifying...

Does it actually have much to do with Robert? Maybe it would be more helpful to talk with Tusan and Vusan, who are also A-blind, B-seeing, candidate Y supporters. They're the ones who would punish non-punishers of supporting candidate X / talking about A. (Which Susan would become, if she were talking to an A-seer without pushing back, let alone if she could see into her A-blindspot.) You could talk to Robert about how he's embedded in threats of punishment for non-punishment of supporting candidate Y / talking about B, but that seems more confusing? IDK.

martin-vlach on My motivation and theory of change for working in AI healthtech

EA is neglecting industrial solutions to the industrial problem of successionism.

..because the broader mass of active actors working on such solutions renders the biz areas non-neglected?

martin-vlach on Safety isn’t safety without a social model (or: dispelling the myth of per se technical safety)

Wow, such a badly argued( aka BS) while heavily up-voted article!
Let's start with the Myth #1, what a straw-man! Rather than this extreme statement, most researchers likely believe that in the current environment their safety&alignment advances are likely( with high EV) helpful to humanity. The thing here is they had quite a free hand or at least varied options to pick the environment where they work and publish.
With your examples a bad actor could see a worthy EV even with a capable system that is less obedient and more false. Even if interpretabilty speeds up development, it would direct such development to more transparent models, at least there is a naive chance for that.

Myth #2: I've not yet met anybody in the alighnment circles who believed that. Most are pretty conscious about the double-edgedness and your sub-arguments.

https://www.lesswrong.com/posts/F2voF4pr3BfejJawL/safety-isn-t-safety-without-a-social-model-or-dispelling-the?commentId=5vB5tDpFiQDG4pqqz [LW(p) · GW(p)] depicts the flaws I point to neatly/gently.

quila on quila's Shortform

What is malevolence? On the nature, measurement, and distribution of dark traits [LW · GW] was posted two weeks ago (and i recommend it). there was a questionnaire discussed in that post which tries to measure the levels of 'dark traits' in the respondent.

i'm curious about the results^[1] of rationalists^[2] on that questionnaire, if anyone wants to volunteer theirs. there are short and long versions (16 and 70 questions).

^{^}
(or responses to the questions themselves)
^{^}
i also posted the same shortform to the EA forum [EA(p) · GW(p)], asking about EAs

tsvibt on Scissors Statements for President?

I think I agree, but

It's hard to get clear enough on your values. In practice (and maybe also in theory) it's an ongoing process.
Values aren't the only thing going on. There are stances that aren't even close to being either a value, a plan, or a belief. An example is a person who thinks/acts in terms of who they trust, and who seems good; if a lot of people that they know who seem good also think some other person seems good, then they'll adopt that stance.

martin-vlach on Safety isn’t safety without a social model (or: dispelling the myth of per se technical safety)

Are you referring to a Science of Technological Progress ala https://www.theatlantic.com/science/archive/2019/07/we-need-new-science-progress/594946 ?

What is your gist on the processes for humanizing technologies, what sources/researches are available on such phenomena?

deepthoughtlife on Should CA, TX, OK, and LA merge into a giant swing state, just for elections?

I think it would be a bad idea to actually do (there are so many problems with it in practice), but it is a bit of an interesting thing to note how being a swing state helps convince everyone to try to cater to you, and not just a little. This would be the swing state to end all swing states, I suppose.

The way to get this done that might actually work is probably to make it an amendment to each state's constitution that can only be repealed for future elections and not the one the constitutional change reverting this would be voted on in. (If necessary, you can always amend how the state constitution is amended to make this doable.)

tsvibt on An alternative approach to superbabies

I don't care about doing this bet. We can just have a conversation though, feel free to DM me.

startattheend on Does the "ancient wisdom" argument have any validity? If a particular teaching or tradition is old, to what extent does this make it more trustworthy?

I refered to that too (specifically, the assumption). By true I meant that the bias which I think is to blame certainly exists, not that it was certain to be the main reason (but I'd like to push against this bias in general, so even if this bias only applies to some of the people to see my comment, I think it's an important topic to bring up, and that it likely has enough indirect influence to matter)

To address your points:

1: Of course it's mixed. But the mixed advice averages out to be "wise", something generally useful.
2: I think it's necessarily trial and error, but a good question is "does the wisdom generalize to now?".
3: This of course depends on the examples that you choose. A passage on the ideal age of marriage might generalize to our time less gracefully than a passage on meditation. I think this goes without saying, but if we assume these things aren't intuitive, then a proper answer would be maybe 5 pages long.
4: Would interpreting it as "negative" not mean that it has been misunderstood? That one can learn without understanding is precisely why they could prosper with a level of education which pales to that of modern times. We learned that bad smells were associated with sickness way before we discovered germs. If our tech requires intelligence to use, then the lower quartile of society might struggle. And with the blind approach you can use genius strategies even if you're mediocre.

5: along with 4, I think this is an example of the bias that I talked about above. What we think of as "real" tends to be sufficiently disconnected from humanity. Religion and traditional ways of living seem to correlate with mental health, so the types of people who think that wealth inequality is the only source of suffering in the world are too materialistic and disconnected. Not to commit the naturalistic fallacy, but nature does optimize in its own way, and imitating nature tends to go much better than "correcting" it.

tsvibt on An alternative approach to superbabies

(e.g. 1 billon dollars and a few very smart geniuses going into trying to make communication with orcas work well)

That would give more like a 90% chance of superbabies born in <10 years.