LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Trustworthy and untrustworthy models
Olli Järviniemi (jarviniemi) · 2024-08-19T16:27:11.088Z · comments (3)

Some costs of superposition
Linda Linsefors · 2024-03-03T16:08:20.674Z · comments (11)

AI Safety 101 : Capabilities - Human Level AI, What? How? and When?
markov (markovial) · 2024-03-07T17:29:53.260Z · comments (8)

[link] Michael Dickens' Caffeine Tolerance Research
niplav · 2024-09-04T15:41:53.343Z · comments (3)

So You Created a Sociopath - New Book Announcement!
Garrett Baker (D0TheMath) · 2024-04-01T18:02:18.010Z · comments (3)

[link] For Civilization and Against Niceness
Gabriel Alfour (gabriel-alfour-1) · 2023-11-20T10:56:20.352Z · comments (14)

The predictive power of dissipative adaptation
dr_s · 2023-12-17T14:01:31.568Z · comments (14)

[link] Metascience of the Vesuvius Challenge
Maxwell Tabarrok (maxwell-tabarrok) · 2024-03-30T12:02:38.978Z · comments (2)

AI #41: Bring in the Other Gemini
Zvi · 2023-12-07T15:10:05.552Z · comments (16)

Big Picture AI Safety: Introduction
EuanMcLean (euanmclean) · 2024-05-23T11:15:44.037Z · comments (7)

AI #68: Remarkably Reasonable Reactions
Zvi · 2024-06-13T16:30:02.969Z · comments (11)

[link] Contra Scott on Abolishing the FDA
Maxwell Tabarrok (maxwell-tabarrok) · 2023-12-15T14:00:17.247Z · comments (3)

[link] If Clarity Seems Like Death to Them
Zack_M_Davis · 2023-12-30T17:40:42.622Z · comments (191)

How to hire somebody better than yourself
lukehmiles (lcmgcd) · 2024-08-28T08:12:53.450Z · comments (5)

[Valence series] 4. Valence & Liking / Admiring
Steven Byrnes (steve2152) · 2024-06-10T14:19:51.194Z · comments (12)

1. The CAST Strategy
Max Harms (max-harms) · 2024-06-07T22:29:13.005Z · comments (19)

AI #33: Cool New Interpretability Paper
Zvi · 2023-10-12T16:20:01.481Z · comments (18)

Toy Models of Feature Absorption in SAEs
chanind · 2024-10-07T09:56:53.609Z · comments (7)

[link] Will releasing the weights of large language models grant widespread access to pandemic agents?
jefftk (jkaufman) · 2023-10-30T18:22:59.677Z · comments (25)

[question] Rationalist horror movies
Elizabeth (pktechgirl) · 2023-10-15T07:42:14.509Z · answers+comments (35)

I finally got ChatGPT to sound like me
lsusr · 2024-09-17T09:39:59.415Z · comments (18)

Bounty for Evidence on Some of Palisade Research's Beliefs
benwr · 2024-09-23T20:01:20.917Z · comments (4)

Vaniver's thoughts on Anthropic's RSP
Vaniver · 2023-10-28T21:06:07.323Z · comments (4)

[link] MIRI's September 2024 newsletter
Harlan · 2024-09-16T18:15:40.785Z · comments (0)

Rapid capability gain around supergenius level seems probable even without intelligence needing to improve intelligence
Towards_Keeperhood (Simon Skade) · 2024-05-06T17:09:10.729Z · comments (16)

LW UI features you might not have tried
Elizabeth (pktechgirl) · 2023-10-13T03:04:57.542Z · comments (6)

The Shallow Bench
Karl Faulks (karl-faulks) · 2024-11-05T05:07:27.357Z · comments (5)

In Defense of Parselmouths
Screwtape · 2023-11-15T23:02:19.344Z · comments (10)

Telopheme, telophore, and telotect
TsviBT · 2023-09-17T16:24:03.365Z · comments (7)

[Intuitive self-models] 7. Hearing Voices, and Other Hallucinations
Steven Byrnes (steve2152) · 2024-10-29T13:36:16.325Z · comments (2)

Forecasting One-Shot Games
Raemon · 2024-08-31T23:10:05.475Z · comments (0)

I'm open for projects (sort of)
cousin_it · 2024-04-18T18:05:01.395Z · comments (13)

Conflating value alignment and intent alignment is causing confusion
Seth Herd · 2024-09-05T16:39:51.967Z · comments (18)

AI as a powerful meme, via CGP Grey
TheManxLoiner · 2024-10-30T18:31:58.544Z · comments (6)

AI #32: Lie Detector
Zvi · 2023-10-05T13:50:05.030Z · comments (19)

Dating Roundup #3: Third Time’s the Charm
Zvi · 2024-05-08T13:30:03.232Z · comments (27)

[link] Book review: Deep Utopia
PeterMcCluskey · 2024-04-23T19:55:50.417Z · comments (14)

[question] Where might I direct promising-to-me researchers to apply for alignment jobs/grants?
abramdemski · 2023-09-18T16:20:03.452Z · answers+comments (10)

[link] NYT on the Manifest forecasting conference
Austin Chen (austin-chen) · 2023-10-09T21:40:16.732Z · comments (14)

On Tapping Out
Screwtape · 2023-11-17T03:23:55.880Z · comments (13)

[link] Towards Evaluating AI Systems for Moral Status Using Self-Reports
Ethan Perez (ethan-perez) · 2023-11-16T20:18:51.730Z · comments (3)

Work with me on agent foundations: independent fellowship
Alex_Altair · 2024-09-21T13:59:16.706Z · comments (5)

We Don't Know Our Own Values, but Reward Bridges The Is-Ought Gap
johnswentworth · 2024-09-19T22:22:05.307Z · comments (47)

AI #36: In the Background
Zvi · 2023-11-02T18:00:01.803Z · comments (5)

~80 Interesting Questions about Foundation Model Agent Safety
RohanS · 2024-10-28T16:37:04.713Z · comments (4)

AI #80: Never Have I Ever
Zvi · 2024-09-10T17:50:08.074Z · comments (20)

[link] AlphaGeometry: An Olympiad-level AI system for geometry
alyssavance · 2024-01-17T17:17:30.913Z · comments (9)

Startup Roundup #2
Zvi · 2024-08-06T13:30:06.554Z · comments (0)

[link] Loneliness and suicide mitigation for students using GPT3-enabled chatbots (survey of Replika users in Nature)
Kaj_Sotala · 2024-01-23T14:05:40.986Z · comments (2)

Things Solenoid Narrates
Solenoid_Entity · 2024-04-12T23:57:16.169Z · comments (2)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

rhollerith_dot_com on Thoughts after the Wolfram and Yudkowsky discussion

Near the end of the interview, Wolfram say that he cannot do much processing of what was discussed "in real time", which strongly suggests to me that he expects to process it slowly over the next days and weeks. I.e., he is now trying to reassure himself that the AI project won't kill his four children or any grandchildren he has or will have. Because Wolfram is much better AFAICT at this kind of slow "strategic rational" deliberation than most people at his level of life accomplishment, there is a good chance he will fail to find his slow deliberations reassuring, in which case he probably then declares himself an AI doomer. Specifically, my probability is .225 that 18 months from now, Wolfram will have come out publicly against allowing ambitious frontier AI research to continue. P = .225 is much much higher than my P for the average 65-year-old of his intellectual stature who is not specialized in AI.

My probability that he will become more optimistic about the AI project over the next 18 months is lower than .225: mostly likely, he goes silent on the issue or continues to take an inquisitive non-commital stance in his public discussions of it.

Although I agree with another comment that Wolfram has not "done the reading" on AI extinction risk, my being able to watch his face while he confronts some of the considerations and arguments for the first time made it easier, not harder, for me to predict where his stance on the AI project will end up 18 months from now.

dagon on A Theory of Equilibrium in the Offense-Defense Balance

I think this is the right way to think of most anti-inductive (planner-adversarial or competitive exploitation) situations. Where there are multiple dimensions of assymetric capabilities, any change is likely to shift the equilibrium, but not necessarily by as much as the shift in component.

That said, tipping points are real, and sometimes a component shift can have a BIGGER effect, because it shifts the search to a new local minimum. In most cases, this is not actully entirely due to that component change, but the discovery and reconfiguration is triggered by it. The rise of mass shootings in the US is an example - there are a lot of causes, but the shift happened quite quickly.

Offense-defense is further confused as an example, because there are at least two different equilibria involved. when you say

The offense-defense balance is a concept that compares how easy it is to protect vs conquer or destroy resources.

Conquer control vs retain control is a different thing than destroy vs preserve. Frank Herbert claimed (via fiction) that "The people who can destroy a thing, they control it." but it's actually true in very few cases. The equilibrium of who gets what share of the value from something can shift very separately from the equilibrium of how much total value that thing provides.

sarahconstantin on sarahconstantin's Shortform

links 11/15/2024: https://roamresearch.com/#/app/srcpublic/page/11-15-2024

https://www.reddit.com/r/self/comments/1gleyhg/people_like_me_are_the_reason_trump_won/ a moderate/swing-voter (Obama, Trump, Biden) explains why he voted for Trump this time around:
- he thinks Kamala Harris was an "empty shell" and unlikable and he felt the campaign was manipulative and deceptive.
- he didn't like that she seemed to be a "DEI hire", but doesn't have a problem with black or female candidates generally, it's just that he resents cynical demographic box-checking.
  - this is a coherent POV -- he did vote for Obama, after all. and plenty of people are like "I want the best person regardless of demographics, not a person chosen for their demographics."
    - hm. why doesn't it seem natural to portray Obama as a "DEI hire"? his campaign made a bigger deal about race than Harris's, and he was criticized a lot for inexperience.
      - One guess: it's laughable to think Obama was chosen by anyone besides himself. He was not the Democratic Party's anointed -- that was Hillary. He's clearly an ambitious guy who wanted to be president on his own initiative and beat the odds to get the nomination. He can't be a "DEI hire" because he wasn't a hire at all.
      - another guess: Obama is clearly smart, speaks/writes in complete sentences, and welcomes lots of media attention and talks about his policies, while Harris has a tendency towards word salad, interviews poorly, avoids discussing issues, etc.
      - another guess: everyone seems to reject the idea that people prefer male to female candidates, but I'm still really not sure there isn't a gender effect! This is very vibes-based on my part, and apparently the data goes the other way, so very uncertain here.
https://trevorklee.substack.com/p/if-langurs-can-drink-seawater-can Trevor Klee on adaptations for drinking seawater

habryka4 on Seven lessons I didn't learn from election day

This was a really good analysis of a bunch of election stuff that I hadn't seen presented clearly like this anywhere else. If it wasn't about elections and news I would curate it.

maxwell-peterson on Seven lessons I didn't learn from election day

A good post, of interest to all across the political spectrum, marred by the mistake at the end to become explicitly politically opinionated and say bad things about those who voted differently than OP.

sharmake-farah on Seven lessons I didn't learn from election day

The one thing I'll say on the election is that a lot of people are using Kamala Harris's loss to put in their own reasons for why Kamala Harris lost that are essentially ideological propaganda.

Basically only the story that she was doomed from the start because of global backlash against incumbents for inflation matches the evidence best, and a lot of other theories are very much there for ideological purposes.

ben-lang on Seven lessons I didn't learn from election day

I think this question is maybe logically flawed.

Say I have a shuffled deck of cards. You say the probability that the top card is the Ace of Spades is 1/52. I show you the top card, it is the 5 of diamonds. I then ask, knowing what you know now, what probability you should have given.

I picked a card analogy, and you picked a dice one. I think the card one is better in this case, for weird idiosyncratic reasons I give below that might just be irrelevant to the train of thought you are on.

Cards vs Dice: If we could reset the whole planet to its exact state 1 week before the election then we would I think get the same result (I don't think quantum will mess with us in one week). What if we do a coarser grained reset? So if there was a kettle of water at 90 degrees a week before the election that kettle is reset to contain the same volume of water in the same part of my kitchen, and the water is still 90 degrees, but the individual water molecules have different momenta. For some value of "macro" the world is reset to the same macrostate but not the same microstate, it had 1 week before election day. If we imagine this experiment I still think Trump wins every (or almost every) time, given what we know now. For me to think this kind of thermal-level randomness made a difference in one week it would have to have been much closer.

In my head things that change on the coarse-grained reset feel more like unrolled dice, and things that don't more like facedown cards. Although in detail the distinction is fuzzy: it is based on an arbitrary line between micro an macro, and it is time sensitive, because cards that are going to be shuffled in the future are in the same category as dice.

EDIT: I did as asked, and replied without reading your comments on the EA forum. Reading that I think we are actually in complete agreement, although you actually know the proper terms for the things I gestured at.

abramdemski on o1 is a bad idea

Thanks for this response! I agree with the argument. I'm not sure what it would take to ensure CoT faithfulness, but I agree that it is an important direction to try and take things; perhaps even the most promising direction for near-term frontier-lab safety research (given the incentives pushing those labs in o1-ish directions).

mondsemmel on Lao Mein's Shortform

That might very well help, yes. However, two thoughts, neither at all well thought out:

If the Trump administration does fight OpenAI, let's hope Altman doesn't manage to judo flip the situation like he did with the OpenAI board saga, and somehow magically end up replacing Musk or Trump in the upcoming administration...
Musk's own track record on AI x-risk is not great. I guess he did endorse California's SB 1047, so that's better than OpenAI's current position. But he helped found OpenAI, and recently founded another AI company. There's a scenario where we just trade extinction risk from Altman's OpenAI for extinction risk from Musk's xAI.

notfnofn on D0TheMath's Shortform

For example, if you ask mathematicians whether ZFC + not Consistent(ZFC) is consistent, they will say "no, of course not!"

Certainly not a mathematician with any background in logic.

Similarly, if we have the Peano axioms without induction, mathematicians will say that induction should be there, but in fact you cannot prove this fact from within Peano

What exactly do you mean here? That the Peano axioms minus induction do not adequately characterize the natural numbers because they have nonstandard models? Why would I then be surprised that induction (which does characterize the natural numbers) can't be proven from the remaining axioms?

and given induction mathematicians will say transfinite induction should be there.

Transfinite induction is a consequence of ZF that makes sense in the context in sets. Yes, it can prove additional statements about the natural numbers (e.g. goodstein sequences converge), but why would it be added as an axiom when the natural numbers are already characterized up to isomorphism by the Peano axioms? How would you even add it as an axiom in the language of natural numbers? (that last question is non-rhetorical).