LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Misnaming and Other Issues with OpenAI's “Human Level” Superintelligence Hierarchy
Davidmanheim · 2024-07-15T05:50:17.770Z · comments (2)

“Why can’t you just turn it off?”
Roko · 2023-11-19T14:46:18.427Z · comments (25)

[link] Contra Acemoglu on AI
Maxwell Tabarrok (maxwell-tabarrok) · 2024-06-28T13:13:15.796Z · comments (0)

[link] On scalable oversight with weak LLMs judging strong LLMs
zac_kenton (zkenton) · 2024-07-08T08:59:58.523Z · comments (18)

Making Bad Decisions On Purpose
Screwtape · 2023-11-09T03:36:59.611Z · comments (8)

The Mom Test: Summary and Thoughts
Adam Zerner (adamzerner) · 2024-04-18T03:34:21.020Z · comments (3)

[link] Web-surfing tips for strange times
eukaryote · 2024-05-31T07:10:25.805Z · comments (19)

Mechanistic Interpretability Workshop Happening at ICML 2024!
Neel Nanda (neel-nanda-1) · 2024-05-03T01:18:26.936Z · comments (6)

On the lethality of biased human reward ratings
Eli Tyre (elityre) · 2023-11-17T18:59:02.303Z · comments (10)

Why the Best Writers Endure Isolation
Declan Molony (declan-molony) · 2024-07-16T05:58:25.032Z · comments (6)

Safety First: safety before full alignment. The deontic sufficiency hypothesis.
Chipmonk · 2024-01-03T17:55:19.825Z · comments (3)

[link] JumpReLU SAEs + Early Access to Gemma 2 SAEs
Senthooran Rajamanoharan (SenR) · 2024-07-19T16:10:54.664Z · comments (10)

Highlights from Lex Fridman’s interview of Yann LeCun
Joel Burget (joel-burget) · 2024-03-13T20:58:13.052Z · comments (15)

SRE's review of Democracy
Martin Sustrik (sustrik) · 2024-08-03T07:20:01.483Z · comments (2)

So you want to work on technical AI safety
gw · 2024-06-24T14:29:57.481Z · comments (3)

AI and the Technological Richter Scale
Zvi · 2024-09-04T14:00:08.625Z · comments (8)

Experiments as a Third Alternative
Adam Zerner (adamzerner) · 2023-10-29T00:39:31.399Z · comments (21)

[link] Chapter 1 of How to Win Friends and Influence People
gull · 2024-01-28T00:32:52.865Z · comments (5)

[link] Designing for a single purpose
Itay Dreyfus (itay-dreyfus) · 2024-05-07T14:11:22.242Z · comments (12)

What is the next level of rationality?
lsusr · 2023-12-12T08:14:14.846Z · comments (24)

[link] Spaced repetition for teaching two-year olds how to read (Interview)
Chipmonk · 2023-11-26T16:52:58.412Z · comments (9)

Philosophers wrestling with evil, as a social media feed
David Gross (David_Gross) · 2024-06-03T22:25:22.507Z · comments (2)

Childhood Roundup #3
Zvi · 2023-10-10T14:30:04.287Z · comments (3)

How to do conceptual research: Case study interview with Caspar Oesterheld
Chi Nguyen · 2024-05-14T15:09:30.390Z · comments (5)

The Handbook of Rationality (2021, MIT press) is now open access
romeostevensit · 2023-10-10T00:30:05.589Z · comments (4)

Should rationalists be spiritual / Spirituality as overcoming delusion
Kaj_Sotala · 2024-03-25T16:48:08.397Z · comments (57)

AISC 2024 - Project Summaries
NickyP (Nicky) · 2023-11-27T22:32:23.555Z · comments (3)

On ‘Responsible Scaling Policies’ (RSPs)
Zvi · 2023-12-05T16:10:06.310Z · comments (3)

[link] Every Mention of EA in "Going Infinite"
KirstenH · 2023-10-07T14:42:32.217Z · comments (0)

[link] Urging an International AI Treaty: An Open Letter
Olli Järviniemi (jarviniemi) · 2023-10-31T11:26:25.864Z · comments (2)

[link] A Good Explanation of Differential Gears
Johannes C. Mayer (johannes-c-mayer) · 2023-10-19T23:07:46.354Z · comments (4)

Fund Transit With Development
jefftk (jkaufman) · 2023-09-22T11:10:05.645Z · comments (22)

What distinguishes "early", "mid" and "end" games?
Raemon · 2024-06-21T17:41:30.816Z · comments (22)

How to safely use an optimizer
Simon Fischer (SimonF) · 2024-03-28T16:11:01.277Z · comments (21)

[link] Constructive Cauchy sequences vs. Dedekind cuts
jessicata (jessica.liu.taylor) · 2024-03-14T23:04:07.300Z · comments (23)

AI Pause Will Likely Backfire (Guest Post)
jsteinhardt · 2023-10-24T04:30:02.113Z · comments (6)

The Dunning-Kruger of disproving Dunning-Kruger
kromem · 2024-05-16T10:11:33.108Z · comments (0)

Goodbye, Shoggoth: The Stage, its Animatronics, & the Puppeteer – a New Metaphor
RogerDearnaley (roger-d-1) · 2024-01-09T20:42:28.349Z · comments (8)

Mission Impossible: Dead Reckoning Part 1 AI Takeaways
Zvi · 2023-11-01T12:52:29.341Z · comments (13)

4. Existing Writing on Corrigibility
Max Harms (max-harms) · 2024-06-10T14:08:35.590Z · comments (13)

Critiques of the AI control agenda
Jozdien · 2024-02-14T19:25:04.105Z · comments (14)

Arguments for moral indefinability
Richard_Ngo (ricraz) · 2023-09-30T22:40:04.325Z · comments (16)

Value learning in the absence of ground truth
Joel_Saarinen (joel_saarinen) · 2024-02-05T18:56:02.260Z · comments (8)

[link] Five projects from AI Safety Hub Labs 2023
charlie_griffin (cjgriffin) · 2023-11-08T19:19:37.759Z · comments (1)

Some Experiments I'd Like Someone To Try With An Amnestic
johnswentworth · 2024-05-04T22:04:19.692Z · comments (33)

Environmental allergies are curable? (Sublingual immunotherapy)
Chipmonk · 2023-12-26T19:05:08.880Z · comments (10)

Sora What
Zvi · 2024-02-22T18:10:05.397Z · comments (3)

Run evals on base models too!
orthonormal · 2024-04-04T18:43:25.468Z · comments (6)

[link] "If we go extinct due to misaligned AI, at least nature will continue, right? ... right?"
plex (ete) · 2024-05-18T14:09:53.014Z · comments (23)

shortest goddamn bayes guide ever
lukehmiles (lcmgcd) · 2024-05-10T07:06:23.734Z · comments (8)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

nathan-helm-burger on The case for more Alignment Target Analysis (ATA)

Adding more resources :

[Clearer Thinking with Spencer Greenberg] Aligning society with our deepest values and sources of meaning (with Joe Edelman) #clearerThinkingWithSpencerGreenberg https://podcastaddict.com/clearer-thinking-with-spencer-greenberg/episode/175816099

https://podcastaddict.com/joe-carlsmith-audio/episode/157573278

sharmake-farah on Applications of Chaos: Saying No (with Hastings Greer)

I do want to note that a lot of the claimed unpredictability from chaos only works if you can measure stuff to a finite precision only, and while this is basically always true in practice, it is worth noticing, because if you did have the ability to have an infinite memory and infinite FLOP/s computer with infinitely precise measurement, like in Newtonian physics, chaos theory doesn't matter, because in a deterministic system, if you get the exact same input, it will always have the same output, so chaos doesn't matter.

To be clear, this isn't a practical way to beat chaos, but it is an exception to the rule that chaos makes a system unpredictable.

charbel-raphael on The case for more Alignment Target Analysis (ATA)

Fair enough.

I think my main problem with this proposal is that under the current paradigm of AIs (GPTs, foundation models), I don't see how you want to implement ATA, and this isn't really a priority?

review-bot on How I started believing religion might actually matter for rationality and moral philosophy

The LessWrong Review [? · GW] runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

review-bot on How I started believing religion might actually matter for rationality and moral philosophy

The LessWrong Review [? · GW] runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

auspicious on UFO Betting: Put Up or Shut Up

The potential for a disputed confirmation could also be a problem here.

I can imagine more congressional hearings happening on UFOs, then OP says this is a confirmation that UFOs are of alien or paranormal origin, while the other party disagrees.

tag on The Other Existential Crisis

Not so much , given deteminism.

Determinism allows you to cause the future in a limited sense. Under determinism, events still need to be caused,and your (determined) actions can be part of the cause of a future state that is itself determined, that has probability 1.0. Determinism allows you to cause the future ,but it doesn't allow you to control the future in any sense other than causing it. (and the sense in which you are causing the future is just the sense in which any future state depends on causes in he past -- it is nothing special and nothing different from physical causation). It allows, in a purely theoretical sense "if I had made choice b instead of choice a, then future B would have happened instead of future A" ... but without the ability to have actually chosen b. You are a link in a deterministic chain that leads to a future state, so without you, the state will not happen ... not that you have any choose use in the matter. You can't stop or change the future because you can't fail to make your choices, or make them differently. You can't anything of your own, since everything about you and your choices was determined by at the time of the Big Bang. Under determinism , you are nothing special...only the BB is special.

Tthis is still true under many worlds. even though MWI implies that there is not a single inevitable future, it doesn't allow you to influence the future in a way that makes future A more likely than future B , as a result of some choice you make now. Under MW determinism, the probabilities of A and B are what they are, and always were -- before you make a decision, after you make a decision , and before you were born. You can't choosee between them, even in the sense of adjusting the probabilities.

Libertarian free will, by contrast, does allow the future to depend on decisions which are not themselves determined. That means there are valid statements of the form "if I had made choice b instead of choice a, then future B would have happened instead of future A". And you actually could have made choice a or choice b....these are real possibilities, not merely conceptual or logical ones.

gwern on Applications of Chaos: Saying No (with Hastings Greer)

Chaos provides value both by telling certain engineers where not to look for solutions to their problems, and by getting their bosses off their back about it. That’s a significant value add, but short of what I was hoping for when I started looking into Chaos.

I don't think it's a value-add, because this sort of proof-by-intimidation abuse is how chaos theory gets used in many places, such as here on Lesswrong as well, not just engineers fighting their managers. Remember the proof that humans can't get high scores playing pinball because 'chaos theory' [LW · GW]? It's just an indiscriminate rhetorical weapon. It is not true in the case of playing pinball, it is probably not true of trebuchets in general (as opposed to cheap simple trebuchets constructed for contests or the Third World), and I would be surprised if all of those 6 successful manipulations were the valid exceptions. It is similar to the pervasive abuse of Godel or the Halting theorem; you doubtless could successfully convince some managers to not bother with things like typechecking or unit-tests or formal proofs because "Turing proved it is impossible to prove things about arbitrary programs" etc, but that is not a good thing, it is a bad thing.

benito on How I started believing religion might actually matter for rationality and moral philosophy

Good point. I have edited it into the last line of the post.

sharmake-farah on Optimistic Assumptions, Longterm Planning, and "Cope"

Not really, but they are definitely more few-shot than other areas, but thankfully getting 1 thing wrong isn't usually an immediate game-ender (though it is still to be avoided, and importantly this is why these 2 areas are harder than a lot of other fields).