LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

A Bird's Eye View of the ML Field [Pragmatic AI Safety #2]
Dan H (dan-hendrycks) · 2022-05-09T17:18:53.978Z · comments (8)

Rationality !== Winning
Raemon · 2023-07-24T02:53:59.764Z · comments (51)

A Personal (Interim) COVID-19 Postmortem
Davidmanheim · 2020-06-25T18:10:40.885Z · comments (41)

Outline of Galef's "Scout Mindset"
Rob Bensinger (RobbBB) · 2021-08-10T00:16:59.050Z · comments (17)

Be less scared of overconfidence
benkuhn · 2022-11-30T15:20:07.738Z · comments (22)

[link] Boycott OpenAI
PeterMcCluskey · 2024-06-18T19:52:42.854Z · comments (26)

A transparency and interpretability tech tree
evhub · 2022-06-16T23:44:14.961Z · comments (11)

The Summoned Heroine's Prediction Markets Keep Providing Financial Services To The Demon King!
abstractapplic · 2024-10-26T12:34:51.059Z · comments (16)

How to (hopefully ethically) make money off of AGI
habryka (habryka4) · 2023-11-06T23:35:16.476Z · comments (90)

The 2021 Less Wrong Darwin Game
lsusr · 2021-09-24T21:16:35.356Z · comments (102)

Gradient hacking is extremely difficult
beren · 2023-01-24T15:45:46.518Z · comments (22)

Holly Elmore and Rob Miles dialogue on AI Safety Advocacy
Bird Concept (jacobjacob) · 2023-10-20T21:04:32.645Z · comments (30)

Small and Vulnerable
sapphire (deluks917) · 2021-05-03T04:55:52.149Z · comments (17)

[link] Seeking Power is Often Convergently Instrumental in MDPs
TurnTrout · 2019-12-05T02:33:34.321Z · comments (39)

Threat-Resistant Bargaining Megapost: Introducing the ROSE Value
Diffractor · 2022-09-28T01:20:11.605Z · comments (19)

The Onion Test for Personal and Institutional Honesty
chanamessinger (cmessinger) · 2022-09-27T15:26:34.567Z · comments (31)

Secure homes for digital people
paulfchristiano · 2021-10-10T15:50:02.697Z · comments (37)

You can remove GPT2’s LayerNorm by fine-tuning for an hour
StefanHex (Stefan42) · 2024-08-08T18:33:38.803Z · comments (11)

Rereading Atlas Shrugged
Vaniver · 2020-07-28T18:54:45.272Z · comments (36)

RAISE post-mortem
[deleted] · 2019-11-24T16:19:05.163Z · comments (12)

[link] Sycophancy to subterfuge: Investigating reward tampering in large language models
Carson Denison (carson-denison) · 2024-06-17T18:41:31.090Z · comments (22)

ITT-passing and civility are good; "charity" is bad; steelmanning is niche
Rob Bensinger (RobbBB) · 2022-07-05T00:15:36.308Z · comments (36)

The Median Researcher Problem
johnswentworth · 2024-11-02T20:16:11.341Z · comments (71)

The Dial of Progress
Zvi · 2023-06-13T13:40:06.354Z · comments (119)

The likely first longevity drug is based on sketchy science. This is bad for science and bad for longevity.
BobBurgers · 2023-12-12T02:42:18.559Z · comments (34)

Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI
Jeremy Gillen (jeremy-gillen) · 2024-01-26T07:22:06.370Z · comments (60)

Logical induction for software engineers
Alex Flint (alexflint) · 2022-12-03T19:55:35.474Z · comments (8)

[link] Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
cloud · 2024-12-06T22:19:26.717Z · comments (12)

o1 is a bad idea
abramdemski · 2024-11-11T21:20:24.892Z · comments (38)

Saving Time
Scott Garrabrant · 2021-05-18T20:11:14.651Z · comments (20)

[link] Pseudorandomness contest: prizes, results, and analysis
Eric Neyman (UnexpectedValues) · 2021-01-15T06:24:15.317Z · comments (22)

Agentized LLMs will change the alignment landscape
Seth Herd · 2023-04-09T02:29:07.797Z · comments (102)

Jailbreaking GPT-4's code interpreter
Nikola Jurkovic (nikolaisalreadytaken) · 2023-07-13T18:43:54.484Z · comments (22)

[link] Connecting the Dots: LLMs can Infer & Verbalize Latent Structure from Training Data
Johannes Treutlein (Johannes_Treutlein) · 2024-06-21T15:54:41.430Z · comments (13)

[link] Succession
Richard_Ngo (ricraz) · 2023-12-20T19:25:03.185Z · comments (48)

Repeal the Foreign Dredge Act of 1906
Zvi · 2022-05-05T15:20:01.739Z · comments (16)

Vote on Interesting Disagreements
Ben Pace (Benito) · 2023-11-07T21:35:00.270Z · comments (129)

Most People Don't Realize We Have No Idea How Our AIs Work
Thane Ruthenis · 2023-12-21T20:02:00.360Z · comments (42)

My research methodology
paulfchristiano · 2021-03-22T21:20:07.046Z · comments (38)

[link] Making every researcher seek grants is a broken model
jasoncrawford · 2024-01-26T16:06:26.688Z · comments (41)

DeepMind's "Frontier Safety Framework" is weak and unambitious
Zach Stein-Perlman · 2024-05-18T03:00:13.541Z · comments (14)

Neutrality
sarahconstantin · 2024-11-13T23:10:05.469Z · comments (27)

Curing insanity with malaria
Swimmer963 (Miranda Dixon-Luinenburg) (Swimmer963) · 2021-08-04T02:28:11.731Z · comments (8)

Sparse Autoencoders Find Highly Interpretable Directions in Language Models
Logan Riggs (elriggs) · 2023-09-21T15:30:24.432Z · comments (8)

Current safety training techniques do not fully transfer to the agent setting
Simon Lermen (dalasnoin) · 2024-11-03T19:24:51.537Z · comments (8)

«Boundaries», Part 1: a key missing concept from utility theory
Andrew_Critch · 2022-07-26T23:03:55.941Z · comments (33)

[link] What would a compute monitoring plan look like? [Linkpost]
[deleted] · 2023-03-26T19:33:46.896Z · comments (10)

[Intro to brain-like-AGI safety] 1. What's the problem & Why work on it now?
Steven Byrnes (steve2152) · 2022-01-26T15:23:22.429Z · comments (19)

Language Models Model Us
eggsyntax · 2024-05-17T21:00:34.821Z · comments (55)

Many arguments for AI x-risk are wrong
TurnTrout · 2024-03-05T02:31:00.990Z · comments (87)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

ryan_b on The future of humanity is in management

And all of this will happen far faster than it did in the past, so people won’t get a chance to adapt. If your job gets eliminated by AI, you won’t even have time to reskill for a new job before AI takes that one too.

I propose an alternative to speed as explanation: all previous forms of automation were local. Each factory had to be automated in bespoke fashion one at a time; a person could move from a factory that was automated to any other factory that had not been yet. The automation equipment had to be made some somewhere and then moved to where the automation was happening.

By contrast, AI is global. Every office on earth can be automated at the same time (relative to historical timescales). There's no bottleneck chain where the automation has to be deployed to one locality, after being assembled in a different locality, from parts made in many different localities. The limitations are network bandwidth and available compute, both of which are shared resource pools and complements besides.

viliam on Thread for Sense-Making on Recent Murders and How to Sanely Respond

No specific link either, but if you know the usual "female brain in a male body" explanation, Ziz kinda has a more nuanced version of this, where each brain hemisphere is a separate personality, so you can have e.g. one male and one female hemisphere in a male body.

(And "if you don't believe an X person when they interpret their own lived experience, that makes you X-phobic" is a standard woke trope.)

liskantope on Thread for Sense-Making on Recent Murders and How to Sanely Respond

Thanks! That's both a more coherent explanation of the term than I've seen, and solid evidence (even mentioning studies from 2016) that it's quite independent from Zizianism. Kind of dumb that I didn't just Google it or Wikipedia it in the first place, in place of my last comment.

benwr on benwr's unpolished thoughts

I think it probably makes sense for ~everyone to have an explicit list of "things I'd like AI to do for me", especially around productivity and/or things that could help you with world-saving. If you have a list like this, and we happen to hit a relevant capability threshold before we lose, you should probably avoid wasting time on that thing as quickly as possible.

henophilia on The Capitalist Agent

Oh I think now I'm starting to get it! So essentially you're afraid that we're creating a literal God in the digital, i.e. an external being which has unlimited power over humanity? Because that's absolutely fascinating! I hadn't even connected these dots before, but it makes so much sense, because you're attributing so many potential scenarios to AI which would normally only be attributed to the Divine. Can you recommend me more resources regarding the overlap of AGI/AI alignment and theology?

mateusz-baginski on Thread for Sense-Making on Recent Murders and How to Sanely Respond

It's not unique to Zizians or some slightly broader rationalist circle.

https://en.wikipedia.org/wiki/Non-binary_gender#Bigender

milan-w on Nathan Helm-Burger's Shortform

This is why I consider it bad informational hygiene to interact with current models in any modality besides text. Why pull the plug now instead of later? To prevent frog-boiling.

liskantope on Thread for Sense-Making on Recent Murders and How to Sanely Respond

Regarding your point about being bigender, I recall that Suri Dao, as I knew them on Tumblr around 2016-2019-ish, identified as bigender, and indeed it was the first time I'd ever heard of the term or concept. I'm not sure I've heard anyone else describe themself that way since, and I never really understood the concept then as Dao tried to explain it or since then either. (The closest I can approximate it to a gender identity I do kind of understand is "genderfluid".) I don't think Dao ever mentioned the hemispheric stuff in connection to it. Is/was this a widespread gender identity among rationalists or even in the wider population that I've been ignorant of? Or is it mainly a concept found among those who subscribe to Ziz's ideas?

mateusz-baginski on Anti-Slop Interventions?

Right, so one possibility is that you are doing something that is “speeding up the development of AIS-helpful capabilities” by 1 day, but you are also simultaneously speeding up “dangerous capabilities” by 1 day, because they are the same thing.

TBC, I was thinking about something like: "speed up the development of AIS-helpful capabilities by 3 days, at the cost of speeding up the development of dangerous capabilities by 1 day".

steve2152 on Anti-Slop Interventions?

Right, so one possibility is that you are doing something that is “speeding up the development of AIS-helpful capabilities” by 1 day, but you are also simultaneously speeding up “dangerous capabilities” by 1 day, because they are the same thing.

If that’s what you’re doing, then that’s bad. You shouldn’t do it. Like, if AI alignment researchers want AI that produces less slop and is more helpful for AIS, we could all just hibernate for six months and then get back to work. But obviously, that won’t help the situation.

And a second possibility is, there are ways to make AI more helpful for AI safety that are not simultaneously directly addressing the primary bottlenecks to AI danger. And we should do those things.

The second possibility is surely true to some extent—for example, the LessWrong JargonBot [LW · GW] is marginally helpful for speeding up AI safety but infinitesimally likely to speed up AI danger.

I think this OP is kinda assuming that “anti-slop” is the second possibility and not the first possibility, without justification. Whereas I would guess the opposite.