LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Don't sleep on Coordination Takeoffs
trevor (TrevorWiesinger) · 2024-01-27T19:55:26.831Z · comments (24)

Against most, but not all, AI risk analogies
Matthew Barnett (matthew-barnett) · 2024-01-14T03:36:16.267Z · comments (40)

Bayesian updating in real life is mostly about understanding your hypotheses
Max H (Maxc) · 2024-01-01T00:10:30.978Z · comments (4)

An Actually Intuitive Explanation of the Oberth Effect
Isaac King (KingSupernova) · 2024-01-10T20:23:17.216Z · comments (33)

AI #48: Exponentials in Geometry
Zvi · 2024-01-18T14:20:07.869Z · comments (9)

Does AI risk “other” the AIs?
Joe Carlsmith (joekc) · 2024-01-09T17:51:47.020Z · comments (3)

Managing catastrophic misuse without robust AIs
ryan_greenblatt · 2024-01-16T17:27:31.112Z · comments (16)

[link] Against Nonlinear (Thing Of Things)
tailcalled · 2024-01-18T21:40:00.369Z · comments (18)

[link] Defending against hypothetical moon life during Apollo 11
eukaryote · 2024-01-07T04:49:42.628Z · comments (9)

Medical Roundup #1
Zvi · 2024-01-16T20:30:35.802Z · comments (9)

A hermeneutic net for agency
TsviBT · 2024-01-01T08:06:30.289Z · comments (4)

AI Is Not Software
Davidmanheim · 2024-01-02T07:58:04.992Z · comments (29)

On Anthropic’s Sleeper Agents Paper
Zvi · 2024-01-17T16:10:05.145Z · comments (5)

[link] Land Reclamation is in the 9th Circle of Stagnation Hell
Maxwell Tabarrok (maxwell-tabarrok) · 2024-01-12T13:36:27.159Z · comments (6)

Dating Roundup #2: If At First You Don’t Succeed
Zvi · 2024-01-02T16:00:04.955Z · comments (29)

Trading off Lives
jefftk (jkaufman) · 2024-01-03T03:40:05.603Z · comments (12)

Reflections on my first year of AI safety research
Jay Bailey · 2024-01-08T07:49:08.147Z · comments (3)

AI #45: To Be Determined
Zvi · 2024-01-04T15:00:05.936Z · comments (4)

Aligned AI is dual use technology
lc · 2024-01-27T06:50:10.435Z · comments (31)

A quick investigation of AI pro-AI bias
Fabien Roger (Fabien) · 2024-01-19T23:26:32.663Z · comments (1)

Does literacy remove your ability to be a bard as good as Homer?
Adrià Garriga-alonso (rhaps0dy) · 2024-01-18T03:43:14.994Z · comments (19)

[link] The Good Balsamic Vinegar
jenn (pixx) · 2024-01-26T19:30:57.435Z · comments (4)

[link] A model of research skill
L Rudolf L (LRudL) · 2024-01-08T00:13:12.755Z · comments (6)

[link] Chapter 1 of How to Win Friends and Influence People
gull · 2024-01-28T00:32:52.865Z · comments (5)

Saving the world sucks
Defective Altruism (Elijah Bodden) · 2024-01-10T05:55:46.504Z · comments (29)

[link] on neodymium magnets
bhauth · 2024-01-30T15:58:24.088Z · comments (6)

Safety First: safety before full alignment. The deontic sufficiency hypothesis.
Chipmonk · 2024-01-03T17:55:19.825Z · comments (3)

[link] Bayesians Commit the Gambler's Fallacy
Kevin Dorst · 2024-01-07T12:54:59.939Z · comments (28)

2023 Prediction Evaluations
Zvi · 2024-01-08T14:40:07.377Z · comments (0)

Goodbye, Shoggoth: The Stage, its Animatronics, & the Puppeteer – a New Metaphor
RogerDearnaley (roger-d-1) · 2024-01-09T20:42:28.349Z · comments (8)

D&D.Sci(-fi): Colonizing the SuperHyperSphere
abstractapplic · 2024-01-12T23:36:54.248Z · comments (23)

[link] AlphaGeometry: An Olympiad-level AI system for geometry
alyssavance · 2024-01-17T17:17:30.913Z · comments (9)

[link] Loneliness and suicide mitigation for students using GPT3-enabled chatbots (survey of Replika users in Nature)
Kaj_Sotala · 2024-01-23T14:05:40.986Z · comments (2)

AI doing philosophy = AI generating hands?
Wei Dai (Wei_Dai) · 2024-01-15T09:04:39.659Z · comments (19)

[link] A starter guide for evals
Marius Hobbhahn (marius-hobbhahn) · 2024-01-08T18:24:23.913Z · comments (2)

When Does Altruism Strengthen Altruism?
jefftk (jkaufman) · 2024-01-21T18:50:05.424Z · comments (2)

Announcing the Double Crux Bot
sanyer (santeri-koivula) · 2024-01-09T18:54:15.361Z · comments (6)

[link] The Leeroy Jenkins principle: How faulty AI could guarantee "warning shots"
titotal (lombertini) · 2024-01-14T15:03:21.087Z · comments (5)

Childhood and Education Roundup #4
Zvi · 2024-01-30T13:50:06.033Z · comments (10)

MonoPoly Restricted Trust
ymeskhout · 2024-01-02T23:02:55.066Z · comments (37)

Estimating efficiency improvements in LLM pre-training
Daan · 2024-01-19T19:32:45.124Z · comments (3)

[link] Surgery Works Well Without The FDA
Maxwell Tabarrok (maxwell-tabarrok) · 2024-01-26T13:31:29.968Z · comments (28)

[question] What rationality failure modes are there?
Ulisse Mini (ulisse-mini) · 2024-01-19T09:12:57.924Z · answers+comments (11)

AI Risk and the US Presidential Candidates
Zane · 2024-01-06T20:18:04.945Z · comments (22)

[link] Project ideas: Epistemics
Lukas Finnveden (Lanrian) · 2024-01-05T23:41:23.721Z · comments (4)

[link] Book review: Cuisine and Empire
eukaryote · 2024-01-21T06:15:12.969Z · comments (2)

Goals selected from learned knowledge: an alternative to RL alignment
Seth Herd · 2024-01-15T21:52:06.170Z · comments (17)

Apply to the PIBBSS Summer Research Fellowship
Nora_Ammann · 2024-01-12T04:06:58.328Z · comments (1)

On the Contrary, Steelmanning Is Normal; ITT-Passing Is Niche
Zack_M_Davis · 2024-01-09T23:12:20.349Z · comments (31)

Simple distribution approximation: When sampled 100 times, can language models yield 80% A and 20% B?
Teun van der Weij (teun-van-der-weij) · 2024-01-29T00:24:27.706Z · comments (5)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

shminux on On Privilege

Excellent point about the compounding, which is often multiplicative, not additive. Incidentally, multiplicative advantages result in a power law distribution of income/net worth, whereas additive advantages/disadvantages result in a normal distribution. But that is a separate topic, well explored in the literature.

shminux on On Privilege

I mostly meant your second point, just generally being kinder to others, but the other two are also well taken.

kevin-dorst on The Natural Selection of Bad Vibes (Part 1)

Agreed that people have lots of goals that don't fit in this model. It's definitely a simplified model. But I'd argue that ONE of (most) people's goals to solve problems; and I do think, broadly speaking, it is an important function (evolutionarily and currently) for conversation. So I still think this model gets at an interesting dynamic.

linch on Ilya Sutskever and Jan Leike resign from OpenAI [updated]

I'd be a bit surprised if that's the answer, if OpenAI doesn't offer any vested equity, that half-truth feels overly blatant to me.

alexander-gietelink-oldenziel on Alexander Gietelink Oldenziel's Shortform

I don't know what you mean by 'general intelligence' exactly but I suspect you mean something like human+ capability in a broad range of domains. I agree LLMs will become generally intelligent in this sense when scaled, arguably even are, for domains with sufficient data. But that's kind of the sticker right? Cave men didn't have the whole internet to learn from yet somehow did something that not even you seem to claim LLMs will be able to do: create the (date of the) Internet.

(Your last claim seems surprising. Pre-2014 games don't have close to the ELO of alphaZero. So a next-token would be trained to simulate a human player up tot 2800, not 3200+. )

localdeity on On Privilege

Also:

Knowing the importance of the advantages people have makes you better able to judge how well people are likely to do, which lets you make better decisions when e.g. investing in someone's company or deciding who to hire for an important role (or marry).
Also orients you towards figuring out the difficult-to-see advantages people must have (or must lack), given the level of success that they've achieved and their visible advantages.
If you're in a position to influence what advantages people end up with—for example, by affecting the genes your children get, or what education and training you arrange for them—then you can estimate how much each of those is worth and prioritize accordingly.

One of the things you probably notice is that having some advantages tends to make other advantages more valuable. Certainly career-wise, several of those things are like, "If you're doing badly on this dimension, then you may be unable to work at all, or be limited to far less valuable roles". For example, if one person's crippling anxiety takes them from 'law firm partner making $1 million' to 'law analyst making $200k', and another person's crippling anxiety takes them from 'bank teller making $50k' to 'unemployed', then, well, from a utilitarian perspective, fixing one person's problems is worth a lot more than the other's. That is probably already acted upon today—the former law partner is more able to pay for therapy/whatever—but it could inform people who are deciding how to allocate scarce resources to young people, such as the student versions of the potential law partner and bank teller.

(Of course, the people who originally wrote about "privilege" would probably disagree in the strongest possible terms with the conclusions of the above lines of reasoning.)

joe_collman on Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

This seems interesting, but I've seen no plausible case that there's a version of (1) that's both sufficient and achievable. I've seen Davidad mention e.g. approaches using boundaries formalization. This seems achievable, but clearly not sufficient. (boundaries don't help with e.g. [allow the mental influences that are desirable, but not those that are undesirable])

The [act sufficiently conservatively for safety, relative to some distribution of safety specifications] constraint seems likely to lead to paralysis (either of the form [AI system does nothing], or [AI system keeps the world locked into some least-harmful path], depending on the setup - and here of course "least harmful" isn't a utopia, since it's a distribution of safety specifications, not desirability specifications).
Am I mistaken about this?

I'm very pleased that people are thinking about this, but I fail to understand the optimism - hopefully I'm confused somewhere!
Is anyone working on toy examples as proof of concept?

I worry that there's so much deeply technical work here that not enough time is being spent to check that the concept is workable (is anyone focusing on this?). I'd suggest focusing on mental influences: what kind of specification would allow me to radically change my ideas, but not to be driven insane? What's the basis to think we can find such a specification?

It seems to me that finding a fit-for-purpose safety/acceptability specification won't be significantly easier than finding a specification for ambitious value alignment.

tenthkrige on Forecasting: the way I think about it

Good points well made. I'm not sure what you mean by "my expected log score is maximized" (and would like to know), but in any case it's probably your average world rather than your median world that does it?

zach-stein-perlman on Anthropic: Reflections on our Responsible Scaling Policy

Thanks.

I'm glad to see that the non-compliance reporting policy has been implemented and includes anonymous reporting. I'm still hoping to see more details. (And I'm generally confused about why Anthropic doesn't share more details on policies like this — I fail to imagine a story about how sharing details could be bad, except that the details would be seen as weak and this would make Anthropic look bad.)
What details are you imagining would be helpful for you? Sharing the PDF of the formal policy document doesn't mean much compared to whether it's actually implemented and upheld and treated as a live option that we expect staff to consider (fwiw: it is, and I don't have a non-disparage agreement). On the other hand, sharing internal docs eats a bunch of time in reviewing it before release, chance that someone seizes on a misinterpretation and leaps to conclusions, and other costs.

Not sure. I can generally imagine a company publishing what Anthropic has published but having a weak/fake system in reality. Policy details do seem less important for non-compliance reporting than some other policies — Anthropic says it has an infohazard review policy [LW(p) · GW(p)], and I expect it's good, but I'm not confident, and for other companies I wouldn't necessarily expect that their policy is good (even if they say a formal policy exists), and seeing details (with sensitive bits redacted) would help.

I mostly take back my secret policy is strong evidence of bad policy insinuation — that's ~true on my home planet, but on Earth you don't get sufficient credit for sharing good policies and there's substantial negative EV from misunderstandings and adversarial interpretations, so I guess it's often correct to not share :(

As an 80/20 of publishing, maybe you could share a policy with an external auditor who would then publish whether they think it's good or have concerns. I would feel better if that happened all the time.

marius-adrian-nicoara on Cluj-Napoca, Romania – ACX Meetups Everywhere 2022

Hi,

How did the event go?

Any plans to organize a meetup this year?

I'm planning to host a meetup in Sibiu this summer, because I haven't seen an event scheduled here. Any advice? I'm also planning to host a meetup in Cluj-Napoca this year, if it's not announced by someone else

Kind regards, Marius Nicoară