LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Some open-source dictionaries and dictionary learning infrastructure
Sam Marks (samuel-marks) · 2023-12-05T06:05:21.903Z · comments (7)

We ran an AI safety conference in Tokyo. It went really well. Come next year!
Blaine (blaine-rogers) · 2024-07-17T06:55:39.620Z · comments (1)

Quick thoughts on the implications of multi-agent views of mind on AI takeover
Kaj_Sotala · 2023-12-11T06:34:06.395Z · comments (14)

New intro textbook on AIXI
Alex_Altair · 2024-05-11T18:18:50.945Z · comments (5)

Monthly Roundup #18: May 2024
Zvi · 2024-05-13T12:30:04.863Z · comments (10)

[link] AI Rights for Human Safety
Simon Goldstein (simon-goldstein) · 2024-08-01T23:01:07.252Z · comments (6)

AI #80: Never Have I Ever
Zvi · 2024-09-10T17:50:08.074Z · comments (20)

[link] Book review: Everything Is Predictable
PeterMcCluskey · 2024-05-27T03:33:53.857Z · comments (0)

Startup Roundup #2
Zvi · 2024-08-06T13:30:06.554Z · comments (0)

We Don't Know Our Own Values, but Reward Bridges The Is-Ought Gap
johnswentworth · 2024-09-19T22:22:05.307Z · comments (47)

Principled Satisficing To Avoid Goodhart
JenniferRM · 2024-08-16T19:05:27.204Z · comments (2)

Conflating value alignment and intent alignment is causing confusion
Seth Herd · 2024-09-05T16:39:51.967Z · comments (18)

Work with me on agent foundations: independent fellowship
Alex_Altair · 2024-09-21T13:59:16.706Z · comments (5)

On Tapping Out
Screwtape · 2023-11-17T03:23:55.880Z · comments (13)

[link] Book review: Deep Utopia
PeterMcCluskey · 2024-04-23T19:55:50.417Z · comments (14)

The Gemini Incident Continues
Zvi · 2024-02-27T16:00:05.648Z · comments (6)

Things Solenoid Narrates
Solenoid_Entity · 2024-04-12T23:57:16.169Z · comments (2)

[link] Loneliness and suicide mitigation for students using GPT3-enabled chatbots (survey of Replika users in Nature)
Kaj_Sotala · 2024-01-23T14:05:40.986Z · comments (2)

A starting point for making sense of task structure (in machine learning)
Kaarel (kh) · 2024-02-24T01:51:49.227Z · comments (2)

AI #53: One More Leap
Zvi · 2024-02-29T16:10:04.049Z · comments (0)

[link] How people stopped dying from diarrhea so much (& other life-saving decisions)
Writer · 2024-03-16T16:00:47.830Z · comments (0)

[link] I'd also take $7 trillion
bhauth · 2024-02-19T03:31:45.552Z · comments (12)

[link] AlphaGeometry: An Olympiad-level AI system for geometry
alyssavance · 2024-01-17T17:17:30.913Z · comments (9)

AI #36: In the Background
Zvi · 2023-11-02T18:00:01.803Z · comments (5)

Atlantis: Berkeley event venue available for rent
Jonas V (Jonas Vollmer) · 2023-11-22T01:47:12.026Z · comments (0)

[link] Towards Evaluating AI Systems for Moral Status Using Self-Reports
Ethan Perez (ethan-perez) · 2023-11-16T20:18:51.730Z · comments (3)

Announcing Atlas Computing
miyazono · 2024-04-11T15:56:31.241Z · comments (4)

When Does Altruism Strengthen Altruism?
jefftk (jkaufman) · 2024-01-21T18:50:05.424Z · comments (2)

[link] Chinese scientists acknowledge xrisk & call for international regulatory body [Linkpost]
Akash (akash-wasil) · 2023-11-01T13:28:43.723Z · comments (4)

[link] Fluent dreaming for language models (AI interpretability method)
tbenthompson (ben-thompson) · 2024-02-06T06:02:59.296Z · comments (4)

Back to Basics: Truth is Unitary
lsusr · 2024-03-29T21:10:33.399Z · comments (13)

Truthseeking, EA, Simulacra levels, and other stuff
Elizabeth (pktechgirl) · 2023-10-27T23:56:49.198Z · comments (12)

What does davidad want from «boundaries»?
Chipmonk · 2024-02-06T17:45:42.348Z · comments (1)

An Introduction to AI Sandbagging
Teun van der Weij (teun-van-der-weij) · 2024-04-26T13:40:00.126Z · comments (8)

AI #60: Oh the Humanity
Zvi · 2024-04-18T14:10:02.281Z · comments (7)

AI #38: Let’s Make a Deal
Zvi · 2023-11-16T19:50:05.442Z · comments (2)

[link] Soft Prompts for Evaluation: Measuring Conditional Distance of Capabilities
porby · 2024-02-02T05:49:11.189Z · comments (1)

D&D.Sci Long War: Defender of Data-mocracy
aphyer · 2024-04-26T22:30:15.780Z · comments (20)

ProLU: A Nonlinearity for Sparse Autoencoders
Glen Taggart · 2024-04-23T14:09:21.592Z · comments (4)

On the Contrary, Steelmanning Is Normal; ITT-Passing Is Niche
Zack_M_Davis · 2024-01-09T23:12:20.349Z · comments (31)

Userscript to always show LW comments in context vs at the top
Vlad Sitalo (harcisis) · 2023-11-21T17:53:30.418Z · comments (8)

Apply to LASR Labs: a London-based technical AI safety research programme
Erin Robertson · 2024-04-09T17:34:06.847Z · comments (1)

[link] EPUBs of MIRI Blog Archives and selected LW Sequences
mesaoptimizer · 2023-10-26T14:17:11.538Z · comments (6)

[link] LLM Evaluators Recognize and Favor Their Own Generations
Arjun Panickssery (arjun-panickssery) · 2024-04-17T21:09:12.007Z · comments (1)

[link] Against Student Debt Cancellation From All Sides of the Political Compass
Maxwell Tabarrok (maxwell-tabarrok) · 2024-05-13T14:55:57.525Z · comments (16)

In defense of technological unemployment as the main AI concern
tailcalled · 2024-08-27T17:58:01.992Z · comments (36)

[question] "Deception Genre" What Books are like Project Lawful?
Double · 2024-08-28T17:19:52.172Z · answers+comments (20)

[link] Open Sourcing Metaculus
ChristianWilliams · 2024-07-02T22:30:01.339Z · comments (0)

Economics Roundup #3
Zvi · 2024-09-10T13:50:06.955Z · comments (9)

D&D.Sci Long War: Defender of Data-mocracy Evaluation & Ruleset
aphyer · 2024-05-14T03:35:10.586Z · comments (3)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

arjun-panickssery on Overcoming Bias Anthology

Yeah it's for the bounty. Hanson suggested that a list of links might be preferred to a printed book, at least for now, since he might want to edit the posts.

richard_kennaway on Open Thread Fall 2024

No, I just threw that in. But there is the VHEM, and apparently serious people who argue for anti-natalism.

Short of those, there are also advocates for "degrowth".

I suspect the reason that Zvi declined to engage with such arguments is that he thinks they're too batshit insane to be worth giving house room, but these are a few terms to search for.

towards_keeperhood on Overview of strong human intelligence amplification methods

I mostly expect you start getting more and more into sub-critical intelligence explosion dynamics when you exceed +6std more and more. (E.g. see second half of this other comment i wrote [LW(p) · GW(p)]) I also expect very smart people will be able to better setup computer-augmented note organizing systems or maybe code narrow aligned AIs that might help them with their tasks (in a way it's a lot more useful than current LLMs but hard to use for other people). But idk.

I'm not sure how big the difference between +6 and +6.3std actually is. I also might've confused the actual-competence vs genetical-potential scale. On the scale I used the drive/"how hard one is trying" also plays a big role.

I actually mostly expect this from seeing that intelligence is pretty heavitailed. E.g. alignment research capability seems incredibly heavitailed to me, though it might be hard to judge the differences in capability there if you're not already one of the relatively few people who are good at alignment research. Another example is how Einstein managed to find general relativity where the combined rest of the world wouldn't have been able to do it like that without more experimental evidence.
I do not know why this is the case. It is (very?) surprising to me. Einstein didn't even work on understanding and optimizing his mind. But yeah that's how I guess.

roger-dearnaley on Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight

I expect (1) and (2) to go mostly fine (though see footnote^[6]), and for (3) to completely fail. For example, suppose has a bunch of features for things like “true according to smart humans.” How will we distinguish those from features for “true”? I think there’s a good chance that this approach will reduce the problem of discriminating $C_{g}$ vs. $C_{b}$ to an equally-difficult problem of discriminating desired vs. undesired features.

However, if we found that the classifier was using a feature for "How smart is the human asking the question?" to decide what answer to give (as opposed to how to then phrase it), that would be a red flag.

gavriel-kleinwaks on If far-UV is so great, why isn't it everywhere?

Why there isn't more ground-up interest: it's expensive and people can't easily tell if it's worth the cost. Also anything where UV is touching you has to overcome people's safety concerns.

Good question on the large effects are easily measured thing--has to do with the distinction between: 1) in what environments you are cleaning the air, 2) how much you are cleaning the air there, 3) how much pathogen people inhale, and 4) how much pathogen is required to actually make people sick. It's not just a far-UV problem, it'd be a problem for any air cleaner, it's just that far-UV is especially expensive to install and especially faces negative "UV" associations.

Far-UV has a large effect on airborne pathogen concentration, and that large effect is in fact easy to measure in a chamber! But once you add it to a room where people are moving around and talking to each other, how much pathogen are they actually inhaling? Is the air in the room well-mixed? Is the far-UV reaching the infectious air before people inhale it? Even if they inhale air that has living pathogens, are they getting sick? If they get sick, did they get sick from that room or from a different environment that they were in previously? Study endpoints matter a lot.

Being able to understand intervention efficacy especially becomes a problem if a disease is largely spread via superspreaders/had high variance in infection sources. COVID, at least early in the pandemic, had very high variance, whereas eg seasonal flu doesn't usually. Therefore, if your study intervention is installed in public spaces, it's possible for it not to show much effect on seasonal flu but a large effect on a disease with early-COVID-like dynamics, which means you have to wait for that COVID-like disease to come along to see the effect--but that would be worth it; high-variance diseases are very concerning!

Another way of saying all this is that it's not the case that the effect will be super hard to measure given enough people and time, it's that the effect is hard to measure given that you need a lot of people in your study to account for the distinctions listed above, and/or you need a highly controlled environment, and that's just expensive.

seth-herd on Chris_Leong's Shortform

We know well enough what people mean by "world" - the stuff they care about. The fact that physics keeps on happening if humanity is snuffed out is no comfort at all to me or to most humans.

Arguing epistemology is not going to prevent a nuclear apocalypse or us being wiped out by the new intelligent species we are inventing. The fact that you don't know what's happening on the other side of the world has no bearing on existential dangers facing those people. That's what I mean by saving the world, and I expect what the author meant. This is a different thing than just helping people by your own values and estimates.

I very much agree that pithy mysterious statements for others to argue over is not a good use of the quick takes here.

johnswentworth on Why I’m not a Bayesian

Is the complaint that you can't do predicate calculus on the probabilities? Because I can certainly use predicate calculus all I want on the expressions within the probabilities.

And if that is the complaint, then my question is: why do we want to do predicate calculus on the probabilities? Like, what would be one concrete application in which we'd want to do that? (Self-reference and things in that cluster would be the obvious use-case, I'm mostly curious if there's any other use-case.)

seth-herd on Chris_Leong's Shortform

I have no interest in honor if it's celebrated on a field of the dead. Virtue ethics is fine, as long as it's not an excuse to not figure out what needs doing and how it's going to get done.

Doing ones own part and trusting that the other parts are done by anonymous unknown others is a very silly coordination strategy. We need plans that amount to success, not just everyone doing whatever sounds nice to them.

Edit: I very much agree that saving the world is a team sport. Perhaps it's relevant that successful teams always do some planning and coordinating.

christiankl on Why I’m not a Bayesian

the thing a logician would call a "logic" or possibly a "logic augmented with some probabilities"

The main point of the article is that once you add probabilities you can't do predicate calculus anymore. It's a mathematical operation that's not defined for the entities that you get when you do your augmentation.

seth-herd on When is reward ever the optimization target?

Interesting. There's certainly a lot going on in there, and some of it very likely is at least vague models of future word occurrences (and corresponding events). The definition of model-based gets pretty murky outside of classic RL, so it's probably best to just directly discuss what model properties give rise to what behavior, e.g. optimizing for reward.

Model-free systems can produce goal-directed behavior. The do this if they have seen some relevant behavior that achieves a given goal, and their input or some internal representation includes the current goal, and they can generalize well enough to apply what they've experienced to the current context. (This is by the neuroscience definition of habitual vs goal-directed: behavior changes to follow the current goal, usually hungry, thirsty or not).

So if they're strong enough generalizers, I think even a model-free system actually optimizes for reward.

I think the claim should be stronger: for a smart enough RL system, reward is the optimization target.