LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Notice When People Are Directionally Correct
Chris_Leong · 2024-01-14T14:12:37.090Z · comments (7)

Utility ≠ Reward
Vlad Mikulik (vlad_m) · 2019-09-05T17:28:13.222Z · comments (24)

Ukraine Situation Report 2022/03/01
lsusr · 2022-03-02T05:07:59.763Z · comments (59)

Choice Writings of Dominic Cummings
Connor_Flexman · 2021-10-13T02:41:44.291Z · comments (75)

AGI safety from first principles: Introduction
Richard_Ngo (ricraz) · 2020-09-28T19:53:22.849Z · comments (18)

Why I'm joining Anthropic
evhub · 2023-01-05T01:12:13.822Z · comments (4)

[link] Gene drives: why the wait?
Metacelsus · 2022-09-19T23:37:17.595Z · comments (50)

What Comes After Epistemic Spot Checks?
Elizabeth (pktechgirl) · 2019-10-22T17:00:00.758Z · comments (9)

Soares, Tallinn, and Yudkowsky discuss AGI cognition
So8res · 2021-11-29T19:26:33.232Z · comments (39)

An Update on Academia vs. Industry (one year into my faculty job)
David Scott Krueger (formerly: capybaralet) (capybaralet) · 2022-09-03T20:43:37.701Z · comments (18)

A proposed method for forecasting transformative AI
Matthew Barnett (matthew-barnett) · 2023-02-10T19:34:01.358Z · comments (20)

Law of No Evidence
Zvi · 2021-12-20T13:50:01.189Z · comments (19)

[link] Paper: LLMs trained on “A is B” fail to learn “B is A”
lberglund (brglnd) · 2023-09-23T19:55:53.427Z · comments (73)

My Understanding of Paul Christiano's Iterated Amplification AI Safety Research Agenda
Chi Nguyen · 2020-08-15T20:02:00.205Z · comments (20)

Reward Is Not Enough
Steven Byrnes (steve2152) · 2021-06-16T13:52:33.745Z · comments (19)

[link] The Alignment Problem: Machine Learning and Human Values
Rohin Shah (rohinmshah) · 2020-10-06T17:41:21.138Z · comments (7)

Land Ho!
Zvi · 2022-01-20T13:30:01.262Z · comments (4)

[link] Matt Levine on "Fraud is no fun without friends."
Raemon · 2021-01-19T18:23:20.614Z · comments (24)

Stampy's AI Safety Info soft launch
steven0461 · 2023-10-05T22:13:04.632Z · comments (9)

Moloch and the sandpile catastrophe
Eric Raymond (eric-raymond) · 2022-04-02T15:35:12.552Z · comments (25)

Compendium of problems with RLHF
Charbel-Raphaël (charbel-raphael-segerie) · 2023-01-29T11:40:53.147Z · comments (16)

Convincing All Capability Researchers
Logan Riggs (elriggs) · 2022-04-08T17:40:25.488Z · comments (70)

Omicron Variant Post #2
Zvi · 2021-11-29T16:30:01.368Z · comments (34)

[link] DontDoxScottAlexander.com - A Petition
Ben Pace (Benito) · 2020-06-25T05:44:50.050Z · comments (32)

Conversation with Eliezer: What do you want the system to do?
Akash (akash-wasil) · 2022-06-25T17:36:14.145Z · comments (38)

Quintin's alignment papers roundup - week 1
Quintin Pope (quintin-pope) · 2022-09-10T06:39:01.773Z · comments (6)

Taking the parameters which seem to matter and rotating them until they don't
Garrett Baker (D0TheMath) · 2022-08-26T18:26:47.667Z · comments (48)

Perpetual Dickensian Poverty?
jefftk (jkaufman) · 2021-12-21T13:30:03.543Z · comments (18)

Solving the Mechanistic Interpretability challenges: EIS VII Challenge 1
StefanHex (Stefan42) · 2023-05-09T19:41:10.528Z · comments (1)

The Territory
LoganStrohl (BrienneYudkowsky) · 2022-02-15T18:56:36.992Z · comments (12)

What good is G-factor if you're dumped in the woods? A field report from a camp counselor.
Hastings (hastings-greer) · 2024-01-12T13:17:23.829Z · comments (22)

A Significant Portion of COVID-19 Transmission Is Presymptomatic
jimrandomh · 2020-03-14T05:52:33.734Z · comments (22)

Christiano, Cotra, and Yudkowsky on AI progress
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2021-11-25T16:45:32.482Z · comments (95)

Unwitting cult leaders
Kaj_Sotala · 2021-02-11T11:10:04.504Z · comments (9)

[link] Introducing the Center for AI Policy (& we're hiring!)
Thomas Larsen (thomas-larsen) · 2023-08-28T21:17:11.703Z · comments (50)

Some background for reasoning about dual-use alignment research
Charlie Steiner · 2023-05-18T14:50:54.401Z · comments (19)

RTFB: On the New Proposed CAIP AI Bill
Zvi · 2024-04-10T18:30:08.410Z · comments (13)

Harms and possibilities of schooling
TsviBT · 2022-02-22T07:48:09.542Z · comments (38)

Late 2021 MIRI Conversations: AMA / Discussion
Rob Bensinger (RobbBB) · 2022-02-28T20:03:05.318Z · comments (199)

Delta Strain: Fact Dump and Some Policy Takeaways
Connor_Flexman · 2021-07-28T03:38:34.455Z · comments (60)

[link] Steering Llama-2 with contrastive activation additions
Nina Rimsky (NinaR) · 2024-01-02T00:47:04.621Z · comments (29)

One-layer transformers aren’t equivalent to a set of skip-trigrams
Buck · 2023-02-17T17:26:13.819Z · comments (10)

How to Bounded Distrust
Zvi · 2023-01-09T13:10:00.942Z · comments (15)

GPT-175bee
Adam Scherlis (adam-scherlis) · 2023-02-08T18:58:01.364Z · comments (13)

FHI paper published in Science: interventions against COVID-19
[deleted] · 2020-12-16T21:19:00.441Z · comments (0)

Theses on Sleep
guzey · 2022-02-11T12:58:15.300Z · comments (104)

Narrative Syncing
AnnaSalamon · 2022-05-01T01:48:45.889Z · comments (48)

Why was the AI Alignment community so unprepared for this moment?
Ras1513 · 2023-07-15T00:26:29.769Z · comments (64)

Problem relaxation as a tactic
TurnTrout · 2020-04-22T23:44:42.398Z · comments (8)

Future ML Systems Will Be Qualitatively Different
jsteinhardt · 2022-01-11T19:50:11.377Z · comments (10)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

johannes-c-mayer on Johannes C. Mayer's Shortform

Today I learned that being successful can involve feelings of hopelessness.

When you are trying to solve a hard problem, where you have no idea if you can solve it, let alone if it is even solvable at all, your brain makes you feel bad. It makes you feel like giving up.

This is quite strange because most of the time when I am in such a situation and manage to make a real efford anyway I seem to always suprise myself with how much progress I manage to make. Empirically this feeling of hopelessness does not seem to track the actual likelyhood that you will completely fail.

zach-stein-perlman on Express interest in an "FHI of the West"

Constellation (which I think has some important FHI-like virtues, although makes different tradeoffs and misses on others)

What is Constellation missing or what should it do? (Especially if you haven't already told the Constellation team this.)

alamerton on A Review of In-Context Learning Hypotheses for Automated AI Alignment Research

I think I mean to say this would imply ICL could not be a new form of learning. And yes, it seems more likely that there could be at least some new knowledge getting generated, one way or another. BI implying all tasks have been previously seen feels extreme, and less likely. I've adjusted my wording a bit now.

dagon on If digital goods in virtual worlds increase GDP, do we actually become richer?

Yes! No! What does "richer" actually mean to you? For that matter, what does "we" mean to you (since the existing set of humans is changing hour to hour as people are born, come of age, and die, and even in a given set there's an extremely wide variance in what they have and in what's considered rich).

To the extent that GDP is your measure of a nation's richness, then it's tautological that increasing GDP makes the nation richer. The weaker argument that it (often) correlates (not necessarily causes) with well-being (in some averages and aggregates) is more defensible, but makes it unsuitable for answering your question.

I think my intuition is that GDP is the wrong tool for measuring how "rich" or "overall satisfied" people are, and simple sum or average is probably the wrong aggregation function. So I fall back on more personal and individual measures of "well-being". This, for most people I know, and as far as I can tell, the majority of neurotypical people, is about lack of worry for near- and medium-term future, access to pleasurable experiences, and social acceptance among accessible sub-groups (family, friends, neighbors, online communities small enough to care about, etc.).

For that kind of "general current human wants", a usable and cheap shared-but-excludable VR space seems to improve things for a lot of people, regardless of what happens to GDP. In fact, if consumption of difficult-to-manufacture-and-deliver luxuries gets partially replaced by consumption of patterns of bits, that likely reduces GDP while increasing satisfaction.

There will always be needs for non-virtual goods and experiences - it's not currently possible to virtualize food's nutrition OR pleasure, and this is true for many things. Which means a mixed economy for a long long time. I don't think anyone can tell you whether this makes those things cheaper or more expensive, relative to an hour spent working online or in the real world.

cousin_it on Express interest in an "FHI of the West"

Sent the form.

What do you think about combining teaching and research? Similar to the Humboldt idea of the university, but it wouldn't have to be as official or large-scale.

When I was studying math in Moscow long ago, I was attending MSU by day, and in the evenings sometimes went to the "Independent University", which wasn't really a university. Just a volunteer-run and donation-funded place with some known mathematicians teaching free classes on advanced topics for anyone willing to attend. I think they liked having students to talk about their work. Then much later, when we ran the AI Alignment Prize here on LW, I also noticed that the prize by itself wasn't too important; the interactions between newcomers and old-timers were a big part of what drove the thing.

So maybe if you're starting an organization now, it could be worth thinking about this kind of generational mixing, research/teaching/seminars/whatnot. Though there isn't much of a set curriculum on AI alignment now, and teaching AI capability is maybe not the best idea :-)

benito on LessOnline Festival Updates Thread

I think on-site housing is pretty scarce, though we're going to make more high-density rooms in response to demand for that. Tickets aren't scarce, our venue could fit like a 700 person event, so I don't expect to hit the limits.

thornoar on Mid-conditional love

In my mind, conditional love always had to do with acceptance. If you love someone unconditionally, you love them for who they are, you admire their existing qualities. By contrast, loving someone conditionally means that you will love them on condition that they acquire some additional qualities. This is why it is considered to be toxic --- conditional love is not really about the person being 'loved', rather about an image of what that person could become.

We can quantify this concept in quite a neat way. Say that for any kind of love, there is a certain image of the loved person (Bob) in the head of the loving person (Alice), that represents the best, most lovable version of Bob. We will call this image BOB. Now, some qualities (or 'conditions') of BOB (say, N of them) may already be present in Bob (say, M out of N). Let's define the conditionality of Alice's love for Bob by the ratio (N-M)/N. That way, if this ratio is 0, that is, M = N, then all of BOB's qualities are already in Bob, i.e. Alice loves him for who he is, unconditionally. If, on the other hand, this ratio is 1, that is, M = 0, then Bob is simply out of the equation --- he doesn't even intersect with his image. Quite amusingly, in this model, it is the perfectly conditional love that would make no distinction between people and worms, because the object really doesn't matter. If a worm could smile and talk and walk like BOB, Alice would readily love this worm (with a conditionality of 0, by the way) like she never loved Bob.

We can see that my definition is actually roughly equivalent to yours. If Alice pulls Bob closer after a bad speech, it means that Alice is fine with who Bob is now, i.e. Bob's qualities somewhat align with BOB's.

Much like your notion of unconditional love, a conditionality of 0 is practically impossible. Bob needs to be a complete saint to perfectly align with BOB (or Alice needs to have really low standards). There will always be something we will want to change about our partners --- bad habits, speech patterns, their attitude, etc. But a low conditionality shows that we are already with the right person, while a high conditionality indicates that we are trying to turn them into something they are not.

tailcalled on Blessed information, garbage information, cursed information

I'd say "in many contexts" in practice refers to when you are already working with relatively blessed information. It's just that while most domains are overwhelmingly filled with garbage information (e.g. if you put up a camera at a random position on the earth, what it records will be ~useless), the fact that they are so filled with garbage means that we don't naturally think of them as being "real domains".

Basically, I don't mean that blessed information is some obscure thing that you wouldn't expect to encounter, I mean that people try to work with as much blessed information as possible. Logs were sort of a special case of being unusually-garbage.

You can't distill information in advance of a bug (or anomaly, or attack) because a bug by definition is going to be breaking all of the past behavior & invariants governing normal behavior that any distillation was based on.

Depends. If the system is very buggy, there's gonna be lots of bugs to distill from. Which bring us to the second part...

The logs are for the exceptions - which are precisely what any non-end-to-end lossy compression (factor analysis or otherwise) will correctly throw out information about to compress as residuals to ignore in favor of the 'signal'.

Even if lossy compression threw out the exceptions we were interested in as being noise, that would actually still be useful as a form of outlier detection. One could just zoom in on the biggest residuals and check what was going on there.

Issue is, the logs end up containing ~all the exceptions, including exceptional user behavior and exceptional user setups and exceptionally error-throwing non-buggy code, but the logs are only useful for bugs/attacks/etc. because the former behaviors are fine and should be supported.

trevorone on [Linkpost] Practically-A-Book Review: Rootclaim $100,000 Lab Leak Debate

I've been tracking the Rootclaim debate from the sidelines and finding it quite an interesting example of high-profile rationality.

Would you prefer the term "high-performance rationality" over "high-profile rationality"?

thornoar on Open Thread Spring 2024

Hello everyone! My name is Roman Maksimovich, I am an immigrant from Russia, currently finishing high school in Serbia. My primary specialization is mathematics, and back in middle school I have had enough education in abstract mathematics (from calculus to category theory and topology) to call myself a mathematician.

My other strong interests include computer science and programming (specifically functional programming, theoretical CS, AI, and systems programming s.a. Linux) as well as languages (specifically Asian languages like Japanese).

I ended up here after reading HP:MOR, which I consider to be an all-time masterpiece. The Sequences are very good too, although not that gripping. Rationality is a very important principle in my life, and so far I found the forum to be very well-organized and the posts to be very informative and well-written, so I will definitely stick around and try to engage in the forum to the best of my ability.

I thought I might do a bit of self-advertising as well. Here's my GitHub: https://github.com/thornoar

If any of you use this very niche mathematical graphics tool called Asymptote, you might be interested to know that I have been developing a cool 6000-line Asymptote library called 'smoothmanifold', which is sort of like a JavaScript framework (an analogy that I do not like) but for drawing abstract mathematical diagrams with Asymptote, whose main problem is the lack of abstraction. In plain Asymptote, you usually have to specify all the coordinates manually and draw objects line by line. In my library I make it so that the code resembles the logical structure of the picture more. You can draw a set as a blob on the plane, and then draw arrows that connect different sets, which would be a nightmare to do manually. And this is only the beginning -- there is a lot more features. If any of this is what interests you, feel free to read the README.md.

I have also written some mathematical papers, the most recent one paired with a software program for strong password creation. If you are interested in cryptography and cybersecurity, and would to create strong passwords using a hashing algorithm, you can take a look at 'password-hash', which contains both the algorithm source and binaries, as well as the paper/documentation.