LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Work with me on agent foundations: independent fellowship
Alex_Altair · 2024-09-21T13:59:16.706Z · comments (3)

We Don't Know Our Own Values, but Reward Bridges The Is-Ought Gap
johnswentworth · 2024-09-19T22:22:05.307Z · comments (35)

Which LessWrong/Alignment topics would you like to be tutored in? [Poll]
Ruby · 2024-09-19T01:35:02.999Z · comments (11)

How difficult is AI Alignment?
Sammy Martin (SDM) · 2024-09-13T15:47:10.799Z · comments (6)

Formalizing the Informal (event invite)
abramdemski · 2024-09-10T19:22:53.564Z · comments (0)

[link] Rowing vs steering
Saul Munn (saul-munn) · 2024-08-10T07:00:17.594Z · comments (2)

Principled Satisficing To Avoid Goodhart
JenniferRM · 2024-08-16T19:05:27.204Z · comments (2)

Unit economics of LLM APIs
dschwarz · 2024-08-27T16:51:22.692Z · comments (0)

Apply to the Constellation Visiting Researcher Program and Astra Fellowship, in Berkeley this Winter
Nate Thomas (nate-thomas) · 2023-10-26T03:07:34.118Z · comments (10)

Concrete empirical research projects in mechanistic anomaly detection
Erik Jenner (ejenner) · 2024-04-03T23:07:21.502Z · comments (0)

Koan: divining alien datastructures from RAM activations
TsviBT · 2024-04-05T18:04:57.280Z · comments (10)

[link] We Need Major, But Not Radical, FDA Reform
Maxwell Tabarrok (maxwell-tabarrok) · 2024-02-24T16:54:33.061Z · comments (12)

Evidential Cooperation in Large Worlds: Potential Objections & FAQ
Chi Nguyen · 2024-02-28T18:58:25.688Z · comments (5)

Wholesomeness and Effective Altruism
owencb · 2024-02-28T20:28:22.175Z · comments (3)

Taking responsibility and partial derivatives
Ruby · 2023-12-31T04:33:51.419Z · comments (1)

D&D.Sci Long War: Defender of Data-mocracy Evaluation & Ruleset
aphyer · 2024-05-14T03:35:10.586Z · comments (3)

Are humans misaligned with evolution?
TekhneMakre · 2023-10-19T03:14:14.759Z · comments (13)

[link] AI Girlfriends Won't Matter Much
Maxwell Tabarrok (maxwell-tabarrok) · 2023-12-23T15:58:30.308Z · comments (22)

Housing Roundup #7
Zvi · 2024-03-04T15:00:08.192Z · comments (1)

Was Releasing Claude-3 Net-Negative?
Logan Riggs (elriggs) · 2024-03-27T17:41:56.245Z · comments (5)

Monthly Roundup #11: October 2023
Zvi · 2023-10-03T14:10:01.686Z · comments (12)

Take SCIFs, it’s dangerous to go alone
latterframe · 2024-05-01T08:02:38.067Z · comments (1)

[link] cold aluminum for medicine
bhauth · 2023-12-16T14:38:03.260Z · comments (4)

Navigating emotions in an uncertain & confusing world
Akash (akash-wasil) · 2023-11-20T18:16:09.492Z · comments (1)

Sparse Coding, for Mechanistic Interpretability and Activation Engineering
David Udell · 2023-09-23T19:16:31.772Z · comments (7)

How I internalized my achievements to better deal with negative feelings
Raymond Koopmanschap · 2024-02-27T15:10:24.149Z · comments (7)

Estimating efficiency improvements in LLM pre-training
Daan · 2024-01-19T19:32:45.124Z · comments (3)

[question] What rationality failure modes are there?
Ulisse Mini (ulisse-mini) · 2024-01-19T09:12:57.924Z · answers+comments (11)

How toy models of ontology changes can be misleading
Stuart_Armstrong · 2023-10-21T21:13:56.384Z · comments (0)

Notes on Dwarkesh Patel’s Podcast with Sholto Douglas and Trenton Bricken
Zvi · 2024-04-01T19:10:12.193Z · comments (1)

Upgrading the AI Safety Community
trevor (TrevorWiesinger) · 2023-12-16T15:34:26.600Z · comments (9)

The Perils of Professionalism
Screwtape · 2023-11-07T00:07:33.213Z · comments (1)

In memory of Louise Glück
Joe Carlsmith (joekc) · 2023-10-15T02:59:42.687Z · comments (1)

Notes on control evaluations for safety cases
ryan_greenblatt · 2024-02-28T16:15:17.799Z · comments (0)

Concrete positive visions for a future without AGI
Max H (Maxc) · 2023-11-08T03:12:42.590Z · comments (28)

[link] energy landscapes of experts
bhauth · 2023-10-02T14:08:32.370Z · comments (2)

Pivotal Acts might Not be what You Think they are
Johannes C. Mayer (johannes-c-mayer) · 2023-11-05T17:23:50.464Z · comments (13)

Matrix completion prize results
paulfchristiano · 2023-12-20T15:40:04.281Z · comments (0)

Estimating effective dimensionality of MNIST models
Arjun Panickssery (arjun-panickssery) · 2023-11-02T14:13:09.012Z · comments (3)

GPT-4o My and Google I/O Day
Zvi · 2024-05-16T17:50:03.040Z · comments (2)

[Aspiration-based designs] 1. Informal introduction
B Jacobs (Bob Jacobs) · 2024-04-28T13:00:43.268Z · comments (4)

Book Review: 1948 by Benny Morris
Yair Halberstadt (yair-halberstadt) · 2023-12-03T10:29:16.696Z · comments (9)

How to partition teams to move fast? Debating "low-dimensional cuts"
jacobjacob · 2023-10-13T21:43:53.067Z · comments (2)

The Pointer Resolution Problem
Jozdien · 2024-02-16T21:25:57.374Z · comments (20)

[question] What did you change your mind about in the last year?
mike_hawke · 2023-11-23T20:53:45.664Z · answers+comments (16)

Struggling like a Shadowmoth
Raemon · 2024-09-24T00:47:05.030Z · comments (3)

[link] Things I learned talking to the new breed of scientific institution
Abhishaike Mahajan (abhishaike-mahajan) · 2024-08-29T14:00:14.844Z · comments (6)

[question] If I wanted to spend WAY more on AI, what would I spend it on?
Logan Zoellner (logan-zoellner) · 2024-09-15T21:24:46.742Z · answers+comments (13)

[link] [Paper] Programming Refusal with Conditional Activation Steering
Bruce W. Lee (bruce-lee) · 2024-09-11T20:57:08.714Z · comments (0)

[link] Podcast with Yoshua Bengio on Why AI Labs are “Playing Dice with Humanity’s Future”
garrison · 2024-05-10T17:23:20.436Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

elityre on If I wanted to spend WAY more on AI, what would I spend it on?

Thank you for writing this up! I've purchased some subscriptions and plan to block out time to play around with some of these services and get familiar with them.

williamkiely on ASIs will not leave just a little sunlight for Earth

criticism in LW comments is why he stopped writing Sequences posts

I wasn't aware of this and would like more information. Can anyone provide a source, or report their agreement or disagreement with the claim?

zach-stein-perlman on Model evals for dangerous capabilities

No. But I’m skeptical: seems hard to imagine provable safety, much less competitive with the default path to powerful AI, much less how post-hoc evals are relevant.

rajathsalegame on Model evals for dangerous capabilities

Out of curiosity, do you have any thoughts on the importance / feasibility of formal verification / mathematically "provable" safety based approaches in these evals you mention?

alex_altair on Search 5000 books, speed up your research and personal growth

Note to readers; it is an obligatory warning on any post like this that you should not run random scripts downloaded from the internet without reading them to see what they do, because there are many harmful things they could be doing.

elityre on I'm creating a deep dive podcast episode about the original Leverage Research - would you like to take part?

I'm curious what Geoff's relationship to this project is. Does he know about it? Is he contributing? Is he in favor of this? Opposed?

rajathsalegame on Why I'm bearish on mechanistic interpretability: the shards are not in the network

I would argue that the AI equivalent of these tiny organisms are "features," which are just beginning to be defined in a structured, mathematical way.

recurrented on Struggling like a Shadowmoth

Gosh that was really beautifully written

faul_sname on What are the best arguments for/against AIs being "slightly 'nice'"?

[habryka] The way humans think about the question of "preferences for weak agents" and "kindness" feels like the kind of thing that will come apart under extreme optimization, in a similar way to how I expect the idea of "having a continuous stream of consciousness with a good past and good future is important" to come apart as humans can make copies of themselves and change their memories, and instantiate slightly changed versions of themselves, etc.

I think there will be options that are good under most of the things that "preferences for weak agents" would likely come apart into under close examination. If you're trying to fulfill the preferences of fish, you might argue about whether the exact thing you should care about is maximizing their hedonic state vs ensuring that they exist in an ecological environment which resembles their niche vs minimizing "boundary-crossing actions"... but you can probably find an action that is better than "kill the fish" by all of those possible metrics.

I think that some people have an intuition that any future agent must pick exactly one utility function over the physical configuration of matter in the universe, and that any agent that has a deontological constraint like "don't do any actions which are 0.00001% better under my current interpretation of my utility function but which are horrifyingly bad to every other agent " will be outcompeted in the long term. I personally don't see it, and particularly I don't see how there's an available slot for an arbitrary outcome-based utility function that is not "reproduce yourself at all costs" but there isn't an available slot for process-based preferences like "and don't be an asshole for miniscule gains while doing that".

nicholas-weininger on How to choose what to work on

Nice post. It prompts two questions, which you may or may not be the right person to answer:

How do you find good obsessions? Is it "just" a matter of being curious and widely-read? What is the combination of life practice and psychological orientation that leads a person to become obsessed with one or more ideas in the way that you became obsessed with progress studies and with Fieldbook?
On your path to world-class status, how do you avoid the "middle-competence trap" (analogy to the middle-income trap)? How do you handle having something you love that you've gotten damn good at, better than most people will ever get, but can't seem to break through to the level of the achievers who really make their mark on the field? Maybe this is more of an issue for me than for others-- maybe for example it is "just" a matter of being willing to burrow deep into something to the exclusion of your other interests in life, and I'm too much of a generalist to do that-- but it's been a problem for me twice now, and I really wonder if it might be a common failure mode of this kind of questing process.