LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Talking With People Who Speak to Congressional Staffers about AI risk
Eneasz · 2023-12-14T17:55:50.606Z · comments (0)

Preface to the Sequence on LLM Psychology
Quentin FEUILLADE--MONTIXI (quentin-feuillade-montixi) · 2023-11-07T16:12:07.742Z · comments (0)

In Defense of Lawyers Playing Their Part
Isaac King (KingSupernova) · 2024-07-01T01:32:58.695Z · comments (9)

Comparing Quantized Performance in Llama Models
NickyP (Nicky) · 2024-07-15T16:01:24.960Z · comments (2)

A quick experiment on LMs’ inductive biases in performing search
Alex Mallen (alex-mallen) · 2024-04-14T03:41:08.671Z · comments (2)

0. The Value Change Problem: introduction, overview and motivations
Nora_Ammann · 2023-10-26T14:36:15.466Z · comments (0)

How I build and run behavioral interviews
benkuhn · 2024-02-26T05:50:05.328Z · comments (6)

[question] How unusual is the fact that there is no AI monopoly?
Viliam · 2024-08-16T20:21:51.012Z · answers+comments (15)

Being good at the basics
dominicq · 2023-11-04T14:18:50.976Z · comments (1)

[link] Self-Resolving Prediction Markets
PeterMcCluskey · 2024-03-03T02:39:42.212Z · comments (0)

[link] Concrete benefits of making predictions
Jonny Spicer (jonnyspicer) · 2024-10-17T14:23:17.613Z · comments (5)

An argument that consequentialism is incomplete
cousin_it · 2024-10-07T09:45:12.754Z · comments (27)

[link] NAO Updates, Fall 2024
jefftk (jkaufman) · 2024-10-18T00:00:04.142Z · comments (2)

DunCon @Lighthaven
Duncan Sabien (Deactivated) (Duncan_Sabien) · 2024-09-29T04:56:27.205Z · comments (0)

Intent alignment as a stepping-stone to value alignment
Seth Herd · 2024-11-05T20:43:24.950Z · comments (4)

Monthly Roundup #13: December 2023
Zvi · 2023-12-19T15:10:08.293Z · comments (5)

Housing Roundup #10
Zvi · 2024-10-29T13:50:09.416Z · comments (2)

[link] OpenAI, DeepMind, Anthropic, etc. should shut down.
Tamsin Leake (carado-1) · 2023-12-17T20:01:22.332Z · comments (48)

Learning Math in Time for Alignment
Nicholas / Heather Kross (NicholasKross) · 2024-01-09T01:02:37.446Z · comments (3)

Investigating the Ability of LLMs to Recognize Their Own Writing
Christopher Ackerman (christopher-ackerman) · 2024-07-30T15:41:44.017Z · comments (0)

Being against involuntary death and being open to change are compatible
Andy_McKenzie · 2024-05-27T06:37:27.644Z · comments (5)

If you are also the worst at politics
lukehmiles (lcmgcd) · 2024-05-26T20:07:49.201Z · comments (8)

Is suffering like shit?
KatjaGrace · 2024-05-31T01:20:03.855Z · comments (5)

A path to human autonomy
Nathan Helm-Burger (nathan-helm-burger) · 2024-10-29T03:02:42.475Z · comments (12)

Video and transcript of presentation on Scheming AIs
Joe Carlsmith (joekc) · 2024-03-22T15:52:03.311Z · comments (1)

[link] A computational complexity argument for many worlds
jessicata (jessica.liu.taylor) · 2024-08-13T19:35:10.116Z · comments (15)

[link] End Single Family Zoning by Overturning Euclid V Ambler
Maxwell Tabarrok (maxwell-tabarrok) · 2024-07-26T14:08:45.046Z · comments (1)

RLHF is the worst possible thing done when facing the alignment problem
tailcalled · 2024-09-19T18:56:27.676Z · comments (10)

Context-dependent consequentialism
Jeremy Gillen (jeremy-gillen) · 2024-11-04T09:29:24.310Z · comments (6)

[link] A Narrative History of Environmentalism's Partisanship
Jeffrey Heninger (jeffrey-heninger) · 2024-05-14T16:51:01.029Z · comments (3)

[question] Feedback request: what am I missing?
Nathan Helm-Burger (nathan-helm-burger) · 2024-11-02T17:38:39.625Z · answers+comments (5)

[question] When is reward ever the optimization target?
Noosphere89 (sharmake-farah) · 2024-10-15T15:09:20.912Z · answers+comments (12)

Incentive design and capability elicitation
Joe Carlsmith (joekc) · 2024-11-12T20:56:05.088Z · comments (0)

Bay Winter Solstice 2024: Speech Auditions
ozymandias · 2024-11-04T22:31:38.680Z · comments (0)

[link] What is it like to be psychologically healthy? Podcast ft. DaystarEld
Chipmonk · 2024-10-05T19:14:04.743Z · comments (8)

Attention Output SAEs Improve Circuit Analysis
Connor Kissane (ckkissane) · 2024-06-21T12:56:07.969Z · comments (0)

Meme Talking Points
ymeskhout · 2024-11-06T15:27:54.024Z · comments (0)

AI labs can boost external safety research
Zach Stein-Perlman · 2024-07-31T19:30:16.207Z · comments (1)

[link] [Linkpost] Statement from Scarlett Johansson on OpenAI's use of the "Sky" voice, that was shockingly similar to her own voice.
Linch · 2024-05-20T23:50:28.138Z · comments (8)

AI's impact on biology research: Part I, today
octopocta · 2023-12-23T16:29:18.056Z · comments (6)

[LDSL#1] Performance optimization as a metaphor for life
tailcalled · 2024-08-08T16:16:27.349Z · comments (4)

[LDSL#6] When is quantification needed, and when is it hard?
tailcalled · 2024-08-13T20:39:45.481Z · comments (0)

Music in the AI World
Martin Sustrik (sustrik) · 2024-08-16T04:20:01.706Z · comments (8)

Resolving von Neumann-Morgenstern Inconsistent Preferences
niplav · 2024-10-22T11:45:20.915Z · comments (5)

[link] Fifty Flips
abstractapplic · 2023-10-01T15:30:43.268Z · comments (14)

On Not Requiring Vaccination
jefftk (jkaufman) · 2024-02-01T19:20:12.657Z · comments (21)

[link] Anthropic, Google, Microsoft & OpenAI announce Executive Director of the Frontier Model Forum & over $10 million for a new AI Safety Fund
Zach Stein-Perlman · 2023-10-25T15:20:52.765Z · comments (8)

Mentorship in AGI Safety (MAGIS) call for mentors
Valentin2026 (Just Learning) · 2024-05-23T18:28:03.173Z · comments (3)

[link] Lying is Cowardice, not Strategy
Connor Leahy (NPCollapse) · 2023-10-24T13:24:25.450Z · comments (73)

Good Bings copy, great Bings steal
dr_s · 2024-04-21T09:52:46.658Z · comments (6)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

williamkiely on Seven lessons I didn't learn from election day

Knowing how the election turned out, how likely do you think it was a week before the election that Trump would win?

Do you think Polymarket had Trump-wins priced too high or too low?

williamkiely on Seven lessons I didn't learn from election day

Foreign-born Americans shifted toward Trump

Are you sure? Couldn't it be that counties with a higher percentage of foreign-born Americans shifted toward Trump because of how the non-foreign-born voters in those counties voted rather than how the foreign-born voters voted?

yams on yams's Shortform

I don't think I really understood what it meant for establishment politics to be divisive until this past election.

As good as it feels to sit on the left and say "they want you to hate immigrants" or "they want you to hate queer people", it seems similarly (although probably not equally?) true that the center left also has people they want you to hate (the religious, the rich, the slightly-more-successful-than-you, the ideologically-impure-who-once-said-a-bad-thing-on-the-internet).

But there's also a deeper, structural sense in which it's true.

Working on AIS, I've long hoped that we could form a coalition with all of the other people worried about AI, because a good deal of them just.. share (some version of) our concerns, and our most ambitious policy solutions (e.g. stopping development, mandating more robust interpretability and evals) could also solve a bunch of problems highlighted by the FATE community, the automation-concerned, etc etc.

Their positions also have the benefit of conforming to widely-held anxieties ('I am worried AI will just be another tool of empire', 'I am worried I will lose my job for banal normie reasons that have nothing to do with civilizational robustness', 'I am worried AI's will cheaply replace human labor and do a worse job, enshittifying everything in the developed world'). We could generally curry popular support and favor, without being dishonest, by looking at the Venn diagram of things we want and things they want (which would also help keep AI policy from sliding into partisanship, if such a thing is still possible, given the largely right-leaning associations of the AIS community*).

For the next four years, at the very least, I am forced to lay this hope aside. That the EO contained language in service of the FATE community was, in hindsight, very bad, and probably foreseeably so, given that even moderate Republicans like to score easy points on culture war bullshit. Probably it will be revoked, because language about bias made it an easy thing for Vance to call "far left".

"This is ok because it will just be replaced."

Given the current state of the game board, I don't want to be losing any turns. We've already lost too many turns; setbacks are unacceptable.

"What if it gets replaced by something better?"

I envy your optimism. I'm also concerned about the same dynamic playing out in reverse; what if the new EO (or piece of legislation via whatever mechanism), like the old EO, contains some language that is (to us) beside the point, but nonetheless signals partisanship, and is retributively revoked or repealed by the next administration? This is why you don't want AIS to be partisan; partisanship is dialectics without teleology.

Ok, so structurally divisive: establishment politics has made it ~impossible to form meaningful coalitions around issues other than absolute lightning rods (e.g. abortion, immigration; the 'levers' available to partisan hacks looking to gin up donations). It's not just that they make you hate your neighbors, it's that they make you behave as though you hate your neighbors, lest your policy proposals get painted with the broad red brush and summarily dismissed.

I think this is the kind of observation that leads many experienced people interested in AIS to work on things outside of AIS, but with an eye toward implications for AI (e.g. Critch, A Ray). You just have these lucid flashes of how stacked the deck really is, and set about digging the channel that is, compared to the existing channels, marginally more robust to reactionary dynamics ('aligning the current of history with your aims' is maybe a good image).

Hopefully undemocratic regulatory processes serve their function as a backdoor for the sensible, but it's unclear how penetrating the partisanship will be over the next four years (and, of course, those at the top are promising that it will be Very Penetrating).

*I am somewhat ambivalent about how right-leaning AIS really is. Right-leaning compared to middle class Americans living in major metros? Probably. Tolerant of people with pretty far-right views? Sure, to a point. Right of the American center as defined in electoral politics (e.g. 'Republican-voting')? Usually not.

philgoetz on Why Bayesians should two-box in a one-shot

All right, yes. But that isn't how anyone has ever interpreted Newcomb's Problem. AFAIK is literally always used to support some kind of acausal decision theory, which it does /not/ if what is in fact happening is that Omega is cheating.

philgoetz on Why Bayesians should two-box in a one-shot

But if the premise is impossible, then the experiment has no consequences in the real world, and we shouldn't consider its results in our decision theory, which is about consequences in the real world.

joe-rogero on Flipping Out: The Cosmic Coinflip Thought Experiment Is Bad Philosophy

Heard of it, but this particular application is new. There's a difference, though, between "this formula can be a useful strategy to get more value" and "this formula accurately reflects my true reflectively endorsed value function."

philgoetz on Why Bayesians should two-box in a one-shot

That equation you quoted is in branch 2, "2. Omega is a "nearly perfect" predictor. You assign P(general) a value very, very close to 1." So it IS correct, by stipulation.

dennis-zoeller on LLMs Look Increasingly Like General Reasoners

Hey, thanks for taking the time to answer!

First, I want to make clear that I don’t believe LLMs to be just stochastic parrots, nor do I doubt that they are capable of world modeling. And you are right to request some more specifically stated beliefs and predictions. In this comment, I attempted to improve on this, with limited success.

There are two main pillars in my world model that make me, even in light of the massive gains in capabilities we have seen in the last seven year, still skeptical of transformer architecture scaling straight to AGI.

Compute overhangs and algorithmic overhangs are regularly talked about. My belief is that a data overhang played a significant role in the success of transformer architecture.
Humans are eager to find meaning and tend to project their own thoughts onto external sources. We even go so far as to attribute consciousness and intelligence to inanimate objects, as seen in animistic traditions. In the case of LLMs this behaviour could lead to an overly optimistic extrapolation of capabilities from toy problems.

On the first point:
My model of the world circa 2017 looks like this. There's a massive data overhang, which in a certain sense took humanity all of history to create. A special kind of data, refined over many human generations of "thinking work", crystalized intelligence. But also with distinct blind spots. Some things are hard to capture with the available media, others we just didn't much care to document.

Then transformer architecture comes around, is uniquely suited to extract the insights embedded in this data. Maybe better than the brains that created it in the first place. At the very least it scales in a way that brains can't. More compute makes more of this data overhang accessible, leading to massive capability gains from model to model.

But in 2024 the overhang has been all but consumed. Humans continue to produce more data, at an unprecedented rate, but still nowhere near enough to keep up with the demand.

On the second point:
Taking the globe representation as an example, it is unclear to me how much of the resulting globe (or atlas) is actually the result of choices the authors made. The decision to map distance vectors in two or three dimensions seems to change the resulting representation. So, to what extent are these representations embedded in the model itself versus originating from the author’s mind? I'm reminded of similar problems in the research of animal intelligence.

Again, it is clear there’s some kind of world model in the LLM, but less so how much this kind of research predicts about its potential (lack of) shortcomings.

However, this is still all rather vague; let me try to formulate some predictions which could plausibly be checked in the next year or so.

Predictions:

The world models of LLMs are impoverished in weird ways compared to humans, due to blind spots in the training data. An example would be tactile sensations, which seem to play an important role in the intuitive modeling of physics for humans. Solving some of the blind spots is critical for further capability gains.
To elicit further capability gains, it will become necessary to turn to data which is less well-suited for transformer architecture. This will lead to escalating compute requirements, the effects of which will already become apparent in 2025.
As a result, there will be even stronger incentives for:
1. Combining different ML architectures, including transformers, and classical software into compound systems. We currently call this scaffolding, but transformers will become less prominent in these. “LLMs plus some scaffolding” will not be an accurate description of the systems that solve the next batch of hard problems.
2. Developing completely new architecture, with a certain chance of another "Attention Is All You Need", a new approach gaining the kind of eminence that transformers currently have. The likelihood and necessity of this is obviously a crux, currently I lean towards a. being sufficient for AGI even in the absence of another groundbreaking discovery.
Automated original ML research will turn out to be one of the hard problems that require 3.a or b. Transformer architecture will not create its own scaffolding or successor.

Now, your comment prompted me to look more deeply into the current state of machine learning in robotics and the success of decision transformers and even more so behaviour transformers disagree with my predictions.

Examples:
https://arxiv.org/abs/2206.11251
https://sjlee.cc/vq-bet/
https://youtu.be/5_G6o_H3HeE?si=JOsTGvQ17ZfdIdAJ

Compound systems, yes. But clearly transformers have an outsized impact on the results, and they handled data which I would have filed under “not well-suited” just fine. For now, I’ll stick with my predictions, if only for the sake of accountability. But evidently it’s time for some more reading.

jbash on Heresies in the Shadow of the Sequences

Non-causal decision theories are not necessary for A.G.I. design.

I'll call that and raise you "No decision theory of any kind, causal or otherwise, will either play any important explicit role in, or have any important architectural effect over, the actual design of either the first AGI(s), or any subsequent AGI(s) that aren't specifically intended to make the point that it's possible to use decision theory".

ege-erdil on Ege Erdil's Shortform

There are two arguments frequently offered for a free market economy over a centrally planned economy: an argument based around knowledge, sometimes called the socialist calculation problem; and another argument based on incentives. The arguments can be briefly summarized like so:

A central planning authority would not have enough knowledge to efficiently direct economic activity.
A central planning authority would not have the right incentives to ensure that their direction was efficient.

A point I've not seen anyone else make is that the argument from knowledge is really itself an argument from incentives in the following sense: the sensory and computational capabilities of human civilization is naturally distributed among individual humans who have a high degree of agency over their own actions. An efficient planner ought to leverage this whole base of data and compute when making decisions, but this requires giving each individual human the incentive to participate in this distributed computing process.

The limited bandwidth of human communication (on the order of bytes per second) compared to human computational power (on the order of 1e15 ops per second for the brain) means that setting up such a distributed computing scheme requires most decisions to be made locally, and this allows many opportunities for individual participants to shirk the duties that would be assigned to them by an economic planner, not only through the work-effort channel (where shirking is more obvious in many industries and can be cracked down on using coercion) but also by falsifying the results of local computations.

So the knowledge problem for the central planner can also be understood as an incentive problem for the participants in the centrally planned economy. The free market gets around this problem by enabling each person or group of people to profit from inefficiencies they find in the system, thereby incentivizing them to contribute to the aggregate economic optimization task. The fact that individual optimizations can be made locally without the need for approval from a central authority means less pressure is put on the scarce communication bandwidth available to the economy, which is reserved for the transmission of important information. While the price mechanism plays a significant role here as would be argued by e.g. Hayekians, compressed information about what drives changes in prices can be just as important.