LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Fake Deeply
Zack_M_Davis · 2023-10-26T19:55:22.340Z · comments (7)

[question] Is AlphaGo actually a consequentialist utility maximizer?
faul_sname · 2023-12-07T12:41:05.132Z · answers+comments (8)

One True Love
Zvi · 2024-02-09T15:10:05.298Z · comments (7)

Monthly Roundup #20: July 2024
Zvi · 2024-07-23T12:50:07.991Z · comments (9)

Confusing the metric for the meaning: Perhaps correlated attributes are "natural"
NickyP (Nicky) · 2024-07-23T12:43:18.681Z · comments (3)

[link] AI Safety Memes Wiki
plex (ete) · 2024-07-24T18:53:04.977Z · comments (1)

[link] Twitter thread on open-source AI
Richard_Ngo (ricraz) · 2024-07-31T00:26:11.655Z · comments (6)

Templates I made to run feedback rounds for Ethan Perez’s research fellows.
Henry Sleight (ResentHighly) · 2024-03-28T19:41:15.506Z · comments (0)

Takeaways from a Mechanistic Interpretability project on “Forbidden Facts”
Tony Wang (tw) · 2023-12-15T11:05:23.256Z · comments (8)

[link] On Lies and Liars
Gabriel Alfour (gabriel-alfour-1) · 2023-11-17T17:13:03.726Z · comments (4)

AI #63: Introducing Alpha Fold 3
Zvi · 2024-05-09T14:20:03.176Z · comments (2)

[link] Vacuum: Theory and Technologies
ethanmorse · 2024-01-21T17:23:49.257Z · comments (0)

[question] Do websites and apps actually generally get worse after updates, or is it just an effect of the fear of change?
lillybaeum · 2023-12-10T17:26:34.206Z · answers+comments (34)

2024 ACX Predictions: Blind/Buy/Sell/Hold
Zvi · 2024-01-09T19:30:06.388Z · comments (2)

AI Safety Strategies Landscape
Charbel-Raphaël (charbel-raphael-segerie) · 2024-05-09T17:33:45.853Z · comments (1)

"Which chains-of-thought was that faster than?"
Emrik (Emrik North) · 2024-05-22T08:21:00.269Z · comments (4)

Musings on LLM Scale (Jul 2024)
Vladimir_Nesov · 2024-07-03T18:35:48.373Z · comments (0)

Rational Animations offers animation production and writing services!
Writer · 2024-03-15T17:26:07.976Z · comments (0)

[link] patent process problems
bhauth · 2024-07-14T21:12:04.953Z · comments (13)

LLMs can strategically deceive while doing gain-of-function research
Igor Ivanov (igor-ivanov) · 2024-01-24T15:45:08.795Z · comments (4)

Update #2 to "Dominant Assurance Contract Platform": EnsureDone
moyamo · 2023-11-28T18:02:50.367Z · comments (2)

Monthly Roundup #16: March 2024
Zvi · 2024-03-19T13:10:05.529Z · comments (4)

Experimentation (Part 7 of "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-03-18T21:25:56.527Z · comments (0)

[link] The Cancer Resolution?
PeterMcCluskey · 2024-07-24T00:25:17.322Z · comments (24)

The Cognitive Bootcamp Agreement
Raemon · 2024-10-16T23:24:05.509Z · comments (0)

How I build and run behavioral interviews
benkuhn · 2024-02-26T05:50:05.328Z · comments (6)

[question] How unusual is the fact that there is no AI monopoly?
Viliam · 2024-08-16T20:21:51.012Z · answers+comments (15)

[link] Why you, personally, should want a larger human population
jasoncrawford · 2024-02-23T19:48:10.526Z · comments (32)

Learning Math in Time for Alignment
Nicholas / Heather Kross (NicholasKross) · 2024-01-09T01:02:37.446Z · comments (3)

Is suffering like shit?
KatjaGrace · 2024-05-31T01:20:03.855Z · comments (5)

[link] End Single Family Zoning by Overturning Euclid V Ambler
Maxwell Tabarrok (maxwell-tabarrok) · 2024-07-26T14:08:45.046Z · comments (1)

An Introduction to Representation Engineering - an activation-based paradigm for controlling LLMs
Jan Wehner · 2024-07-14T10:37:21.544Z · comments (4)

Monthly Roundup #13: December 2023
Zvi · 2023-12-19T15:10:08.293Z · comments (5)

Investigating the Ability of LLMs to Recognize Their Own Writing
Christopher Ackerman (christopher-ackerman) · 2024-07-30T15:41:44.017Z · comments (0)

[link] A computational complexity argument for many worlds
jessicata (jessica.liu.taylor) · 2024-08-13T19:35:10.116Z · comments (15)

[link] Talking With People Who Speak to Congressional Staffers about AI risk
Eneasz · 2023-12-14T17:55:50.606Z · comments (0)

[link] OpenAI, DeepMind, Anthropic, etc. should shut down.
Tamsin Leake (carado-1) · 2023-12-17T20:01:22.332Z · comments (48)

Some of my predictable updates on AI
Aaron_Scher · 2023-10-23T17:24:34.720Z · comments (8)

Video and transcript of presentation on Scheming AIs
Joe Carlsmith (joekc) · 2024-03-22T15:52:03.311Z · comments (1)

A quick experiment on LMs’ inductive biases in performing search
Alex Mallen (alex-mallen) · 2024-04-14T03:41:08.671Z · comments (2)

[link] Lying is Cowardice, not Strategy
Connor Leahy (NPCollapse) · 2023-10-24T13:24:25.450Z · comments (73)

[link] the subreddit size threshold
bhauth · 2024-01-23T00:38:13.747Z · comments (3)

0. The Value Change Problem: introduction, overview and motivations
Nora_Ammann · 2023-10-26T14:36:15.466Z · comments (0)

[link] Manifund: 2023 in Review
Austin Chen (austin-chen) · 2024-01-18T23:50:13.557Z · comments (0)

In Defense of Lawyers Playing Their Part
Isaac King (KingSupernova) · 2024-07-01T01:32:58.695Z · comments (9)

[link] How "Pause AI" advocacy could be net harmful
Tamsin Leake (carado-1) · 2023-12-26T16:19:20.724Z · comments (9)

Being against involuntary death and being open to change are compatible
Andy_McKenzie · 2024-05-27T06:37:27.644Z · comments (5)

5 Reasons Why Governments/Militaries Already Want AI for Information Warfare
trevor (TrevorWiesinger) · 2023-10-30T16:30:38.020Z · comments (0)

Computational Approaches to Pathogen Detection
jefftk (jkaufman) · 2023-11-01T00:30:13.012Z · comments (5)

Being good at the basics
dominicq · 2023-11-04T14:18:50.976Z · comments (1)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

cstinesublime on Advice on Communicating Concisely

What specific kinds of ideas are making this problem noticeable?

Are you talking about conveying specialist knowledge to a lay-audience - for example, good luck trying to get me to understand what an Eigenvector is or the points system in Cricket - I've tried. Likewise, to explain to a friend what Sub-Surface scattering was, I first had to introduce him to the mechanics of Ray Tracing. Luckily he was a musician so I could just use analogies to the diffusion and travel of sound waves.

Or are you talking about more personal preferences and experiences, for example recently someone asked me "why do you prefer to be behind the camera rather than a performer in front of it?" - apparently they thought I was such a ham I should be a comedian not a director - I didn't know where to even begin.
Likewise many people who "kind of fell into doing this" for their current profession will stumble if you ask them how they "got into it" because there's often a meandering narrative and confused chronology because even to themselves it's not clear.

Another question I have is - are there any patterns in the assumptions, misunderstandings or tangents which people you're trying to explain exhibit in reaction to your explanations?

julius-1 on Interest in Leetcode, but for Rationality?

I originally had an LLM generate them for me, and then I checked those with other LLMs to make sure the answers were right and that weren't ambiguous. All of the questions are here: https://github.com/jss367/calibration_trivia/tree/main/public/questions

cubefox on A brief theory of why we think things are good or bad

Problem is, motivated reasoning can only explain selfish beliefs, beliefs which are in accordance with our own motivations. But moral beliefs are often not at all selfish. In contrast, "suffering is bad" could just be part of what "bad" means. No motivated reasoning required. It would be a "foundational belief" in the same sense "Bachelors are unmarried" could be called "foundational".

justismills on Slightly More Than You Wanted To Know: Pregnancy Length Effects

Yeah, probably. There are a few things like "meconium aspiration" that would make a literal 1:1 womb substitute insufficient to give the baby a few more weeks, and for all we know some of the 42-43 issues are direct harms of marginal gestation. But I'd be rather surprised (<10% chance) if the optimal gestation-in-artificial-womb duration were less than 41 weeks.

justismills on Slightly More Than You Wanted To Know: Pregnancy Length Effects

They're correlational, though the broad cohorts help - not sure what you can do beyond just canvassing an entire birth cohort and noticing differences. There are possible pitfalls like the decision to induct early being made by people with genes that predict bad outcomes? But I really don't think that's major.

zy on Cipolla's Shortform

Could you maybe elaborate on "long term academic performance"?

nostalgebraist on The Hidden Complexity of Wishes

In the situation assumed by your first argument, AGI would be very unlikely to share our values even if our values were much simpler than they are.

Complexity makes things worse, yes, but the conclusion "AGI is unlikely to have our values" is already entailed by the other premises even if we drop the stuff about complexity.

Why: if we're just sampling some function from a simplicity prior, we're very unlikely to get any particular nontrivial function that we've decided to care about in advance of the sampling event. There are just too many possible functions, and probability mass has to get divided among them all.

In other words, if it takes bits to specify human values, there are $2^{N}$ ways that a bitstring of the same length could be set, and we're hoping to land on just one of those through luck alone. (And to land on a bitstring of this specific length in the first place, of course.) Unless $N$ is very small, such a coincidence is extremely unlikely.

And $N$ is not going to be that small; even in the sort of naive and overly simple "hand-crafted" value specifications which EY has critiqued in this post and elsewhere, a lot of details have to be specified. (E.g. some proposals refer to "humans" and so a full algorithmic description of them would require an account of what is and isn't a human.)

One could devise a variant of this argument that doesn't have this issue, by "relaxing the problem" so that we have some control, just not enough to pin down the sampled function exactly. And then the remaining freedom is filled randomly with a simplicity bias. This partial control might be enough to make a simple function likely, while not being able to make a more complex function likely. (Hmm, perhaps this is just your second argument, or a version of it.)

This kind of reasoning might be applicable in a world where its premises are true, but I don't think it's premises are true in our world.

In practice, we apparently have no trouble getting machines to compute very complex functions, including (as Matthew points out) specifications of human value whose robustness would have seemed like impossible magic back in 2007. The main difficulty, if there is one, is in "getting the function to play the role of the AGI values," not in getting the AGI to compute the particular function we want in the first place.

ben-livengood on Change My Mind: Thirders in "Sleeping Beauty" are Just Doing Epistemology Wrong

"What is your credence now for the proposition that the coin landed heads?"

There are three doors. Two are labeled Monday, and one is labeled Tuesday. Behind each door is a Sleeping Beauty. In a waiting room, many (finite) more Beauties are waiting; every time a Beauty is anesthetized, a coin is flipped and taped to their forehead with clear tape. You open all three doors, the Beauties wake up, and you ask the three Beauties The Question. Then they are anesthetized, the doors are shut, and any Beauties with a Heads showing on their foreheads or behind a Tuesday door are wheeled away after the coin is removed from their forehead. The Beauty with a Tails on their forehead behind the Monday door is wheeled behind the Tuesday door. Two new Beauties are wheeled behind the two Monday doors, one with Heads and one with Tails. The experiment repeats.

You observe that Tuesday Beauties always have a Tails taped to their forehead. You always observe that one Monday Beauty has a Tails showing, and one has a Heads showing. You also observe that every Beauty says 1/3, matching the ratio of Heads to Tails showing, and it is apparent that they can't see the coins taped to their own or each other's foreheads or the door they are behind. Every Tails Beauty is questioned twice. Every Heads Beauty is questioned once. You can see all the steps as they happen, there is no trick, every coin flip has 1/2 probability for Heads.

There is eventually a queue of Waiting Sleeping Beauties with all-Heads or all-Tails showing and a new Beauty must be anesthetized with a new coin; the queue length changes over time and sometimes switches face. You can stop the experiment when the queue is empty, as a random walk guarantees to happen eventually, if you like tying up loose ends.

zy on A Rocket–Interpretability Analogy

Agree with this, and wanted to add that I am also not completely sure if mechanistic interpretability is a good "commercial bet" yet based on my experience and understanding, with my definition of commercial bet being materialization of revenue or simply revenue generating.

One revenue generating path I can see for LLMs is the company uses them to identify data that are most effective for particular benchmarks, but my current understanding (correct me if I am wrong) is that it is relatively costly to first research a reliable method, and then run interpretability methods for large models for now; additionally, it would be generally very intuitive to researchers on what datasets could be useful to specific benchmarks already. On the other hand, the method would be much useful to look into nuanced and hard to tackle safety problems. In fact there are a lot of previous efforts in using interpretability generally for safety mitigations.

evolutionbydesign on Advice on Communicating Concisely

Thank you! The books you recommended look like what I was hoping to find.