LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Jonothan Gorard:The territory is isomorphic to an equivalence class of its maps
Daniel C (harper-owen) · 2024-09-07T10:04:47.840Z · comments (18)

Is Text Watermarking a lost cause?
egor.timatkov · 2024-10-01T16:20:51.113Z · comments (13)

My career exploration: Tools for building confidence
lynettebye · 2024-09-13T11:37:55.843Z · comments (0)

[link] AlignedCut: Visual Concepts Discovery on Brain-Guided Universal Feature Space
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-09-14T23:23:26.296Z · comments (1)

Using Dangerous AI, But Safely?
habryka (habryka4) · 2024-11-16T04:29:20.914Z · comments (2)

OpenAI defected, but we can take honest actions
Remmelt (remmelt-ellen) · 2024-10-21T08:41:25.728Z · comments (16)

Heresies in the Shadow of the Sequences
Cole Wyeth (Amyr) · 2024-11-14T05:01:11.889Z · comments (12)

Reducing global AI competition through the Commerce Control List and Immigration reform: a dual-pronged approach
Ben Smith (ben-smith) · 2024-09-03T05:28:24.549Z · comments (2)

[question] Is there a CFAR handbook audio option?
FinalFormal2 · 2024-10-26T17:08:36.480Z · answers+comments (0)

Physical Therapy Sucks (but have you tried hiding it in some peanut butter?)
Declan Molony (declan-molony) · 2024-09-10T05:54:47.000Z · comments (12)

[link] A Little Depth Goes a Long Way: the Expressive Power of Log-Depth Transformers
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-11-20T11:48:14.170Z · comments (0)

[link] Every niche event should also be a meetup
DMMF · 2024-11-19T20:47:50.053Z · comments (0)

Hiring a writer to co-author with me (Spencer Greenberg for ClearerThinking.org)
spencerg · 2024-10-27T17:34:50.479Z · comments (0)

Evolutionary prompt optimization for SAE feature visualization
neverix · 2024-11-14T13:06:49.728Z · comments (0)

[question] Does the "ancient wisdom" argument have any validity? If a particular teaching or tradition is old, to what extent does this make it more trustworthy?
SpectrumDT · 2024-11-04T15:20:14.822Z · answers+comments (49)

[link] Why good things often don’t lead to better outcomes
DMMF · 2024-09-19T16:37:07.778Z · comments (1)

Appealing to the Public
jefftk (jkaufman) · 2024-10-23T19:00:07.669Z · comments (0)

Review: Dr Stone
ProgramCrafter (programcrafter) · 2024-09-29T10:35:53.175Z · comments (4)

Slave Morality: A place for every man and every man in his place
Martin Sustrik (sustrik) · 2024-09-19T04:20:04.491Z · comments (7)

[question] What epsilon do you subtract from "certainty" in your own probability estimates?
Dagon · 2024-11-26T19:13:46.795Z · answers+comments (6)

LifeKeeper Diaries: Exploring Misaligned AI Through Interactive Fiction
Tristan Tran (tristan-tran) · 2024-11-09T20:58:09.182Z · comments (5)

Electric Grid Cyberattack: An AI-Informed Threat Model
moonlightmaze · 2024-11-11T21:34:17.190Z · comments (0)

Join a LessWrong Team for the Unaging System Challenge
Crissman · 2024-10-23T06:01:08.018Z · comments (5)

Current Attitudes Toward AI Provide Little Data Relevant to Attitudes Toward AGI
Seth Herd · 2024-11-12T18:23:53.533Z · comments (2)

Two arguments against longtermist thought experiments
momom2 (amaury-lorin) · 2024-11-02T10:22:11.311Z · comments (5)

[link] Pronouns are Annoying
ymeskhout · 2024-09-18T13:30:04.620Z · comments (21)

2024 NYC Secular Solstice & Megameetup
Joe Rogero · 2024-11-12T17:46:18.674Z · comments (0)

[link] Levers for Biological Progress - A Response to "Machines of Loving Grace"
Niko_McCarty (niko-2) · 2024-11-01T16:35:08.221Z · comments (0)

Announcing the Ultimate Jailbreaking Championship
InnerHufflepuff (grayswan) · 2024-09-04T00:35:31.234Z · comments (1)

New Funding Category Open in Foresight's AI Safety Grants
Allison Duettmann (allison-duettmann) · 2024-11-06T22:59:41.065Z · comments (0)

Chaos Theory in Ecology
Elizabeth (pktechgirl) · 2024-11-09T17:50:01.727Z · comments (2)

[link] The Neruda Factory
jenn (pixx) · 2024-11-29T15:20:02.276Z · comments (1)

[link] Where is the Learn Everything System?
Shoshannah Tekofsky (DarkSym) · 2024-09-27T21:30:16.379Z · comments (8)

[question] Any Trump Supporters Want to Dialogue?
k64 · 2024-09-28T19:41:55.370Z · answers+comments (80)

Should you have children? All LessWrong posts about the topic
Sherrinford · 2024-11-26T23:52:44.113Z · comments (0)

Inverse Problems In Everyday Life
silentbob · 2024-10-15T11:42:30.276Z · comments (2)

[link] I, Token
Ivan Vendrov (ivan-vendrov) · 2024-11-25T02:20:35.629Z · comments (2)

Against Explosive Growth
c.trout (ctrout) · 2024-09-04T21:45:03.120Z · comments (1)

[link] Disentangling Representations through Multi-task Learning
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-11-24T13:10:26.307Z · comments (1)

[link] AI x Human Flourishing: Introducing the Cosmos Institute
Brendan McCord (brendan-mccord) · 2024-09-05T18:23:32.690Z · comments (5)

Aligning AI Safety Projects with a Republican Administration
Deric Cheng (deric-cheng) · 2024-11-21T22:12:27.502Z · comments (0)

[link] Verification methods for international AI agreements
Akash (akash-wasil) · 2024-08-31T14:58:10.986Z · comments (1)

[question] How can we prevent AGI value drift?
Dakara (chess-ice) · 2024-11-20T18:19:24.375Z · answers+comments (6)

Secular Solstice Songbook Update
jefftk (jkaufman) · 2024-11-17T17:30:07.404Z · comments (2)

Dance Differentiation
jefftk (jkaufman) · 2024-11-15T02:30:07.694Z · comments (0)

AXRP Episode 38.0 - Zhijing Jin on LLMs, Causality, and Multi-Agent Systems
DanielFilan · 2024-11-14T07:00:06.977Z · comments (0)

My hopes for YouCongress.com
Nathan Helm-Burger (nathan-helm-burger) · 2024-09-22T03:20:20.939Z · comments (3)

Lenses of Control
WillPetillo · 2024-10-22T07:51:06.355Z · comments (0)

The deepest atheist: Sam Altman
Trey Edwin (Paolo Vivaldi) · 2024-10-10T03:27:34.465Z · comments (2)

Humans are (mostly) metarational
Yair Halberstadt (yair-halberstadt) · 2024-10-09T05:51:16.644Z · comments (6)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

dagon on Is the mind a program?

Hmm, still not following, or maybe not agreeing. I think that "if the reasoning used to solve the problem is philosophical" then "correct solution" is not available. "useful", "consensus", or "applicable in current societal context" might be better evaluations of a philosophical reasoning.

sweenesm on Understanding Emergence in Large Language Models

Thanks for the post. I think it'd be helpful if you could add some links to references for some of the things you say, such as:

For instance, between 10^10 and 10^11 parameters, models showed dramatic improvements in their ability to interpret emoji sequences representing movies.

joseph-miller on Joseph Miller's Shortform

There are two types of people in this world.

There are people who treat the lock on a public bathroom as a tool for communicating occupancy and a safeguard against accidental attempts to enter when the room is unavailable. For these people the standard protocol is to discern the likely state of engagement of the inner room and then tentatively proceed inside if they detect no signs of human activity.

And there are people who view the lock on a public bathroom as a physical barricade with which to temporarily defend possessed territory. They start by giving the door a hearty push to test the tensile strength of the barrier. On meeting resistance they engage with full force, wringing the handle up and down and slamming into the door with their full body weight. Only once their attempts are thwarted do they reluctantly retreat to find another stall.

cbiddulph on You should consider applying to PhDs (soon!)

Thanks, this post made me seriously consider applying to a PhD, and I strong-upvoted. I had vaguely assumed that PhDs take way too long and don't allow enough access to compute compared to industry AI labs. But considering the long lead time required for the application process and the reminder that you can always take new opportunities as they come up, I now think applying is worth it.

However, looking into it, putting together a high-quality application starting now and finishing by the deadline seems approximately impossible? If the deadline were December 15, that would give you two weeks; other schools like Berkeley have even earlier deadlines. I asked ChatGPT how long it would take to apply to just a single school, and it said it would take 43–59 hours of time spent working, or ~4–6 weeks in real time. Claude said 37-55 hours/4-6 weeks.

Not to discourage anyone from starting their application now if they think they can do it - I guess if you're sufficiently productive and agentic and maybe take some amphetamines, you can do anything. But this seems like a pretty crazy timeline. Just the thought of asking someone to write me a recommendation letter in a two-week timeframe makes me feel bad.

Your post does make me think "if I were going to be applying to a PhD next December, what would I want to do now?" That seems pretty clarifying, and would probably be a helpful frame even if it turns out that a better opportunity comes along and I never apply to a PhD.

I think it'd be a good idea for you to repost this in August or early September of next year!

nicholas-heather-kross on Why and When Interpretability Work is Dangerous

Kinda, my current mainline-doom-case is "some AI gets controlled --> powerful people use it to prop themselves up --> world gets worse until AI gets uncontrollably bad --> doom". I would call it a different yet also-important doom case of "perpetual low-grade-AI dictatorship where the AI is controlled by humans in a surveillance state".

sunwillrise on gwern's Shortform

All of these ideas sound awesome and exciting, and precisely the right kind [LW · GW] of use of LLMs that I would like to see on LW!

sunwillrise on A shot at the diamond-alignment problem

It's looking like the values of humans are far, far simpler than a lot of evopsych literature and Yudkowsky thought, and related to this, values are less fragile than people thought 15-20 years ago, in the sense that values generalize far better OOD than people used to think 15-20 years ago

I'm not sure I like this argument very much, as it currently stands. It's not that I believe anything you wrote in this paragraph is wrong per se, but more like this misses the mark a bit in terms of framing.

Yudkowsky had (and, AFAICT, still has) a specific theory [LW · GW] of human values in terms of what they mean in a reductionist [LW · GW] framework, where it makes sense (and is rather natural) to think of (approximate) utility functions [LW · GW] of humans and of Coherent Extrapolated Volition [LW · GW] as things-that-exist-in-the-territory [LW · GW].

I think a lot of writing and analysis, summarized by me here [LW(p) · GW(p)], has cast a tremendous amount of doubt on the viability of this way of thinking and has revealed what seem to me to be impossible-to-patch holes at the core of these theories. I do not believe [LW(p) · GW(p)] "human values" in the Yudkowskian sense ultimately make sense as a coherent concept that carves reality at the joints [LW · GW]; I instead observe a tremendous number of unanswered questions and apparent contradictions [LW(p) · GW(p)] that throw the entire edifice in disarray.

But supplementing this reorientation of thinking around what it means to satisfy human values has been "prosaic" [LW · GW] alignment researchers pivoting more towards intent alignment [LW · GW] as opposed to doomed-from-the-start paradigms like "learning the true human utility function" [LW · GW] or ambitious value learning [LW · GW], a recognition that realism about (AGI) rationality [LW · GW] is likely just straight-up false and that the very specific set of conclusions MIRI-clustered alignment researchers have reached [LW(p) · GW(p)] about what AGI cognition will be like are entirely overconfident and seem contradicted by our modern observations of LLMs [LW(p) · GW(p)], and ultimately an increased focus on the basic observation that full value alignment simply is not required [LW(p) · GW(p)] for a good AI outcome (or at the very least for prevent AI takeover). So it's not so much that human values (to the extent such a thing makes sense) are simpler, but more so that fulfilling those values is just not needed to as nearly a high a degree as people used to think.

nadroj on Mechanistically Eliciting Latent Behaviors in Language Models

Couldn't you do something like fit a Gaussian to the model's activations, then restrict the steered activations to be high likelihood (low Mahalanobis distance)? Or (almost) equivalently, you could just do a whitening transformation to activation space before you constrain the L2 distance of the perturbation.

(If a gaussian isn't expressive enough you could model the manifold in some other way, eg. with a VAE anomaly detector or mixture of gaussians or whatever)

drake-thomas on Is the mind a program?

The theoretical maximum FLOPS of an Earth-bound classical computer is something like .

Is this supposed to have a different base or exponent? A single H100 already gets like $2^{45}$ FLOP/s.

green_leaf on LLM chatbots have ~half of the kinds of "consciousness" that humans believe in. Humans should avoid going crazy about that.

Ooh.