LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Mapping the semantic void II: Above, below and between token embeddings
mwatkins · 2024-02-15T23:00:09.010Z · comments (4)

Why wasn't preservation with the goal of potential future revival started earlier in history?
Andy_McKenzie · 2024-01-16T16:15:08.550Z · comments (1)

A more systematic case for inner misalignment
Richard_Ngo (ricraz) · 2024-07-20T05:03:03.500Z · comments (4)

On "Geeks, MOPs, and Sociopaths"
alkjash · 2024-01-19T21:04:48.525Z · comments (35)

Some Things That Increase Blood Flow to the Brain
romeostevensit · 2024-03-27T21:48:46.244Z · comments (15)

On Not Requiring Vaccination
jefftk (jkaufman) · 2024-02-01T19:20:12.657Z · comments (21)

AI labs can boost external safety research
Zach Stein-Perlman · 2024-07-31T19:30:16.207Z · comments (1)

Retrospective: PIBBSS Fellowship 2023
DusanDNesic · 2024-02-16T17:48:32.151Z · comments (1)

Good Bings copy, great Bings steal
dr_s · 2024-04-21T09:52:46.658Z · comments (6)

[link] self-fulfilling prophecies when applying for funding
Chipmonk · 2024-03-01T19:01:40.991Z · comments (0)

Inference-Only Debate Experiments Using Math Problems
Arjun Panickssery (arjun-panickssery) · 2024-08-06T17:44:27.293Z · comments (0)

[link] [Linkpost] Statement from Scarlett Johansson on OpenAI's use of the "Sky" voice, that was shockingly similar to her own voice.
Linch · 2024-05-20T23:50:28.138Z · comments (8)

Music in the AI World
Martin Sustrik (sustrik) · 2024-08-16T04:20:01.706Z · comments (8)

[LDSL#1] Performance optimization as a metaphor for life
tailcalled · 2024-08-08T16:16:27.349Z · comments (4)

[LDSL#6] When is quantification needed, and when is it hard?
tailcalled · 2024-08-13T20:39:45.481Z · comments (0)

Mentorship in AGI Safety (MAGIS) call for mentors
Valentin2026 (Just Learning) · 2024-05-23T18:28:03.173Z · comments (3)

I was raised by devout Mormons, AMA [&|] Soliciting Advice
ErioirE (erioire) · 2024-03-13T16:52:19.130Z · comments (41)

[link] introduction to thermal conductivity and noise management
bhauth · 2024-03-06T23:14:02.288Z · comments (1)

[link] New report: A review of the empirical evidence for existential risk from AI via misaligned power-seeking
Harlan · 2024-04-04T23:41:26.439Z · comments (5)

Quick evidence review of bulking & cutting
jp · 2024-04-04T21:43:48.534Z · comments (5)

Incentive design and capability elicitation
Joe Carlsmith (joekc) · 2024-11-12T20:56:05.088Z · comments (0)

Context-dependent consequentialism
Jeremy Gillen (jeremy-gillen) · 2024-11-04T09:29:24.310Z · comments (6)

[link] A progress policy agenda
jasoncrawford · 2024-12-19T18:42:37.327Z · comments (1)

XX by Rian Hughes: Pretentious Bullshit
Yair Halberstadt (yair-halberstadt) · 2025-01-08T13:02:52.438Z · comments (5)

Balancing Label Quantity and Quality for Scalable Elicitation
Alex Mallen (alex-mallen) · 2024-10-24T16:49:00.939Z · comments (1)

1. Meet the Players: Value Diversity
Allison Duettmann (allison-duettmann) · 2025-01-02T19:00:52.696Z · comments (2)

[link] What is it like to be psychologically healthy? Podcast ft. DaystarEld
Chipmonk · 2024-10-05T19:14:04.743Z · comments (8)

[link] Thoughts on Zero Points
depressurize (anchpop) · 2024-04-23T02:22:27.448Z · comments (1)

D&D.Sci (Easy Mode): On The Construction Of Impossible Structures [Evaluation and Ruleset]
abstractapplic · 2024-05-20T09:38:55.228Z · comments (2)

UDT1.01: Plannable and Unplanned Observations (3/10)
Diffractor · 2024-04-12T05:24:34.435Z · comments (0)

Apply to MATS 7.0!
Ryan Kidd (ryankidd44) · 2024-09-21T00:23:49.778Z · comments (0)

[link] Epistemic states as a potential benign prior
Tamsin Leake (carado-1) · 2024-08-31T18:26:14.093Z · comments (2)

Extracting SAE task features for in-context learning
Dmitrii Kharlapenko (dmitrii-kharlapenko) · 2024-08-12T20:34:13.747Z · comments (1)

Book Review: What Even Is Gender?
Joey Marcellino · 2024-09-01T16:09:27.773Z · comments (14)

[link] A Narrative History of Environmentalism's Partisanship
Jeffrey Heninger (jeffrey-heninger) · 2024-05-14T16:51:01.029Z · comments (3)

Fun With CellxGene
sarahconstantin · 2024-09-06T22:00:03.461Z · comments (2)

[link] Baking vs Patissing vs Cooking, the HPS explanation
adamShimi · 2024-07-17T20:29:09.645Z · comments (16)

AI Constitutions are a tool to reduce societal scale risk
Sammy Martin (SDM) · 2024-07-25T11:18:17.826Z · comments (2)

AI #74: GPT-4o Mini Me and Llama 3
Zvi · 2024-07-25T13:50:06.528Z · comments (6)

AIS terminology proposal: standardize terms for probability ranges
eggsyntax · 2024-08-30T15:43:39.857Z · comments (12)

AI Safety Seed Funding Network - Join as a Donor or Investor
Alexandra Bos (AlexandraB) · 2024-12-16T19:30:43.812Z · comments (0)

AI #59: Model Updates
Zvi · 2024-04-11T14:20:06.339Z · comments (2)

[link] 2024 State of the AI Regulatory Landscape
Deric Cheng (deric-cheng) · 2024-05-28T11:59:06.582Z · comments (0)

AI #62: Too Soon to Tell
Zvi · 2024-05-02T15:40:04.364Z · comments (8)

[link] Anthropic: Reflections on our Responsible Scaling Policy
Zac Hatfield-Dodds (zac-hatfield-dodds) · 2024-05-20T04:14:44.435Z · comments (21)

AI #85: AI Wins the Nobel Prize
Zvi · 2024-10-10T13:40:07.286Z · comments (6)

Improving Model-Written Evals for AI Safety Benchmarking
Sunishchal Dev (sunishchal-dev) · 2024-10-15T18:25:08.179Z · comments (0)

[link] What I expected from this site: A LessWrong review
Nathan Young · 2024-12-20T11:27:39.683Z · comments (5)

A Principled Cartoon Guide to NVC
plex (ete) · 2025-01-07T21:01:07.904Z · comments (5)

[link] Safety tax functions
owencb · 2024-10-20T14:08:38.099Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

foyle on Why do futurists care about the culture war?

Agree that most sociological, economic and environmental problems that loom large in current context will radically shift in importance in next decade or two, to the point that they are probably no longer worth devoting any significant resources to in the present. Impacts of AI are only issue worth worrying about. But even assuming utopian outcomes; who gets possession of the Malibu beach houses in post scarcity world?

Once significant white-collar job losses start to mount in a year or two I think it inevitable that a powerful and electorally dominant anti-AI movement will grow, at least in erstwhile democracies, and likely ban most AGI applications outside of a few fields where fewer workers would stand to lose jobs (health - with near endless demand, perhaps cutting edge tech where payoff to human net welfare is highest). Butlerian Jihad-lite.

It won't save us, and has substantial risk of ushering in repressive authoritarianism in the political ruckus caused but will likely delay our demise or (at best) delivery into powerless pet status by perhaps a decade or two.

fl33tw00d on How do you deal w/ Super Stimuli?

Firefox Focus on iPhone is useful here.

Delete all social media apps from your phone, and hide safari.

Only access any platforms via Firefox Focus - browsing sessions are ephemeral, so you need to login each time. This added friction basically solved it for me.

viliam on The low Information Density of Eliezer Yudkowsky & LessWrong

I'll try to respect your preference for brevity ;)

a shorter version would be very useful -- yes, fully agree
- at least there is readthesequences.com without the comments (10x as much text as the articles)
- there were summaries at LW wiki, but those were too short; we need something medium-sized
there are some good reasons why Eliezer wrote a long text
- there wasn't rationalist community yet, lines had to be drawn to separate it from many existing adjacent communities (atheists, skeptics, libertarians, sci-fi fans, self-help, contrarians, academia...)
- emotional, near-mode appeal -- why should we even care about "being rational"?
- popular bad memes/patterns (mysterious answers, applause lights, "trust the science"...)

tl;dr -- writing for an already existing rationalist(-ish) community is different from writing in order to create a rationalist community

davidmanheim on AI Safety as a YC Startup

True, and even more, if optimizing for impact or magnitude has Goodhart effects, of various types, then even otherwise good directions are likely to be ruined by pushing on them too hard. (In large part because it seems likely that the space we care about is not going to have linear divisions into good and bad, there will be much more complex regions, and even when pointed in a directino that is locally better, pushing too far is possible, and very hard to predict from local features even if people try, which they mostly don't.)

davidmanheim on AI Safety as a YC Startup

I think the point wasn't having a unit norm, it was that impact wasn't defined as directional, so we'd need to remove the dimensionality from a multidimensionally defined direction.

So to continue the nitpicking, I'd argue impact = || Magnitude * Direction ||, or better, ||Impact|| = Magnitude * Direction, so that we can talk about size of impact. And that makes my point in a different comment even clearer - because almost by assumption, the vast majority of those with large impact are pointed in net-negative directions, unless you think either a significant proportion of directions are positive, or that people are selecting for it very strongly, which seems not to be the case.

james-chua on Inference-Time-Compute: More Faithful? A Research Note

thanks! Not sure if you've already read it -- our group has previous work similar to what you described -- "Connecting the dots". Models can e.g. articulate functions that that implicit in the training data. This ability is not perfect, models still have a long way to go.

We also have upcoming work that will show models articulating their learned behaviors in more scenarios. Will be released soon.

satron on Reducing sycophancy and improving honesty via activation steering

A new method for reducing sycophancy. Sycophantic behavior is present in quite a few AI threat models, so it's an important area to work on.

The article not only uses activation steering to reduce sycophancy in AI models but also provides directions for future work [LW · GW].

Overall, this post is a valuable addition to the toolkit of people who wish to build safe advanced AI.

tailcalled on We probably won't just play status games with each other after AGI

I don't think consumers demand authentic AI friends because they already have authentic human friends. Also it's not clear how you imagine the AI companies could train the AIs to be more independent and less superficial; generally training an AI requires a differentiable loss, but human independence does not originate from a differentiable loss and so it's not obvious that one could come up with something functionally similar via gradient descent.

paddyc on Seeing Red: Dissolving Mary's Room and Qualia

It’s easy to imagine visual perception in relation to qualia ie how do I know that the blue I see looks the same to you beyond identifying it eg the sky is blue. But I think it’s harder to imagine qualia in relation to sound, that is how could sound have a subjective essence that is possibly unique to each individual. I think you either hear something or you don’t.

fl33tw00d on Do humans really learn from "little" data?

The linked video says so at 30:45