LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Abs-E (or, speak only in the positive)
dkl9 · 2024-02-19T21:14:32.095Z · comments (24)

[link] Debate helps supervise human experts [Paper]
habryka (habryka4) · 2023-11-17T05:25:17.030Z · comments (6)

Investigating Bias Representations in LLMs via Activation Steering
DawnLu · 2024-01-15T19:39:14.077Z · comments (4)

[question] How did you integrate voice-to-text AI into your workflow?
ChristianKl · 2023-11-20T12:01:37.696Z · answers+comments (12)

[link] AI Impacts 2023 Expert Survey on Progress in AI
habryka (habryka4) · 2024-01-05T19:42:17.226Z · comments (1)

Monthly Roundup #19: June 2024
Zvi · 2024-06-25T12:00:03.333Z · comments (9)

Childhood and Education Roundup #6: College Edition
Zvi · 2024-06-26T11:40:03.990Z · comments (8)

Deconfusing “ontology” in AI alignment
Dylan Bowman (dylan-bowman) · 2023-11-08T20:03:43.205Z · comments (3)

{Book Summary} The Art of Gathering
Tristan Williams (tristan-williams) · 2024-04-16T10:48:41.528Z · comments (0)

Updates to Open Phil’s career development and transition funding program
abergal · 2023-12-04T18:10:29.394Z · comments (0)

Cryonics p(success) estimates are only weakly associated with interest in pursuing cryonics in the LW 2023 Survey
Andy_McKenzie · 2024-02-29T14:47:28.613Z · comments (6)

Auditing LMs with counterfactual search: a tool for control and ELK
Jacob Pfau (jacob-pfau) · 2024-02-20T00:02:09.575Z · comments (6)

Ackshually, many worlds is wrong
tailcalled · 2024-04-11T20:23:59.416Z · comments (42)

DIY RLHF: A simple implementation for hands on experience
Mike Vaiana (mike-vaiana) · 2024-07-10T12:07:03.047Z · comments (0)

[link] Conversation Visualizer
ethanmorse · 2023-12-31T01:18:01.424Z · comments (4)

Collection (Part 6 of "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-03-14T21:37:00.160Z · comments (0)

[link] Memo on some neglected topics
Lukas Finnveden (Lanrian) · 2023-11-11T02:01:55.834Z · comments (2)

[link] Our Digital and Biological Children
Eneasz · 2024-10-24T18:36:38.719Z · comments (0)

[link] Arithmetic Models: Better Than You Think
kqr · 2024-10-26T09:42:07.185Z · comments (4)

Trading Candy
jefftk (jkaufman) · 2024-11-01T01:10:08.024Z · comments (4)

[link] A new process for mapping discussions
Nathan Young · 2024-09-30T08:57:20.029Z · comments (7)

Tackling Moloch: How YouCongress Offers a Novel Coordination Mechanism
Hector Perez Arenas (hector-perez-arenas) · 2024-05-15T23:13:48.501Z · comments (9)

Employee Incentives Make AGI Lab Pauses More Costly
nikola (nikolaisalreadytaken) · 2023-12-22T05:04:15.598Z · comments (12)

[link] Quick Thoughts on Scaling Monosemanticity
Joel Burget (joel-burget) · 2024-05-23T16:22:48.035Z · comments (1)

Evaporation of improvements
Viliam · 2024-06-20T18:34:40.969Z · comments (27)

Reading More Each Day: A Simple $35 Tool
aysajan · 2024-07-24T13:54:04.290Z · comments (2)

AI #65: I Spy With My AI
Zvi · 2024-05-23T12:40:02.793Z · comments (7)

Escaping Skeuomorphism
Stuart Johnson (stuart-johnson) · 2023-12-20T03:51:00.489Z · comments (0)

An explanation of evil in an organized world
KatjaGrace · 2024-05-02T05:20:06.240Z · comments (9)

[link] Cellular reprogramming, pneumatic launch systems, and terraforming Mars: Some things I learned about at Foresight Vision Weekend
jasoncrawford · 2024-01-04T19:33:57.887Z · comments (0)

Can quantised autoencoders find and interpret circuits in language models?
charlieoneill (kingchucky211) · 2024-03-24T20:05:50.125Z · comments (4)

Aggregative principles approximate utilitarian principles
Cleo Nardo (strawberry calm) · 2024-06-12T16:27:22.179Z · comments (3)

Heuristics for preventing major life mistakes
SK2 (lunchbox) · 2023-12-20T08:01:09.340Z · comments (2)

[link] New blog: Expedition to the Far Lands
Connor Leahy (NPCollapse) · 2024-08-17T11:07:48.537Z · comments (3)

AI #64: Feel the Mundane Utility
Zvi · 2024-05-16T15:20:02.956Z · comments (11)

Cicadas, Anthropic, and the bilateral alignment problem
kromem · 2024-05-22T11:09:56.469Z · comments (6)

[link] AI Safety at the Frontier: Paper Highlights, August '24
gasteigerjo · 2024-09-03T19:17:24.850Z · comments (0)

5 ways to improve CoT faithfulness
CBiddulph (caleb-biddulph) · 2024-10-05T20:17:12.637Z · comments (8)

Towards Quantitative AI Risk Management
Henry Papadatos (henry) · 2024-10-16T19:26:48.817Z · comments (1)

Interpretability of SAE Features Representing Check in ChessGPT
Jonathan Kutasov (jonathan-kutasov) · 2024-10-05T20:43:36.679Z · comments (2)

European Progress Conference
Martin Sustrik (sustrik) · 2024-10-06T11:10:03.819Z · comments (11)

[link] Generic advice caveats
Saul Munn (saul-munn) · 2024-10-30T21:03:07.185Z · comments (1)

[question] Any real toeholds for making practical decisions regarding AI safety?
lukehmiles (lcmgcd) · 2024-09-29T12:03:08.084Z · answers+comments (6)

[link] Evaluating Synthetic Activations composed of SAE Latents in GPT-2
Giorgi Giglemiani (Rakh) · 2024-09-25T20:37:48.227Z · comments (0)

[link] David Burns Thinks Psychotherapy Is a Learnable Skill. Git Gud.
Morpheus · 2024-01-27T13:21:05.068Z · comments (20)

[link] Link Collection: Impact Markets
Saul Munn (saul-munn) · 2023-12-26T09:01:48.815Z · comments (0)

flowing like water; hard like stone
lsusr · 2024-02-20T03:20:46.531Z · comments (4)

On the 2nd CWT with Jonathan Haidt
Zvi · 2024-04-05T17:30:05.223Z · comments (3)

Uncertainty in all its flavours
Cleo Nardo (strawberry calm) · 2024-01-09T16:21:07.915Z · comments (6)

NYU Code Debates Update/Postmortem
David Rein (david-rein) · 2024-05-24T16:08:06.151Z · comments (4)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

vlad-mikulik on Anthropic: Three Sketches of ASL-4 Safety Case Components

FWIW I agree with you and wouldn't put it the way it is in Roger's post. Not sure what Roger would say in response.

jimmy on The Median Researcher Problem

There's no norm saying you can't be ignorant of stats and read, or even post about things not requiring an understanding of stats, but there's still a critical mass of people who do understand the topic well enough to enforce norms against actively contributing with that illiteracy. (E.g. how do you expect it to go over if someone makes a post claiming that p=0.05 means that there's a 95% change that the hypothesis is true?)

Taking it a step further, I'd say my household "has norms which basically require everyone to speak English", but that doesn't mean the little one is quite there yet or that we're gonna boot her for not already meeting the bar. It just means that she has to work hard to learn how to talk if she wants to be part of what's going on.

Lesswrong feels like that to me in that I would feel comfortable posting about things which require statistical literacy to understand, knowing that engagement which fails to meet that bar will be downvoted rather than getting downvoted for expecting to find a statistically literate audience here.

mr-hire on Should CA, TX, OK, and LA merge into a giant swing state, just for elections?

A real life use for smart contracts 😆

towards_keeperhood on What are the primary drivers that caused selection pressure for intelligence in humans?

thanks. Can you say more about why?

I mean runaway sexual selection is basically H1, which I updated to being less plausible. See my answer here [LW(p) · GW(p)]. (You could comment there why you think my update might be wrong or so.)

tanjent on tanjent's Shortform

Claude dot Ai did a remarkable job of summarizing my thoughts about the current state of things:

https://medium.com/@tanjent/unedited-chat-with-claude-dot-ai-about-politics-and-acting-against-ones-own-self-interest-b3b805d85015

ape-in-the-coat on What are the primary drivers that caused selection pressure for intelligence in humans?

I suspect that runaway sexual selection played a huge part.

gilch on Open Thread Fall 2024

Not sure I understand what you mean by that. The Universe seems to follow relatively simple deterministic laws. That doesn't mean you can use quantum field theory to predict the weather. But chaotic systems can be modeled as statistical ensembles. Temperature is a meaningful measurement even if we can't calculate the motion of all the individual gas molecules.

If you're referring to human irrationality in particular, we can study cognitive bias [? · GW], which is how human reasoning diverges from that of idealized agents in certain systematic ways. This is a topic of interest at both the individual level of psychology, and at the level of statistical ensembles in economics.

buck on Thomas Kwa's Shortform

In terms of developing better misalignment risk countermeasures, I think the most important questions are probably:

How to evaluate whether models should be trusted or untrusted: currently I don't have a good answer and this is bottlenecking the efforts to write concrete control proposals.
How AI control should interact with AI security tools inside labs.

More generally:

How can we get more evidence on whether scheming is plausible?
How scary is underelicitation? How much should the results about password-locked models [LW · GW] or arguments about being able to generate small numbers of high-quality labels or demonstrations [AF · GW] affect this?

benito on The Median Researcher Problem

Curated. I think this model is pretty useful and well-compressed, and I'm glad to be able to concisely link to it.

The policy implications are still much open to debate, for here on LessWrong and for other ecosystems in the world.

hastings-greer on Both-Sidesism—When Fair & Balanced Goes Wrong

I have observed a transition. 12 years ago, the left-right split was based on many loosely correlated factors and strategic/inertial effects, creating bizarre situations like near perfect correlation between opinions on Gay Marriage and privatization of social security. I think at that time you could reason much better if you could recognize that the separation between left and right was not natural. I at least have a ton of cached arguments from this era because it became such a familiar dynamic.

Nowadays, I don't think this old schema really applies, especially among the actual elected officers and party leadership. The effective left right split is mono-factor: you are right exactly in proportion to your personal loyalty to one Donald J. Trump, resulting in bizarre situations like Dick Cheney being classified as "Left."