LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

On Claude 3.5 Sonnet
Zvi · 2024-06-24T12:00:05.719Z · comments (14)

[link] Compact Proofs of Model Performance via Mechanistic Interpretability
LawrenceC (LawChan) · 2024-06-24T19:27:21.214Z · comments (3)

[question] How to get nerds fascinated about mysterious chronic illness research?
riceissa · 2024-05-27T22:58:29.707Z · answers+comments (50)

I am the Golden Gate Bridge
Zvi · 2024-05-27T14:40:03.216Z · comments (6)

[link] Seven lessons I didn't learn from election day
Eric Neyman (UnexpectedValues) · 2024-11-14T18:39:07.053Z · comments (33)

[link] Ideological Bayesians
Kevin Dorst · 2024-02-25T14:17:25.070Z · comments (4)

[link] I found >800 orthogonal "write code" steering vectors
Jacob G-W (g-w1) · 2024-07-15T19:06:17.636Z · comments (19)

[link] Almost everyone I’ve met would be well-served thinking more about what to focus on
Henrik Karlsson (henrik-karlsson) · 2024-01-05T21:01:27.861Z · comments (8)

[link] Ilya Sutskever created a new AGI startup
harfe · 2024-06-19T17:17:17.366Z · comments (35)

[question] Which things were you surprised to learn are not metaphors?
Eric Neyman (UnexpectedValues) · 2024-11-21T18:56:18.025Z · answers+comments (52)

[link] Explaining Impact Markets
Saul Munn (saul-munn) · 2024-01-31T09:51:27.587Z · comments (2)

[link] MIRI's April 2024 Newsletter
Harlan · 2024-04-12T23:38:20.781Z · comments (0)

[link] Uncovering Deceptive Tendencies in Language Models: A Simulated Company AI Assistant
Olli Järviniemi (jarviniemi) · 2024-05-06T07:07:05.019Z · comments (13)

[link] Things You’re Allowed to Do: University Edition
Saul Munn (saul-munn) · 2024-02-06T00:36:11.690Z · comments (13)

[link] RAND report finds no effect of current LLMs on viability of bioterrorism attacks
StellaAthena · 2024-01-25T19:17:30.493Z · comments (14)

Sparsify: A mechanistic interpretability research agenda
Lee Sharkey (Lee_Sharkey) · 2024-04-03T12:34:12.043Z · comments (22)

Counting arguments provide no evidence for AI doom
Nora Belrose (nora-belrose) · 2024-02-27T23:03:49.296Z · comments (188)

Live Theory Part 0: Taking Intelligence Seriously
Sahil · 2024-06-26T21:37:10.479Z · comments (3)

Towards a Less Bullshit Model of Semantics
johnswentworth · 2024-06-17T15:51:06.060Z · comments (44)

[link] the Giga Press was a mistake
bhauth · 2024-08-21T04:51:24.150Z · comments (26)

[link] Anthropic: Three Sketches of ASL-4 Safety Case Components
Zach Stein-Perlman · 2024-11-06T16:00:06.940Z · comments (33)

Apollo Research 1-year update
Marius Hobbhahn (marius-hobbhahn) · 2024-05-29T17:44:32.484Z · comments (0)

[question] What are the best arguments for/against AIs being "slightly 'nice'"?
Raemon · 2024-09-24T02:00:19.605Z · answers+comments (54)

[link] Sabotage Evaluations for Frontier Models
David Duvenaud (david-duvenaud) · 2024-10-18T22:33:14.320Z · comments (55)

[link] Finishing The SB-1047 Documentary In 6 Weeks
Michaël Trazzi (mtrazzi) · 2024-10-28T20:17:47.465Z · comments (5)

[link] Executable philosophy as a failed totalizing meta-worldview
jessicata (jessica.liu.taylor) · 2024-09-04T22:50:18.294Z · comments (40)

A Solomonoff Inductor Walks Into a Bar: Schelling Points for Communication
johnswentworth · 2024-07-26T00:33:42.000Z · comments (1)

It's time for a self-reproducing machine
Carl Feynman (carl-feynman) · 2024-08-07T21:52:22.819Z · comments (68)

Information vs Assurance
johnswentworth · 2024-10-20T23:16:25.762Z · comments (7)

Notes on Dwarkesh Patel’s Podcast with Demis Hassabis
Zvi · 2024-03-01T16:30:08.687Z · comments (0)

2024 Petrov Day Retrospective
Ben Pace (Benito) · 2024-09-28T21:30:14.952Z · comments (25)

[link] Against Aschenbrenner: How 'Situational Awareness' constructs a narrative that undermines safety and threatens humanity
GideonF · 2024-07-15T18:37:40.232Z · comments (17)

You can, in fact, bamboozle an unaligned AI into sparing your life
David Matolcsi (matolcsid) · 2024-09-29T16:59:43.942Z · comments (171)

Science advances one funeral at a time
Cameron Berg (cameron-berg) · 2024-11-01T23:06:19.381Z · comments (9)

Takeoff speeds presentation at Anthropic
Tom Davidson (tom-davidson-1) · 2024-06-04T22:46:35.448Z · comments (0)

SB 1047: Final Takes and Also AB 3211
Zvi · 2024-08-27T22:10:07.647Z · comments (11)

[link] China Hawks are Manufacturing an AI Arms Race
garrison · 2024-11-20T18:17:51.958Z · comments (13)

Everything Wrong with Roko's Claims about an Engineered Pandemic
WitheringWeights (EZ97) · 2024-02-22T15:59:08.439Z · comments (10)

On attunement
Joe Carlsmith (joekc) · 2024-03-25T12:47:34.856Z · comments (8)

OpenAI: The Board Expands
Zvi · 2024-03-12T14:00:04.110Z · comments (1)

[question] Am I confused about the "malign universal prior" argument?
nostalgebraist · 2024-08-27T23:17:22.779Z · answers+comments (33)

Catastrophic sabotage as a major threat model for human-level AI systems
evhub · 2024-10-22T20:57:11.395Z · comments (8)

Bigger Livers?
sarahconstantin · 2024-11-08T21:50:09.814Z · comments (10)

Circular Reasoning
abramdemski · 2024-08-05T18:10:32.736Z · comments (36)

New page: Integrity
Zach Stein-Perlman · 2024-07-10T15:00:41.050Z · comments (3)

Quotes from Leopold Aschenbrenner’s Situational Awareness Paper
Zvi · 2024-06-07T11:40:03.981Z · comments (10)

How to train your own "Sleeper Agents"
evhub · 2024-02-07T00:31:42.653Z · comments (11)

Announcing Neuronpedia: Platform for accelerating research into Sparse Autoencoders
Johnny Lin (hijohnnylin) · 2024-03-25T21:17:58.421Z · comments (7)

Meaning & Agency
abramdemski · 2023-12-19T22:27:32.123Z · comments (17)

Just admit that you’ve zoned out
joec · 2024-06-04T02:51:27.594Z · comments (22)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

avturchin on Are You More Real If You're Really Forgetful?

I'm inclined to bite this bullet too, though it feels somewhat strange. Weird implication: you can increase the amount of reality-fluid assigned to you by giving yourself amnesia.

I explored a similar line of reasoning here: Magic by forgetting [LW · GW]

I think that yes, the sameness of humans as agents is generated by the process of self-identification in which a human being is identifies herself through a short string of information "Name, age, sex, profession + few more kilobytes". Evidence for this is the success of improv theatre, where people quickly adopt completely new roles through one-line instructions.

If yes, then we should expect ourselves to be agents that exist in a universe that abstracts well, because "high-level agents" embedded in such universes are "supported" by a larger equivalence class of universes (since they draw on reality fluid from an entire pool of "low-level" agents).

I think that your conclusion is valid.

keltan on Which things were you surprised to learn are not metaphors?

If I’ll probably see them again, I don’t miss people. I thought people saying they miss you were just being overly polite.

interstice on lemonhope's Shortform

Yeah I definitely agree you should start learning as young as possible. I think I would usually advise someone starting out to learn general math/CS stuff and do AI safety on the side, since there's way more high-quality knowledge in those fields. Although "just dive in to AI" seems to have worked out well for some people like Chris Olah, and timelines are plausibly pretty short so ¯\_(ツ)_/¯

yams on yams's Shortform

Yes this world.

drossbucket on Doing Research Part-Time is Great

Interesting post! I’ve wondered the same thing before.

I’m doing a much more half-arsed version, as a casual quantum foundations enjoyer alongside a technical writing job, and also getting endlessly distracted by other things I find interesting, so my output is not impressive. But it’s a pretty fun hobby and I’m surprised more people don’t try this!

paulpauls on Mechanistic Interpretability of Llama 3.2 with Sparse Autoencoders

Hi Neel,

you're absolutely right, all research in the gemmascope paper was performed on the open source Gemma 2 model. I wanted to group up all research that my paper was based on in a concise sentence and by doing so erroneously put you in the 'proprietary LLMs' section. I went ahead and corrected the mistake.

My apologies.

I hope you still enjoyed the project and thank you for your great research work at DeepMind. =)

loops on Keeping Your Identity Small

Could you post coordinates next time? I can't find the entrance on Elizabeth St. you're referring to

carey-underwood on [deleted]

The corresponding arbital page is now (apparently) dead.

carey-underwood on [deleted]

A link appears to have broken, does anyone know what “null” was supposed to link to in “policy null ” (note the extra spaces around “null”

directedevolution on A very strange probability paradox

I had to write several new Python versions of the code to explore the problem before it clicked for me.

I understand the proof, but the closest I can get to a true intuition that B is bigger is:

Imagine you just rolled your first 6, haven't rolled any odds yet, and then you roll a 2 or a 4.
In the consecutive-6 condition, it's quite unlikely you'll end up keeping this sequence, because you now still have to get two 6s before rolling any odds.
In the two-6 condition, you are much more likely to end up keeping this sequence, which is guaranteed to include at least one 2 or 4, and likely to include more than one before you roll that 6.

I think the main think I want to remember is that "given" or "conditional on X" means that you use the unconditional probability distribution and throw out results not conforming to X, not that you substitute a different generating function that always generates events conforming to X.