LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

Transformers Represent Belief State Geometry in their Residual Stream
Adam Shai (adam-shai) · 2024-04-16T21:16:11.377Z · comments (64)

[link] Thoughts on seed oil
dynomight · 2024-04-20T12:29:14.212Z · comments (98)

Express interest in an "FHI of the West"
habryka (habryka4) · 2024-04-18T03:32:58.592Z · comments (41)

[link] Paul Christiano named as US AI Safety Institute Head of AI Safety
Joel Burget (joel-burget) · 2024-04-16T16:22:06.937Z · comments (58)

Funny Anecdote of Eliezer From His Sister
Daniel Birnbaum (daniel-birnbaum) · 2024-04-22T22:05:31.886Z · comments (4)

[link] FHI (Future of Humanity Institute) has shut down (2005–2024)
gwern · 2024-04-17T13:54:16.791Z · comments (21)

[question] Examples of Highly Counterfactual Discoveries?
johnswentworth · 2024-04-23T22:19:19.399Z · answers+comments (89)

Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer
johnswentworth · 2024-04-18T00:27:43.451Z · comments (17)

Refusal in LLMs is mediated by a single direction
Andy Arditi (andy-arditi) · 2024-04-27T11:13:06.235Z · comments (52)

[link] Daniel Dennett has died (1942-2024)
kave · 2024-04-19T16:17:04.742Z · comments (5)

On Not Pulling The Ladder Up Behind You
Screwtape · 2024-04-26T21:58:29.455Z · comments (10)

[link] Simple probes can catch sleeper agents
Monte M (montemac) · 2024-04-23T21:10:47.784Z · comments (15)

[link] Introducing AI Lab Watch
Zach Stein-Perlman · 2024-04-30T17:00:12.652Z · comments (2)

The first future and the best future
KatjaGrace · 2024-04-25T06:40:04.510Z · comments (10)

Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight
Sam Marks (samuel-marks) · 2024-04-18T16:17:39.136Z · comments (7)

Ironing Out the Squiggles
Zack_M_Davis · 2024-04-29T16:13:00.371Z · comments (8)

Priors and Prejudice
MathiasKB (MathiasKirkBonde) · 2024-04-22T15:00:41.782Z · comments (16)

[link] Essay competition on the Automation of Wisdom and Philosophy — $25k in prizes
owencb · 2024-04-16T10:10:13.338Z · comments (6)

A couple productivity tips for overthinkers
Steven Byrnes (steve2152) · 2024-04-20T16:05:50.332Z · comments (9)

Creating unrestricted AI Agents with Command R+
Simon Lermen (dalasnoin) · 2024-04-16T14:52:50.917Z · comments (12)

[Full Post] Progress Update #1 from the GDM Mech Interp Team
Neel Nanda (neel-nanda-1) · 2024-04-19T19:06:59.185Z · comments (8)

When is a mind me?
Rob Bensinger (RobbBB) · 2024-04-17T05:56:38.482Z · comments (61)

Mid-conditional love
KatjaGrace · 2024-04-17T04:00:08.341Z · comments (19)

[Summary] Progress Update #1 from the GDM Mech Interp Team
Neel Nanda (neel-nanda-1) · 2024-04-19T19:06:17.755Z · comments (0)

[link] Motivation gaps: Why so much EA criticism is hostile and lazy
titotal (lombertini) · 2024-04-22T11:49:59.389Z · comments (5)

[link] The Inner Ring by C. S. Lewis
Saul Munn (saul-munn) · 2024-04-24T22:48:09.228Z · comments (6)

Rejecting Television
Declan Molony (declan-molony) · 2024-04-23T04:59:50.253Z · comments (7)

Constructability: Plainly-coded AGIs may be feasible in the near future
Épiphanie Gédéon (joy_void_joy) · 2024-04-27T16:04:45.894Z · comments (12)

Duct Tape security
Isaac King (KingSupernova) · 2024-04-26T18:57:05.659Z · comments (8)

On Llama-3 and Dwarkesh Patel’s Podcast with Zuckerberg
Zvi · 2024-04-22T13:10:02.645Z · comments (4)

[link] Improving Dictionary Learning with Gated Sparse Autoencoders
Senthooran Rajamanoharan (SenR) · 2024-04-25T18:43:47.003Z · comments (25)

AISC9 has ended and there will be an AISC10
Linda Linsefors · 2024-04-29T10:53:18.812Z · comments (2)

Take the wheel, Shoggoth! (Lesswrong is trying out changes to the frontpage algorithm)
Ruby · 2024-04-23T03:58:43.443Z · comments (15)

So What's Up With PUFAs Chemically?
J Bostock (Jemist) · 2024-04-27T13:32:52.159Z · comments (23)

Changes in College Admissions
Zvi · 2024-04-24T13:50:03.487Z · comments (10)

Experiment on repeating choices
KatjaGrace · 2024-04-19T04:20:03.992Z · comments (1)

[link] "Why I Write" by George Orwell (1946)
Arjun Panickssery (arjun-panickssery) · 2024-04-25T16:02:28.668Z · comments (2)

[link] This is Water by David Foster Wallace
Nathan Young · 2024-04-24T21:21:09.445Z · comments (16)

Transfer Learning in Humans
niplav · 2024-04-21T20:49:42.595Z · comments (1)

[link] Let's Design A School, Part 1
Sable · 2024-04-23T21:50:20.937Z · comments (1)

LessOnline Festival Updates Thread
Ben Pace (Benito) · 2024-04-18T21:55:08.003Z · comments (15)

Superposition is not "just" neuron polysemanticity
LawrenceC (LawChan) · 2024-04-26T23:22:06.066Z · comments (3)

[link] Moving on from community living
Vika · 2024-04-17T17:02:11.357Z · comments (7)

Spatial attention as a “tell” for empathetic simulation?
Steven Byrnes (steve2152) · 2024-04-26T15:10:58.040Z · comments (11)

Towards Multimodal Interpretability: Learning Sparse Interpretable Features in Vision Transformers
hugofry · 2024-04-29T20:57:35.127Z · comments (3)

[link] LLMs seem (relatively) safe
JustisMills · 2024-04-25T22:13:06.221Z · comments (23)

Why I'm doing PauseAI
Joseph Miller (Josephm) · 2024-04-30T16:21:54.156Z · comments (1)

AI #60: Oh the Humanity
Zvi · 2024-04-18T14:10:02.281Z · comments (7)

I'm open for projects (sort of)
cousin_it · 2024-04-18T18:05:01.395Z · comments (12)

[link] LLM Evaluators Recognize and Favor Their Own Generations
Arjun Panickssery (arjun-panickssery) · 2024-04-17T21:09:12.007Z · comments (1)

next page (older posts) →

Archive

Recent comments

weightt-an on LLMs could be as conscious as human emulations, potentially

>It seems you are arguing that anything that presents like it is conscious implies that it is conscious.

No? That's definitely not what I'm arguing.

>But what ultimately matters is what this thing IS, not how it became in that way. If, this thing internalized that conscious type of processing from scratch, without having it natively, then resulting mind isn't worse than the one that evolution engineered with more granularity. Doesn't matter if this human was assembled atom by atom on molecular assembler, it's still a conscious human.

Look, here I'm talking about pathways to acquire that "structure" inside you. Not outlook of it.

avturchin on Magic by forgetting

non-disease copies do not need to perform any changes in their meditation routine in this model, assuming that they naturelly forget their disease status during meditation.

mondsemmel on Thoughts on seed oil

You might appreciate the perspective in the short post Statistical models & the irrelevance of rare exceptions [LW · GW]. (I previously commented [LW(p) · GW(p)] something similar on a post by Duncan.)

ablue on LLMs could be as conscious as human emulations, potentially

I don't think that in the example you give, you're making a token-predicting transformer out of a human emulation; you're making a token-predicting transformer out of a virtual system with a human emulation as a component. In the system, the words "what's your earliest memory?" appearing on the paper are going to trigger all sorts of interesting (emulated) neural mechanisms that eventually lead to a verbal response, but the token predictor doesn't necessarily need to emulate any of that. In fact, if the emulation is deterministic, it can just memorize whatever response is given. Maybe gradient descent is likely to make the LLM conscious in order to efficiently memorize the outputs of a partly conscious system, but that's not obvious.

If you have a brain emulation, the best way to get a conscious LLM seems to me like it would be finding a way to tokenize emulation states and training it on those.

gunnar_zarncke on LLMs could be as conscious as human emulations, potentially

Ok. It seems you are arguing that anything that presents like it is conscious implies that it is conscious. You are not arguing whether or not the structure of LLMs can give rise to consciousness.

But then your argument is a social argument. I'm fine with a social definition of consciousness - after all, our actions depend to a large degree on social feedback and morals (about what beings have value) at different times have been very different and thus been socially construed.

But then why are you making a structural argument about LLMs in the end?

PS. In fact, I commented on the filler symbol paper when Xixidu posted about it and I don't think that's a good comparison.

seth-herd on The Prop-room and Stage Cognitive Architecture

I agree with all of that. Even being sceptical that LLMs plus search will reach AGI. The lack of constraint satisfaction as the human brain does it could be a real stumbling block.

But LLMs have copied a good bit of our reasoning and therefore our semantic search. So they can do something like constraint satisfaction.

Put the constraints into a query, and the answer will satisfy those constraints. The process used is different than a human brain, but for every problem I can think of, the results are the same.

Now, that's partly because every problem I can think of is one I've already seen solved. But my ability to do truly novel problem solving is rarely used and pretty limitted. So I'm not sure the LLM can't do just as good a job if it had a scaffolded script to explore its knowledge base from a few different angles.

metachirality on What is the easiest/funnest way to build up a comprehensive understanding of AI and AI Safety?

Vanessa Kosoy has a list specifically for her alignment agenda but is probably applicable to agent foundations in general: https://www.alignmentforum.org/posts/fsGEyCYhqs7AWwdCe/learning-theoretic-agenda-reading-list [AF · GW]

review-bot on Express interest in an "FHI of the West"

The LessWrong Review [? · GW] runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year. Will this post make the top fifty?

gwern on Andrew Burns's Shortform

Altman made a Twitter-edit joke about 'gpt-2 i mean gpt2', so at this point, I think it's just a funny troll-name related to the 'v2 personality' which makes it a successor to the ChatGPT 'v1', presumably, 'personality'. See, it's gptv2 geddit not gpt-2? very funny, everyone lol at troll

drake-morrison on What is the easiest/funnest way to build up a comprehensive understanding of AI and AI Safety?

If you can code, build a small AI with the fast.ai course. This will (hopefully) be fun while also showing you particular holes in your knowledge to improve, rather than a vague feeling of "learn more".

If you want to follow along with more technical papers, you need to know the math of machine learning: linear algebra, multivariable calculus, and probability theory. For Agent Foundations work, you'll need more logic and set theory type stuff.

MIRI has some recommendations for textbooks here. There's also the Study Guide [LW · GW] and this sequence on leveling up [? · GW].

3blue1brown's Youtube has good videos for a lot of this, if that's the medium you like.

If you like non-standard fiction, some people like Project Lawful.

At the end of the day, it's not a super well-defined field that has clear on-ramps into the deeper ends. You just gotta start somewhere, and follow your curiosity. Have fun!