LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Loving a world you don’t trust
Joe Carlsmith (joekc) · 2024-06-18T19:31:36.581Z · comments (13)

The Worst Form Of Government (Except For Everything Else We've Tried)
johnswentworth · 2024-03-17T18:11:38.374Z · comments (47)

An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2
Neel Nanda (neel-nanda-1) · 2024-07-07T17:39:35.064Z · comments (15)

Limitations on Formal Verification for AI Safety
Andrew Dickson · 2024-08-19T23:03:52.706Z · comments (60)

[link] Simple probes can catch sleeper agents
Monte M (montemac) · 2024-04-23T21:10:47.784Z · comments (21)

[link] "AI achieves silver-medal standard solving International Mathematical Olympiad problems"
gjm · 2024-07-25T15:58:57.638Z · comments (38)

Processor clock speeds are not how fast AIs think
Ege Erdil (ege-erdil) · 2024-01-29T14:39:38.050Z · comments (55)

On saying "Thank you" instead of "I'm Sorry"
Michael Cohn (michael-cohn) · 2024-07-08T03:13:50.663Z · comments (16)

A Dozen Ways to Get More Dakka
Davidmanheim · 2024-04-08T04:45:19.427Z · comments (11)

The Dark Arts
lsusr · 2023-12-19T04:41:13.356Z · comments (49)

Why I don't believe in the placebo effect
transhumanist_atom_understander · 2024-06-10T02:37:07.776Z · comments (22)

The case for training frontier AIs on Sumerian-only corpus
Alexandre Variengien (alexandre-variengien) · 2024-01-15T16:40:22.011Z · comments (15)

My simple AGI investment & insurance strategy
lc · 2024-03-31T02:51:53.479Z · comments (27)

Notice When People Are Directionally Correct
Chris_Leong · 2024-01-14T14:12:37.090Z · comments (8)

[link] "Can AI Scaling Continue Through 2030?", Epoch AI (yes)
gwern · 2024-08-24T01:40:32.929Z · comments (4)

Near-mode thinking on AI
Olli Järviniemi (jarviniemi) · 2024-08-04T20:47:28.085Z · comments (8)

Updatelessness doesn't solve most problems
Martín Soto (martinsq) · 2024-02-08T17:30:11.266Z · comments (43)

Circuits in Superposition: Compressing many small neural networks into one
Lucius Bushnaq (Lblack) · 2024-10-14T13:06:14.596Z · comments (8)

[question] Which things were you surprised to learn are not metaphors?
Eric Neyman (UnexpectedValues) · 2024-11-21T18:56:18.025Z · answers+comments (78)

How I started believing religion might actually matter for rationality and moral philosophy
zhukeepa · 2024-08-23T17:40:47.341Z · comments (41)

A Shutdown Problem Proposal
johnswentworth · 2024-01-21T18:12:48.664Z · comments (61)

An even deeper atheism
Joe Carlsmith (joekc) · 2024-01-11T17:28:31.843Z · comments (47)

Things I've Grieved
Raemon · 2024-02-18T19:32:47.169Z · comments (6)

[link] OpenAI's CBRN tests seem unclear
LucaRighetti (Error404Dinosaur) · 2024-11-21T17:28:30.290Z · comments (6)

Pantheon Interface
NicholasKees (nick_kees) · 2024-07-08T19:03:51.681Z · comments (22)

Community Notes by X
NicholasKees (nick_kees) · 2024-03-18T17:13:33.195Z · comments (15)

[link] Bayesian Injustice
Kevin Dorst · 2023-12-14T15:44:08.664Z · comments (10)

Subskills of "Listening to Wisdom"
Raemon · 2024-12-09T03:01:18.706Z · comments (14)

[question] What do coherence arguments actually prove about agentic behavior?
sunwillrise (andrei-alexandru-parfeni) · 2024-06-01T09:37:28.451Z · answers+comments (35)

[link] Steering Llama-2 with contrastive activation additions
Nina Panickssery (NinaR) · 2024-01-02T00:47:04.621Z · comments (29)

BIG-Bench Canary Contamination in GPT-4
Jozdien · 2024-10-22T15:40:48.166Z · comments (13)

Parasites (not a metaphor)
lemonhope (lcmgcd) · 2024-08-08T20:07:13.593Z · comments (17)

Do you believe in hundred dollar bills lying on the ground? Consider humming
Elizabeth (pktechgirl) · 2024-05-16T00:00:05.257Z · comments (22)

[link] Investigating the Chart of the Century: Why is food so expensive?
Maxwell Tabarrok (maxwell-tabarrok) · 2024-08-16T13:21:23.596Z · comments (26)

Why I take short timelines seriously
NicholasKees (nick_kees) · 2024-01-28T22:27:21.098Z · comments (29)

"The Solomonoff Prior is Malign" is a special case of a simpler argument
David Matolcsi (matolcsid) · 2024-11-17T21:32:34.711Z · comments (44)

Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
Erik Jenner (ejenner) · 2024-06-04T15:50:47.475Z · comments (14)

Natural Latents: The Math
johnswentworth · 2023-12-27T19:03:01.923Z · comments (37)

RTFB: On the New Proposed CAIP AI Bill
Zvi · 2024-04-10T18:30:08.410Z · comments (14)

Current AIs Provide Nearly No Data Relevant to AGI Alignment
Thane Ruthenis · 2023-12-15T20:16:09.723Z · comments (155)

Awakening
lsusr · 2024-05-30T07:03:00.821Z · comments (79)

[link] Miles Brundage resigned from OpenAI, and his AGI readiness team was disbanded
garrison · 2024-10-23T23:40:57.180Z · comments (1)

Efficient Dictionary Learning with Switch Sparse Autoencoders
Anish Mudide (anish-mudide) · 2024-07-22T18:45:53.502Z · comments (19)

AI catastrophes and rogue deployments
Buck · 2024-06-03T17:04:51.206Z · comments (16)

The Standard Analogy
Zack_M_Davis · 2024-06-03T17:15:42.327Z · comments (28)

AI Alignment Metastrategy
Vanessa Kosoy (vanessa-kosoy) · 2023-12-31T12:06:11.433Z · comments (13)

A List of 45+ Mech Interp Project Ideas from Apollo Research’s Interpretability Team
Lee Sharkey (Lee_Sharkey) · 2024-07-18T14:15:50.248Z · comments (18)

[question] Which skincare products are evidence-based?
Vanessa Kosoy (vanessa-kosoy) · 2024-05-02T15:22:12.597Z · answers+comments (48)

A bird's eye view of ARC's research
Jacob_Hilton · 2024-10-23T15:50:06.123Z · comments (12)

Passages I Highlighted in The Letters of J.R.R.Tolkien
Ivan Vendrov (ivan-vendrov) · 2024-11-25T01:47:59.071Z · comments (10)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

legobridge on (The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser

Thank you, my US employer uses Benevity too and I found Lightcone without any problem, probably because of this thread. My employer is doing a 2x match due to Giving Tuesday so my contribution got tripled!

sil-ver on Computational functionalism probably can't explain phenomenal consciousness

I think causal closure of the kind that matters here just means that the abstract description (in this case, of the brain as performing an algorithm/computation) captures all relevant features of the the physical description, not that it has no dependence on inputs. Should probably be renamed something like "abstraction adequacy" (making this up right now, I don't have a term on shelf for this property). Abstraction (in)adequacy is relevant for CF I believe (I think it's straight-forward why?). Randomness probably doesn't matter since you can include this in the abstract description.

viliam on Why Isn't Tesla Level 3?

I wonder how much safer roads could be if human drivers were all a little more patient?

Depends on country. In my opinion, in Switzerland, the human drivers are already almost perfect. In Italy, I was surprised that most of them somehow manage to survive the day.

viliam on the gears to ascenscion's Shortform

From today's perspective, Marx is just another old white cishet tech bro. (something something swims left)

I never expected that one day I would miss the old-style Marxists, but God forgive me, I do. We disagreed on many things, but at least we were able to have an intelligent debate.

nathan-young on Nathan Young's Shortform

Some thoughts on Rootclaim

Blunt, quick. Weakly held.

The platform has unrealized potential in facilitating Bayesian analysis and debate.

Either

The platform could be a simple reference document
The platform could be an interactive debate and truthseeking tool
The platform could be a way to search the rootclaim debates

Currently it does none of these and is frustrating to me.

Heading to the site I expect:

to be able to search the video debates
to be able to input my own probability estimates to the current bayesian framework
Failing this, I would prefer to just have a reference document which doesn't promise these

avturchin on The Dissolution of AI Safety

The problem of "humans hostile to humans" has two heavy tails: nuclear war and biological terrorism, which could kill all humans. A similar problem is the main AI risk: AI killing everyone for paperclips.

The central (and not often discussed) claim of AI safety is that the second situation is much more likely: it is more probable that AI will kill all humans than that humans will kill all humans. For example, by advocating for pausing AI development, we assume that the risks of nuclear war causing extinction are less than AI extinction risks.

If AI is used to kill humans as just one more weapon, it doesn't change anything stated above until AI evolves into an existential weapon (like a billion-drone swarm).

shash42 on Gradient Routing: Masking Gradients to Localize Computation in Neural Networks

Thanks for pointing me to Figure 12, it alleviates my concern! I don't fully agree with RMU being a stand-in for ascent based methods. Targeted representation noising (as done in RMU) seems easier to reverse than loss maximization methods (like TAR). Finally, just wanted to clarify that I see SSD/Potion more as automated mechanistic interpretability methods rather than finetuning-based. What I meant to say was that adding some retain set finetuning on top (as done for gradient routing) might be needed to make them work for tasks like unlearning virology.

experience-machine on Stan van Wingerden's Shortform

Using almost the same training parameters as above (I used full batch and train_frac=0.5 to get faster & more consistent grokking, but I don't think this matters here)

I did a few runs and the results all looked more or less like this. The training process of such toy models doesn't contain so many bits of interesting information, so I wouldn't be surprised if a variety of different metrics would capture this process in this case. (E.g. the training dynamics can be also modelled by an HMM, see here).

weightt-an on yams's Shortform

Shameless self promotion: this one https://www.lesswrong.com/posts/ASmcQYbhcyu5TuXz6/llms-could-be-as-conscious-as-human-emulations-potentially [LW · GW]

It circumvents object level question and instead looks at epistemic one.

This one is about broader direction in "how the things that happened change attitudes and opinions of people"

https://www.astralcodexten.com/p/sakana-strawberry-and-scary-ai

This one too, about consciousness in particular

https://dynomight.net/consciousness/

I think it's somewhat productive direction explored in these 3 posts, but it's not like very object level, more about epistemics of it all. I think you can look up how like LLM states overlap / predict / correspond with brain scans of people who engage in some tasks? I think there were a couple of paper on that.

E.g. here https://www.neuroai.science/p/brain-scores-dont-mean-what-we-think

viliam on Shortform

What is the next step in this direction? Neuralink? I wonder what horrors it will bring.