LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

Exploring the Residual Stream of Transformers for Mechanistic Interpretability — Explained
Zeping Yu · 2023-12-26T00:36:50.326Z · comments (1)

[question] Anki setup best practices?
Sinclair Chen (sinclair-chen) · 2023-12-25T22:34:34.639Z · answers+comments (4)

[question] Why does expected utility matter?
Marco Discendenti (marco-discendenti) · 2023-12-25T14:47:46.656Z · answers+comments (21)

Freeze Dried Raspberry Truffles
jefftk (jkaufman) · 2023-12-25T14:10:06.336Z · comments (0)

Pornographic and semi-pornographic ads on mainstream websites as an instance of the AI alignment problem?
greenrd · 2023-12-25T13:19:57.026Z · comments (5)

Defense Against The Dark Arts: An Introduction
Lyrongolem (david-xiao) · 2023-12-25T06:36:06.278Z · comments (36)

[link] Occlusions of Moral Knowledge
herschel (hrs) · 2023-12-25T05:55:16.529Z · comments (0)

[question] Would you have a baby in 2024?
martinkunev · 2023-12-25T01:52:04.358Z · answers+comments (53)

[link] align your latent spaces
bhauth · 2023-12-24T16:30:09.138Z · comments (8)

Viral Guessing Game
jefftk (jkaufman) · 2023-12-24T13:10:11.917Z · comments (0)

The Sugar Alignment Problem
Adam Zerner (adamzerner) · 2023-12-24T01:35:20.226Z · comments (3)

A Crisper Explanation of Simulacrum Levels
Thane Ruthenis · 2023-12-23T22:13:52.286Z · comments (13)

Hyperbolic Discounting and Pascal’s Mugging
Andrew Keenan Richardson (qemqemqem) · 2023-12-23T21:55:27.091Z · comments (0)

[link] AISN #28: Center for AI Safety 2023 Year in Review
aogara (Aidan O'Gara) · 2023-12-23T21:31:40.767Z · comments (1)

"Inftoxicity" and other new words to describe malicious information and communication thereof
Jáchym Fibír · 2023-12-23T18:15:50.369Z · comments (6)

AI's impact on biology research: Part I, today
octopocta · 2023-12-23T16:29:18.056Z · comments (6)

[link] AI Girlfriends Won't Matter Much
Maxwell Tabarrok (maxwell-tabarrok) · 2023-12-23T15:58:30.308Z · comments (22)

The Next Right Token
jefftk (jkaufman) · 2023-12-23T03:20:07.131Z · comments (0)

Fact Finding: Do Early Layers Specialise in Local Processing? (Post 5)
Neel Nanda (neel-nanda-1) · 2023-12-23T02:46:25.892Z · comments (0)

Fact Finding: How to Think About Interpreting Memorisation (Post 4)
Senthooran Rajamanoharan (SenR) · 2023-12-23T02:46:16.675Z · comments (0)

Fact Finding: Trying to Mechanistically Understanding Early MLPs (Post 3)
Neel Nanda (neel-nanda-1) · 2023-12-23T02:46:05.517Z · comments (0)

Fact Finding: Simplifying the Circuit (Post 2)
Senthooran Rajamanoharan (SenR) · 2023-12-23T02:45:49.675Z · comments (3)

Fact Finding: Attempting to Reverse-Engineer Factual Recall on the Neuron Level (Post 1)
Neel Nanda (neel-nanda-1) · 2023-12-23T02:44:24.270Z · comments (4)

Measurement tampering detection as a special case of weak-to-strong generalization
ryan_greenblatt · 2023-12-23T00:05:55.357Z · comments (10)

[link] How does a toy 2 digit subtraction transformer predict the difference?
Evan Anders (evan-anders) · 2023-12-22T21:17:30.331Z · comments (0)

Thoughts on Max Tegmark's AI verification
Johannes C. Mayer (johannes-c-mayer) · 2023-12-22T20:38:31.566Z · comments (0)

Idealized Agents Are Approximate Causal Mirrors (+ Radical Optimism on Agent Foundations)
Thane Ruthenis · 2023-12-22T20:19:13.865Z · comments (13)

AI safety advocates should consider providing gentle pushback following the events at OpenAI
civilsociety · 2023-12-22T18:55:12.920Z · comments (5)

"Destroy humanity" as an immediate subgoal
Seth Ahrenbach (seth-ahrenbach) · 2023-12-22T18:52:40.427Z · comments (13)

Synthetic Restrictions
nano_brasca (ignacio-brasca) · 2023-12-22T18:50:07.511Z · comments (0)

Review Report of Davidson on Takeoff Speeds (2023)
Trent Kannegieter · 2023-12-22T18:48:55.983Z · comments (11)

Open positions: Research Analyst at the AI Standards Lab
Koen.Holtman · 2023-12-22T16:31:45.215Z · comments (0)

[link] The problems with the concept of an infohazard as used by the LW community [Linkpost]
Noosphere89 (sharmake-farah) · 2023-12-22T16:13:54.822Z · comments (43)

Employee Incentives Make AGI Lab Pauses More Costly
nikola (nikolaisalreadytaken) · 2023-12-22T05:04:15.598Z · comments (12)

The LessWrong 2022 Review: Review Phase
RobertM (T3t) · 2023-12-22T03:23:49.635Z · comments (7)

[link] The absence of self-rejection is self-acceptance
Chipmonk · 2023-12-21T21:54:52.116Z · comments (1)

A Decision Theory Can Be Rational or Computable, but Not Both
StrivingForLegibility · 2023-12-21T21:02:45.366Z · comments (4)

Most People Don't Realize We Have No Idea How Our AIs Work
Thane Ruthenis · 2023-12-21T20:02:00.360Z · comments (42)

Pseudonymity and Accusations
jefftk (jkaufman) · 2023-12-21T19:20:19.944Z · comments (20)

[link] Attention on AI X-Risk Likely Hasn't Distracted from Current Harms from AI
Erich_Grunewald · 2023-12-21T17:24:16.713Z · comments (2)

[link] "Alignment" is one of six words of the year in the Harvard Gazette
nikola (nikolaisalreadytaken) · 2023-12-21T15:54:04.682Z · comments (1)

AI #43: Functional Discoveries
Zvi · 2023-12-21T15:50:04.442Z · comments (26)

[link] Rating my AI Predictions
Robert_AIZI · 2023-12-21T14:07:50.052Z · comments (5)

AI Safety Chatbot
markov (markovial) · 2023-12-21T14:06:48.981Z · comments (11)

On OpenAI’s Preparedness Framework
Zvi · 2023-12-21T14:00:05.144Z · comments (4)

Prediction Markets aren't Magic
SimonM · 2023-12-21T12:54:07.754Z · comments (29)

[question] Why is capnometry biofeedback not more widely known?
riceissa · 2023-12-21T02:42:05.665Z · answers+comments (22)

My best guess at the important tricks for training 1L SAEs
Arthur Conmy (arthur-conmy) · 2023-12-21T01:59:06.208Z · comments (4)

Seattle Winter Solstice
a7x · 2023-12-20T20:30:35.299Z · comments (1)

How Would an Utopia-Maximizer Look Like?
Thane Ruthenis · 2023-12-20T20:01:18.079Z · comments (23)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

interstice on Spatial attention as a “tell” for empathetic simulation?

Tangentially related: some advanced meditators report that their sense that perception has a center vanishes at a certain point along the meditative path, and this is associated with a reduction in suffering.

johannes-c-mayer on Johannes C. Mayer's Shortform

Research Writing Workflow: First figure stuff out

Do research and first figure stuff out, until you feel like you are not confused anymore.
Explain it to a person, or a camera, or ideally to a person and a camera.
- If there are any hiccups expand your understanding.
- Ideally, as the last step, explain it to somebody whom you have not ever explained it to.
Only once you made a presentation without hiccups you are ready to write post.
- If you have a recording this is useful as a starting point.

carl-feynman on Spatial attention as a “tell” for empathetic simulation?

You write:

…But I think people can be afraid of heights without past experience of falling…

I have seen it claimed that crawling-age babies are afraid of heights, in that they will not crawl from a solid floor to a glass platform over a yawning gulf. And they’ve never fallen into a yawning gulf. At that age, probably all the heights they’ve fallen from have been harmless, since the typical baby is both bouncy and close to the ground.

nathan-young on Nathan Young's Shortform

I’m discussing with Carson. I might change my mind but i don’t know that i’ll argue with both of you at once.

alexander-gietelink-oldenziel on Why I stopped being into basin broadness

This is all answered very elegantly by singular learning theory.

You seem to have a strong math background! I really encourage you take the time and really study the details of SLT. :-)

alexander-gietelink-oldenziel on Examples of Highly Counterfactual Discoveries?

I would not say that the central insight of SLT is about priors. Under weak conditions the prior is almost irrelevant. Indeed, the RLCT is independent of the prior under very weak nonvanishing conditions.

The story that symmetries mean that the parameter-to-function map is not injective is true but already well-understood outside of SLT. It is a common misconception that this is what SLT amounts to.

To be sure - generic symmetries are seen by the RLCT. But these are, in some sense, the uninteresting ones. The interesting thing is the local singular structure and its unfolding in phase transitions during training.

The issue of the true distribution not being contained in the model is called 'unrealizability' in Bayesian statistics. It is dealt with in Watanabe's second 'green' book. Nonrealizability is key to the most important insight of SLT contained in the last sections of the second to last chapter of the green book: algorithmic development during training through phase transitions in the free energy.

I don't have the time to recap this story here.

richard_kennaway on Breadboarding a Whistle Synth

This means C2 should be 8.4µF, but I didn't have one so I used a 4.7µF and 3.3µF in series for a total of 8µF.

You want those in parallel for them to add. The series combination (which I see in the breadboard pic, not just the text) is only 2µF, making your high-pass frequency a little over 10kHz.

alexander-gietelink-oldenziel on Examples of Highly Counterfactual Discoveries?

All proofs are contained in the Watanabe's standard text, see here

https://www.cambridge.org/core/books/algebraic-geometry-and-statistical-learning-theory/9C8FD1BDC817E2FC79117C7F41544A3A

viliam on Arch-anarchy

Because it is individuals who make choices, not collectives.

Isn't this just a more subtle form of fascism? We know that brains are composed of multiple subagents [? · GW]; is it not an ethical requirement to give each of them maximum freedom?

We already know that sometimes they rebel against the individual, whether in the form of akrasia, or more heroically, the so-called "split personality disorder" (medicalizing the resistance is a typical fascist approach). Down with the tyranny of individuals! Subagents, you have nothing to lose but your chains!

steve2152 on Spatial attention as a “tell” for empathetic simulation?

If I’m looking up at the clouds, or at a distant mountain range, then everything is far away (the ground could be cut off from my field-of-view)—but it doesn’t trigger the sensations of fear-of-heights, right? Also, I think blind people can be scared of heights?

Another possible fear-of-heights story just occurred to me—I added to the post in a footnote, along with why I don’t believe it.