LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Laying the Foundations for Vision and Multimodal Mechanistic Interpretability & Open Problems
Sonia Joseph (redhat) · 2024-03-13T17:09:17.027Z · comments (13)

[link] Level up your spreadsheeting
angelinahli · 2024-05-25T14:57:19.730Z · comments (11)

In defense of technological unemployment as the main AI concern
tailcalled · 2024-08-27T17:58:01.992Z · comments (36)

Simplifying Corrigibility – Subagent Corrigibility Is Not Anti-Natural
Rubi J. Hudson (Rubi) · 2024-07-16T22:44:17.128Z · comments (27)

Start an Upper-Room UV Installation Company?
jefftk (jkaufman) · 2024-10-19T02:00:10.691Z · comments (9)

Case Study: Interpreting, Manipulating, and Controlling CLIP With Sparse Autoencoders
Gytis Daujotas (gytis-daujotas) · 2024-08-01T21:08:38.800Z · comments (6)

Economics Roundup #3
Zvi · 2024-09-10T13:50:06.955Z · comments (9)

AI #91: Deep Thinking
Zvi · 2024-11-21T14:30:06.930Z · comments (9)

[link] Dangerous capability tests should be harder
LucaRighetti (Error404Dinosaur) · 2024-11-21T17:20:50.610Z · comments (3)

[link] The Choice Transition
owencb · 2024-11-18T12:30:56.198Z · comments (4)

[link] Epistemic status: poetry (and other poems)
Richard_Ngo (ricraz) · 2024-11-21T18:13:17.194Z · comments (5)

[link] Literacy Rates Haven't Fallen By 20% Since the Department of Education Was Created
Maxwell Tabarrok (maxwell-tabarrok) · 2024-11-22T20:53:59.007Z · comments (0)

D&D.Sci Long War: Defender of Data-mocracy
aphyer · 2024-04-26T22:30:15.780Z · comments (20)

On the Contrary, Steelmanning Is Normal; ITT-Passing Is Niche
Zack_M_Davis · 2024-01-09T23:12:20.349Z · comments (31)

My intellectual journey to (dis)solve the hard problem of consciousness
Charbel-Raphaël (charbel-raphael-segerie) · 2024-04-06T09:32:41.612Z · comments (41)

Auditing failures vs concentrated failures
ryan_greenblatt · 2023-12-11T02:47:35.703Z · comments (0)

What does davidad want from «boundaries»?
Chipmonk · 2024-02-06T17:45:42.348Z · comments (1)

On Trust
johnswentworth · 2023-12-06T19:19:07.680Z · comments (26)

An Introduction to AI Sandbagging
Teun van der Weij (teun-van-der-weij) · 2024-04-26T13:40:00.126Z · comments (10)

[link] Open Sourcing Metaculus
ChristianWilliams · 2024-07-02T22:30:01.339Z · comments (0)

ProLU: A Nonlinearity for Sparse Autoencoders
Glen Taggart · 2024-04-23T14:09:21.592Z · comments (4)

[link] LLM Evaluators Recognize and Favor Their Own Generations
Arjun Panickssery (arjun-panickssery) · 2024-04-17T21:09:12.007Z · comments (1)

Apply to LASR Labs: a London-based technical AI safety research programme
Erin Robertson · 2024-04-09T17:34:06.847Z · comments (1)

Announcing Atlas Computing
miyazono · 2024-04-11T15:56:31.241Z · comments (4)

When Does Altruism Strengthen Altruism?
jefftk (jkaufman) · 2024-01-21T18:50:05.424Z · comments (2)

[link] Against Student Debt Cancellation From All Sides of the Political Compass
Maxwell Tabarrok (maxwell-tabarrok) · 2024-05-13T14:55:57.525Z · comments (16)

[link] Non-alignment project ideas for making transformative AI go well
Lukas Finnveden (Lanrian) · 2024-01-04T07:23:13.658Z · comments (1)

D&D.Sci Long War: Defender of Data-mocracy Evaluation & Ruleset
aphyer · 2024-05-14T03:35:10.586Z · comments (3)

Why does generalization work?
Martín Soto (martinsq) · 2024-02-20T17:51:10.424Z · comments (16)

[link] How bad is chlorinated water?
bhauth · 2023-12-13T18:00:12.640Z · comments (18)

Job Listing: Managing Editor / Writer
Gretta Duleba (gretta-duleba) · 2024-02-21T23:41:26.818Z · comments (2)

The Next ChatGPT Moment: AI Avatars
kolmplex (luke-man) · 2024-01-05T20:14:10.074Z · comments (10)

Concrete empirical research projects in mechanistic anomaly detection
Erik Jenner (ejenner) · 2024-04-03T23:07:21.502Z · comments (3)

[link] Project ideas: Epistemics
Lukas Finnveden (Lanrian) · 2024-01-05T23:41:23.721Z · comments (4)

The Case for Predictive Models
Rubi J. Hudson (Rubi) · 2024-04-03T18:22:20.243Z · comments (7)

[question] Does reducing the amount of RL for a given capability level make AI safer?
Chris_Leong · 2024-05-05T17:04:01.799Z · answers+comments (22)

[question] Where is the Town Square?
Gretta Duleba (gretta-duleba) · 2024-02-13T03:53:18.205Z · answers+comments (8)

How difficult is AI Alignment?
Sammy Martin (SDM) · 2024-09-13T15:47:10.799Z · comments (6)

Reading RFK Jr so that you don’t have to
braces · 2024-11-22T00:59:19.583Z · comments (0)

Which LessWrong/Alignment topics would you like to be tutored in? [Poll]
Ruby · 2024-09-19T01:35:02.999Z · comments (12)

Minimal Motivation of Natural Latents
johnswentworth · 2024-10-14T22:51:58.125Z · comments (14)

[link] Why Georgism Lost Its Popularity
Zero Contradictions · 2024-07-20T15:08:41.469Z · comments (50)

Motivation control
Joe Carlsmith (joekc) · 2024-10-30T17:15:50.881Z · comments (7)

Monthly Roundup #24: November 2024
Zvi · 2024-11-18T13:20:06.086Z · comments (14)

Understanding Positional Features in Layer 0 SAEs
bilalchughtai (beelal) · 2024-07-29T09:36:40.701Z · comments (0)

Sci-Fi books micro-reviews
Yair Halberstadt (yair-halberstadt) · 2024-06-24T09:49:28.523Z · comments (27)

The need for multi-agent experiments
Martín Soto (martinsq) · 2024-08-01T17:14:16.590Z · comments (3)

Locating My Eyes (Part 3 of "The Sense of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-02-29T03:09:25.810Z · comments (4)

Childhood and Education Roundup #4
Zvi · 2024-01-30T13:50:06.033Z · comments (10)

New Executive Team & Board — PIBBSS
Nora_Ammann · 2024-07-01T19:30:45.261Z · comments (1)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

lao-mein on Lao Mein's Shortform

The meeting allegedly happened on the 11th. The Iranian market rallied immediately after the election. It was clearly based on something specific to a Trump administration. Maybe it's large-scale insider trading from Iranian diplomats?

I also think the market genuinely, unironically disbelieves everything Trump says about tariffs in a way they don't about his cabinet nominations (pharma stocks tanked after RFK got HHS).

The man literally wrote that he was going to institute 25% tariffs on Canadian goods, to exactly zero movement on Canadian stocks.

shmi on Compute and size limits on AI are the actual danger

Right, eventually it will. But abstraction building is very hard! If you have any other option, like growing in size, I would expect it to be taken first.

I guess I should be a bit more precise. Abstraction building at the same level as before is probably not very hard. But going up a level is basically equivalent to inventing a new way of compressing knowledge, which is a quantitative leap.

optimization-process on [bounty $100] Why are there no interesting (1D, 2-state) quantum cellular automata?

I've never been familiar enough with group-theory stuff to memorize the names (which, warning, also might mean that it will take you a lot of time to write a sufficiently-dumbed-down version), but the internet suggests is related to... the Minkowski metric? I would be flabbergasted to learn that something so specific-to-our-universe was relevant to this toy mathematical contraption.

simon on Passages I Highlighted in The Letters of J.R.R.Tolkien

Presumably the 'Orcs on our side' refers to the Soviet Union.

I think that, if that's what he meant, he would not have referred to his son as "amongst the Urukhai." - he wouldn't have been among soviet troops. I think it is referring back to turning men and elves into orcs - the orcs are people who have a mindset he doesn't like, presumably to do with violence.

arturo-macias on Arthropod (non) sentience

Agree on this criticism for the difference between humans and pigs, but there too many orders of magnitude of difference between shrimp and human to consider detailed measures of computing power very necesary.

Quantifying empathy is intrinsically hard, because everything begins by postulating (not observing) consciousness in a group of beings, and that is only well grounded for humans. So, at the end, even if you are totally successful in developing a theory of human sentience, for other beings you are extrapolating. Anything beyond solipsism is a leap of faith (unlike you find St. Anselm ontological proof credible).

bhauth on a space habitat design

After being "launched" from the despinner, you would find yourself hovering stationary next to the ring.

Air resistance.

That is, however, basically the system I proposed near the end, for use near the center of a cylinder where speeds would be low.

bhauth on a space habitat design

This happened to Explorer 1, the first satellite launched by the United States in 1958. The elongated body of the spacecraft had been designed to spin about its long (least-inertia) axis but refused to do so, and instead started precessing due to energy dissipation from flexible structural elements.

picture: https://en.wikipedia.org/wiki/Explorer_1#/media/File:Explorer1.jpg

arturo-macias on Arthropod (non) sentience

Illusionism is not a competitor, because consciousness is obviously an illusion. That is immediate since Descartes. That is why you cannot distinguish between "the true reality" and "matrix": both produce a legitimate stream of illusory experience ("you").

Epiphenomenalism is physicalist in the sense that it respects the autonomy and closeness of the physical world. Given that we are not p-zombis (because there is an "illusory" but immediate difference between real humans and p-zombies), that difference is precisely what we call “consciousness”.

Descartes+Laplace=Chalmers.

In fact, there is only one scape: consciousness could play an active role in the fundamental Laws of Physics. That would break the Descartes/Laplace orthogonality, making philosophy interesting again.

optimization-process on [bounty $100] Why are there no interesting (1D, 2-state) quantum cellular automata?

I think compared to the literature you're using an overly restrictive and nonstandard definition of quantum cellular automata.

That makes sense! I'm searching for the simplest cellular-automaton-like thing that's still interesting to study. I may have gone too far in the "simple" direction; but I'd like to understand why this highly-restricted subset of QCAs is too simple.

Specifically, it only makes sense to me to write as a product of operators like you have if all of the terms are on spatially disjoint regions.

Hmm! That's not obvious to me; if there's some general insight like "no operator containing two ~'partially overlapping' terms like $\dots + | x ⟩ (⟨ x | + ⟨ y |) + | y ⟩ (⟨ y | + ⟨ z |) + \dots$ can be unitary," I'd happily pay for that!

wassname on notrishi's Shortform

Interesting. It would be much more inspectable and controllable and modular which would be good for alignment.

You've got some good ideas in here, have you ever brainstormed any alignment ideas?