LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

When is a mind me?
Rob Bensinger (RobbBB) · 2024-04-17T05:56:38.482Z · comments (125)

The Dark Arts
lsusr · 2023-12-19T04:41:13.356Z · comments (49)

The Worst Form Of Government (Except For Everything Else We've Tried)
johnswentworth · 2024-03-17T18:11:38.374Z · comments (46)

An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2
Neel Nanda (neel-nanda-1) · 2024-07-07T17:39:35.064Z · comments (15)

Limitations on Formal Verification for AI Safety
Andrew Dickson · 2024-08-19T23:03:52.706Z · comments (60)

Loving a world you don’t trust
Joe Carlsmith (joekc) · 2024-06-18T19:31:36.581Z · comments (13)

How it All Went Down: The Puzzle Hunt that took us way, way Less Online
A* (agendra) · 2024-06-02T08:01:40.109Z · comments (5)

[link] "AI achieves silver-medal standard solving International Mathematical Olympiad problems"
gjm · 2024-07-25T15:58:57.638Z · comments (38)

Processor clock speeds are not how fast AIs think
Ege Erdil (ege-erdil) · 2024-01-29T14:39:38.050Z · comments (55)

[link] Simple probes can catch sleeper agents
Monte M (montemac) · 2024-04-23T21:10:47.784Z · comments (20)

Why I don't believe in the placebo effect
transhumanist_atom_understander · 2024-06-10T02:37:07.776Z · comments (22)

Ayn Rand’s model of “living money”; and an upside of burnout
AnnaSalamon · 2024-11-16T02:59:07.368Z · comments (19)

On saying "Thank you" instead of "I'm Sorry"
Michael Cohn (michael-cohn) · 2024-07-08T03:13:50.663Z · comments (16)

The case for training frontier AIs on Sumerian-only corpus
Alexandre Variengien (alexandre-variengien) · 2024-01-15T16:40:22.011Z · comments (15)

A Dozen Ways to Get More Dakka
Davidmanheim · 2024-04-08T04:45:19.427Z · comments (11)

Notice When People Are Directionally Correct
Chris_Leong · 2024-01-14T14:12:37.090Z · comments (8)

[link] "Can AI Scaling Continue Through 2030?", Epoch AI (yes)
gwern · 2024-08-24T01:40:32.929Z · comments (4)

My simple AGI investment & insurance strategy
lc · 2024-03-31T02:51:53.479Z · comments (27)

Updatelessness doesn't solve most problems
Martín Soto (martinsq) · 2024-02-08T17:30:11.266Z · comments (43)

Near-mode thinking on AI
Olli Järviniemi (jarviniemi) · 2024-08-04T20:47:28.085Z · comments (8)

How I started believing religion might actually matter for rationality and moral philosophy
zhukeepa · 2024-08-23T17:40:47.341Z · comments (41)

Circuits in Superposition: Compressing many small neural networks into one
Lucius Bushnaq (Lblack) · 2024-10-14T13:06:14.596Z · comments (8)

A Shutdown Problem Proposal
johnswentworth · 2024-01-21T18:12:48.664Z · comments (61)

An even deeper atheism
Joe Carlsmith (joekc) · 2024-01-11T17:28:31.843Z · comments (47)

Things I've Grieved
Raemon · 2024-02-18T19:32:47.169Z · comments (6)

[link] Bayesian Injustice
Kevin Dorst · 2023-12-14T15:44:08.664Z · comments (10)

Community Notes by X
NicholasKees (nick_kees) · 2024-03-18T17:13:33.195Z · comments (15)

Pantheon Interface
NicholasKees (nick_kees) · 2024-07-08T19:03:51.681Z · comments (22)

[question] What do coherence arguments actually prove about agentic behavior?
sunwillrise (andrei-alexandru-parfeni) · 2024-06-01T09:37:28.451Z · answers+comments (35)

BIG-Bench Canary Contamination in GPT-4
Jozdien · 2024-10-22T15:40:48.166Z · comments (13)

[link] Steering Llama-2 with contrastive activation additions
Nina Panickssery (NinaR) · 2024-01-02T00:47:04.621Z · comments (29)

Do you believe in hundred dollar bills lying on the ground? Consider humming
Elizabeth (pktechgirl) · 2024-05-16T00:00:05.257Z · comments (22)

Deep Forgetting & Unlearning for Safely-Scoped LLMs
scasper · 2023-12-05T16:48:18.177Z · comments (29)

Apocalypse insurance, and the hardline libertarian take on AI risk
So8res · 2023-11-28T02:09:52.400Z · comments (38)

Parasites (not a metaphor)
lemonhope (lcmgcd) · 2024-08-08T20:07:13.593Z · comments (17)

Why I take short timelines seriously
NicholasKees (nick_kees) · 2024-01-28T22:27:21.098Z · comments (29)

[link] Investigating the Chart of the Century: Why is food so expensive?
Maxwell Tabarrok (maxwell-tabarrok) · 2024-08-16T13:21:23.596Z · comments (26)

Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
Erik Jenner (ejenner) · 2024-06-04T15:50:47.475Z · comments (14)

Natural Latents: The Math
johnswentworth · 2023-12-27T19:03:01.923Z · comments (37)

RTFB: On the New Proposed CAIP AI Bill
Zvi · 2024-04-10T18:30:08.410Z · comments (14)

Awakening
lsusr · 2024-05-30T07:03:00.821Z · comments (79)

The Standard Analogy
Zack_M_Davis · 2024-06-03T17:15:42.327Z · comments (28)

Neutrality
sarahconstantin · 2024-11-13T23:10:05.469Z · comments (6)

[link] Miles Brundage resigned from OpenAI, and his AGI readiness team was disbanded
garrison · 2024-10-23T23:40:57.180Z · comments (1)

AI catastrophes and rogue deployments
Buck · 2024-06-03T17:04:51.206Z · comments (16)

Efficient Dictionary Learning with Switch Sparse Autoencoders
Anish Mudide (anish-mudide) · 2024-07-22T18:45:53.502Z · comments (19)

A List of 45+ Mech Interp Project Ideas from Apollo Research’s Interpretability Team
Lee Sharkey (Lee_Sharkey) · 2024-07-18T14:15:50.248Z · comments (18)

[question] Which skincare products are evidence-based?
Vanessa Kosoy (vanessa-kosoy) · 2024-05-02T15:22:12.597Z · answers+comments (47)

AI Alignment Metastrategy
Vanessa Kosoy (vanessa-kosoy) · 2023-12-31T12:06:11.433Z · comments (13)

A bird's eye view of ARC's research
Jacob_Hilton · 2024-10-23T15:50:06.123Z · comments (12)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

gerardus-mercator on Claude seems to be smarter than LessWrong community

I see those assertions, but I don't see why an intelligent agent would be persuaded by them. Why would it think that the hypothetical objective goal is better than its utility function? Caring about objective facts and investigating them is also an instrumental goal compared to the terminal goal of optimizing its utility function. The agent's only frame of reference for 'better' and 'worse' is relative to its utility function; it would presumably understand that there are other frames of reference, but I don't think it would apply them, because that would lead to a worse outcome according to its current frame of reference.

dakara on Simple probes can catch sleeper agents

I am also interested in knowing whether the probing method is a solution to the undetectable backdoor problem.

dakara on Simple probes can catch sleeper agents

This paper argues that unintended deceptive behavior is not susceptible to detection by probing method. The authors of that paper argue that the probing method fares no better than random guessing for detecting unintended deceptive behavior.

I would really appreciate any input, especially from Monte or his co-authors. This seems like a very important issue to address.

dr_s on Neutrality

Agree 100% with all of this.

There is one thing that comes to mind IMO and that people who argue that "everything is political" and that neutrality is an evil ploy to actually sneak in your evil ideas really underestimate: the point of impartiality as you describe it is to keep things simpler. Maybe a God with an infinite mind could keep in it all the issues, all the complexities, all the nuances simultaneously, and continuously figure out the optimal path. But we can't. We come up with simple rules like "if you're a doctor, you have a duty to cure anyone, not pick and choose" because they make things more straightforward and decouple domains. Doctors cure people. If you do crimes, there's a system dedicated to punish you. But a doctor's job is different, and the knowledge they need to do it has nothing to do with your rap sheet.

The frenzy to couple everything into a single tangle of complexity is driven by the misunderstanding that complacency is the only reason why your ideology is not the winning one, and that if only everyone was forced to think about it all of the time, they'd end up agreeing with it. But in reality, decoupling is necessary mostly because it allows the world to be cognitively accessible rather than driving us into either perpetual decision paralysis or perpetual paranoia (or worse, both). Destroying that doesn't give anyone victory, we just end up all worse off.

satron on Sabotage Evaluations for Frontier Models

Sure, it sounds like a good idea! Below I will write my thoughts on your overall summarized position.

———

"I primarily wish to argue that, given the general lack of accountability for developing machine learning systems in worlds where indeed the default outcome is doom, it should not be surprising to find out that there is a large corporation (or multiple) doing so."

I do think that I could maybe agree with this if it was 1 small corporation. In your previous comment you suggested that you are describing not the intentional contribution to the omnicide, but the bit of rationalization. I don't think I would agree that that many people working on AI are successfully engaged in that bit of rationalization or that it would be enough to keep them doing it. The big factor is that in case of their failure, they personally (and all of their loved ones) will suffer the consequences.

"It is also not surprising that glory-seeking companies have large departments focused on 'ethics' and 'safety' in order to look respectable to such people."

I don't disagree with this, because it seems plausible that one of the reasons for creating safety departments is ulterior. However, I believe that this reason is probably not the main one and that AI safety labs are making genuinely good research papers. To take an example of Anthropic, I've seen safety papers that got LessWrong community excited (at least judging by upvotes). Like this [LW · GW] one.

"I believe that there is no good plan and that these companies would exist regardless of whether a good plan existed or not... I believe that the people involved are getting rich risking all of our lives and there is (currently) no justice here"

For the reasons, that I mentioned in my first paragraph I would probably disagree with this. Relatedly, while I do think wealth in general can be somewhat motivating, I also think that AI developers are aware that all their wealth would mean nothing if AI kills everyone.

———

Overall, I am really happy with this discussion. Our disagreements came down to a few points and we agree on quite a bit of issues. I am similarly happy to conclude this big comment thread.

anders-lindstroem on The Online Sports Gambling Experiment Has Failed

Good write up! People Cannot Handle "fill in the blank" on smartphones. Sex, food, drugs, social status, betting, binge watching, shopping etc. in abundance and a click away is something we cannot not handle. If some of biggest corporations in the world spends billions upon billions each year to grab our attention, they will win and "you" will on average loose, unless you pull the cord (or turn off the wifi...) or have extreme will power.

I am definitely not the one to throw the first rock, but is it not pretty embarrassing that most of us who thought we were so smart and independent are mere serfs, both intellectually and physically, to a little piece of electronics that have completely and utterly hijacked our brains and bodies.

camille-berger on Neutrality

Related: https://www.lesswrong.com/posts/vcuBJgfSCvyPmqG7a/list-of-collective-intelligence-projects [LW · GW]

I had never thought about approaching this topic from the abstract, but I'm judging from the karma that this is actually what people want, rather than existing projects.

I'm surprised! I thought people were overall disinterested about this topic, but it seems more like the problem itself hadn't been stated to start with.

viliam on D0TheMath's Shortform

if you ask mathematicians whether ZFC + not Consistent(ZFC) is consistent, they will say "no, of course not!"

I suspect than many people's intuitive interpretation of "consistent" is ω-consistent, especially if they are not aware of the distinction.

viliam on Lalit Shankar Chowdhury's Shortform

I find it difficult to make distinct categories, but there seem to be two dimensions along which to classify relations:

How intense is the relation / how much we "click" emotionally and intellectually.
Whether the relation is expected to survive the change of current context.

(Even this is not a clear distinction, because "my relatives" is kinda contextual, but the context is there forever.)

Mapping to your system: close friends = high intensity context independent; friendly acquaintances = high intensity contextual; acquaintances = low intensity contextual.

One quadrant seems to be missing, but maybe that makes sense: if the relation is low intensity, why would people bother to keep it outside of the context where it originated.

egor-timatkov on "It's a 10% chance which I did 10 times, so it should be 100%"

It's a great idea. I ended up bolding the one line that states my conclusion to make it easier to spot.