LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Nursing doubts
dynomight · 2024-08-30T02:25:36.826Z · comments (23)

Why Don't We Just... Shoggoth+Face+Paraphraser?
Daniel Kokotajlo (daniel-kokotajlo) · 2024-11-19T20:53:52.084Z · comments (58)

[link] China Hawks are Manufacturing an AI Arms Race
garrison · 2024-11-20T18:17:51.958Z · comments (44)

Value Claims (In Particular) Are Usually Bullshit
johnswentworth · 2024-05-30T06:26:21.151Z · comments (18)

Alignment Faking Revisited: Improved Classifiers and Open Source Extensions
John Hughes (john-hughes) · 2025-04-08T17:32:55.315Z · comments (17)

[link] Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study
Adam Karvonen (karvonenadam) · 2025-04-14T17:38:02.918Z · comments (38)

[link] Fields that I reference when thinking about AI takeover prevention
Buck · 2024-08-13T23:08:54.950Z · comments (16)

The "Think It Faster" Exercise
Raemon · 2024-12-11T19:14:10.427Z · comments (35)

Applying traditional economic thinking to AGI: a trilemma
Steven Byrnes (steve2152) · 2025-01-13T01:23:00.397Z · comments (32)

When is a mind me?
Rob Bensinger (RobbBB) · 2024-04-17T05:56:38.482Z · comments (130)

[link] That Alien Message - The Animation
Writer · 2024-09-07T14:53:30.604Z · comments (9)

Momentum of Light in Glass
Ben (ben-lang) · 2024-10-09T20:19:42.088Z · comments (44)

The Most Forbidden Technique
Zvi · 2025-03-12T13:20:04.732Z · comments (9)

Survey: How Do Elite Chinese Students Feel About the Risks of AI?
Nick Corvino (nick-corvino) · 2024-09-02T18:11:11.867Z · comments (13)

OpenAI #12: Battle of the Board Redux
Zvi · 2025-03-31T15:50:02.156Z · comments (1)

Human takeover might be worse than AI takeover
Tom Davidson (tom-davidson-1) · 2025-01-10T16:53:27.043Z · comments (55)

Passages I Highlighted in The Letters of J.R.R.Tolkien
Ivan Vendrov (ivan-vendrov) · 2024-11-25T01:47:59.071Z · comments (38)

Planning for Extreme AI Risks
joshc (joshua-clymer) · 2025-01-29T18:33:14.844Z · comments (5)

[link] A computational no-coincidence principle
Eric Neyman (UnexpectedValues) · 2025-02-14T21:39:39.277Z · comments (38)

Ten people on the inside
Buck · 2025-01-28T16:41:22.990Z · comments (28)

Auditing language models for hidden objectives
Sam Marks (samuel-marks) · 2025-03-13T19:18:32.638Z · comments (15)

[link] The Hidden Cost of Our Lies to AI
Nicholas Andresen (nicholas-andresen) · 2025-03-06T05:03:47.239Z · comments (17)

What Indicators Should We Watch to Disambiguate AGI Timelines?
snewman · 2025-01-06T19:57:43.398Z · comments (57)

Hire (or Become) a Thinking Assistant
Raemon · 2024-12-23T03:58:42.061Z · comments (49)

My experience using financial commitments to overcome akrasia
William Howard (william-howard) · 2024-04-15T22:57:32.574Z · comments (33)

[Fiction] [Comic] Effective Altruism and Rationality meet at a Secular Solstice afterparty
tandem · 2025-01-07T19:11:21.238Z · comments (5)

[link] The Failed Strategy of Artificial Intelligence Doomers
Ben Pace (Benito) · 2025-01-31T18:56:06.784Z · comments (78)

Anomalous Tokens in DeepSeek-V3 and r1
henry (henry-bass) · 2025-01-25T22:55:41.232Z · comments (2)

On saying "Thank you" instead of "I'm Sorry"
Michael Cohn (michael-cohn) · 2024-07-08T03:13:50.663Z · comments (16)

The Milton Friedman Model of Policy Change
JohnofCharleston · 2025-03-04T00:38:56.778Z · comments (17)

[Completed] The 2024 Petrov Day Scenario
Ben Pace (Benito) · 2024-09-26T08:08:32.495Z · comments (114)

How it All Went Down: The Puzzle Hunt that took us way, way Less Online
A* (agendra) · 2024-06-02T08:01:40.109Z · comments (5)

[question] How Much Are LLMs Actually Boosting Real-World Programmer Productivity?
Thane Ruthenis · 2025-03-04T16:23:39.296Z · answers+comments (51)

[question] Which things were you surprised to learn are not metaphors?
Eric Neyman (UnexpectedValues) · 2024-11-21T18:56:18.025Z · answers+comments (88)

Loving a world you don’t trust
Joe Carlsmith (joekc) · 2024-06-18T19:31:36.581Z · comments (13)

An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2
Neel Nanda (neel-nanda-1) · 2024-07-07T17:39:35.064Z · comments (16)

Limitations on Formal Verification for AI Safety
Andrew Dickson · 2024-08-19T23:03:52.706Z · comments (60)

Why I don't believe in the placebo effect
transhumanist_atom_understander · 2024-06-10T02:37:07.776Z · comments (22)

[link] Simple probes can catch sleeper agents
Monte M (montemac) · 2024-04-23T21:10:47.784Z · comments (21)

[link] "AI achieves silver-medal standard solving International Mathematical Olympiad problems"
gjm · 2024-07-25T15:58:57.638Z · comments (38)

Parasites (not a metaphor)
lemonhope (lcmgcd) · 2024-08-08T20:07:13.593Z · comments (19)

A Dozen Ways to Get More Dakka
Davidmanheim · 2024-04-08T04:45:19.427Z · comments (11)

[link] Training on Documents About Reward Hacking Induces Reward Hacking
evhub · 2025-01-21T21:32:24.691Z · comments (15)

[link] "Can AI Scaling Continue Through 2030?", Epoch AI (yes)
gwern · 2024-08-24T01:40:32.929Z · comments (4)

Circuits in Superposition: Compressing many small neural networks into one
Lucius Bushnaq (Lblack) · 2024-10-14T13:06:14.596Z · comments (9)

Some articles in “International Security” that I enjoyed
Buck · 2025-01-31T16:23:27.061Z · comments (10)

The Paris AI Anti-Safety Summit
Zvi · 2025-02-12T14:00:07.383Z · comments (21)

Tell me about yourself: LLMs are aware of their learned behaviors
Martín Soto (martinsq) · 2025-01-22T00:47:15.023Z · comments (5)

Building AI Research Fleets
Ben Goldhaber (bgold) · 2025-01-12T18:23:09.682Z · comments (11)

Near-mode thinking on AI
Olli Järviniemi (jarviniemi) · 2024-08-04T20:47:28.085Z · comments (9)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

cole-wyeth on A Dissent on Honesty

This was heavily downvoted and the tone is in fact off but I think there is a little sliver of truth to it.

cole-wyeth on A Dissent on Honesty

I think about this topic a lot, and I appreciate your dissent, particularly since it helped me organize my thoughts a little. That said, I think you're almost completely wrong. The best way to get at the problem is probably to start with your examples. Not exactly in order, sorry.

The lesson here is likewise clear: If your actual personality isn't good enough, pretend to be Flynn Rider to everyone at all times, with the sole carve-out being people who love you, like your mother or a princess. This works because people who love you will find your openness endearing, whereas everyone else will think you pathetic and use it against you.

Here's a true story. I once met a lovely and intelligent woman who didn't like that I'm a bit blunt and ruthlessly truthseeking. I didn't stop being that way, and mainly for that reason we didn't become romantically involved. A few months later I met a lovely, intelligent, reasonable, sometimes blunt, and open-minded woman who did like that I'm a bit blunt and ruthlessly truthseeking. We've been dating for 2.5 years now and I'm on balance very happy with how everything worked out.

A Paragon of Morality is out travelling, when he is beset by bandits. They demand he hand over his gold or they will kill him and take it from his corpse. This is not a decision-theoretic threat [? · GW] because the bandits value getting his gold more than they disprefer commiting murder, but would otherwise avoid the murder if possible. If he hands over all his gold he will lose all his gold. If he hands over all the gold in his pockets, neglects the extra he has hidden in his sock, and says "I have given you all my gold" in a sufficiently convincing tone of voice, then he will lose less than all his gold.
These isn't Omega we're dealing with here, they're totally trickable by a moderately convincing performance. If he keeps some of the gold he can donate it to Givewell approves charities and save however many QALYs or whatever.
Does he have a moral obligation to lie?

He certainly doesn't have a moral obligation to tell the truth. But a lot of moral obligations change when someone points a gun at you. For instance, it becomes morally permissible (though not necessarily feasible) to shoot at them, or to give up what money you must and later steal it back at the first available opportunity. To me, the truth is something precious, and lying is like stealing the truth; it's permissible in some extreme and usually adversarial situations. With that said, I'm a bit of a rationalist dedicate/monk and I'd prefer to fight than lie - however I don't think everyone is rationally or otherwise compelled to follow suit, for reasons that will be further explained.

A Normally Honest Man is applying for a job as a Widget Designer. He has many years of industry experience in Widget Engineering. He has memorised the Widget Manufacturing Process. He's actually kind of obsessed with Widgets. Typically whenever a conversation becomes about Widgets he gushes openly and makes a bad impression with his in-laws. Since that incident he has developed the self control to pretend otherwise, and the rest of his personality is okay.
The interviewer works for a Widget Manufacturing company but seems to only care about Widgets a normal amount. He asks "How interested are you in Widgets?" He has learnt from previous job interviews that, if he answers honestly, the interviewer will think he is any of lying, insane, or too weird to deal with, and not hire him, even though this is not in the best financial interests of the company, were they fully informed.
Should he pretend to like widgets the amount most likely to get him hired, or does he have a moral obligation to keep answering honestly until he runs out of rent money and becomes homeless?

I don't know, he could say "Honestly, I enjoy designing widgets so much that others sometimes find it strange!" That would probably work fine. I think you can actually get a way with a bit more if you say honestly first and then are actually sincere. This would also signal social awareness.

I realize that I am in some sense dodging your hypothetical but I think your hypothetical is the problem. You haven't thought hard enough about how this guy can succeed without lying.

A Self-Improvement and Epistemics Nerd has an online community for Self-Improvement and Epistemics Nerds. Half the people reading it are autists with bad social skills, who weren't at exactly the right age to be saved by Disney's Tangled. They struggle with navigating ordinary social situations and obtaining true beliefs because they're bad at lying, and insufficiently aggressive at anticipating it in others.
Would they be doing anyone a favour in encourage a social norm of truthfulness and the expectation of truthfulness in others, when all those people will inevitably have to leave the computer one day and end up like the subjects of the previous two examples? Would they be making the world a better place?

Yes and yes.

Contrary to common belief, lesswrong is not an autism support group.

And you know what? I think it made the world much better. Now we have places online and in the real world (lighthaven, meetups, Berkeley) to gather and form a community around truthseeking and rationality. I like it. I'm glad it exists. I even think some important and powerful ideas have come out of it, and I think we've learned a lot together.

Saying words is just an action, like any other action. Whether the words are literally true or not is just a fact about the action, like any other fact about an action. It's not the morally important fact. You judge actions by their consequences, whether you expect it to lead to more good or bad. Then you take the action with the best consequences overall.

Saying words is an action, but it's not like any other action, because it can guide others towards or away from the truth. Similarly, torture is an action, but it's not like any other action, because it is when one person causes another immense pain intentionally.

Sure, we judge actions by their consequences, but we do not judge all actions in the same way. Some of them are morally repugnant, and we try very, very hard to never take them unless our hands our forced, and then only take them with immense regret and sorrow. There are various distinguishing factors. For instance, the consequences of torture seem likely to be almost always bad, so I never seriously consider it. Also, I don't want to be the sort of person who tortures people (both for instrumental reasons and to some extent for intrinsic reasons). It's actually pretty hard to fully disentangle my disprefernece for torture from its consequences, because torture is inherently about causing suffering and don't either want suffering to exist or to cause it (though the former is far more important to me).

My feelings about lying are the same. I love the truth, I love the truthseeking process, I love seeing curiosity in the eyes of children and adults and kittens. I hate lies, confusion, and deceiving others. This is partially because the truth is really useful for agents (and I like agents to be able to exercise their potential, typically), it's partially because telling the truth seems to be best for me in most cases, and it's partially because I just value truth.

Rationality can be about Winning [LW · GW], or it can be about The Truth, but it can't be about both. Sooner or later, your The Truth will demand you shoot yourself in the foot, while Winning will offer you a pretty girl with a country-sized dowry. The only price will be presenting various facts about yourself in the most seductive order instead of the most informative one.

It can totally be about both if truth is part of winning. Yes, there are sometimes tradeoffs, and truth is not the singular source of value. But I think most of us value it very strongly, so presenting these two axes as orthogonal is highly misleading. And I want to share the truth with other people in case they decide to value it too - if not, they can always choose not to face it.

Also, there's a missing mood in your example. When you value the truth, being honest tends to get you a lot of other things that you value; you tend to end up surrounded by the right people for you, being the kind of person you can respect, in the kind of place where you belong, even if you have to create it.

Now, you're probably going to say that I can't convince you by pure reason to value the truth. That's right. However, I also can't convince you by pure reason to value literally anything, and if you had written an essay about how we should consider killing or torturing people because it's just an action like any other, I would have objected on similar grounds. You're totally missing the fact that it's wrong, and also (separately!) the consequences of following your advice would probably be bad for you, and certainly for most of us, over the long run.

michaeldickens on An AI Race With China Can Be Better Than Not Racing

This is a good concept. I built a similar Squiggle model a few weeks ago* (although it's still a rough draft), I hadn't realized you'd beaten me to it. So I guess you won the race to build an arms race model? :P

If I'm reading this right, it looks like the model assumes that if the US doesn't race, then China gets TAI first with 100% probability. That seems wrong to me. Race dynamics mean that when you go faster, the other party also goes faster. If the US slows down, there's a good chance China also slows down.

Also, regarding specific values, the model's average P(doom) values are:

10% if race + US wins
20% if race + China wins
15% if no race + China wins

That doesn't sound right to me. Racing is very bad for safety and right now the US leaders are not going a good job, so I think P(doom | no race & China wins) is less than P(doom | race & US wins). Although I think this is pretty debatable.

*My model found that racing was bad and I had to really contort the parameter values to reverse that result. I haven't thought much about the model construction so there could be unfair built-in assumptions.

romeostevensit on Three Months In, Evaluating Three Rationalist Cases for Trump

I think the major impacts that matter are on war, pandemic risk, and x-risk. I rarely see anyone try to figure those out, perhaps the sign is too uncertain due to complexity.

jkaufman on Risers for Foot Percussion

I did see your comment on FB! I'm still thinking about what I want to try next. I'm worried that silicone with your method would tear, though.

hpcfung on Rationalist Should Win. Not Dying with Dignity and Funding WBE.

I'm also interested, have you made any progress since your comment?

lc on Three Months In, Evaluating Three Rationalist Cases for Trump

The doubling down is delusional but I think you're simplifying the failure of projection a bit. The inability of markets and forecasters to predict Trump's second term is quite interesting. A lot of different models of politics failed.

gjm on o3 Will Use Its Tools For You

Pedantic note: there are many instances of "syncopathy" that I am fairly sure should be "sycophancy".

(It's an understandable mistake -- "syncopathy" is composed of familiar components, which could plausibly be put together to mean something like "the disease of agreeing too much" which is, at least in the context of AI, not far off what sycophancy in fact means. Whereas if you can parse "sycophancy" at all you might work out that it means "fig-showing" which obviously has nothing to do with anything. So far as I can tell, no one actually knows how "fig-showing" came to be the term for servile flattery.)

michaeldickens on Planning for Extreme AI Risks

I think the right way to self-destruct isn't to shut down entirely. It's to spend all your remaining assets on safety (whether that be lobbying for regulations, or research, or whatever). This would greatly increase the total amount of money spent on safety efforts so it might help quite a lot.

I do believe shutting down does have a decent chance, although not a comfortingly large one, of scaring government and/or other AI companies into taking the risks seriously.

anthonyc on What Makes an AI Startup "Net Positive" for Safety?

I won't comment on your specific startup, but I wonder in general how an AI Safety startup becomes a successful business. What's the business model? Who is the target customer? Why do they buy? Unless the goal is to get acquired by one of the big labs, in which case, sure, but again, why or when do they buy, and at what price? Especially since they already don't seem to be putting much effort into solving the problem themselves despite having better tools and more money to do so than any new entrant startup.