LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI
Kaj_Sotala · 2025-04-15T15:56:19.466Z · comments (43)

[link] Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study
Adam Karvonen (karvonenadam) · 2025-04-14T17:38:02.918Z · comments (38)

Why Should I Assume CCP AGI is Worse Than USG AGI?
Tomás B. (Bjartur Tómas) · 2025-04-19T14:47:52.167Z · comments (30)

AI-enabled coups: a small group could use AI to seize power
Tom Davidson (tom-davidson-1) · 2025-04-16T16:51:29.561Z · comments (16)

Ctrl-Z: Controlling AI Agents via Resampling
Aryan Bhatt (abhatt349) · 2025-04-16T16:21:23.781Z · comments (0)

Training AGI in Secret would be Unsafe and Unethical
Daniel Kokotajlo (daniel-kokotajlo) · 2025-04-18T12:27:35.795Z · comments (7)

Three Months In, Evaluating Three Rationalist Cases for Trump
Arjun Panickssery (arjun-panickssery) · 2025-04-18T08:27:27.257Z · comments (21)

One-shot steering vectors cause emergent misalignment, too
Jacob Dunefsky (jacob-dunefsky) · 2025-04-14T06:40:41.503Z · comments (6)

[link] ASI existential risk: Reconsidering Alignment as a Goal
habryka (habryka4) · 2025-04-15T19:57:42.547Z · comments (14)

What Makes an AI Startup "Net Positive" for Safety?
jacquesthibs (jacques-thibodeau) · 2025-04-18T20:33:22.682Z · comments (18)

Steelmanning heuristic arguments
Dmitry Vaintrob (dmitry-vaintrob) · 2025-04-13T01:09:33.392Z · comments (0)

How I switched careers from software engineer to AI policy operations
Lucie Philippon (lucie-philippon) · 2025-04-13T06:37:33.507Z · comments (1)

Map of AI Safety v2
Bryce Robertson (bryceerobertson) · 2025-04-15T13:04:40.993Z · comments (4)

To be legible, evidence of misalignment probably has to be behavioral
ryan_greenblatt · 2025-04-15T18:14:53.022Z · comments (11)

The Bell Curve of Bad Behavior
Screwtape · 2025-04-14T19:58:10.293Z · comments (6)

Four Types of Disagreement
silentbob · 2025-04-13T11:22:38.466Z · comments (2)

[link] The Russell Conjugation Illuminator
TimmyM (timmym) · 2025-04-17T19:33:06.924Z · comments (14)

Vestigial reasoning in RL
Caleb Biddulph (caleb-biddulph) · 2025-04-13T15:40:11.954Z · comments (7)

OpenAI #13: Altman at TED and OpenAI Cutting Corners on Safety Testing
Zvi · 2025-04-15T15:30:02.518Z · comments (3)

Try training token-level probes
StefanHex (Stefan42) · 2025-04-14T11:56:23.191Z · comments (4)

ALLFED emergency appeal: Help us raise $800,000 to avoid cutting half of programs
denkenberger · 2025-04-16T21:47:40.687Z · comments (8)

[link] Sentinel's Global Risks Weekly Roundup #15/2025: Tariff yoyo, OpenAI slashing safety testing, Iran nuclear programme negotiations, 1K H5N1 confirmed herd infections.
NunoSempere (Radamantis) · 2025-04-14T19:11:20.977Z · comments (0)

A Dissent on Honesty
eva_ · 2025-04-15T02:43:44.163Z · comments (49)

Is Gemini now better than Claude at Pokémon?
Julian Bradshaw · 2025-04-19T23:34:43.298Z · comments (1)

Handling schemers if shutdown is not an option
Buck · 2025-04-18T14:39:18.609Z · comments (0)

D&D.Sci Tax Day: Adventurers and Assessments
aphyer · 2025-04-15T23:43:14.733Z · comments (8)

Scaffolding Skills
Screwtape · 2025-04-18T17:39:25.634Z · comments (4)

Can SAE steering reveal sandbagging?
jordine · 2025-04-15T12:33:41.264Z · comments (3)

OpenAI rewrote its Preparedness Framework
Zach Stein-Perlman · 2025-04-15T20:00:50.614Z · comments (1)

o3 Will Use Its Tools For You
Zvi · 2025-04-18T21:20:02.566Z · comments (3)

[link] The 4-Minute Mile Effect
Parker Conley (parker-conley) · 2025-04-14T21:41:27.726Z · comments (6)

How to evaluate control measures for LLM agents? A trajectory from today to superintelligence
Tomek Korbak (tomek-korbak) · 2025-04-14T16:45:46.584Z · comments (1)

[link] Unbendable Arm as Test Case for Religious Belief
Ivan Vendrov (ivan-vendrov) · 2025-04-14T01:57:12.013Z · comments (45)

Thoughts on the Double Impact Project
Mati_Roy (MathieuRoy) · 2025-04-13T19:07:57.687Z · comments (10)

[link] Understanding and overcoming AGI apathy
Dhruv Sumathi (dhruv-sumathi) · 2025-04-17T01:04:53.853Z · comments (1)

How Close We Are to a Complete List of Imprinted Genes
Morpheus · 2025-04-19T18:37:57.074Z · comments (1)

AI #112: Release the Everything
Zvi · 2025-04-17T15:10:02.029Z · comments (6)

GPT-4.1 Is a Mini Upgrade
Zvi · 2025-04-16T19:00:03.181Z · comments (6)

Impact, agency, and taste
benkuhn · 2025-04-19T21:10:06.960Z · comments (0)

[link] Nucleic Acid Observatory Updates, April 2025
jefftk (jkaufman) · 2025-04-15T18:58:29.839Z · comments (0)

Prodromes and Biomarkers in Chronic Disease
sarahconstantin · 2025-04-16T21:30:02.978Z · comments (2)

The Last Light
Bridgett Kay (bridgett-kay) · 2025-04-14T15:41:02.745Z · comments (2)

Monthly Roundup #29: April 2025
Zvi · 2025-04-14T11:50:02.324Z · comments (6)

Understanding Trust: Overview Presentations
abramdemski · 2025-04-16T18:08:31.064Z · comments (0)

[link] Inside OpenAI's Controversial Plan to Abandon its Nonprofit Roots
garrison · 2025-04-18T18:46:57.310Z · comments (0)

[link] Slopworld 2035: The dangers of mediocre AI
titotal (lombertini) · 2025-04-14T13:14:08.390Z · comments (6)

Offer: Team Conflict Counseling for AI Safety Orgs
Severin T. Seehrich (sts) · 2025-04-14T15:17:00.835Z · comments (1)

GPT-4.5 is Cognitive Empathy, Sonnet 3.5 is Affective Empathy
Jack (jack-3) · 2025-04-16T19:12:38.789Z · comments (2)

[link] Top OpenAI Catastrophic Risk Official Steps Down Abruptly
garrison · 2025-04-16T16:04:28.115Z · comments (0)

[link] The real reason AI benchmarks haven’t reflected economic impacts
Noosphere89 (sharmake-farah) · 2025-04-15T13:44:06.225Z · comments (0)

next page (older posts) →

Archive

Recent comments

da_peach on Power Lies Trembling: a three-book review

That's an interesting idea. The military would undoubtedly care about AI alignment — they'd want their systems to operate strictly within set parameters. But the more important question is: do we even want the military to be investing in AI at all? Because that path likely leads to AI-driven warfare. Personally, I'd rather live in a world without autonomous robotic combat or AI-based cyberwarfare.

But as always, I will pray that some institution (like the EU) leads the charge & start instilling it into people's heads that this is a problem we must solve.

saidachmiz on A Dissent on Honesty

… stuff about perverse utility functions …

Well, there’s a couple of things to say in response to this… one is that wanting to get the girl / dowry / happiness / love / whatever tangible or intangible goals as such, and also wanting to be virtuous, doesn’t seem to me to be a weird or perverse set of values. In a sense, isn’t this sort of thing the core of the project of living a human life, when you put it like this? “I want to embody all the true virtues, and also I want to have all the good things.” Seems pretty natural to me! Of course, it’s also a rather tall order (uh, to put it mildly…), but that just means that it provides a challenge worthy of one who does not fear setting high goals for himself.

Somewhat orthogonally to this, there is also the fact that—well, I wrote the footnote about the utility function being metaphorical for a reason. I don’t actually think that humans (with perhaps very rare exceptions) have utility functions; that is, I don’t think that our preferences satisfy the VNM axioms—and nor should they. (And indeed I am aware of so-called “coherence theorems” and I don’t believe in them [LW · GW].)

With that constraint (which I consider an artificial and misguided one) out of the way, I think that we can reason about things like this in ways that make more sense. For instance, trying to fit truth and honesty into a utility framework makes for some rather unnatural formulations and approaches, like talking about buying more of it, or buying it more cheaply, etc. I just don’t think that this makes sense. If the question is “is this person honest, trustworthy, does he have integrity, is he committed to truth”, then the answer can be “yes”, and it can be “no”, and it could perhaps be some version of “ehhh”, but if it’s already “yes” then you basically can’t buy any more of it than that. And if it’s not “yes” and you’re talking about how cheaply you can buy more of it, then it’s still not “yes” even after you complete your purchase.

(This is related to the notion that while consequentialism may be the proper philosophical grounding for morality, and deontology the proper way to formulate and implement your morality so that it’s tractable for a finite mind, nevertheless virtue ethics is the “descriptively correct as an account of how human minds implement morality, and (as a result) prescriptively valid as a recommendation of how to implement your morality in your own mind, once you’ve decided on your object-level moral views”. Thus you can embody the virtue of honesty, or fail to do so. You can’t buy more of embodying some virtue by trading away some other virtue; that’s just not how it works.)

I think you understand that, f other people noticed a pattern that everything you said was false, irrelevant, or unimportant, they would eventually stop bothering to listen when you talk, and this would mean you’d lose the ability to get other people to know things, which is a useful ability to have.

Yes, of course; but…

Whether the specific person you address is better off in each specific case isn’t materal because you aren’t trying to always make them better off, you’re just trying to avoid being seen as someone who predictibly doesn’t make them better off.

… but the preceding fact just doesn’t really have much to do with this business of “do you make people better off by what you say”.

My claim is that people (other than “rationalists”, and not even all or maybe even most “rationalists” but only some) just do not think of things in this way. They don’t think of whether their words will make their audience better off when they speak, and they don’t think of whether the words of other people are making them better off when they listen. This entire framing is just alien to how most people do, and should, think about communication in most circumstances. Yeah, if you lie all the time, people will stop believing you. That’s just directly the causation here, it doesn’t go through another node where people compute the expected value of your words and find it to be negative.

(Maybe this point isn’t particularly important to the main discussion. I can’t tell, honestly!)

I took great effort to try to right down my policy as something explicit in terms a person could try to do (even though I am willing to admit it is not really correct mostly because finite agent problems), because a person can’t be a real Rule Consequentialist without actually having a Rule. What is the rule for “Only lie when doing so is the right thing to do”? It sounds like an instruction to pass the act to my rightness calculator, but if I program that rule into my rightness calculator, and then give it any input, it gets into an infinite loop. I have an Act Consequentialist rightness calculator as a backup, but if I pass the rule “only lie when doing so is the right thing to do” into that as a backup I’m just right back at doing act consequentialism.

If you can write down a better rule for when to lie the than what I’ve put above (that is also better than the “never” or “only by coming up with galaxy-brained ways it technically isn’t lying” or Eliezer’s meta-honesty idea that I’ve read before) I’d consider you to have (possibly) won this issue, but that’s the real price of entry. It’s not enough to point out the flaws where all my rules don’t work, you have to produce rules that work better.

Well… let’s start with the last bit, actually. No, it totally is enough to point out the flaws. I mean, we should do better if we can, of course; if we can think of a working solution, great. But no, pointing out the flaws in a proffered solution is valuable and good all by itself. (“What should we do?” “Well, not that.” “How come?” “Because it fails to solve the problem we’re trying to solve.” “Ok, yeah, that’s a good reason.”) In other words: “any solution that solves the problem is acceptable; any solution that does not solve the problem is not acceptable”. Act consequentialism does not solve the problem.

But as far as my own actual solution goes… I consider Robin Hanson’s curve-fitting approach (outlined in sections II and III of his paper “Why Health is Not Special: Errors in Evolved Bioethics Intuitions”) to be the most obviously correct approach to (meta)ethics. In brief: sometimes we have very strong moral intuitions (when people speak of listening to their conscience, this is essentially what they are referreing to), and as those intuitions are the ultimate grounding for any morality we might construct, if the intuitions are sufficiently strong and consistent, we can refer to them directly. Sometimes we are more uncertain. But we also value consistency in our moral judgments (for various good reasons). So we try to “fit a curve” to our moral intuitions—that is, we construct a moral system that tries to capture those intuitions. Sometimes the intuitions are quite strong, and we adjust the curve to fit them; sometimes we find weak intuitions which are “outliers”, and we judge them to be “errors”; sometimes we have no data points at all for some region of the graph, and we just take the output of the system we’ve constructed. This is necessarily an iterative process.

If the police arrest your best friend for murder, but you know that said friend spent the whole night of the alleged crime with you (i.e. you’re his only alibi and your testimony would completely clear him of suspicion), should you tell the truth to the police when they question you, or should you betray your friend and lie, for no reason at all other than that it would mildly inconvenience you to have to go down to the police station and give a statement? Pretty much nobody needs any kind of moral system to answer this question. It’s extremely obvious what you should do. What does act and/or rule consequentialism tell us about this? What about deontology, etc.? Doesn’t matter, who cares, anyone who isn’t a sociopath (and probably even most sociopaths who aren’t also very stupid) can see the answer here, it’s absurdly easy and requires no thought at all.

What if you’re in Germany in 1938 and the Gestapo show up at your door to ask whether you’re hiding any Jews in your attic (which you totally are)—what should you do? Once again the answer is easy, pretty much any normal person gets this one right without hesitation (in order to get it wrong, you need to be smart enough to confuse yourself with weird philosophy).

So here we’ve got two situations where you can ask “is it right to lie here, or to tell the truth?” and the answer is just obvious. Well, we start with cases like this, we think about other cases where the answer is obvious, and yet other cases where the answer is less obvious, and still other cases where the answer is not obvious at all, and we iteratively build a curve that fits them as well as possible. This curve should pass right through the obvious-answer points, and the other data points should be captured with an accuracy that befits their certainty (so to speak). The resulting curve will necessarily have at least a few terms, possibly many, definitely not just one or two. In other words, there will be many Rules.

(How to evaluate these rules? With great care and attention. We must be on the lookout for complexity, we must continually question whether we are in fact satisfying our values / embodying our chosen virtues, etc.)

Here’s an example rule, which concerns situations of a sort of which I have written before: if you voluntarily agree to keep a secret, then, when someone who isn’t in on the secret asks you about the secret, you should behave as you would if you didn’t know the secret. If this involves lying (that is, saying things which you know to be false, but which you would believe to be true if you were not in possession of this secret which you have agreed, of your own free will, to keep), then you should lie. Lying in this case is right. Telling the truth in this case is wrong. (And, yes, trying to tell some technical truth that technically doesn’t reveal anything is also wrong.)

Is that an obvious rule? Certainly not as obvious as the rules you’d formulate to cover the two previous example scenarios. Is it correct? Well, I’m certainly prepared to defend it (indeed, I have done so, though I can’t find the link right now; it’s somewhere in my comment history). Is a person who follows a rule like this an honest and trustworthy person, or a dishonest and untrustworthy liar? (Assuming, naturally, that they also follow all the other rules about when it is right to tell the truth.) I say it’s the former, and I am very confident about this.

I’m not going to even try to enumerate all the rules that apply to when lying is wrong and when it’s right. Frankly, I think that it’s not as hard as some people make it out to be, to tell when it is necessary to tell the truth and when one should instead lie. Mostly, the right answer is obvious to everyone, and the debates, such as they are, mostly boil down to people trying to justify things that they know perfectly well cannot be justified.

Indeed, there is a useful heuristic that comes out of that. In these discussions, I have often made this point (as I did in my top-level comment) that it is sometimes obligatory to lie, and wrong to tell the truth. The reason I keep emphasizing this is that there’s a pattern one sees: the arguments most often concern whether it’s permissible to lie. Note: not, “is it obligatory to tell the truth, or is it obligatory to lie”—but “is it obligatory to tell the truth, or do I have no obligation here and can I just lie”.

I think that this is very telling. And what it tells us (with imperfect but nevertheless non-trivial certainty) is that the person asking the question, or making the argument against the obligation, knows perfectly well what the real—which is to say, moral—answer is. Yes, the right thing to do is to tell the truth. Yes, you already know this. You have reasons for not wanting to tell the truth. Well, nobody promised you that doing the right thing will always be personally convenient! Nevertheless, very often, there is no actual moral uncertainty in anyone’s mind, it’s just “… ok, but do I really have to do the right thing, though”.

This heuristic is not infallible. For example, it does not apply to the case of “lying to someone who has no right to ask the question that they’re asking”: there, it is indeed permissible to lie^[1], but no particular obligation either to lie or to tell the truth. (Although one can make the case for the obligation to lie even in some subset of such cases, having to do with the establishment and maintenance of certain communicative norms.) But it applies to all of these [LW(p) · GW(p)], for instance.

The bottom line is that if you want to be honest, to be trustworthy, to have integrity, you will end up constructing a bunch of rules to aid you in epitomizing these virtues. If you want to try to put together a complete list of such rules, that’s certainly a project, and I may even contribute to it, but there’s not much point in expecting this to be a definitively completable task. We’re fitting a curve to the data provided by our values, which cannot be losslessly compressed.

Assuming that certain conditions are met—but they usually are. ↩︎

hold_my_fish on How to Make Superbabies

One thing we're worried about is cases where the haplotypes have the small additive effects rather than individual SNPs, and you get an unpredictable (potentially deleterious) effect if you edit to a rare haplotype even if all SNPs involved are common.

This is a point of uncertainty that bothered me when I was doing a similar analysis a while ago. GWAS data is possibly good enough to estimate causal effects of haplotypes, but that's not enough information to do single base edits. To have reasonable confidence of getting the predicted effect, it'd be necessary to to make all the edits to transform the original haplotype into a different haplotype.

And unlike with distant variants where additive effects dominate, it'd make sense if non-additive effects are strong locally, since the variants are near each other. Whether this is actually true in reality is way beyond my knowledge, though.

david-matolcsi on Karma Tests in Logical Counterfactual Simulations motivates strong agents to protect weak agents

Thanks for the reply, I broadly agree with your points here. I agree we should pronably eventually try to do trades across logical counter-factuals. Decreasing logical risk is one good framing for that, but in general, there are just positive trades to be made.

However, I think you are still underestimating how hard it might be to strike these deals. "Be kind to other existing agents" is a natural idea to us, but it's still unclear to me if it's something you should assign hogh probability to as a preference of logically counter-factual beings. Sure, there is enough room for humans and mosquitos, but if you relax 'agent' and 'existing', suddenly there is not enough room for everyone. You can argue that "be kind to existing agents" is plausibly a relatively short description length statement, so it will be among the first guesses of the AI and will allocate at least some fraction of the universe to it. But once trading across logical counter-factuals, I'm not sure you can trust things like description length. Maybe in the logical counter-factual universe, they assign higher value/probability to longer instead of shortet statements, but the measure still ends up to 1, because math works differently.

Similarly, you argue that loving torture is probably rare, based on evolutionary grounds. But logically counter-factual beings weren't necessarily born through evolution. I have no idea how we should determine the dstribution of logicsl counter-factuals, and I don't know what fraction enjoys torture in that distribution.

Altogether, I agree logical trade is eventually worth trying, but it will be very hard and confusing and I see a decent chance that it basically won't work at all.

xelap on Utility Maximization = Description Length Minimization

There's a minor error in the formula giving the cross entropy: you need a minus sign on the RHS so that it reads E[- log P[X|M_2] | M_2]

The preceding text is "Of course, we could be wrong about the distribution - we could use a code optimized for a model M2 which is different from the “true” model M1. In this case, the average number of bits used will be"

tenoke on aog's Shortform

It's hard for me to respect a Safety-ish org so obviously wrong about the most important factors of their chosen topic.

I won't judge a random celebrity for expecting e.g. very long timelines but an AI research center? I'm sure they are very cool people but come on.

tenoke on Why Should I Assume CCP AGI is Worse Than USG AGI?

As in ultimately more people are likely to like their condition and agree (comparably more) with the AI's decisions while having roughly equal rights.

samuelshadrach on A Dissent on Honesty

If you can’t provide a few unambiguous examples of the dilemma in the post that actually happened in the real world, I’m less likely to take your post seriously.

Might be worth thinking more and then coming up with examples.

eva_ on A Dissent on Honesty

I consider you to be basically agreeing with me for 90% of what I intended and your disagreements for the other 10% to be the best written of any so far, and basically valid in all the places I'm not replying to it. I still have a few objections:

What if my highest value is getting a pretty girl with a country-sized dowry, while having not betrayed the Truth? ... In short, no, Rationality absolutely can be about both Winning and about The Truth.

I agree the utility function isn't up for grabs and that that is a coherent set of values to have, but I have this criticism that I want to make that I feel I don't have the right language to make. Maybe you can help me. I want to call that utility function perverse. The kind of utilityfunction that an entity is probably mistaken to imagine itself as having.

For any particular situation you might find yourself in, for any particular sequence of actions you might do in that situation, there is a possible utilityfunction you could be said to have such that the sequence of actions is the rational behaviour of a perfect omniscient utility maximiser. If nothing else, pick the exact sequence of events that will result, declare that your utility function is +100 for that sequence of events and 0 for anything else, and then declare yourself a supremely efficient rationalist.

Actually doing that would be a mistake. It wouldn't be making you better. This is not a way to succeed at your goals, this is a way to observe what you're inclined to do anyway and paint the target around it. Your utility function (fake or otherwise) is supposed to describe stuff you actually want. Why would you want specifically that in particular?

I think the stronger version of Rationality is the version that phrases it as about getting the things you want, whatever those things might be. In that sense, if The Truth is merely a value, you should carefully segment it in your brain out from your practice of rationality: Your rationality is about mirroring the mathematical structure best suited for obtaining goals, and then to whatever degree you value The Truth above its normal instrumental value is something you buy where it's cheapest like all your other values. Mixing the two makes both worse, you pollute your concept of rational behaviour with a love of the truth (and therefore, for example, are biased towards imagining that other people who display rationality are probably honest, or other people who display honesty are probably rational) and you damage your ability to pursue the truth by not putting in the values category where it belongs where it will lead you to try to cheaply buy more of it.

Of course maybe you're just the kind of guy who really loves mixing his value for The Truth in with his rationality into a weird soup. That'd explain your actiosn without making you a walking violation of any kind of mathematical law, it'd just be a really weird thing for you to innately want.

I am still trying to find a better way to phrase this argument such that someone might find it persuasive of something, because I don't expect this phrasing to work.

I say and write things^[3] [LW · GW] because I consider those things to be true, relevant, and at least somewhat important. That by itself is very often (possibly usually) sufficient for a thing to be useful in a general sense (i.e., I think that the world is better for me having said it, which necessarily involves the world being better for the people in it). Whether the specific person to whom the thing is nominally or factually addressed will be better off as a result of what I said or wrote is not my concern in any way other than that.

I think I meant something subtly different that what you've taken that part to mean. I think you understand that, f other people noticed a pattern that everything you said was false, irrelevant, or unimportant, they would eventually stop bothering to listen when you talk, and this would mean you'd lose the ability to get other people to know things, which is a useful ability to have. This is basically my position! Whether the specific person you address is better off in each specific case isn't materal because you aren't trying to always make them better off, you're just trying to avoid being seen as someone who predictibly doesn't make them better off. I agree that calculating the full expected consequences to every person of every thing you say isn't necessary for this purpose.

No, this is a terrible idea. Do not do this. Act consequentialism does not work. ... Look, this is going to sound fatuous, but there really isn’t any better general rule than this: you should only lie when doing so is the right thing to do.

I agree that Act Consequentialism doesn't really work. I was trying to be a Rule consequentialist instead wben I wrote the above rule. I agree that that sounds fatuous, but I think the immediate feeling is pointing at a valid retort: You haven't operationalized this position into a decision process that a person can actually do (or even pretend to do).

I took great effort to try to right down my policy as something explicit in terms a person could try to do (even though I am willing to admit it is not really correct mostly because finite agent problems), because a person can't be a real Rule Consequentialist without actually having a Rule. What is the rule for "Only lie when doing so is the right thing to do"? It sounds like an instruction to pass the act to my rightness calculator, but if I program that rule into my rightness calculator, and then give it any input, it gets into an infinite loop. I have an Act Consequentialist rightness calculator as a backup, but if I pass the rule "only lie when doing so is the right thing to do" into that as a backup I'm just right back at doing act consequentialism.

If you can write down a better rule for when to lie the than what I've put above (that is also better than the "never" or "only by coming up with galaxy-brained ways it technically isn't lying" or Eliezer's meta-honesty idea that I've read before) I'd consider you to have (possibly) won this issue, but that's the real price of entry. It's not enough to point out the flaws where all my rules don't work, you have to produce rules that work better.

huera on Scaffolding Skills

re 2: Now that you mention it, I realized sharpening can be easily outsourced. My mistake.

re 1: I don't see it, buying pre-chopped onions is simply not equivalent to having a freshly chopped onion and some vegetables cannot be bought pre-cut. While cutting isn't a bottleneck for most people I had this chain in mind: (no cutting skills) -> (cooking takes more time and is less pleasant) -> (Less willingness to try new or complex recipes).

(Also, if you don't have proper technique, you're at a higher risk of cutting yourself. In that respect, it's like free climbing / using safety ropes)

re 3: I had self-experiments in general in mind (people run self-experiments, without knowing statistics, or even gathering data), but it did not occur to me that not all self-experiments are QS (probably most aren't). As written you are, of course, correct.