LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

[link] AI 2027: What Superintelligence Looks Like
Daniel Kokotajlo (daniel-kokotajlo) · 2025-04-03T16:23:44.619Z · comments (152)

How to Make Superbabies
GeneSmith · 2025-02-19T20:39:38.971Z · comments (332)

[link] How AI Takeover Might Happen in 2 Years
joshc (joshua-clymer) · 2025-02-07T17:10:10.530Z · comments (137)

A Bear Case: My Predictions Regarding AI Progress
Thane Ruthenis · 2025-03-05T16:41:37.639Z · comments (155)

LessWrong has been acquired by EA
habryka (habryka4) · 2025-04-01T13:09:11.153Z · comments (45)

The Case Against AI Control Research
johnswentworth · 2025-01-21T16:03:10.143Z · comments (80)

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
Jan Betley (jan-betley) · 2025-02-25T17:39:31.059Z · comments (91)

[link] Will Jesus Christ return in an election year?
Eric Neyman (UnexpectedValues) · 2025-03-24T16:50:53.019Z · comments (45)

Policy for LLM Writing on LessWrong
jimrandomh · 2025-03-24T21:41:30.965Z · comments (65)

VDT: a solution to decision theory
L Rudolf L (LRudL) · 2025-04-01T21:04:09.509Z · comments (21)

[link] Recent AI model progress feels mostly like bullshit
lc · 2025-03-24T19:28:43.450Z · comments (79)

Murder plots are infohazards
Chris Monteiro (chris-topher) · 2025-02-13T19:15:09.749Z · comments (44)

[link] Playing in the Creek
Hastings (hastings-greer) · 2025-04-10T17:39:28.883Z · comments (6)

[link] Good Research Takes are Not Sufficient for Good Strategic Takes
Neel Nanda (neel-nanda-1) · 2025-03-22T10:13:38.257Z · comments (28)

So You Want To Make Marginal Progress...
johnswentworth · 2025-02-07T23:22:19.825Z · comments (42)

Arbital has been imported to LessWrong
RobertM (T3t) · 2025-02-20T00:47:33.983Z · comments (31)

[link] METR: Measuring AI Ability to Complete Long Tasks
Zach Stein-Perlman · 2025-03-19T16:00:54.874Z · comments (104)

[link] Tracing the Thoughts of a Large Language Model
Adam Jermyn (adam-jermyn) · 2025-03-27T17:20:02.162Z · comments (22)

[link] A History of the Future, 2025-2040
L Rudolf L (LRudL) · 2025-02-17T12:03:58.355Z · comments (41)

[link] Trojan Sky
Richard_Ngo (ricraz) · 2025-03-11T03:14:00.681Z · comments (39)

Why Have Sentence Lengths Decreased?
Arjun Panickssery (arjun-panickssery) · 2025-04-03T17:50:29.962Z · comments (53)

[link] Why Did Elon Musk Just Offer to Buy Control of OpenAI for $100 Billion?
garrison · 2025-02-11T00:20:41.421Z · comments (8)

Eliezer's Lost Alignment Articles / The Arbital Sequence
Ruby · 2025-02-20T00:48:10.338Z · comments (9)

“Sharp Left Turn” discourse: An opinionated review
Steven Byrnes (steve2152) · 2025-01-28T18:47:04.395Z · comments (26)

[link] Thoughts on AI 2027
Max Harms (max-harms) · 2025-04-09T21:26:23.926Z · comments (48)

Mechanisms too simple for humans to design
Malmesbury (Elmer of Malmesbury) · 2025-01-22T16:54:37.601Z · comments (45)

Why White-Box Redteaming Makes Me Feel Weird
Zygi Straznickas (nonagon) · 2025-03-16T18:54:48.078Z · comments (34)

Will alignment-faking Claude accept a deal to reveal its misalignment?
ryan_greenblatt · 2025-01-31T16:49:47.316Z · comments (28)

[link] Power Lies Trembling: a three-book review
Richard_Ngo (ricraz) · 2025-02-22T22:57:59.720Z · comments (20)

Intention to Treat
Alicorn · 2025-03-20T20:01:19.456Z · comments (4)

[link] OpenAI: Detecting misbehavior in frontier reasoning models
Daniel Kokotajlo (daniel-kokotajlo) · 2025-03-11T02:17:21.026Z · comments (25)

Catastrophe through Chaos
Marius Hobbhahn (marius-hobbhahn) · 2025-01-31T14:19:08.399Z · comments (17)

Instrumental Goals Are A Different And Friendlier Kind Of Thing Than Terminal Goals
johnswentworth · 2025-01-24T20:20:28.881Z · comments (61)

Claude Sonnet 3.7 (often) knows when it’s in alignment evaluations
Nicholas Goldowsky-Dill (nicholas-goldowsky-dill) · 2025-03-17T19:11:00.813Z · comments (7)

So how well is Claude playing Pokémon?
Julian Bradshaw · 2025-03-07T05:54:45.357Z · comments (74)

[link] On the Rationality of Deterring ASI
Dan H (dan-hendrycks) · 2025-03-05T16:11:37.855Z · comments (34)

Short Timelines Don't Devalue Long Horizon Research
Vladimir_Nesov · 2025-04-09T00:42:07.324Z · comments (23)

[link] Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development
Jan_Kulveit · 2025-01-30T17:03:45.545Z · comments (52)

I make several million dollars per year and have hundreds of thousands of followers—what is the straightest line path to utilizing these resources to reduce existential-level AI threats?
shrimpy · 2025-03-16T16:52:42.177Z · comments (25)

[question] Have LLMs Generated Novel Insights?
abramdemski · 2025-02-23T18:22:12.763Z · answers+comments (36)

Reducing LLM deception at scale with self-other overlap fine-tuning
Marc Carauleanu (Marc-Everin Carauleanu) · 2025-03-13T19:09:43.620Z · comments (40)

Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI
Kaj_Sotala · 2025-04-15T15:56:19.466Z · comments (43)

[link] Self-fulfilling misalignment data might be poisoning our AI models
TurnTrout · 2025-03-02T19:51:14.775Z · comments (27)

It's been ten years. I propose HPMOR Anniversary Parties.
Screwtape · 2025-02-16T01:43:14.586Z · comments (3)

Statistical Challenges with Making Super IQ babies
Jan Christian Refsgaard (jan-christian-refsgaard) · 2025-03-02T20:26:22.103Z · comments (26)

[link] Conceptual Rounding Errors
Jan_Kulveit · 2025-03-26T19:00:31.549Z · comments (15)

[link] Quotes from the Stargate press conference
Nikola Jurkovic (nikolaisalreadytaken) · 2025-01-22T00:50:14.793Z · comments (7)

Levels of Friction
Zvi · 2025-02-10T13:10:07.224Z · comments (8)

The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better
Thane Ruthenis · 2025-02-21T20:15:11.545Z · comments (51)

Methods for strong human germline engineering
TsviBT · 2025-03-03T08:13:49.414Z · comments (28)

next page (older posts) →

Archive

Recent comments

mitchell_porter on Kabir Kumar's Shortform

I asked because I'm pretty sure that I'm being badly wasted (i.e. I could be making much more substantial contributions to AI safety), but I very rarely apply for support, so I thought I'd ask for information about the funding landscape from someone who has been exploring it.

And by the way, your brainchild AI-Plans is a pretty cool resource. I can see it being useful for e.g. a frontier AI organization which thinks they have an alignment plan, but wants to check the literature to know what other ideas are out there.

da_peach on Power Lies Trembling: a three-book review

That's an interesting idea. The military would undoubtedly care about AI alignment — they'd want their systems to operate strictly within set parameters. But the more important question is: do we even want the military to be investing in AI at all? Because that path likely leads to AI-driven warfare. Personally, I'd rather live in a world without autonomous robotic combat or AI-based cyberwarfare.

But as always, I will pray that some institution (like the EU) leads the charge & start instilling it into people's heads that this is a problem we must solve.

saidachmiz on A Dissent on Honesty

… stuff about perverse utility functions …

Well, there’s a couple of things to say in response to this… one is that wanting to get the girl / dowry / happiness / love / whatever tangible or intangible goals as such, and also wanting to be virtuous, doesn’t seem to me to be a weird or perverse set of values. In a sense, isn’t this sort of thing the core of the project of living a human life, when you put it like this? “I want to embody all the true virtues, and also I want to have all the good things.” Seems pretty natural to me! Of course, it’s also a rather tall order (uh, to put it mildly…), but that just means that it provides a challenge worthy of one who does not fear setting high goals for himself.

Somewhat orthogonally to this, there is also the fact that—well, I wrote the footnote about the utility function being metaphorical for a reason. I don’t actually think that humans (with perhaps very rare exceptions) have utility functions; that is, I don’t think that our preferences satisfy the VNM axioms—and nor should they. (And indeed I am aware of so-called “coherence theorems” and I don’t believe in them [LW · GW].)

With that constraint (which I consider an artificial and misguided one) out of the way, I think that we can reason about things like this in ways that make more sense. For instance, trying to fit truth and honesty into a utility framework makes for some rather unnatural formulations and approaches, like talking about buying more of it, or buying it more cheaply, etc. I just don’t think that this makes sense. If the question is “is this person honest, trustworthy, does he have integrity, is he committed to truth”, then the answer can be “yes”, and it can be “no”, and it could perhaps be some version of “ehhh”, but if it’s already “yes” then you basically can’t buy any more of it than that. And if it’s not “yes” and you’re talking about how cheaply you can buy more of it, then it’s still not “yes” even after you complete your purchase.

(This is related to the notion that while consequentialism may be the proper philosophical grounding for morality, and deontology the proper way to formulate and implement your morality so that it’s tractable for a finite mind, nevertheless virtue ethics is the “descriptively correct as an account of how human minds implement morality, and (as a result) prescriptively valid as a recommendation of how to implement your morality in your own mind, once you’ve decided on your object-level moral views”. Thus you can embody the virtue of honesty, or fail to do so. You can’t buy more of embodying some virtue by trading away some other virtue; that’s just not how it works.)

I think you understand that, f other people noticed a pattern that everything you said was false, irrelevant, or unimportant, they would eventually stop bothering to listen when you talk, and this would mean you’d lose the ability to get other people to know things, which is a useful ability to have.

Yes, of course; but…

Whether the specific person you address is better off in each specific case isn’t materal because you aren’t trying to always make them better off, you’re just trying to avoid being seen as someone who predictibly doesn’t make them better off.

… but the preceding fact just doesn’t really have much to do with this business of “do you make people better off by what you say”.

My claim is that people (other than “rationalists”, and not even all or maybe even most “rationalists” but only some) just do not think of things in this way. They don’t think of whether their words will make their audience better off when they speak, and they don’t think of whether the words of other people are making them better off when they listen. This entire framing is just alien to how most people do, and should, think about communication in most circumstances. Yeah, if you lie all the time, people will stop believing you. That’s just directly the causation here, it doesn’t go through another node where people compute the expected value of your words and find it to be negative.

(Maybe this point isn’t particularly important to the main discussion. I can’t tell, honestly!)

I took great effort to try to right down my policy as something explicit in terms a person could try to do (even though I am willing to admit it is not really correct mostly because finite agent problems), because a person can’t be a real Rule Consequentialist without actually having a Rule. What is the rule for “Only lie when doing so is the right thing to do”? It sounds like an instruction to pass the act to my rightness calculator, but if I program that rule into my rightness calculator, and then give it any input, it gets into an infinite loop. I have an Act Consequentialist rightness calculator as a backup, but if I pass the rule “only lie when doing so is the right thing to do” into that as a backup I’m just right back at doing act consequentialism.

If you can write down a better rule for when to lie the than what I’ve put above (that is also better than the “never” or “only by coming up with galaxy-brained ways it technically isn’t lying” or Eliezer’s meta-honesty idea that I’ve read before) I’d consider you to have (possibly) won this issue, but that’s the real price of entry. It’s not enough to point out the flaws where all my rules don’t work, you have to produce rules that work better.

Well… let’s start with the last bit, actually. No, it totally is enough to point out the flaws. I mean, we should do better if we can, of course; if we can think of a working solution, great. But no, pointing out the flaws in a proffered solution is valuable and good all by itself. (“What should we do?” “Well, not that.” “How come?” “Because it fails to solve the problem we’re trying to solve.” “Ok, yeah, that’s a good reason.”) In other words: “any solution that solves the problem is acceptable; any solution that does not solve the problem is not acceptable”. Act consequentialism does not solve the problem.

But as far as my own actual solution goes… I consider Robin Hanson’s curve-fitting approach (outlined in sections II and III of his paper “Why Health is Not Special: Errors in Evolved Bioethics Intuitions”) to be the most obviously correct approach to (meta)ethics. In brief: sometimes we have very strong moral intuitions (when people speak of listening to their conscience, this is essentially what they are referreing to), and as those intuitions are the ultimate grounding for any morality we might construct, if the intuitions are sufficiently strong and consistent, we can refer to them directly. Sometimes we are more uncertain. But we also value consistency in our moral judgments (for various good reasons). So we try to “fit a curve” to our moral intuitions—that is, we construct a moral system that tries to capture those intuitions. Sometimes the intuitions are quite strong, and we adjust the curve to fit them; sometimes we find weak intuitions which are “outliers”, and we judge them to be “errors”; sometimes we have no data points at all for some region of the graph, and we just take the output of the system we’ve constructed. This is necessarily an iterative process.

If the police arrest your best friend for murder, but you know that said friend spent the whole night of the alleged crime with you (i.e. you’re his only alibi and your testimony would completely clear him of suspicion), should you tell the truth to the police when they question you, or should you betray your friend and lie, for no reason at all other than that it would mildly inconvenience you to have to go down to the police station and give a statement? Pretty much nobody needs any kind of moral system to answer this question. It’s extremely obvious what you should do. What does act and/or rule consequentialism tell us about this? What about deontology, etc.? Doesn’t matter, who cares, anyone who isn’t a sociopath (and probably even most sociopaths who aren’t also very stupid) can see the answer here, it’s absurdly easy and requires no thought at all.

What if you’re in Germany in 1938 and the Gestapo show up at your door to ask whether you’re hiding any Jews in your attic (which you totally are)—what should you do? Once again the answer is easy, pretty much any normal person gets this one right without hesitation (in order to get it wrong, you need to be smart enough to confuse yourself with weird philosophy).

So here we’ve got two situations where you can ask “is it right to lie here, or to tell the truth?” and the answer is just obvious. Well, we start with cases like this, we think about other cases where the answer is obvious, and yet other cases where the answer is less obvious, and still other cases where the answer is not obvious at all, and we iteratively build a curve that fits them as well as possible. This curve should pass right through the obvious-answer points, and the other data points should be captured with an accuracy that befits their certainty (so to speak). The resulting curve will necessarily have at least a few terms, possibly many, definitely not just one or two. In other words, there will be many Rules.

(How to evaluate these rules? With great care and attention. We must be on the lookout for complexity, we must continually question whether we are in fact satisfying our values / embodying our chosen virtues, etc.)

Here’s an example rule, which concerns situations of a sort of which I have written before: if you voluntarily agree to keep a secret, then, when someone who isn’t in on the secret asks you about the secret, you should behave as you would if you didn’t know the secret. If this involves lying (that is, saying things which you know to be false, but which you would believe to be true if you were not in possession of this secret which you have agreed, of your own free will, to keep), then you should lie. Lying in this case is right. Telling the truth in this case is wrong. (And, yes, trying to tell some technical truth that technically doesn’t reveal anything is also wrong.)

Is that an obvious rule? Certainly not as obvious as the rules you’d formulate to cover the two previous example scenarios. Is it correct? Well, I’m certainly prepared to defend it (indeed, I have done so, though I can’t find the link right now; it’s somewhere in my comment history). Is a person who follows a rule like this an honest and trustworthy person, or a dishonest and untrustworthy liar? (Assuming, naturally, that they also follow all the other rules about when it is right to tell the truth.) I say it’s the former, and I am very confident about this.

I’m not going to even try to enumerate all the rules that apply to when lying is wrong and when it’s right. Frankly, I think that it’s not as hard as some people make it out to be, to tell when it is necessary to tell the truth and when one should instead lie. Mostly, the right answer is obvious to everyone, and the debates, such as they are, mostly boil down to people trying to justify things that they know perfectly well cannot be justified.

Indeed, there is a useful heuristic that comes out of that. In these discussions, I have often made this point (as I did in my top-level comment) that it is sometimes obligatory to lie, and wrong to tell the truth. The reason I keep emphasizing this is that there’s a pattern one sees: the arguments most often concern whether it’s permissible to lie. Note: not, “is it obligatory to tell the truth, or is it obligatory to lie”—but “is it obligatory to tell the truth, or do I have no obligation here and can I just lie”.

I think that this is very telling. And what it tells us (with imperfect but nevertheless non-trivial certainty) is that the person asking the question, or making the argument against the obligation, knows perfectly well what the real—which is to say, moral—answer is. Yes, the right thing to do is to tell the truth. Yes, you already know this. You have reasons for not wanting to tell the truth. Well, nobody promised you that doing the right thing will always be personally convenient! Nevertheless, very often, there is no actual moral uncertainty in anyone’s mind, it’s just “… ok, but do I really have to do the right thing, though”.

This heuristic is not infallible. For example, it does not apply to the case of “lying to someone who has no right to ask the question that they’re asking”: there, it is indeed permissible to lie^[1], but no particular obligation either to lie or to tell the truth. (Although one can make the case for the obligation to lie even in some subset of such cases, having to do with the establishment and maintenance of certain communicative norms.) But it applies to all of these [LW(p) · GW(p)], for instance.

The bottom line is that if you want to be honest, to be trustworthy, to have integrity, you will end up constructing a bunch of rules to aid you in epitomizing these virtues. If you want to try to put together a complete list of such rules, that’s certainly a project, and I may even contribute to it, but there’s not much point in expecting this to be a definitively completable task. We’re fitting a curve to the data provided by our values, which cannot be losslessly compressed.

Assuming that certain conditions are met—but they usually are. ↩︎

hold_my_fish on How to Make Superbabies

One thing we're worried about is cases where the haplotypes have the small additive effects rather than individual SNPs, and you get an unpredictable (potentially deleterious) effect if you edit to a rare haplotype even if all SNPs involved are common.

This is a point of uncertainty that bothered me when I was doing a similar analysis a while ago. GWAS data is possibly good enough to estimate causal effects of haplotypes, but that's not enough information to do single base edits. To have reasonable confidence of getting the predicted effect, it'd be necessary to to make all the edits to transform the original haplotype into a different haplotype.

And unlike with distant variants where additive effects dominate, it'd make sense if non-additive effects are strong locally, since the variants are near each other. Whether this is actually true in reality is way beyond my knowledge, though.

david-matolcsi on Karma Tests in Logical Counterfactual Simulations motivates strong agents to protect weak agents

Thanks for the reply, I broadly agree with your points here. I agree we should pronably eventually try to do trades across logical counter-factuals. Decreasing logical risk is one good framing for that, but in general, there are just positive trades to be made.

However, I think you are still underestimating how hard it might be to strike these deals. "Be kind to other existing agents" is a natural idea to us, but it's still unclear to me if it's something you should assign hogh probability to as a preference of logically counter-factual beings. Sure, there is enough room for humans and mosquitos, but if you relax 'agent' and 'existing', suddenly there is not enough room for everyone. You can argue that "be kind to existing agents" is plausibly a relatively short description length statement, so it will be among the first guesses of the AI and will allocate at least some fraction of the universe to it. But once trading across logical counter-factuals, I'm not sure you can trust things like description length. Maybe in the logical counter-factual universe, they assign higher value/probability to longer instead of shortet statements, but the measure still ends up to 1, because math works differently.

Similarly, you argue that loving torture is probably rare, based on evolutionary grounds. But logically counter-factual beings weren't necessarily born through evolution. I have no idea how we should determine the dstribution of logicsl counter-factuals, and I don't know what fraction enjoys torture in that distribution.

Altogether, I agree logical trade is eventually worth trying, but it will be very hard and confusing and I see a decent chance that it basically won't work at all.

xelap on Utility Maximization = Description Length Minimization

There's a minor error in the formula giving the cross entropy: you need a minus sign on the RHS so that it reads E[- log P[X|M_2] | M_2]

The preceding text is "Of course, we could be wrong about the distribution - we could use a code optimized for a model M2 which is different from the “true” model M1. In this case, the average number of bits used will be"

tenoke on aog's Shortform

It's hard for me to respect a Safety-ish org so obviously wrong about the most important factors of their chosen topic.

I won't judge a random celebrity for expecting e.g. very long timelines but an AI research center? I'm sure they are very cool people but come on.

tenoke on Why Should I Assume CCP AGI is Worse Than USG AGI?

As in ultimately more people are likely to like their condition and agree (comparably more) with the AI's decisions while having roughly equal rights.

samuelshadrach on A Dissent on Honesty

If you can’t provide a few unambiguous examples of the dilemma in the post that actually happened in the real world, I’m less likely to take your post seriously.

Might be worth thinking more and then coming up with examples.

eva_ on A Dissent on Honesty

I consider you to be basically agreeing with me for 90% of what I intended and your disagreements for the other 10% to be the best written of any so far, and basically valid in all the places I'm not replying to it. I still have a few objections:

What if my highest value is getting a pretty girl with a country-sized dowry, while having not betrayed the Truth? ... In short, no, Rationality absolutely can be about both Winning and about The Truth.

I agree the utility function isn't up for grabs and that that is a coherent set of values to have, but I have this criticism that I want to make that I feel I don't have the right language to make. Maybe you can help me. I want to call that utility function perverse. The kind of utilityfunction that an entity is probably mistaken to imagine itself as having.

For any particular situation you might find yourself in, for any particular sequence of actions you might do in that situation, there is a possible utilityfunction you could be said to have such that the sequence of actions is the rational behaviour of a perfect omniscient utility maximiser. If nothing else, pick the exact sequence of events that will result, declare that your utility function is +100 for that sequence of events and 0 for anything else, and then declare yourself a supremely efficient rationalist.

Actually doing that would be a mistake. It wouldn't be making you better. This is not a way to succeed at your goals, this is a way to observe what you're inclined to do anyway and paint the target around it. Your utility function (fake or otherwise) is supposed to describe stuff you actually want. Why would you want specifically that in particular?

I think the stronger version of Rationality is the version that phrases it as about getting the things you want, whatever those things might be. In that sense, if The Truth is merely a value, you should carefully segment it in your brain out from your practice of rationality: Your rationality is about mirroring the mathematical structure best suited for obtaining goals, and then to whatever degree you value The Truth above its normal instrumental value is something you buy where it's cheapest like all your other values. Mixing the two makes both worse, you pollute your concept of rational behaviour with a love of the truth (and therefore, for example, are biased towards imagining that other people who display rationality are probably honest, or other people who display honesty are probably rational) and you damage your ability to pursue the truth by not putting in the values category where it belongs where it will lead you to try to cheaply buy more of it.

Of course maybe you're just the kind of guy who really loves mixing his value for The Truth in with his rationality into a weird soup. That'd explain your actiosn without making you a walking violation of any kind of mathematical law, it'd just be a really weird thing for you to innately want.

I am still trying to find a better way to phrase this argument such that someone might find it persuasive of something, because I don't expect this phrasing to work.

I say and write things^[3] [LW · GW] because I consider those things to be true, relevant, and at least somewhat important. That by itself is very often (possibly usually) sufficient for a thing to be useful in a general sense (i.e., I think that the world is better for me having said it, which necessarily involves the world being better for the people in it). Whether the specific person to whom the thing is nominally or factually addressed will be better off as a result of what I said or wrote is not my concern in any way other than that.

I think I meant something subtly different that what you've taken that part to mean. I think you understand that, f other people noticed a pattern that everything you said was false, irrelevant, or unimportant, they would eventually stop bothering to listen when you talk, and this would mean you'd lose the ability to get other people to know things, which is a useful ability to have. This is basically my position! Whether the specific person you address is better off in each specific case isn't materal because you aren't trying to always make them better off, you're just trying to avoid being seen as someone who predictibly doesn't make them better off. I agree that calculating the full expected consequences to every person of every thing you say isn't necessary for this purpose.

No, this is a terrible idea. Do not do this. Act consequentialism does not work. ... Look, this is going to sound fatuous, but there really isn’t any better general rule than this: you should only lie when doing so is the right thing to do.

I agree that Act Consequentialism doesn't really work. I was trying to be a Rule consequentialist instead wben I wrote the above rule. I agree that that sounds fatuous, but I think the immediate feeling is pointing at a valid retort: You haven't operationalized this position into a decision process that a person can actually do (or even pretend to do).

I took great effort to try to right down my policy as something explicit in terms a person could try to do (even though I am willing to admit it is not really correct mostly because finite agent problems), because a person can't be a real Rule Consequentialist without actually having a Rule. What is the rule for "Only lie when doing so is the right thing to do"? It sounds like an instruction to pass the act to my rightness calculator, but if I program that rule into my rightness calculator, and then give it any input, it gets into an infinite loop. I have an Act Consequentialist rightness calculator as a backup, but if I pass the rule "only lie when doing so is the right thing to do" into that as a backup I'm just right back at doing act consequentialism.

If you can write down a better rule for when to lie the than what I've put above (that is also better than the "never" or "only by coming up with galaxy-brained ways it technically isn't lying" or Eliezer's meta-honesty idea that I've read before) I'd consider you to have (possibly) won this issue, but that's the real price of entry. It's not enough to point out the flaws where all my rules don't work, you have to produce rules that work better.