raymond-d

Ah I should emphasise, I do think all of these things could help -- it definitely is a spectrum, and I would guess these proposals all do push away from agency. I think the direction here is promising.

The two things I think are (1) the paper seems to draw an overly sharp distinction between agents and non-agents, and (2) basically all of the mitigations proposed look like they break down with superhuman capabilities. Hard to tell which of this is actual disagreements and which is the paper trying to be concise and approachable, so I'll set that aside for now.

It does seem like we disagree a bit about how likely agents are to emerge. Some opinions I expect I hold more strongly than you:

It's easy to accidentally scaffold some kind of agent out of an oracle as soon as there's any kind of consistent causal process from the oracle's outputs to the world, even absent feedback loops. In other words, I agree you can choose to create agents, but I'm not totally sure you can easily choose not to
Any system trained to predict the actions of agents over long periods of time will develop an understanding of how agents could act to achieve their goals -- in a sense this is the premise of offline RL and things like decision transformers
It might be pretty easy for agent-like knowledge to 'jump the gap', e.g. a model trained to predict deceptive agents might be able to analogise to itself being deceptive
Sufficient capability at broad prediction is enough to converge on at least the knowledge of how to circumvent most of the guardrails you describe, e.g. how to collude

Comment by Raymond D on Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path? · 2025-02-25T15:29:56.157Z · LW · GW

I like the thrust of this paper, but I feel that it overstates how robust the safety properties will be, by drawing an overly sharp distinction between agentic and non-agentic systems, and not really engaging with the strongest counterexamples

To give some examples from the text:

A chess-playing AI, for instance, is goal-directed because it prefers winning to losing. A classifier trained with log likelihood is not goal-directed, as that learning objective is a natural consequence of making observations

But I could easily train an AI which simply classifies chess moves by quality. What takes that to being an agent is just the fact that its outputs are labelled as 'moves' rather than as 'classifications', rather than any feature of the model itself. More generally, even a LM can be viewed as "merely" predicting next tokens -- the fact that there is some perspective from which a system is non-agentic does not actually tell us very much.

Paralleling a theoretical scientist, it only generates hypotheses about the world and uses them to evaluate the probabilities of answers to given questions. As such, the Scientist AI has no situational awareness and no persistent goals that can drive actions or long-term plans.

I think it's a stretch to say something generating hypotheses about the world has no situational awareness and no persistent goals -- maybe it has indexical uncertainty, but a sufficiently powerful system is pretty likely to hypothesise about itself, and the equivalent of persistent goals can easily fall out of any ways its world model doesn't line up with reality. Note that this doesn't assume the AI has any 'hidden goals' or that it ever makes inaccurate predictions.

I appreciate that the paper does discuss objections to the safety of Oracle AIs, but the responses also feel sort of incomplete. For instance:

The counterfactual query proposal basically breaks down in the face of collusion
The point about isolating the training process from the real world says that "a reward-maximizing agent alters the real world to increase its reward", which I think is importantly wrong. In general, I think the distinctions drawn here between RL and the science AI all break down at high levels.
The uniqueness of solutions still leaves a degree of freedom in how the AI fills in details we don't know -- it might be able to, for example, pick between several world models that fit the data which each offer a different set of entirely consistent answers to all our questions. If it's sufficiently superintelligent, we wouldn't be able to monitor whether it was even exercising that freedom.

Overall, I'm excited by the direction, but it doesn't feel like this approach actually gets any assurances of safety, or any fundamental advantages.

Comment by Raymond D on Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development · 2025-02-02T22:48:30.366Z · LW · GW

Thank you for the very detailed comment! I’m pretty sympathetic to a lot of what you’re saying, and mostly agree with you about the three properties you describe. I also think we ought to do some more spelling-out of the relationship between gradual disempowerment and takeover risk, which isn’t very fleshed-out in the paper — a decent part of why I’m interested in it is because I think it increases takeover risk, in a similar but more general way to the way that race dynamics increase takeover risk.

I’m going to try to respond to the specific points you lay out, probably not in enough detail to be super persuasive but hopefully in a way that makes it clearer where we might disagree, and I’d welcome any followup questions off the back of that. (Note also that my coauthors might not endorse all this.)

Responding to the specific assumptions you lay out:

No egregious lying — Agree this will probably be doable, and pretty interested in the prospect of ‘ai police’, but not a crux for me. I think that, for instance, much of the harm caused by unethical business practices or mass manipulation is not reliant on outright lies but rather on ruthless optimisation. A lot also comes from cases where heavy manipulation of facts in a technically not-false way is strongly incentivised.
No strong AI rights — Agree de jure, disagree de facto, and partly this is contingent on the relative speeds of research and proliferation. Mostly I think there will be incentives and competitive pressures to give AIs decision making power, and that oversight will be hard, and that much of the harm will be emergent from complex interactions. I also think it’s maybe interesting to reflect on Ryan and Kyle’s recent piece about paying AIs to reveal misalignment — locally it makes sense, but globally I wonder what happens if we do a lot of that in the next few years.
No hot global war — Agree, also not a crux for me, although I think the prospect of war might generate pretty bad incentives and competitive pressures.

Overall, I think I can picture worlds where (conditional on no takeover) we reach states of pretty serious disempowerment of the kind described in the paper, without any of these assumptions fully breaking down. That said, I expect AI rights to be the most important, and the one that starts breaking down first.

As for the feedback loops you mention:

Owners of capital are aware of the consequences of their actions —
- I think this is already sort of not true: I have very little sense of what my current stocks are doing, and my impression is many CEOs don’t really understand most of what’s going on in their companies. Maybe AIs are naturally easier to oversee in a way that helps, maybe they operate at unprecedented scale and speed, overall I'm unsure which way this cuts.
- But also, I expect that the most important consequences of labor displacement by AIs are (1) displacement of human decision making, including over capital allocation and (2) distribution of economic power across humans, and resultant political incentives.
- On top of that, I think a lot of the economic badness will be about aggregate competitive effects, rather than individual obviously-bad actions. If an individual CEO notices their company is doing something bad to stay competitive, which other companies are also doing, then stopping the badness in the world is a lot harder than shutting down a department.
Politicians stop systems from doing obviously bad things —
- I also think this is not currently totally true; there is definitely a sense in which some politicians already do not change systems that have bad consequences (terrible material conditions for citizens, at least), partly because they themselves are beholden to some pretty unfortunate incentives. There are bad equilibria within parties, between parties, and between states.
- I also think that the mechanisms which keep countries friendly to citizens specifically are pretty fragile and contingent.
- So essentially, I think the standard of ‘obviously bad’ might not actually be enough.
Cultural consumption —
- Here I am confused about where we’re disagreeing, and I think I don't understand what you're saying. I’m not sure why people being able to choose the culture they consume would help, and I don’t think it’s something we assume in the paper.

I hope this sheds some light on things!

Comment by Raymond D on The Choice Transition · 2024-11-19T23:50:12.303Z · LW · GW

The writing here was definitely influenced by Lewis (we quote TAoM in footnote 6), although I think the Choice Transition is broader and less categorically negative.

For instance in Lewis's criticism of the potential abolition he writes things like:

The old dealt with its pupils as grown birds deal with young birds when they teach them to fly; the new deals with them more as the poultry-keeper deals with young birds— making them thus or thus for purposes of which the birds know nothing. In a word, the old was a kind of propagation—men transmitting manhood to men; the new is merely propaganda.

The Choice Transition as we're describing it is consistent with either of these approaches. There needn't be any ruling minority, nor do we assume humans can perfectly control future humans, just that they (or any other dominant power) can appropriately steer emergent inter-human dynamics (if there are still humans).

Comment by Raymond D on Automation collapse · 2024-10-22T01:05:52.396Z · LW · GW

Could you expand on what you mean by 'less automation'? I'm taking it to mean some combination of 'bounding the space of controller actions more', 'automating fewer levels of optimisation', 'more of the work done by humans' and maybe 'only automating easier tasks' but I can't quite tell which of these you're intending or how they fit together.

(Also, am I correctly reading an implicit assumption here that any attempts to do automated research would be classed as 'automated ai safety'?)

Comment by Raymond D on Why I’m not a Bayesian · 2024-10-07T00:20:38.943Z · LW · GW

When I read this post I feel like I'm seeing four different strands bundled together:
1. Truth-of-beliefs as fuzzy or not
2. Models versus propositions
3. Bayesianism as not providing an account of how you generate new hypotheses/models
4. How people can (fail to) communicate with each other

I think you hit the nail on the head with (2) and am mostly sold on (4), but am sceptical of (1) - similar to what several others have said, it seems to me like these problems don't appear when your beliefs are about expected observations, and only appear when you start to invoke categories that you can't ground as clusters in a hierarchical model.

That leaves me with mixed feelings about (3):
- It definitely seems true and significant that you can get into a mess by communicating specific predictions relative to your own categories/definitions/contexts without making those sufficiently precise
- I am inclined to agree that this is a particularly important feature of why talking about AI/x-risk is hard
- It's not obvious to me that what you've said above actually justifies knightian uncertainty (as opposed to infrabayesianism or something), or the claim that you can't be confident about superintelligence (although it might be true for other reasons)

Comment by Raymond D on Decomposing Agency — capabilities without desires · 2024-07-31T14:09:02.944Z · LW · GW

Strongly agree that active inference is underrated both in general and specifically for intuitions about agency.

I think the literature does suffer from ambiguity over where it's descriptive (ie an agent will probably approximate a free energy minimiser) vs prescriptive (ie the right way to build agents is free energy minimisation, and anything that isn't that isn't an agent). I am also not aware of good work on tying active inference to tool use - if you know of any, I'd be pretty curious.

I think the viability thing is maybe slightly fraught - I expect it's mainly for anthropic reasons that we mostly encounter agents that have adapted to basically independently and reliably preserve their causal boundaries, but this is always connected to the type of environment they find themselves in.

For example, active inference points to ways we could accidentally build misaligned optimisers that cause harm - chaining an oracle to an actuator to make a system trying to do homeostasis in some domain (like content recommendation) could, with sufficient optimisation power, create all kinds of weird and harmful distortions. But such a system wouldn't need to have any drive for boundary preservation, or even much situational awareness.

So essentially an agent could conceivably persist for totally different reasons, we just tend not to encounter such agents, and this is exactly the kind of place where AI might change the dynamics a lot.

Comment by Raymond D on Decomposing Agency — capabilities without desires · 2024-07-30T20:05:32.905Z · LW · GW

Interesting! I think one of the biggest things we gloss over in the piece in how perception fits into the picture, and this seems like a pretty relevant point. In general the space of 'things that give situational awareness' seems pretty broad and ripe for analysis.

I also wonder how much efficiency gets lost by decoupling observation and understanding - at least in humans, it seems like we have a kind of hierarchical perception where our subjective experience of 'looking at' something has already gone through a few layers of interpretation, giving us basically no unadulterated visual observation, presumably because this is more efficient (maybe in particular faster?).

Comment by Raymond D on Decomposing Agency — capabilities without desires · 2024-07-29T16:23:00.928Z · LW · GW

I'd be pretty curious to hear about your disagreements if you're willing to share

Comment by Raymond D on Decomposing Agency — capabilities without desires · 2024-07-17T09:10:30.996Z · LW · GW

This seems like a misunderstanding / not my intent. (Could you maybe quote the part that gave you this impression?)

I believe Dusan was trying to say that davidad's agenda limits the planner AI to only writing provable mathematical solutions. To expand, I believe that compared to what you briefly describe, the idea in davidad's agenda is that you don't try to build a planner that's definitely inner aligned, you simply have a formal verification system that ~guarantees what effects a plan will and won't have if implemented.

Comment by Raymond D on ChatGPT can learn indirect control · 2024-03-24T22:07:23.239Z · LW · GW

Oh interesting! I just had a go at testing it on screenshots from a parallel conversation and it seems like it incorrectly interprets those screenshots as also being of its own conversation.

So it seems like 'recognising things it has said' is doing very little of the heavy lifting and 'recognising its own name' is responsible for most of the effect.

I'll have a bit more of a play around and probably put a disclaimer at the top of the post some time soon.

Comment by Raymond D on Predictive model agents are sort of corrigible · 2024-01-06T18:52:52.696Z · LW · GW

The 'reward being chance of winning' stuff changes a bit about how the model generalises if it's playing a game with randomness and conditioned on the upper end - it biases the model towards 'expecting risk to pay off'. Eg/ if the model plays a 1-step game where it either banks 1 point or gets a 1% chance of 10 points, then conditioning on it getting 10 points will cause it to take the lower EV action. But this isn't super relevant.

In general I am pretty confused about how models would generalise strategies out of distribution, and this seems like at least partly an empirical ML / science of deep learning problem. But my intuition is that no, even if the model 'understands' that an action causes a good outcome [ P(outcome |. action) ], it still might not take that action if the action is unlikely on priors [ P(action) ], because the basic driver of its choice is what action it will observe given the outcome [ P(action | outcome) ] which is proportional to both of those quantities (and in fact, probably equal, with some normalising constant P(outcome)).

Re lowering the temperature: the thing the predictive model learns is usually some numbers that you normalise into a probability distribution with a softmax function, and you can easily tweak the softmax to make it favour more likely completions, and disfavour unlikely ones. This would greatly decrease the turn-by-turn chance of disconnecting the button (and therefore overall chance), but probably not impair the model's performance elsewhere much.

Comment by Raymond D on Predictive model agents are sort of corrigible · 2024-01-05T16:01:33.742Z · LW · GW

Re generalisation - decision transformers don't really have strategies per se, they pick moment to moment, and might be systematically miscalibrated about what they'll do in future timesteps. It is true that they'll have some chance at every timestep, which will add up over time, but if you were actually trying to implement this then you could do things like lowering the temperature, which shouldn't affect performance otherwise.

Re higher conditioning - I think this shouldn't be true. For the sake of argument we can reframe it as a binary outcome, where the model's final return (as a proportion of total possible return) becomes its chance of 'winning'. The thing the model is figuring out is not 'what action leads to me winning', or even 'what action is more likely in worlds where I win than worlds where I lose', it's 'what action do I expect to see from agents that win'. If on turn 1, 99% of agents in the training set voluntarily slap a button that has a 1% chance of destroying them, and then 50% go on to win, as well as 50% of the agents that didn't slap the button, then a DT will (correctly) learn that 'almost all agents which go on to win tend to slap the button on turn 1'.

Re correlation - Sure, I am taking the liberal assumption that there's no correlation in the training data, and indeed a lot of this rests on the training data having a nice structure

Comment by Raymond D on Goal-Direction for Simulated Agents · 2023-07-12T22:26:59.887Z · LW · GW

Thanks! Yeah this isn't in the paper, it's just a thing I'm fairly sure of which probably deserves a more thorough treatment elsewhere. In the meantime, some rough intuitions would be:

delusions are a result of causal confounders, which must be hidden upstream variables
if you actually simulate and therefore specify an entire markov blanket, it will screen off all other upstream variables including all possible confounders
this is ludicrously difficult for agents with a long history (like a human), but if the STF story is correct, it's sufficient, and crucially, you don't even need to know the full causal structure of reality, just a complete markov blanket
any holes in the markov blanket/boundary represent ways for unintended causal pathways to leak through, which separate the predictor's predictions about the effect of an action from the actual causal effect of the action, making the agent appear 'delusional'

I hope we'll have a proper writeup soon; in the meantime let me know if this doesn't make sense.

Comment by Raymond D on A Longlist of Theories of Impact for Interpretability · 2022-03-11T17:58:03.638Z · LW · GW

A slightly sideways argument for interpretability: It's a really good way to introduce the importance and tractability of alignment research

In my experience it's very easy to explain to someone with no technical background that

Image classifiers have got much much better (like in 10 years they went from being impossible to being something you can do on your laptop)
We actually don't really understand why they do what they do (like we don't know why the classifier says this is an image of a cat, even if it's right)
But, thanks to dedicated research, we have begun to understand a bit of what's going on in the black box (like we know it knows what a curve is, we can tell when it thinks it sees a curve)

Then you say 'this is the same thing that big companies are using to maximise your engagement on social media and sell you stuff, and look at how that's going. and by the way did you notice how AIs keep getting bigger and stronger?'

At this point my experience is it's very easy for people to understand why alignment matters and also what kind of thing you can actually do about it.

Compare this to trying to explain why people are worried about mesa-optimisers, boxed oracles, or even the ELK problem, and it's a lot less concrete. People seem to approach it much more like a thought experiment and less like an ongoing problem, and it's harder to grasp why 'developing better regularisers' might be a meaningful goal.

But interpretability gives people a non-technical story for how alignment affects their lives, the scale of the problem, and how progress can be made. IMO no other approach to alignment is anywhere near as good for this.

Comment by Raymond D on Signaling isn't about signaling, it's about Goodhart · 2022-01-06T22:21:25.817Z · LW · GW

My main takeaway from this post is that it's important to distinguish between sending signals and trying to send signals, because the latter often leads to goodharting.

It's tricky, though, because obviously you want to be paying attention to what signals you're giving off, and how they differ from the signals you'd like to be giving off, and sometimes you do just have to try to change them.

For instance, I make more of an effort now than I used to, to notice when I appreciate what people are doing, and tell them, so that they know I care. And I think this has basically been very good. This is very much not me dropping all effort to signal.

But I think what you're talking about is very applicable here, because if I were just trying to maximise that signal, I would probably just make up compliments, and this would probably be obviously insincere. So I guess the big question is, which things do you stop trying to do?

(Also, I notice I'm now overthinking editing this comment because I've switched gears from 'what am I trying to say' to 'what will people interpret from this'. Time to submit, I guess.)

Comment by Raymond D on Reply to Eliezer on Biological Anchors · 2021-12-23T21:25:53.727Z · LW · GW

if you think timelines are short for reasons unrelated to biological anchors, I don't think Bio Anchors provides an affirmative argument that you should change your mind.

Eliezer: I wish I could say that it probably beats showing a single estimate, in terms of its impact on the reader. But in fact, writing a huge careful Very Serious Report like that and snowing the reader under with Alternative Calculations is probably going to cause them to give more authority to the whole thing. It's all very well to note the Ways I Could Be Wrong and to confess one's Uncertainty, but you did not actually reach the conclusion, "And that's enough uncertainty and potential error that we should throw out this whole deal and start over," and that's the conclusion you needed to reach.

I would be curious to know what the intended consequences of the forecasting piece were.

A lot of Eliezer's argument seems to me to be pushing at something like 'there is a threshold for how much evidence you need before you start putting down numbers, and you haven't reached it', and I take what I've quoted from your piece to be supporting something like 'there is a threshold for how much evidence you might have, and if you're above it (and believe this forecast to be an overestimate) then you may be free to ignore the numbers here', contra the Humbali position. I'm not particularly confident on that, though.

Where this leaves me is feeling like you two have different beliefs about who will (or should) update on reading this kind of thing, and to what end, which is probably tangled up in beliefs about how good people are at holding uncertainty in their mind. But I'm not really sure what these beliefs are.

Comment by Raymond D on Taking Clones Seriously · 2021-12-01T20:26:16.659Z · LW · GW

The belief that people can only be morally harmed by things that causally affect them is not universally accepted. Personally I intuitively would like my grave to not be desecrated, for instance.

I think we have lots of moral intuitions that have become less coherent as science has progressed. But if my identical twin started licensing his genetic code to make human burgers for people who wanted to see what cannibalism was like, I would feel wronged.

I'm using pretty charged examples here, but the point I'm trying to convey is that there are a lot of moral lenses to apply here, and there are defensible deontological prohibitions to be made. Perhaps under scrutiny they'd fall away but I don't think it's clear cut, or at least not yet.

Comment by Raymond D on Taking Clones Seriously · 2021-12-01T18:53:17.537Z · LW · GW

You ask a number of good questions here, but the crucial point to me is that they are still questions. I agree it seems, based on my intuitions of the answers, like this isn't the best path. But 'how much would it cost' and 'what's the chance a clone works on something counterproductive' are, to me, not an argument against cloning, but rather arguments for working out how to answer those questions.

Also very ironic if we can't even align clones and that's what gets us.

Comment by Raymond D on Taking Clones Seriously · 2021-12-01T18:30:23.116Z · LW · GW

I think there are extra considerations to do with what the clone's relation to von Neumann. Plausibly, it might be wrong to clone him without his consent, which we can now no longer get. And the whole idea that you might have a right to your likeness, identity, image, and so on, becomes much trickier as soon as you have actually been cloned.

Also there's a bit of a gulf between a parent deciding to raise a child they think might do good and a (presumably fairly large) organisation funding the creation of a child.

I don't have strongly held convictions on these points, but I do think that they're important and that you'd need to have good answers before you cloned somebody.

Comment by Raymond D on Why Save The Drowning Child: Ethics Vs Theory · 2021-11-16T23:42:55.304Z · LW · GW

Well, I basically agree with everything you just said. I think we have quite different opinions about what politics is, though, and what it's for. But perhaps this isn't the best place to resolve those differences.

Comment by Raymond D on Why Save The Drowning Child: Ethics Vs Theory · 2021-11-16T21:31:56.033Z · LW · GW

Ok I think this is partly fair, but also clearly our moral standards are informed by our society, and in no small part those standards emerge from discussions about what we collectively would like those standards to be, and not just a genetically hardwired disloyalty sensor.

Put another way: yes, in pressured environments we act on instinct, but those instincts don't exist in a vacuum, and the societal project of working out what they ought to be is quite important and pretty hard, precisely because in the moment where you need to refer to it, you will be acting on System 1.

Comment by Raymond D on Why Save The Drowning Child: Ethics Vs Theory · 2021-11-16T21:25:37.756Z · LW · GW

I'm not sure I'm entirely persuaded. Are you saying that the goal of ethics is to accurately predict what people's moral impulse will be in arbitrary situations?

I think moral impulses have changed with times, and it's notable that some people (Bentham, for example) managed to think hard about ethics and arrive at conclusions which massively preempted later shifts in moral values.

Like, Newton's theories give you a good way to predict what you'll see when you throw a ball in the air, but it feels incorrect to me to say that Newton's goal was to find order in our sensory experience of ball throwing. Do you think that there are in fact ordered moral laws that we're subject to, which our impulses respond to, and which we're trying to hone in on?

Comment by Raymond D on Substack Ho? · 2021-11-09T14:12:56.496Z · LW · GW

Migration - they have a team that will just do it for you if you're on the annual plan, plus there's an exporting plugin (https://ghost.org/docs/migration/wordpress/)

Setup - yeah there are a bunch of people who can help with this and I am one of them

I'll message you

Comment by Raymond D on Substack Ho? · 2021-11-06T17:25:31.403Z · LW · GW

Massive conflict of interest: I blog on ghost, know and like the people at ghost, and work at a company that moved from substack to ghost, get paid to help people use ghost, and a couple more COIs in this vein.

But if you're soliciting takes from somebody from wordpress I think you might also appreciate the case for ghost, which I simply do think is better than substack for most bloggers above a certain size.

Re your cons, ghost:

1 - has a migration team and the ability to do custom routing, so you would be able to migrate your content

3 - supports total theme customisation

4 - supports analytics add-ons which would give you these details

5 - supports custom excerpts - doesn't even have to be the first bit of the post

6 - is built on open-source software, and you have the option of self-hosting

Some other pros:

really nice post editor
the upper limit of what you can do with add-ons and custom html injection is really high

Notable points against would be:

no mechanism for discovery like substack's
harder to set up than substack
analytics, commenting, and email click-through are not native, they're separate add-ons (although imo pretty easy to add)
I am not personally sure how hard migrating comments from wordpress would be
I don't know how to compare what degree of support you'd get from substack versus ghost
below a certain subscription threshold, more expensive (unlike substack's percentage fee ghost charges a rate that scales with subscribers)
just the big meta point that I am really biased here - I really don't want to give the impression of neutrality

Comment by Raymond D on Speaking of Stag Hunts · 2021-11-06T15:03:56.333Z · LW · GW

I'd like to throw out some more bad ideas, with fewer disclaimers about how terrible they are because I have less reputation to hedge against.

Inline Commenting

I very strongly endorse the point that it seems bad that someone can make bad claims in a post, which are then refuted in comments which only get read by people who get all the way to the bottom and read comments. To me the obvious (wrong) solution is to let people make inline comments. If nothing else, having a good way within comments to point to what part of the post you want to address feels like a strict win, and given that we already have pingbacks I think letting sufficiently good comments exist alongside the post would also be good. This could also be the kind of thing that a poster can enable or disable, and that a reader can toggle visibility on.

Personal Reputation

I don't have great models for how reputation should or does work on LessWrong. The second of these is testable though - I'd be curious to see what happened if prominent accounts, before commenting, flipped a coin, and in half of all cases posted through a random alt. Of course it may not be a bad thing if respected community figures get more consideration, but it would just be interesting to know how much of an effect it had. There are loads of obvious ways to hedge against this, all themed around anonymisation at different levels, but I think here it's less of a 'can' and more of a 'should', so I'd be curious to hear anyone else's thoughts on that.

Curated Comments

I agree that there are comments that are epistemically not so great. There's some underlying, very complicated question about 'who gets to decide what comments people should read', and I have some democratic instinct which resists any centralisation. But it does feel like some comments are notably higher-effort, or particularly at risk of brigading. I reckon a full prediction market-style moderation system would be a mess, but it seems like it wouldn't be that hard if, when someone made a comment, they could submit it for curation as 'a particularly carefully considered, relevant, and epistemically hygienic response' which, if approved, would be bumped above non-curated comments, with some suitable minor penalty for failed attempts or notes of feedback.

Debate as a model

In formal debate (or at least the kind I did) you distinguish between a point of information and a point of order. When you try to lodge a point of information, the opposing speaker can choose whether they'd like to be interrupted, and you're just interjecting some relevant facts. A point of order, though, is made to the chair, when there's a procedural violation, and it can very much interrupt you. I'm not sure how you'd extend this to lesswrong but it feels like a useful distinction in a similar context.

Comment by Raymond D on The Opt-Out Clause · 2021-11-04T11:52:44.599Z · LW · GW

I'm really enjoying the difference between the number of people who claimed they opted out and the number of people who explicitly wrote the phrase

Comment by Raymond D on The Opt-Out Clause · 2021-11-03T22:35:47.550Z · LW · GW

ah whoops thanks!

Comment by Raymond D on The Opt-Out Clause · 2021-11-03T22:31:24.218Z · LW · GW

What's the procedure?

User info

Posts

Comments