Posts
Comments
Third, we gave the model a secret “scratchpad”—a workspace where it could record its step-by-step reasoning. We told the model to use the scratchpad to reason about what it should do (as far as the model was aware, the scratchpad was invisible to users and to Anthropic). This allowed us to observe whether the model planned to fake alignment, and how it would go about doing so.
This feels like amusingly like tricking a child. "Remember kiddo, you can reason out loud about where you're going to hide and I won't hear it. Now let's play hide and seek!"
I gave my strongest hypothesis for why it looks to me that many many people believe it's responsible to take down information that makes your org look bad. I don't think alternative stories have negligible probability, nor does what I wrote imply that, though it is logically consistent with that.
There are many anti-informative behaviors that are widespread for which people do for poor reasons, like saying that their spouse is the best spouse in the world, or telling customers that their business is the best business in the industry, or saying exclusively glowing things about people in reference letters, that are best explained by the incentives on the person to present themselves in the best light; at the same time, it is respectful to a person, while in dialogue with them, to keep a track of the version of them who is trying their best to have true beliefs and honestly inform others around them, in order to help them become that person (and notice the delta between their current behavior and what they hopefully aspire to).
Seeing orgs in the self-identified-EA space take down information that makes them look bad is (to me) not that dissimilar to the other things I listed.
I think it's good to discuss norms about how appropriate it is to bring up cynical hypotheses about someone during a discussion in which they're present. In this case I think raising this hypothesis was worthwhile it for the discussion, and I didn't cut off any way for the person in question to continue to show themselves to be broadly acting in good faith, so I think it went fine. Li replied to Habryka, and left a thoughtful pair of comments retracting and apologizing, which reflected well on them in my eyes.
List of posts that seem promising to me, that are about to fall out of the annual review in 10 mins because only I have voted on them:
I'm nominating this! On skimming, this is a very readable dialogue with an AI about ethics, lots of people seem to have found it valuable to read. I hope to give it a full read and review in the review period.
I appreciated reading this layout of a perspective for uncollaborative truth-seeking discourse, even though I disagree with many parts of it. I'll give it a positive vote here in the last two hours of the nominations period, I hope someone else gives it one too.
Tentative +9, I aim to read/re-read the whole sequence before the final vote and write a more thorough review.
My current quickly written sense of the sequence is that it is a high-effort, thoughtfully written attempt to help people with something like 'generating the true hypotheses' rather than 'evaluating the hypotheses that I already have'. Or 'how to do ontological updates well and on-purpose'.
Skimming the first few posts, there's an art here that I don't see other people talking about unprompted very much (as a general thing one can do well, of course sometimes people talk about having ontological updates) and have not seen written down in detail before and it's so awesome that someone has made a serious attempt.
I haven't read it all but I have seen bits and pieces of the thinking and explanations (and been to a short workshop by Logan), and I think this should definitely go through to the review phase and probably some of the essays (or the sequence as-a-whole) should go into the top of the review.
Recently, I told a friend of mine that I'd been to a wedding. They asked how it was, and I said the couple clearly loved each other very much (as they made clear repeatedly in their speeches). My friend made a face like that I read as some kind of displeasure, a bit of a grimace. Since then, I've been wondering why that was.
I think it's a common occurrence, that people feel negatively about others openly expressing their love for something (a person, a piece of art, a place, etc). I'm pretty sure I've had this feeling myself, but I don't know why.
I can think of two hypotheses.
- It's 'sappy'. It's kind of too much for me to feel this in whatever social context I'm in right now (e.g. in my office at work, getting drinks with some people I don't know that well, etc), such that I am resistant to starting to feel it, and resistant to others bringing me into feeling it in this environment. It is not a good time for me to be overcome with emotion for tens of minutes!
- It's 'forced'. People believe that other people pretend their loving emotions more than is real, and the intensity of the truth combined with the forced feeling is unpleasant. I'm expected to believe them, and also reciprocate, and I don't like doing that with something that is dear to me.
Anyone got any other hypotheses, or think that they know the answer?
Are there Manifold markets yet on whether this was a suicide and whether it will turn out that this was due to any pressures relating to the OpenAI whistleblowing?
I don't think that Duncan tried to describe what everyone has agreed to, I think he tried to describe the ideal truth-seeking discussion norms, irrespective of this site's current discussion norms.
Added: I guess one can see here what the algorithm he aimed to run, which had elements of both:
In other words, the guidelines are descriptive of good discourse that already exists; here I am attempting to convert them into prescriptions, with some wiggle room and some caveats.
+4. This doesn't offer a functional proposal, but it makes some important points about the situation and offers an interesting reframe, and I hope it gets built upon. Key paragraph:
In other words: from a libertarian perspective, it makes really quite a lot of sense (without compromising your libertarian ideals even one iota) to look at the AI developers and say "fucking stop (you are taking far too much risk with everyone else's lives; this is a form of theft until and unless you can pay all the people whose lives you're risking, enough to offset the risk)".
I see. I agree it makes the strength of discourse here weaker, and agree that the people blocked were specifically people who have disagreements about the standards to aspire to in large group discourse. I am grateful that at least one of the people has engaged well elsewhere, and have written a review encouraging people to positively vote on that post (I gave it +4). While I do think it's likely some valid criticisms of content within the posts have been missed as a result of such silencing effects under Duncan's posts, I feel confident enough that there's a lot of valuable content that I still think it deserves to score highly in the review.
I think that Duncan was not aspiring to set his own preferred standards, but to figure out the best standards for truth-seeking discourse. I might agree that he did not perfectly succeed, but I'm not sure this means all attempts, if deemed not perfectly successful, should be called "My Guidelines for My Preferred Discourse".
It does seem worth having a term here! +4 for pointing it out and the attempt.
I've been re-reading tons of old posts for the review to remember them and see if they're worth nominating, and then writing a quick review if yes (and sometimes if no).
I've gotten my list of ~40 down to 3 long ones. If anyone wants to help out, here are some I'd appreciate someone re-reading and giving a quick review of in the next 2 days.
- To Predict What Happens, Ask What Happens by Zvi
- A case for AI alignment being difficult by Jessicata
- Alexander and Yudkowsky on AGI goals by Scott Alexander & Eliezer Yudkowsky
A great, short post. I think it retreads some similar ground that I aim to point at in A Sketch of Good Communication, and I think in at least one important regard it does much better. I give this +4.
I think the analogy in this post makes a great point v clearly, and improves upon the discussion of how those who control the flow of information mislead people. +4
I have various disagreements with some of the points in this post, and I don't think it adds enough new ideas to be strongly worthy of winning the annual review, but I am grateful to have read it, and for worthwhile topics it helps to retread the same ground in slightly different ways with some regularity. I will give this a +1 vote.
(As an example disagreement, there's a quote of a fictional character saying "There will be time enough for love and beauty and joy and family later. But first we must make the world safe for them." A contrary hypothesis I believe in more is that growing from children into adults involves bringing to life all parts of us that have been suffocated by Moloch, including many of these very powerful very human parts, and it is not good for these parts of us to be lost to the world until after the singularity.)
I like something about this post. It might just be the way it's setting up to save conversations that are going sideways. Anyway, I'd be interested to hear from the author how much use this post ended up getting. For now, I'll give it a positive vote in the review.
I re-read about 1/3rd of this while looking through posts to nominate. I think it's an account of someone who believes in truth-seeking, engaging with the messy political reality of an environment that cared about the ideals of truth-seeking far more than most other places on earth, and finding it to either fall short or sometimes betray those ideals. Personally I find a post like this quite helpful to ruminate on and read, to think about my ideals and how they can be played out in society.
I can't quickly tell if it is the right thing for the LW review or not, feeling a bit more like a personal diary, with a person telling their version of a complicated series of events, with all of the epistemic reliance of such an account (i.e. there will often be multiple other perspectives on what happened that I would want to hear before I am confident about what actually happened)... though with more care and aspirations to truth-seeking ideals than most people would care to put in or even know one could aspire to.
I'll think about it more later, but for now I'm giving it a positive vote to see it through the next phase of the review.
I feel more responsibility to be the person holding/tracking the earnest hypothesis in a 1-1 context, or if I am the only one speaking; in larger group contexts I tend to mostly ask "Is there a hypothesis here that isn't or likely won't be tracked unless I speak up" and then I mostly focus on adding hypotheses to track (or adding evidence that nobody else is adding).
I don't know how to quickly convey why I find this point so helpful, but I find this to be a helpful pointer to a key problem, and the post is quite short, and I hope someone else positively votes on it. +4.
I also believe that the data making EA+CEA looks bad is the causal reason why it was taken down. However, I want to add some slight nuance.
I want to contrast a model whereby Angelina Li did this while explicitly trying to stop CEA from looking bad, versus a model whereby she senses that something bad might be happening, she might be held responsible (e.g. within her organization / community), and is executing a move that she's learned is 'responsible' from the culture around her.
I think many people have learned to believe the reasoning step "If people believe bad things about my team I think are mistaken with the information I've given them, then I am responsible for not misinforming people, so I should take the information away, because it is irresponsible to cause people to have false beliefs". I think many well-intentioned people will say something like this, and that this is probably because of two reasons (borrowing from The Gervais Principle):
- This is a useful argument for powerful sociopaths to use when they are trying to suppress negative information about themselves.
- The clueless people below them in the hierarchy need to rationalize why they are following the orders of the sociopaths to prevent people from accessing information. The idea that they are 'acting responsibly' is much more palatable than the idea that they are trying to control people, so they willingly spread it and act in accordance with it.
A broader model I have is that there are many such inference-steps floating around the culture that well-intentioned people can accept as received wisdom, and they got there because sociopaths needed a cover for their bad behavior and the clueless people wanted reasons to feel good about their behavior; and that each of these adversarially optimized inference-steps need to be fought and destroyed.
Thanks again.
I am currently holding a rough hypothesis of "when someone is interested in exploring psychosis and psychedelics, they become more interested in Michael Vassar's ideas", in that the former causes the latter, rather than the other way around.
Is that a claim of this post? It's a long post so I might be forgetting a place where Zvi writes that, but I think most of the relevant parts of this book review are about how MacAskill and EAs are partly responsible for empowering Sam Bankman-Fried, for supporting him with great talent and trust with funders and a positive public image.
Thanks for answering; good to hear that you don't think you've had any severe or long-lasting consequences (though it sounds like one time LSD was a contributor to your episode of bad mental health).
I guess here's other question that seems natural: it's been said that some people take LSD on either the personal advice of Michael Vassar, or otherwise as a result of reading/discussing his ideas. Are either of those true for you?
I am somewhat confused about this.
To be clear I am pro people from organizations I think are corrupt showing up to defend themselves, so I would upvote it if it had like 20 karma or less.
I would point out that the comments criticizing the organization’s behavior and character are getting similar vote levels (e.g. top comment calls OpenAI reckless and unwise and 185 karma and 119 agree-vote).
Yes, there is, I’ll get the post up today.
...did you try to 'induce psychosis' in yourself by taking psychedelics? If so I would also ask about how much you took and if you had any severe or long-lasting consequences.
+9. This is at times hilarious, at times upsetting story, of how a man gained a massive amount of power and built a corrupt empire. It's a psychological study, as well as a tale of a crime, hand-in-hand with a lot of naive ideologues.
I think it is worthwhile for understanding a lot about how the world currently works, including understanding individuals with great potential for harm, the crooked cryptocurrency industry, and the sorts of nerds in the world who falsely act in the name of good.
I don't believe that all the details here are fully accurate, but that enough of it is to be a worthy story to read.
(It is personally upsetting to me that the person who was ~King over me and everyone I knew professionally and personally turned out to be such a spiritually-hollow crook, and to know how close I am to being in a world where his reign continues.)
I think that someone reading this would be challenged to figure out for themselves what assumptions they think are justified in good discourse, and would fix some possible bad advice they took from reading Sabien's post. I give this a +4.
(Below is a not especially focused discussion of some points raised; perhaps after I've done more reviews I can come back and tighten this up.)
Sabien's Fifth guideline is "Aim for convergence on truth, and behave as if your interlocutors are also aiming for convergence on truth."
My guess is that the idea that motivates Sabien's Fifth Guideline is something like "Assume by-default that people are contributing to the discourse in order to share true information and strong arguments, rather than posing as doing that while sharing arguments they don't believe or false information in order to win", out of a sense that there is indeed enough basic trust to realize this as an equilibrium, and also a sense that this is one of the ~best equilibriums for public discourse to be in.
One thing this post argues is that a person's motives are of little interest when one can assess their arguments. Argument screens off authority and many other things too. So we don't need to make these assumptions about people's motives.
There's a sense in which I buy that, and yet also a sense in which the epistemic environment I'm in matters. Consider two possibilities:
- I'm in an environment of people aspiring to "make true and accurate contributions to the discourse" but who are making many mistakes/failing.
- I'm in an environment of people who are primarily sharing arguments and evidence filtered to sound convincing for positions that are convenient to them, and are pretending to be sort of people described in the first one.
I anticipate very different kinds of discussions, traps, and epistemic defenses I'll want to have in the two environments, and I do want to treat the individuals differently.
I think there is a sense in which I can just focus on local validity and evaluating the strength of arguments, and that this is generally more resilient to whatever the particular motives are of the people in the local environment, but my guess is that I should still relate to people and their arguments differently, and invest in different explanations or different incentives or different kinds of comment thread behavior.
I also think this provides good pushbacks on some possible behaviors people might take away from Sabien's fifth guideline. (I don't think that this post correctly understands what Sabien is going for, but I think bringing up reasonable hypotheses and showing why they don't make sense is helpful for people's understanding of how to participate well in discourse.)
Simplifying a bit, this is another entry in the long-running discourse on how adversarial one should model individuals in public discourse as, and what assumptions to make about other people's motives, and I think this provides useful arguments about that topic.
I give this a +9, one of the most useful posts of the year.
I think that a lot of these are pretty non-obvious guidelines that make sense when explained, and I continue to put effort in to practicing them. Separating observations and inferences is pro-social, making falsifiable claims is pro-social, etc.
I like this document both for carefully condensing the core ideas into 10 short guidelines, and also having longer explanations for those who want to engage with them.
I like that it’s phrased as guidelines rather than rules/norms. I do break these from time to time and endorse it.
I don't agree with everything, this is not an endorsement, I have many nuances and different phrasings and different standards, but I think this is a worthwhile document for people to study, especially those approaching this sort of discourse for the first time, and it's v well-written.
It's a fine post, but I don't love this set of recommendations and justifications, and I feel like rationalist norms & advice should be held to a high standard, so I'm not upvoting it in the review. I'll give some quick pointers to why I don't love it.
- Truth-Seeking: Seems too obvious to be useful advice. Also I disagree with the subpoint about never treating arguments like soldiers, I think two people inhabiting opposing debate-partners is sort of captured by this and I think this is a healthy truth-seeking process.
- Non-Violence: All the examples of things you're not supposed to do in response to an argument are things you're not supposed to do anyway. Also it seems too much like it's implying the only response to an argument is a counter-argument. Sometimes the correct response to bad argument is to fire someone or attempt to politically disempower them. As an example, Zvi Mowshowitz presents evidence and argument in Repeal the Jones Act of 1920 that there are a lot of terrible and disingenuous arguments being put forward by unions that are causing a total destruction of the US shipping industry. The generator here of arguments seems reliably non-truth-tracking, and I would approve of someone repealing the Jones act without persuading such folks or spending the time to refute each and every argument.
- Non-Deception: I'll quote the full description here:
- "Never try to steer your conversation partners (or onlookers) toward having falser models. Where possible, avoid saying stuff that you expect to lower the net belief accuracy of the average reader; or failing that, at least flag that you're worried about this happening."
- I think that the space of models one walks through is selected for both accuracy and usefulness. Not all models are equally useful. I might steer someone from a perfectly true but vacuous model, to a less perfect but more practical model, thereby net reducing the accuracy of a person's statements and beliefs (most of the time). I prefer something more like a standard of "Intent to Inform".
Various other ones are better, some are vague, many things are presented without justification and I suspect I might disagree if it was offered. I think Zack M. Davis's critique of 'goodwill' is good.
I disagree with the first half of this post, and agree with the second half.
"Physicist Motors" makes sense to me as a topic. If I imagine it as a book, I can contrast it with other books like "Motors for Car Repair Mechanics" and "Motors for Hobbyist Boat Builders" and "Motors for Navy Contract Coordinators". These would focus on other aspects of motors such as giving you advice for materials to use and which vendors to trust or how to evaluate the work of external contractors, and give you more rules of thumb for your use case that don't rely on a great deal of complex mathematical calculations (e.g. "how to roughly know if a motor is strong enough for your boat as a function of the weight and surface area of the boat"). The "Physicist Motors" book would focus on the math of ideal motors and doing experiments to see the basic laws of physics at play.
Similarly, many places want norms of discourse, or have goals for discourse, and a rationalist focus would connect it to principles of truth-seeking more directly (e.g. in contrast with norms of "YouTube Discourse" or "Playful/Friendly Discourse").
So I don't believe that it is a confused thing to do, to outline practical heuristics or norms rationalist discourse as opposed to other kinds of discourse or other goals one might have with discourse.
In contrast, this critique seems of a valid type:
"A vague spirit of how to reason and argue" seems like an apt description of what "Basics of Rationalist Discourse" and "Elements of Rationalist Discourse" are attempting to codify—but with no explicit instruction on which guidelines arise from deep object-level principles of normative reasoning, and which from mere taste, politeness, or adaptation to local circumstances
Arguing that the principles/heuristics proposed are in conflict with the underlying laws of probability theory and such is a totally valid kind of critique. And I think the critique of the "goodwill" heuristic is pretty good.
My take is that if you positively vote on Bensinger's "Elements of Rationalist Discourse" then it makes sense to also upvote this post in the review as it is a counterpoint that has a good critique, but I wouldn't otherwise, as I disagree with the core analogy.
Hear, hear!
At least Anthropic didn't particularly try to be a big commercial company making the public excited about AI. Making the AI race a big public thing was a huge mistake on OpenAI's part, and is evidence that they don't really have any idea what they're doing.
I just want to point out, I don't believe this is the case, I believe that the CEO is attempting to play games with the public narrative that benefit his company financially.
I... think that reading personal accounts of psychotic people is useful for understand the range of the human psyche and what insanity looks like? My guess is that on the margin it would be good for most people to have a better understanding of that, and reading this post will help, so I'm giving this a +1 for the LW review.
Thanks for writing it.
Most of my time I worry about ways in which I and everyone around me may be insane in ways we haven't noticed. Reading this, I've thought for the first time that perhaps I and most people I know are doing quite well on the sanity axis.
Fun post, but insofar as it's mostly expository of some basic game theory ideas, I think it doesn't do a good enough job of communicating that the starting assumption is that one is in a contrived (but logically possible) equilibrium. Scott Alexander's example is clearer about this. So I am not giving it a positive vote in the review (though I would for an edited version that fixed this issue).
Did this happen yet? I would even just be into a short version of this (IMO good) post.
Reading this post reminds me of my standard online heuristic: just because someone is spending a lot of effort writing about you, does not mean that it is worth a minute of your time to read it.
(This is of course a subset of the general heuristic that most writing has nothing worth reading in it; but it bears keeping in mind that this doesn't change when the writing is about you.)
I have made a Manifold market for predicting how much we will raise! Get your bets in.
I wanted a datapoint for Czynski's hypothesis that LW 2.0 killed the comment sections, so I checked how many comments your blogposts were getting in the first 3 months of 2017 (before LW 2.0 rebooted). There were 13 posts, and the comment counts were 0, 0, 2, 6, 9, 36, 0, 5, 0, 2, 0, 0, 2. (The 36 was a a political post in response to the US election, discussion of which I generally count as neutral or negative on LW, so I'd discount this.)
I'll try the same for Zvi. 13, 8, 3, 1, 3, 18, 2, 19, 2, 2, 2, 5, 3, 7, 7, 12, 4, 2, 61, 31, 79. That's more active (the end was his excellent sequence Against Facebook, and the last one was a call for people to share links to their blogs).
So that's not zero, there was something to kill. How do those numbers compare during LessWrong 2.0? My sense is that there's two Zvi eras, there's the timeless content (e.g. Mazes, Sabbaths, Simulacra) and the timeful content (e.g. Covid, AI, other news). The latter is a newer, more frequent, less deep writing style, so it's less apples to apples, so instead let's take the Moral Mazes sequence from 2020 (when LW 2.0 would've had a lot of time to kill Zvi's comments). I'm taking the 17 posts in this main sequence and counting the number of comments on LW and Wordpress.
# | LW | Wordpress |
1 | 16 | 5 |
2 | 40 | 19 |
3 | 29 | 23 |
4 | 8 | 12 |
5 | 7 | 21 |
6 | 56 | 10 |
7 | 6 | 13 |
8 | 12 | 8 |
9 | 18 | 8 |
10 | 21 | 18 |
11 | 26 | 21 |
12 | 42 | 16 |
13 | 6 | 11 |
14 | 9 | 15 |
15 | 14 | 18 |
16 | 11 | 19 |
17 | 28 | 22 |
SUM | 349 | 259 |
This shows the comment section on Wordpress about as active as it was in the 3-month period above (259 vs 284 comments) in the 2-months that the Mazes sequence was released, and comments were more evenly distributed (median of 17 vs 5). And it shows that the LessWrong comment section more than doubled the amount of discussion of the posts, without reducing the total discussion on Zvi's wordpress blog.
These bits of data aren't consistent with LW killing other blogs. FWIW my alternative hypothesis is that these things are synergistic (e.g. I also believe that the existence of LessWrong and the EA Forum increases discussion on each), and I think that is more consistent with the Zvi commenting numbers.
I agree that which terms people use vs taboo is a judgment call, I don't mean to imply that others should clearly see these things the same as me.
As my 2 cents, the phrase 'deadname' to me sounded like it caught on because it was hyperbolic and imputes aggression – similar to how phrases like trauma caught on (which used to primarily refer to physical damage like the phrase "blunt-forced trauma") and notions spread that "words can be violence" (which seems to me to be bending the meaning of words like 'violence' too far and is trying to get people on board for a level of censorship that isn't appropriate). I similarly recall seeing various notions on social media that not using the requested pronouns for transgender people constituted killing them due the implied background levels of violence towards such people in society.
Overall this leaves me personally choosing not to use the term 'deadname' and I reliably taboo it when I wish to refer to someone using the person's former alternative-gendered name.
I've updated future posts to have start time at 6:30 and doors open at 6pm.
Well that escalated quickly (at the very end).
cabotage
I assumed this was a typo for 'sabotage' the first time I saw it. For those wondering, here's a definition from google.
restriction of the operation of sea, air, or other transport services within or into a particular country to that country's own transport services.
By contrast, a report by the pro-Jones Act American Maritime Partnership claims ‘the Jones Act is responsible for’ 13,000 jobs and adding $3.3 billion to the economy, which means that is currently the value to Hawaii of all shipborne trade with America.
Noob question: is this supposed to be low or high? Or is this just a list of datapoints regardless of how they fall?
Curated![1] I found this layout of how contracts/agreements get settled on in personal conversation very clarifying.
- ^
"Curated", a term which here means "This just got emailed to 30,000 people, of whom typically half open the email, and it gets shown at the top of the frontpage to anyone who hasn't read it for ~1 week."
Oh interesting, thanks for the feedback. I think I illusion-of-transparency'd that people would feel fine about arriving in the 6:15-6:30 window. In my head the group discussions start at about 6:30. I'll make a note to update the description hopefully for next time.