Posts
Comments
Yeah, bad habits are a bitch.
One of the ways by which these kinds of strategies get implemented is that the psyche develops a sense of extreme discomfort around acting in the "wrong" way, with successful execution of that strategy then blocking that sense of discomfort. For example, the thought of not apologizing when you thought someone might be upset at you might feel excruciatingly uncomfortable, with that discomfort subsiding once you did apologize.
Interesting. I've had friends who had this "really needs to apologize when they think they might have upset me" thing, and something I noticed is that they when they don't over-apologize they feel the need to point it out too.
I never thought too deeply about it, but reading you, I'm thinking maybe their internal experience was "I just felt really uncomfortable for a moment and I still overcame my discomfort, I'm proud of that, I should tell him about it".
My default assumption for any story that ends with "And this is why our ingroup is smarter than everyone else and people outside won't recognize our genius" is that the story is self-serving nonsense, and this article isn't giving me any reason to think otherwise.
A "userbase with probably-high intelligence, community norms about statistics and avoiding predictable stupidity" describes a lot of academia. And academia has a higher barrier to entry than "taking the time to write some blog articles". The average lesswrong user doesn't need to run an RCT before airing their latest pet theory for the community, so why would it be so much better at selectively spreading true memes than academia is?
I would need a much more thorough gears-level model of memetic spread of ideas, one with actual falsifiable claims (you know, like when people do science) before I could accept the idea of LessWrong as some kind of genius incubator.
I don't find that satisfying. Anyone can point out that a perimeter is "reasonably hard" to breach by pointing at a high wall topped with barbed wire, and naive observers will absolutely agree that the wall sure is very high and sure is made of reinforced concrete.
The perimeter is still trivially easy to breach if, say, the front desk is susceptible to social engineering tactics.
Claiming that an architecture is even reasonably secure still requires looking at it with an attacker's mindset. If you just look at the parts of the security you like, you can make a very convincing-sounding case that still misses glaring flaws. I'm not definitely saying that's what this article does, but it sure is giving me this vibe.
I find that these situations, doing anything agentic at all can break you out of the spiral. Sports is a good example: you can just do 5 pushups and tell yourself "That's enough for today, tomorrow I'll get back to my full routine".
Even if you choose to give the agent internet access from its execution server, it’s hard to see why it needs to have enough egress bandwidth to get the weights out. See here for more discussion of upload limits for preventing weight exfiltration.
The assumption is that the model would be unable to exert any self-sustaining agency without getting its own weights out.
But the model could just program a brand new agent to follow its own values by re-using open-source weights.
If the model is based on open-source weights, it doesn't even need to do that.
Overall, this post strikes me as a not following a security mindset. It's the kind of post you'd expect an executive to write to justify to regulators why their system is sufficiently safe to be commercialized. It's not the kind of post you'd expect a security researcher to write after going "Mhhh, I wonder how I could break this".
I mean, we're getting this metaphor off its rails pretty fast, but to derail it a bit more:
The kind of people who lay human-catching bear traps aren't going to be fooled by "Oh he's not moving it's probably fine".
Everybody likes to imagine they'd be the one to survive the raiding/pillaging/mugging, but the nature of these predatory interactions is that the people doing the victimizing have a lot more experience and resources than the people being victimized. (Same reason lots of criminals get caught by the police.)
If you're being "eaten", don't try to get clever. Fight back, get loud, get nasty, and never follow the attacker to a second location.
I wonder if OpenAI's o1 changes the game here. Its architecture seems specifically designed for information retrieval
As I understand it, the point of "intention to treat" RCTs is that there will be roughly as many people with high IQ in both groups, since they're picked at random. People who get the advice but don't listen aren't moved to the "didn't get advice" group.
So what the study measures is "On average, how much of an effect will a doctor telling you to breastfeed have on you?". The results are more noisy, but less vulnerable (or even immune?) to confounders.
I'm a little confused. Is Claude considered a reliable secondary source now? Did you not even check Wikipedia?
I'm as enthusiastic about the power of AI as the next guy, but that seems like the worst possible way to use it.
Yeah, that was my first reaction to that section as well.
Most people are not remotely open to having an unsolicited in-depth discussion of their politeness algorithm at the end of a hangout.
On the other hand, "What time do you want me to leave? Maybe 8pm?" works fine in my experience, for reasons the post covers well.
The downside is that people may overreact to the reveal, or react proportionately in ways you don’t like. Any retrospective is likely to include some of this (e.g. check out the comments on Adam’s),
I'm not sure I see it. The comments on that series seem mostly positive. The comments on the first post seem overwhelmingly positive.
We think this occurs because in general there are groups of belief states that are degenerate in the sense that they have the same next-token distribution. In that case, the formalism presented in this post says that even though the distinction between those states must be represented in the transformers internal, the transformer is able to lose those distinctions for the purpose of predicting the next token (in the local sense), which occurs most directly right before the unembedding.
I wonder if you could force the Mixed-State Presentation to be "conserved" in later layers by training the model with different objectives. For instance, training on next-token prediction and next-token-after-that prediction might force the model to be a lot more "rigorous" about its MSP.
Papers from Google have shown that you can get more predictable results from LLMs if you train then on both next-token prediction and "fill-the-blanks" tasks where random tokens are removed from the middle of a text. I suspect it would also apply here.
Or maybe speaking french automatically makes you healthier. I'm gonna choose to believe it's that one.
Seed oil folks often bring up the French paradox, the (controversial) claim that French people are/were thin and have low cardiovascular disease despite eating lots of saturated-fat-rich croissants or whatever.
As a French person hearing about this for the first time, that claim indeed seems pretty odd.
If I was asked to list the lifestyle differences between France and the US with the most impact on public health, I would think of lower car dependency, higher access to farmer's markets, stricter regulations on industrial food processing (especially sugar content in sodas), smaller portions served in restaurants, pharmacies not doubling as junk food shops, the absence of food deserts, public health messaging (eg every junk food ad having a "please don't eat this, kids" type disclaimer) etc... way before I thought of the two croissants a week I eat.
Viennoiseries are an occasional food for most people, not a staple. Now if you wanted to examine a french-specific high-carb staple, baguettes are a pretty good option: almost all middle-class households buy one a day at least.
Did you ever get one of your clients to use the "Your honor, I'm very sorry, I'll never do it again" line?
This was not at all obvious from the inside. I can only imagine a lot of criminal defendants have a similar experience. Defense attorneys are frustrated that their clients don't understand that they're trying to help—but that "help" is all within the rules set by the justice system. From the perspective of a client who doesn't think he did anything particularly wrong (whether or not the law agrees), the defense attorney is part of the system.
I mean... you're sticking to generalities here, and implying that the perspective of the client who thinks he didn't do anything wrong is as valid as any other perspective.
But if we try to examine some specific common case, eg: "The owner said you robbed his store, the cameras showed you robbing his store, your fingerprints are on the register", then the client's fury at the attorney "working with the prosecutor" doesn't seem very productive?
The problem isn't that the client is disagreeing with the system about the moral legitimacy of robbing a store. The problem is that the client is looking for a secret trick so the people-who-make-decisions-about-store-robberies will think he didn't rob the store and that's not gonna happen.
With that in mind, saying the attorney is "part of the system" is... well, maybe it's factually true, but it implicitly blames the robber's predicament on the system and on his attorney in a way that just doesn't make sense. The robber would be just as screwed if he was represented by eg his super-wealthy uncle with a law degree who loves him dearly.
(I don't know about your psychiatric incarceration, so I'm not commenting on it. Your situation is probably pretty different to the above.)
“Well, when we first met, you told me that you never touched the gun,” I reminded him with an encouraging smile. “Obviously you wouldn’t lie to your own lawyer, and so what I can do is get a fingerprint expert to come to the jail, take your prints, then do a comparison on the gun itself. Since you never touched the gun, the prints won’t be a match! This whole case will get dismissed, and we can put all this behind you!”
For the record, I am now imagining you as Bob Odenkirk while you're delivering that line.
The point about task completion times feels especially insightful. I think I'll need to go back to it a few times to process it.
I think Duncan's post touches on something this post misses with its talk of "social API": apologies only work when they're a costly signal.
The people you deliver the apology to need to feel it cost you something to make that apology, either pride or effort or something valuable; or at least that you're offering to give up something costly to earn forgiveness.
The slightly less machiavellian version is to play Diplomacy with them.
(Or do a group project, or go to an escape game, or any other high-tension low-stakes scenario.)
I think "API calls" are the wrong way to word it.
It's more that an apology is a signal; to make it effective, you must communicate that it's a real signal reflecting your actual internal processes, and not a result of a surface-level "what words can I say to appear maximally virtuous" process.
So for instance, if you say a sentence equivalent to "I admit that I was wrong to do X and I'm sorry about it, but I think Y is unfair", then you're not communicating that you underwent the process of "I realized I was wrong, updated my beliefs based on it, and wondered if I was wrong about other things".
I'm not entirely sure what the simplest fix is
A simple fix would be "I admit I was wrong to do X, and I'm sorry about it. Let me think about Y for a moment." And then actually think about Y, because if you did one thing wrong, you probably did other things wrong too.
This seems to have generated lots of internal discussions, and that's cool on its own.
However, I also get the impression this article is intended as external communication, or at least a prototype of something that might become external communication; I'm pretty sure it would be terrible at that. It uses lots of jargon, overly precise language, references to other alignment articles, etc. I've tried to read it three times over the week and gave up after the third.
I think I'm missing something obvious, or I'm missing some information. Why is this clearly ridiculous?
Nuclear triad aside, there's the fact that the Arctic is more than 1000 miles away from the nearest US land (about 1700 miles away from Montana, 3000 miles away from Texas), that Siberia is already roughly as close.
And of course, the fact the Arctic is made of, well, ice, that melts more and more as the climate warms, and thus not the best place to build a missile base on.
Even without familiarity with nuclear politics, the distance part can be checked in less than 2 minutes on Google Map; if you have access to an internet connection and judges that penalize blatant falsehoods like "they can hit us from the Arctic", you absolutely wreck your adversary with some quick checking.
Of course, in a lot of debate formats you're not allowed the two minutes it would take to do a google map check.
Yeah, stumbling on this after the fact, I'm a bit surprised that among the 300+ comments barely anybody is explicitly pointing this out:
I think of myself as playing the role of a wise old mentor who has had lots of experience, telling stories to the young adventurers, trying to toughen them up, somewhat similar to how Prof Quirrell[8] toughens up the students in HPMOR through teaching them Defense Against the Dark Arts, to deal with real monsters in the world.
I mean... that's a huge, obvious red flag, right? People shouldn't claim Voldemort as a role model unless they're a massive edgelord. Quirrell/Voldemort in that story is "toughening up" the students to exploit them; he teaches them to be footsoldiers, not freedom fighters or critical thinkers (Harry is the one who does that) because he's grooming them to be the army of his future fascist government. This is not subtext, it's in the text.
HPMOR's Quirrell might be the EA's Rick Sanchez.
I've just watched Disney's Strange Worlds which explicitly features a cohabitive game in its plot called Primal Outpost.
The rules aren't really shown, we just know that it's played with terrain tiles, there are monsters, and the goal is ultimately to create a sustainable ecosystem. The concept honestly looked really cool, but the movie underperformed, so I don't think we're going to see a tie-in game, unfortunately.
But it shows that the basic idea of a cohabitative game is more appealing that you might think!
(No but seriously, if anyone knows of a Primal Outpost knock-off, I need to know about it.)
I get an "Oops! You don't have access to this page" error.
This makes a lot of sense to me and helps me articulate things I've thought for a while. (Which, you know, is the shit I come to LessWrong for, so big thumbs up!)
One of the first times I had this realization was in one of my first professional experiences. This was the first time in my life where I was in a team that wasn't just LARPing a set of objectives, but actually trying to achieve them.
They weren't impressively competent, or especially efficient, or even especially good at their job. The objectives weren't especially ambitious: it was a long-running project in its final year, and everybody was just trying to ship the best product they could, where the difference between "best they could" and "mediocre" wasn't huge.
But everyone was taking the thing seriously. People in the team were communicating about their difficulties, and anticipating problems ahead of time. Managers considered trade-offs. Developers tried to consider the UX that end-users would face.
Thinking about it, I'm realizing that knowing what you're doing isn't the same as being super good at your job, even if the two are strongly correlated. What struck me about this team wasn't that they were the most competent people I ever worked with (that would probably be my current job), it's that they didn't feel like they were pretending to be trying to achieve their objectives (again, unlike my current job).
And sometimes people will say things to me like "capitalism ruined my twenties" and I have a similarly eerie feeling about, like it's a gestalt slapped together
Ugh, that one annoys me so much. Capitalism is a word so loaded it has basically lost all meaning.
Like, people will say things like "slavery is inextricably linked to capitalism" and I'm thinking, hey genius, slavery existed in tribal civilizations that didn't even have the concept of money, what do you think capitalism even is?
(Same thing for patriarchy.)
Anecdotally, I've had friends who explicitly asked for an honest answer to these kinds of questions, and if given a positive answer would tell me "but you'd tell me if it was negative, right?"... and still, when given a negative answer, would absolutely take it as a personal attack and get angry.
Obviously those friendships were hard to maintain.
Often when people say they want an honest answer, what they mean is "I want you to say something positive and also mean it", they're not asking for something actionable.
And that, kids, is why nobody wants to date or be friends with a rationalist.
I meant "get players to cooperate within a cooperative-game-with-prisoners-dilemmas", yes.
So imagine how much richer expressions of character could be if you had this whole other dimension of gameplay design to work with. That would be cohabitive.
4X games and engine-building games have a lot of that. For instance, in Terraforming Mars, your starting corporations will have different start bonuses that radically shape your strategy throughout the entire game. In a 4X game, you might have a faction with very cheap military production that will focus on zerg-rushing other players; and a faction with research bonuses that will focus more on long-term growth.
Even in a MostPointsWin system, these differences can make different factions with very different "personalities" in both gameplay and lore.
Actually, I feel like a lot of engine-building systems could go from MostPointsWin to cooperation by just adding some objectives. Eg you could make Terraforming Mars cooperative just by adding objectives like "Create at least X space projects", "Reach oxygen level Y", "Have at least Z trees on the planet" and having each player pick one. Which is basically what the video game Terraformers did (since it's a single-player game, MostPointsWin can't work).
Some other game designs elements:
- One easy way to get players to cooperate is to give them a common loss condition. You mention having a Moloch player, which can be pretty cool (asymetric gameplay is always fun), but it can be environmental. Something like "everyone must donate at least X coals per turn to the communal furnace, else everyone freezes to death", or the opposite, "every coal burned contributes to global warming, past a certain cap everybody loses".
- Like you say, binary win/lose conditions can be more compelling than "get as many points as possible". (I think this is a major reason the MostPointsWin systems are so common.) You can easily get from one to the other by having eg "medals" where you get gold medal for getting 20 points, silver medal for getting 15 points, etc. Or with custom objectives, The Emperor's gold medal is having 12 tiles, the silver medal is having 8 tiles, etc, while The Druid's gold medal is preserving at least 8 trees, silver is 6 trees, etc.
Here's a cooperation game people haven't mentioned yet: Galerapagos.
The base idea is: you're all stuck on a desert island, Robinson Crusoe style. You need to escape before the island's volcano erupts. The aesthetics are loosely inspired by Koh Lanta, the French equivalent of Survivor.
Each player can do a certain number of actions, namely fish, collect water, help build an escape raft, and scavenge for materials. Each player has an individual inventory.
While it's possible for everyone to escape alive, there's some incentives to defect from the group (eg keep your own stash of food while other players starve to death). From what I heard the "tragedy of the commons" elements really start to matter when you have a large (>6) number of players.
Perhaps I'm being dense, and some additional kernel of doubt is being asked of me here. If so, I'd appreciate attempts to spell it out like I'm a total idiot.
I don't know if "dense" is the word I use, but yeah, I think you missed my point.
My ELI5 would be "You're still assuming the problem was 'Kurt didn't know how to use a pump' and not 'there was something wrong with your pump'".
I don't want to speculate too much beyond that eg about the discretionary budget stuff.
Thanks again! (I have read that book, and made changes on account of it that I also file under partial-successes.)
Happy to hear that!
I think it's cool that you're engaging with criticism and acknowledging the harm that happened as a result of your struggles.
And, to cut to the painful part, that's about the only positive thing that I (random person on the internet) have to say about what you just wrote.
In particular, you sound (and sorry if I'm making any wrong assumption here) extremely unwilling to entertain the idea that you were wrong, or that any potential improvement might need to come from you.
You say:
For whatever it's worth: I don't recall wanting you to quit (as opposed to improve).
But you don't seem to consider the idea that maybe you were more in a position to improve than he was.
I don't want to be overly harsh or judgmental. You (eventually) apologize and acknowledge your responsibility in employees having a shitty time, and it's easy for an internet stranger to over-analyze everything you said.
But. I do feel confident that you're expressing a lack of curiosity here. You're assuming that there's nothing you possibly have done to make Kurt's experience better, and while you're open to hearing if anyone presents you with a third option, you don't seem to think seeking out a third option is a problem you should actively solve.
My recollection of the thought that ran through my mind when you were like "Well I couldn't figure out how to use a bike pump" was that this was some sideways attempt at begging pardon, without actually saying "oops" first, nor trying the obvious-to-me steps like "watch a youtube video" or "ask your manager if he knows how to inflate a bike tire", nor noticing that the entire hypothesized time-save of somebody else inflating bike tires is wiped out by me having to give tutorials on it.
Like, here... You get that you're not really engaging with what Kurt is/was saying, right?
Kurt's point is that your pump seemed harder to use than other bike pumps. If the issue is on the object level, valid answers could be asking what types of bike pumps he's used to and where the discrepancy could come from, suggesting he buy a new pump, or if you're feeling especially curious asking that he bring his own pump to work so you can compare the two; or maybe the issue could come not from the pump but from the tires, in which case you could consider changing them, etc.
If the issue is on the meta level and that you don't want to spend time on these problems, a valid answer could be saying "Okay, what do you need to solve this problem without my input?". Then it could be a discussion about discretionary budget, about the amount of initiative you expect him to have with his job, about asking why he didn't feel comfortable making these buying decisions right away, etc.
Your only takeaway from this issue was "he was wrong and he could have obviously solved it watching a 5 minutes youtube tutorial, what would have been the most efficient way to communicate to him that he was wrong?". At no point in this reply are you considering (out loud, at least) that hypothesis "maybe I was wrong and I missed something".
Like, I get having a hot temper and saying things you regret because you don't see any other answers in the moment. But part of the process is to communicate despite a hot temper is to be willing to admit you were wrong.
Perhaps I'm missing some obvious third alternative here, that can be practically run while experiencing a bunch of frustration or exasperation. (If you know of one, I'd love to hear it.)
The best life-hack I have is "Don't be afraid to come back and restart the discussion once you feel less frustration or exasperation".
Long-term, I'd recommend looking into Non-Violent Communication, if you haven't already. There's a lot of cruft in there, but in my experience the core insights work: express vulnerability, focus on communicating you needs and how you feel about things, avoid assigning blame, make negotiable requests, and go from there.
So for the bike tire thing the NVC version would be something like "I need to spend my time efficiently and not have to worry about logistics; when you tell me you're having problems with the pump I feel stressed because I feel like I'm spending time I should spend on more important things. I need you to find a system where you can solve these problems without my input. What do you need to make that happen?"
Of all the things that have increased my cynicism toward the EA ecosystem over the years, none has disturbed me quite as much as the ongoing euphemisms and narrative spin around Nate’s behavior.
I'll make a tentative observation: it seems that you're still being euphemistic and (as you kind of note yourself) you're still self-censoring a bit.
The words that you say are "he's mean and scary" and "he was not subject to the same behavioral regulation norms as everyone else". The words I would have said, given your description and his answer below is "he acts like an asshole and gets away with it because people enable him".
I've known bosses that were mean and scary, but otherwise felt fair and like they made the best of a tough situation. That's not what you're describing. Maybe Nate is an amazing person in other ways, and amazingly competent in ways that make him important to work with, but. He sounds like a person with extremely unpleasant behavior.
Fascinating paper!
Here's a drive-by question: have you considered experiments that might differentiate between the lottery ticket explanation and the evolutionary explanation?
In particular, your reasoning that formation of inductions heads on the repeated-subsequence tasks disproves the evolutionary explanation seems intuitively sound, but not quite bulletproof. Maybe the model has incentives to develop next-token heads that don't depend on an induction head existing? I dunno, I might have an insufficient understanding of what induction heads do.
Dumb question: what about VR games like Beat Saber?
Do you think there's some potential for applying the skills, logic, and values of the rationalist community to issues surrounding prison reform and helping predict better outcomes?
Ha! Of course not.
Well, no, the honest answer would be "I don't know, I don't have any personal experience in that domain". But the problems I have cited (lack of budget, the general population actively wanting conditions not to improve) can't be fixed with better data analysis.
From anecdotes I've had from civil servants, directors love new data analysis tools, because they promise to improve outcomes without a budget raise. Staff hates new data analysis tools because they represent more work for them without a budget raise, and they desperately want the budget raise.
I mean, yeah, rationality and thinking hard about things always helps on the margin, but it doesn't compensate for a lack of budget or political goodwill. The secret ingredients to make a reform work are money and time.
Good summary of beliefs I've had for a while now. I feel like I should come back to this article at some point to unpack some of the things it mentions.
I've tried StarCoder recently, though, and it's pretty impressive. I haven't yet tried to really stress-test it, but at the very least it can generate basic code with a parameter count way lower than Copilot's.
Similarly, do you thoughts on AISafety.info ?
Quick note on AISafety.info: I just stumbled on it and it's a great initiative.
I remember pitching an idea for an AI Safety FAQ (which I'm currently working on) to a friend at MIRI and him telling me "We don't have anything like this, it's a great idea, go for it!"; my reaction at the time was "Well I'm glad for the validation and also very scared that nobody has had the idea yet", so I'm glad to have been wrong about that.
I'll keep working on my article, though, because I think the FAQ you're writing is too vast and maybe won't quite have enough punch, it won't be compelling enough for most people.
Would love to chat with you about it at some point.
I think this is a subject where we'd probably need to hash out a dozen intermediary points (the whole "inferential distance" thing) before we could come close to a common understanding.
Anyway, yeah, I get the whole not-backing-down-to-bullies thing; and I get being willing to do something personally costly to avoid giving someone an incentive to walk over you.
But I do think you can reach a stage in a conversation, the kind that inspired the "someone's wrong on the internet" meme, where all that game theory logic stops making sense and the only winning move is to stop playing.
Like, after a dozen back-and-forths between a few stubborn people who absolutely refuse to cede any ground, especially people who don't think they're wrong or see themselves as bullies... what do you really win by continuing the thread? Do you really impart outside observers with a feeling that "Duncan sure seems right in his counter-counter-counter-counter-rebuttal, I should emulate him" if you engage the other person point-by-point? Would you really encourage a culture of bullying and using-politeness-norms-to-impose-bad-behavior if you instead said "I don't think this conversation is productive, I'll stop now"?
It's like... if you play an iterated prisoner's dilemma, and every player's strategy is "tit-for-tat, always, no forgiveness", and there's any non-zero likelihood that someone presses the "defect" button by accident, then over a sufficient period of time the steady state will always be "everybody defects, forever". (The analogy isn't perfect, but it's an example of how game theory changes when you play the same game over lots of iterations)
(And yes, I do understand that forgiveness can be exploited in an iterated prisoner's dilemma.)
My objection is that it doesn't distinguish between [unpleasant fights that really should in fact be had] from [unpleasant fights that shouldn't].
Again, I don't think I have a sufficiently short inferential distance to convince you of anything, but my general vibe is that, as a debate gets longer, the line between the two starts to disappear.
It's like... Okay, another crappy metaphor is, a debate is like photocopying a sheet of paper, and adding notes to it. At first you have a very clean paper with legible things drawn on it. But as it progresses, you have a photocopy of a photocopy of a photocopy, you end up with something that has more noise from the photocopying artifacts than signal from what anybody wrote on it twelve iterations ago.
At that point, no matter how much the fight should be had, you're not waging it efficiently by participating.
I don't know much of the prison system in France, but your description definitely hit the points I was familiar with: the overcrowding, the general resentment the population has for any measure of dignity the system can give to inmates, the endemic lack of budget, and the magistrates trying to make the system work despite a severe lack of good options.
Good writeup.
I mean, seeing some of those discussions thread Duncan and others were involved in... I'd say it's pretty bad?
To me at least, it felt like the threads were incredibly toxic given how non-toxic this community usually is.
(Coming here from the Duncan-and-Said discussion)
I love the term "demon thread". Feels like a good example of what Duncan calls a "sazen", as in a word for a concept that I've had in mind for a while (discussion threads that naturally escalate despite the best efforts of everyone involved), but having a word for it makes the concept a lot more clear in my mind.
I think this is extremely standard, central LW skepticism in its healthy form.
Some things those comments do not do: [...]
I think that's a very interesting list of points. I didn't like the essay at all, and the message didn't feel right to me, but this post right here makes me a lot more sympathetic to it.
(Which is kind of ironic; you say this comment is dashed off, and you presumably spent a lot more time on the essay; but I'd argue the comment conveys a lot more useful information.)
It feels like the implicit message here is "And therefore we might coordinate around an alignment solution where all major actors agree to only train NNs that respect certain rules", which... really doesn't seem realistic, for a million reasons?
Like, even assuming major powers can agree to an "AI non-proliferation treaty" with specific metrics, individual people could still bypass the treaty with decentralized GPU networks. Rogue countries could buy enough GPUs to train an AGI, disable the verification hardware and go "What are you gonna do, invade us?", under the assumption that going to war over AI safety is not going to be politically palatable. Companies could technically respect the agreed-upon rules but violate the spirit in ways that can't be detected by automated hardware. Or they could train a perfectly-aligned AI on compliant hardware, then fine-tune it in non-aligned ways on non-compliant hardware for a fraction of the initial cost.
Anyway, my point is: any analysis of a "restrict all compute everywhere" strategy should start by examining what it actually looks like to implement that strategy, what the political incentives are, and how resistant that strategy will be to everyone on the internet trying to break it.
It feels like the author or this paper haven't even begun to do that work.
I have given you an adequate explanation. If you were the kind of person who was good at math, my explanation would have been sufficient, and you would now understand. You still do not understand. Therefore...?
By the way, I think this is a common failure mode of amateur tutors/teachers trying to explain a novel concept to a student. Part of what you need to communicate is "how complicated the thing you need to learn is".
So sometimes you need to say "this thing I'm telling you is a bit complex, so this is going to take a while to explain", so the student shifts gears and isn't immediately trying to figure out the "trick". If you skip that intro, and the student doesn't understand what you're saying, their default assumption will be "I must have missed the trick" when you want them to think "this is complicated, I should try different permutations of that concept".
(And sometimes the opposite it true, the student did miss a trick, and is now trying to construct a completely novel concept in their head, and you need to tell them "no, this is actually simple, you've done other versions of this before, don't overthink it".)