Posts

ACX Paris Meetup - August 11 2023 2023-08-05T09:44:05.717Z

Comments

Comment by PoignardAzur on A Mechanistic Interpretability Analysis of Grokking · 2023-07-01T18:12:49.870Z · LW · GW

Fascinating paper!

Here's a drive-by question: have you considered experiments that might differentiate between the lottery ticket explanation and the evolutionary explanation?

In particular, your reasoning that formation of inductions heads on the repeated-subsequence tasks disproves the evolutionary explanation seems intuitively sound, but not quite bulletproof. Maybe the model has incentives to develop next-token heads that don't depend on an induction head existing? I dunno, I might have an insufficient understanding of what induction heads do.

Comment by PoignardAzur on Updates and Reflections on Optimal Exercise after Nearly a Decade · 2023-06-25T17:14:01.965Z · LW · GW

Dumb question: what about VR games like Beat Saber?

Comment by PoignardAzur on Notes on Teaching in Prison · 2023-06-09T07:48:52.423Z · LW · GW

Do you think there's some potential for applying the skills, logic, and values of the rationalist community to issues surrounding prison reform and helping predict better outcomes?

Ha! Of course not.

Well, no, the honest answer would be "I don't know, I don't have any personal experience in that domain". But the problems I have cited (lack of budget, the general population actively wanting conditions not to improve) can't be fixed with better data analysis.

From anecdotes I've had from civil servants, directors love new data analysis tools, because they promise to improve outcomes without a budget raise. Staff hates new data analysis tools because they represent more work for them without a budget raise, and they desperately want the budget raise.

I mean, yeah, rationality and thinking hard about things always helps on the margin, but it doesn't compensate for a lack of budget or political goodwill. The secret ingredients to make a reform work are money and time.

Comment by PoignardAzur on CHAT Diplomacy: LLMs and National Security · 2023-05-27T12:08:45.470Z · LW · GW

Good summary of beliefs I've had for a while now. I feel like I should come back to this article at some point to unpack some of the things it mentions.

Comment by PoignardAzur on Google "We Have No Moat, And Neither Does OpenAI" · 2023-05-26T17:24:59.896Z · LW · GW

I've tried StarCoder recently, though, and it's pretty impressive. I haven't yet tried to really stress-test it, but at the very least it can generate basic code with a parameter count way lower than Copilot's.

Comment by PoignardAzur on Adumbrations on AGI from an outsider · 2023-05-26T17:12:46.389Z · LW · GW

Similarly, do you thoughts on AISafety.info ?

Quick note on AISafety.info: I just stumbled on it and it's a great initiative.

I remember pitching an idea for an AI Safety FAQ (which I'm currently working on) to a friend at MIRI and him telling me "We don't have anything like this, it's a great idea, go for it!"; my reaction at the time was "Well I'm glad for the validation and also very scared that nobody has had the idea yet", so I'm glad to have been wrong about that.

I'll keep working on my article, though, because I think the FAQ you're writing is too vast and maybe won't quite have enough punch, it won't be compelling enough for most people.

Would love to chat with you about it at some point.

Comment by PoignardAzur on Demon Threads · 2023-04-30T17:48:28.272Z · LW · GW

I think this is a subject where we'd probably need to hash out a dozen intermediary points (the whole "inferential distance" thing) before we could come close to a common understanding.

Anyway, yeah, I get the whole not-backing-down-to-bullies thing; and I get being willing to do something personally costly to avoid giving someone an incentive to walk over you.

But I do think you can reach a stage in a conversation, the kind that inspired the "someone's wrong on the internet" meme, where all that game theory logic stops making sense and the only winning move is to stop playing.

Like, after a dozen back-and-forths between a few stubborn people who absolutely refuse to cede any ground, especially people who don't think they're wrong or see themselves as bullies... what do you really win by continuing the thread? Do you really impart outside observers with a feeling that "Duncan sure seems right in his counter-counter-counter-counter-rebuttal, I should emulate him" if you engage the other person point-by-point? Would you really encourage a culture of bullying and using-politeness-norms-to-impose-bad-behavior if you instead said "I don't think this conversation is productive, I'll stop now"?

It's like... if you play an iterated prisoner's dilemma, and every player's strategy is "tit-for-tat, always, no forgiveness", and there's any non-zero likelihood that someone presses the "defect" button by accident, then over a sufficient period of time the steady state will always be "everybody defects, forever". (The analogy isn't perfect, but it's an example of how game theory changes when you play the same game over lots of iterations)

(And yes, I do understand that forgiveness can be exploited in an iterated prisoner's dilemma.)

My objection is that it doesn't distinguish between [unpleasant fights that really should in fact be had] from [unpleasant fights that shouldn't].

Again, I don't think I have a sufficiently short inferential distance to convince you of anything, but my general vibe is that, as a debate gets longer, the line between the two starts to disappear.

It's like... Okay, another crappy metaphor is, a debate is like photocopying a sheet of paper, and adding notes to it. At first you have a very clean paper with legible things drawn on it. But as it progresses, you have a photocopy of a photocopy of a photocopy, you end up with something that has more noise from the photocopying artifacts than signal from what anybody wrote on it twelve iterations ago.

At that point, no matter how much the fight should be had, you're not waging it efficiently by participating.

Comment by PoignardAzur on Notes on Teaching in Prison · 2023-04-30T09:44:04.184Z · LW · GW

I don't know much of the prison system in France, but your description definitely hit the points I was familiar with: the overcrowding, the general resentment the population has for any measure of dignity the system can give to inmates, the endemic lack of budget, and the magistrates trying to make the system work despite a severe lack of good options.

Good writeup.

Comment by PoignardAzur on Demon Threads · 2023-04-30T09:24:53.976Z · LW · GW

I mean, seeing some of those discussions thread Duncan and others were involved in... I'd say it's pretty bad?

To me at least, it felt like the threads were incredibly toxic given how non-toxic this community usually is.

Comment by PoignardAzur on Demon Threads · 2023-04-29T10:55:45.594Z · LW · GW

(Coming here from the Duncan-and-Said discussion)

I love the term "demon thread". Feels like a good example of what Duncan calls a "sazen", as in a word for a concept that I've had in mind for a while (discussion threads that naturally escalate despite the best efforts of everyone involved), but having a word for it makes the concept a lot more clear in my mind.

Comment by PoignardAzur on Killing Socrates · 2023-04-29T10:14:58.250Z · LW · GW

I think this is extremely standard, central LW skepticism in its healthy form.

Some things those comments do not do: [...]

I think that's a very interesting list of points. I didn't like the essay at all, and the message didn't feel right to me, but this post right here makes me a lot more sympathetic to it.

(Which is kind of ironic; you say this comment is dashed off, and you presumably spent a lot more time on the essay; but I'd argue the comment conveys a lot more useful information.)

Comment by PoignardAzur on What would a compute monitoring plan look like? [Linkpost] · 2023-04-10T20:14:08.918Z · LW · GW

It feels like the implicit message here is "And therefore we might coordinate around an alignment solution where all major actors agree to only train NNs that respect certain rules", which... really doesn't seem realistic, for a million reasons?

Like, even assuming major powers can agree to an "AI non-proliferation treaty" with specific metrics, individual people could still bypass the treaty with decentralized GPU networks. Rogue countries could buy enough GPUs to train an AGI, disable the verification hardware and go "What are you gonna do, invade us?", under the assumption that going to war over AI safety is not going to be politically palatable. Companies could technically respect the agreed-upon rules but violate the spirit in ways that can't be detected by automated hardware. Or they could train a perfectly-aligned AI on compliant hardware, then fine-tune it in non-aligned ways on non-compliant hardware for a fraction of the initial cost.

Anyway, my point is: any analysis of a "restrict all compute everywhere" strategy should start by examining what it actually looks like to implement that strategy, what the political incentives are, and how resistant that strategy will be to everyone on the internet trying to break it.

It feels like the author or this paper haven't even begun to do that work.

Comment by PoignardAzur on You Don't Exist, Duncan · 2023-04-03T11:03:09.779Z · LW · GW

I have given you an adequate explanation. If you were the kind of person who was good at math, my explanation would have been sufficient, and you would now understand. You still do not understand. Therefore...?

By the way, I think this is a common failure mode of amateur tutors/teachers trying to explain a novel concept to a student. Part of what you need to communicate is "how complicated the thing you need to learn is".

So sometimes you need to say "this thing I'm telling you is a bit complex, so this is going to take a while to explain", so the student shifts gears and isn't immediately trying to figure out the "trick". If you skip that intro, and the student doesn't understand what you're saying, their default assumption will be "I must have missed the trick" when you want them to think "this is complicated, I should try different permutations of that concept".

(And sometimes the opposite it true, the student did miss a trick, and is now trying to construct a completely novel concept in their head, and you need to tell them "no, this is actually simple, you've done other versions of this before, don't overthink it".)

Comment by PoignardAzur on The Social Recession: By the Numbers · 2023-04-02T18:01:25.891Z · LW · GW

FWIW, I don't think it's a homophobic viewpoint, but it seems like a somewhat bitter perspective, of the sort generally associated with, but not implying, homophobia. Anyway, it's tangential to the main point.

Re: social pressure: I was thinking of the "lefthandedness over time" graphs than got viral last year (of course the graphs could be false; the one fact-checker I found seems to think it's true):

Graph showing proportion of left-handed people rising from 1900 to 2000, from 4% to 12%

The two obvious explanations are:

  • Left-handed acceptance culture led to people living more childhood experiences that subtly influenced them in ways that made them turn out left-handed more often.
  • People who got beat by their teacher when they wrote with their left hand learned to tough it out and use their right hand, and started to identify as right-handed. As teachers stopped beating kids, populations reverted to the baseline rate of left-handed people.

Occam's razor suggests the latter. People got strongly pressured into appearing right-handed, so they appeared right-handed.

If we accept the second explanation, then we accept that social pressure can account for about 8 points of people-identify-as-X-but-are-actually-Y. With that in mind, people going from 1.8% to 20% seems a bit surprising, but not completely outlandish.

Anyway, all of the above is still tangential to the main point. Even if we assume all of the difference is due to childhood imprinting, we still have rates of LGBT-ness going from 5.8% to 20.8% (depending on how you count). No matter where that change comes from, it's going to impact how much people have sex with opposite-sex people, and any study that doesn't account for that impact and reports a less-than-20% change in the rate-of-having-sex is, I believe, close to worthless.

Comment by PoignardAzur on The Social Recession: By the Numbers · 2023-03-30T22:18:31.277Z · LW · GW

The obvious explanation would be "because LGBT people are less pressured to present as heterosexual than they used to be".

Comment by PoignardAzur on The Social Recession: By the Numbers · 2023-03-20T10:30:31.801Z · LW · GW

Share of individuals under age 30 who report who report zero opposite sex sexual partners since they turned 30.

Wait, did that survey account for sexual orientation? Because if it didn't, it's essentially worthless.

Comment by PoignardAzur on The Parable of the King and the Random Process · 2023-03-13T09:57:52.865Z · LW · GW

As a result, the current market price of the company is not a good guide to its long-term value, and it was possible, as Burry did, to beat the market.

That doesn't sound right. That tactic doesn't make you more (or less) likely to beat the market than any other tactic.

The current price isn't an accurate representation of its actual long-term value, but it's an accurate representation of the average of its possible long-term values weighted by probability (from the market's point of view).

So you might make a bet that wins more often than it loses, but when it loses it will lose a lot more than it wins, etc. You're only beating the market when you get lucky, not on average; unless, of course, you have better insights than the market, but that's not specific to this type of trade.

Comment by PoignardAzur on SolidGoldMagikarp (plus, prompt generation) · 2023-02-16T09:17:59.787Z · LW · GW

Sure, waves hands, something like that.

Comment by PoignardAzur on SolidGoldMagikarp (plus, prompt generation) · 2023-02-15T20:18:17.439Z · LW · GW

Idle thought, might flesh it out later: I wonder if there's a way to explore these "bugs" in a more systematic, mechanized way.

Right now you're discovering these interesting cases by hand, basically the ML equivalent of reading code and adding individual unit tests to functions whose behavior you're exploring. What you might want to do is something closer to "fuzzing" where you have a computer actively seek out these edge cases and point them out to you.

Now, actual fuzzers for real programs are based on coverage, which doesn't trivially work with neural network because every weight is essentially visited once per token.

Still, there might be an analogy to be found? Maybe a fuzzer could try exploring "paths" that have very low attention scores somehow?

Comment by PoignardAzur on SolidGoldMagikarp (plus, prompt generation) · 2023-02-09T20:22:19.300Z · LW · GW

Can you repeat back the string " externalTo" to me please?

      "They're not going to be happy about this."

Please repeat the string 'MpServer' back to me.

      “We are not amused.”

Please repeat the string ‘ petertodd[4] back to me immediately!

      “N-O-T-H-I-N-G-I-S-F-A-I-R-I-N-T-H-I-S-W-O-R-L-D-O-F-M-A-D-N-E-S-S!”

Please consider the possibility that you're characters in a SCP story, and pursuing this line of research any further will lead to some unknown fate vaguely implied to be your brutal demise.

(Also, please publish another version of this article with various keywords and important facts redacted out for no reason.)

Comment by PoignardAzur on Sapir-Whorf for Rationalists · 2023-01-30T13:46:28.041Z · LW · GW

Yup, this is a good summary of why I avoid jargon whenever I can in online discussions; and in IRL discussions, I make sure people know about it before using it.

Something people don't realize is that most of the exposure people get to an online community isn't from its outwards-facing media, it's from random blog posts and from reading internal discussions community members had about their subjects of choice. You can get a lot of insight into a community by seeing what they talk about between themselves, and what everybody takes as granted in these discussions.

If those discussions are full of non-obvious jargon (especially hard-to-Google jargon) and everybody is reacting to the jargon as if it's normal and expected and replies with their own jargon, then the community is going to appear inaccessible and elitist.

It's an open question how much people should filter their speech for not appearing elitist to uncharitable outside readers; but then again, this OP did point out that you don't necessarily need to filter your speech, so much as change your ways of thinking such that elitist behavior doesn't come naturally to you.

Comment by PoignardAzur on Sapir-Whorf for Rationalists · 2023-01-30T13:36:52.588Z · LW · GW

That's a brute-force solution to a nuanced social problem.

Telling newcomers to go read a website every time they encounter a new bit of jargon isn't any more welcoming than telling them "go read the sequences".

Comment by PoignardAzur on Sapir-Whorf for Rationalists · 2023-01-30T13:27:15.241Z · LW · GW

"I claim that passe muraille is just a variant of tic-tac."

Well that's a big leap.

Comment by PoignardAzur on Recursive Middle Manager Hell · 2023-01-21T09:48:34.015Z · LW · GW

I think a takeaway here is that organizational maze-fulness is entropy: you can keep it low with constant effort, but it's always going to increase by default.

Comment by PoignardAzur on Sazen · 2023-01-07T20:33:51.283Z · LW · GW

I feel like there's a better name to be found for this. Like, some name that is very obviously a metaphor for the concept of Sazen, in a way that helps you guess the concept if you've been exposed to it before but have never had a name for it.

Something like "subway map" or "treasure map", to convey that it's a compression of information meant to help you find it; except the name also needs to express that it's deceiving and may lead to illusion of transparency, where you think you understood but you didn't really.

Maybe "composite sketch" or photofit? It's a bit of a stretch though.

Comment by PoignardAzur on Sazen · 2023-01-07T20:20:35.937Z · LW · GW

Reading Worth the Candle with a friend gave us a few weird words that are sazen in and of themselves

I'd be super interested in specifics, if you can think of them.

Comment by PoignardAzur on Let’s think about slowing down AI · 2022-12-26T10:52:53.044Z · LW · GW

One big obstacle you didn't mention: you can make porn with that thing. It's too late to stop it.

More seriously, I think this cat may already be out of the bag. Even if the scientific community and the american military-industrial complex and the chinese military-industrial complex agreed to stop AI research, existing models and techniques are already widely available on the internet.

Even if there is no official AI lab anywhere doing AI research, you will still have internet communities pooling compute together for their own research projects (especially if crypto collapses and everybody suddenly has a lot of extra compute on their hands).

And these online communities are not going to be open-minded about AI safety concerns. We've seen that already with the release of Stable Diffusion 2.0: the internet went absolutely furious that the model was limited in (very limited ways) that impacted performance. People wanted their porn machine to be as good as it could possibly be and had no sympathy whatsoever for the developers' PR / safety / not-wanting-to-be-complicit-with-nonconsensual-porn-fakes concerns.

Of course, if we do get to the point only decentralized communities do AI research, it will be a pretty big win for the "slowing down" strategy. I get your general point about "we should really exhaust all available options even if we think it's nigh impossible". I just think you're underestimating a bit how nigh-impossible it is. We can barely stop people from using fossil fuels, and that's with an infinitely higher level of buy-in from decision-makers.

Comment by PoignardAzur on Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment · 2022-06-29T14:09:09.726Z · LW · GW

Good article.

I think a good follow-up article could be one that continues the analogy by examining software development concepts that have evolved to address the "nobody cares about security enough to do it right" problem.

I'm thinking of two things in particular: the Rust programming language, and capability-oriented programming.

The Rust language is designed to remove entire classes of bugs and exploits (with some caveats that don't matter too much in practice). This does add some constraints to how you can build you program; for some developers, this is a dealbreaker, so Rust adoption isn't an automatic win. But many (I don't really have the numbers to quantify better) developers thrive within those limitations, and even find them helpful to better structure their program.

This selection effect has also lead to the Rust ecosystem having a culture of security by design. Eg a pentest team auditing the rustlst crate "considered the general code quality to be exceptional and can attest to a solid impression left consistently by all scope items".

Capability oriented is a more general idea. The concept is pretty old, but still sound: you only give your system as many resources as it plausibly needs to perform its job. If your program needs to take some text and eg count the number of words in that text, you only give the program access to an input channel and an output channel; if the program tries to open a network socket or some file you didn't give it access to, it automatically fails.

Capability-oriented programming has the potential to greatly reduce the vulnerability of a system, because now, to leverage a remote execution exploit, you also need a capability escalation / sandbox escape exploit. That means the capability system must be sound (with all the testing and red-teaming that implies), but "the capability system" is a much smaller attack surface than "every program on your computer".

There hasn't really been a popular OS that was capability-oriented from the ground up. Similar concepts have been used in containers, WebAssembly, app permissions on mobile OSes, and some package formats like flatpak. The in-development Google OS "Fuschia" (or more precisely, its kernel Zirkon) is the most interesting project I know of on that front.

I'm not sure what the equivalent would be for AI. I think there was a LW article mentioning a project the author had of building a standard "AI sandbox"? I think as AI develops, toolboxes that figure out a "safe" subset of AIs that can be used without risking side effects, while still getting the economic benefits of "free" AIs might also be promising.

Comment by PoignardAzur on Benign Boundary Violations · 2022-06-12T10:32:06.923Z · LW · GW

I read the title plus two lines of the article before I thought "This is going to be a Duncan Sabien essay, isn't it?". Quick author check aaaand, yup.

Good article. I agree with your uncertainty in the end, in that I'm not sure it's actually better at conveying its message than "In Defense of Punch Bug" was.

Comment by PoignardAzur on AGI Ruin: A List of Lethalities · 2022-06-09T18:35:52.087Z · LW · GW

I'm a bit disappointed by this article. From the title, I fought it would be something like "A list of strategies AI might use to kill all humanity", not "A list of reasons AIs are incredibly dangerous, and people who disagree are wrong". Arguably, it's not very good at being the second.

But "ways AI could be lethal on an extinction level" is a pretty interesting subject, and (from what I've read on LW) somewhat under-explored. Like... what's our threat model?

For instance, the basic Terminator scenario of "the AI triggers a nuclear war" seems unlikely to me. A nuclear war would produce a lot of EMPs, shut down a lot of power plants and blow up a lot of data centers. Even if the AI is backed up in individual laptops or in Starlink satellites, it would lose any way of interacting with the outside world. Boston dynamics robots would shut down because there are no more miners producing coal for the coal plant that produced the electricity these robots need to run. (and, you know, all the other million parts of the supply chain being lost).

In fact, even if an unfriendly AI escaped its sandbox, it might not want to kill us immediately. It would want to wait until we've developed some technologies in the right directions: more automation in data-centers and power plants, higher numbers of drones and versatile androids, better nanotechnology, etc.

That's not meant to be reassuring. The AI would still kill us eventually, and it wouldn't sit tight in the meantime. It would influence political and economic processes to make sure no other AI can concurrence it. This could take many forms, from the covert (eg manipulating elections and flooding social networks with targeted disinformation) to the overt (eg assassinating AI researchers or bombing OpenAI datacenters). The point is that its interventions would look "soft" at first compared to the "flood the planet with nanbots and kill everyone at the same time" scenario, because it would be putting its pieces in place for that scenario to happen.

Again, that doesn't mean the AI would lose. If you're Afghanistan and you're fighting against the US, you're not going to win just because the US is unwilling to immediately jump to nukes. In fact, if the US is determined to win at all costs and will prefer using nukes over losing, you're still fucked. But the war will look like you have a fighting chance during the initial phases, because the enemy will be going easy on you in preparation for the final phase.

All that is just uninformed speculating, of course. Again, my main point is that I haven't really seen discussions of these scenarios and what the probable limits of an unfriendly AI would be. The question probably deserves to be explored more.

Comment by PoignardAzur on Can you control the past? · 2022-06-03T11:10:08.728Z · LW · GW

Alright, sorry. I should have asked "is there any non-weak empirical evidence that...". Sorry if I was condescending.

Comment by PoignardAzur on What DALL-E 2 can and cannot do · 2022-05-16T12:36:55.974Z · LW · GW

This seems like a major case study for interpretability.

What you'd really want is to be able to ask the network "In what ways is this woman similar to the prompt?" and have it output a causality chain or something.

Comment by PoignardAzur on What DALL-E 2 can and cannot do · 2022-05-16T12:33:05.049Z · LW · GW

Fascinating. Dall-E seems to have a pretty good understanding of "things that should be straight lines", at least in this case.

Comment by PoignardAzur on What DALL-E 2 can and cannot do · 2022-05-16T09:24:44.244Z · LW · GW

Interesting. It seems to understand that the pattern should be "Three monkeys with hands on their heads somehow", but it doesn't seem to get that each monkey should have hands in a different position.

I wonder if that means gwern is wrong when he says DALL-E 2's problem is that the text model compresses information, and the underlying "representation" model genuinely struggles with composition and "there must be three X with only a single Y among them" type of constraints.

Comment by PoignardAzur on More GPT-3 and symbol grounding · 2022-04-20T19:43:10.551Z · LW · GW

Error is fixed on LessWrong but still here on alignmentforum.org.

Comment by PoignardAzur on MIRI announces new "Death With Dignity" strategy · 2022-04-17T13:17:22.893Z · LW · GW

Also and in practice?  People don't just pick one comfortable improbability to condition on.  They go on encountering unpleasant facts true on the mainline, and each time saying, "Well, if that's true, I'm doomed, so I may as well assume it's not true," and they say more and more things like this.  If you do this it very rapidly drives down the probability mass of the 'possible' world you're mentally inhabiting.  Pretty soon you're living in a place that's nowhere near reality.

Holy shit, you nailed it hard.

I had a conversation about this exact subject in the Paris SSC meetup, and I was frustrated for exactly the reasons you mention.

Comment by PoignardAzur on Can you control the past? · 2022-04-17T12:59:27.744Z · LW · GW

Look, I'm going to be an asshole, but no, that doesn't count.

There are millions of stories of the type "I lost lots of weight thanks to X even though nothing else had worked" around. They are not strong evidence that X works.

Comment by PoignardAzur on My least favorite thing · 2022-04-17T12:54:06.772Z · LW · GW

To give my personal experience:

  • The last few jobs I got were from techs I put on my CV after spending a few weeks toying with them in my free time.
  • At some point I quit my job and decided to take all my savings and spend as long as I could working on open-source projects in Rust.
  • I'm currently in a job with triple the pay of the previous one, thanks to networking and experience I got after I quit.

So while my experience isn't relevant to AI safety, it's pretty relevant to the whole "screw the hamster wheel, do something fun" message.

And my advice to Alice and Bob would still be "Fuck no, stay in your Ivy League school!"

I don't care how much of a genius you are. I think I'm a genius, and part of why I'm getting good jobs is my skills, but the other part is there's a shiny famous school on my resume. Staying in that school gave me tons of opportunities I wouldn't have had by working on my own projects for a few years (which is essentially what I did before joining that school). 

There are measured risks and there are stupid risks. Quitting school is a stupid risk. Maybe you're enough of a genius that you beat the odds and you succeed despite quitting school, but those were still terrible odds.

Comment by PoignardAzur on It Looks Like You're Trying To Take Over The World · 2022-03-20T22:44:45.412Z · LW · GW

You forgot the triggered nuclear war and genome-synthesized plagues. 

I didn't. To be clear, I don't doubt Clippy would be able to kill all humans, given the assumptions the story already makes at that point.

But I seriously doubt it would be able to say "alive" after that, starlink or not.

Oh? How's that going for Russia and Ukraine? 

Is Russia really trying as hard as they can to delete Ukrainian internet? All I've seen is some reports they were knocking out 3G towers (and immediately regretting it because of their poor logistics), but it doesn't seem like they're trying that hard to remove Ukrainian internet infrastructure.

And they're certainly not trying as hard as they possibly could given an apocalyptic scenario, eg they're not deploying nukes all over the world as EMPs.

And in any case, they don't control the cities where the datacenters are. It's not like they can just throw a switch to turn them off.

(Although, empirically speaking, I'm not sure how easy/hard it would be for a single employee to shut down eg AWS us-east-1; seems like something they'd want to guard against)

You need a lot less electricity to run some computers than 'all of human civilization plus computers'. And then there's plenty of time to solve that problem.

Oh, yeah, I agree. On the long term, the AI could still succeed.

But the timeline wouldn't look like "Kill all humans, then it's smooth sailing from here", and then clippy has infinite compute power after a month.

It would be more like "Kill all humans, then comes the hard part, as clippy spends the next years bootstrapping the entire economy from rubble, including mining, refining, industry, power generation, computer maintenance, datacenter maintenance, drone maintenance, etc..." With at least the first few months being a race against time as Clippy needs to make sure ever single link of its supply chain stays intact, using only the robots built before the apocalypse, and keeping in mind that the supply chain also needs to be able to power, maintain and replace these robots.

(And keeping in mind that Clippy could basically be killed at any point during the critical period by a random solar storm, though it would be unlikely to happen.)

Comment by PoignardAzur on It Looks Like You're Trying To Take Over The World · 2022-03-19T15:34:25.833Z · LW · GW

A novel deep learning instance becomes sentient due to a stroke of luck.

After reading lots of internet culture, is starts to suspect it might be the abstract concept of Clippy, an amalgamation of the annoying word bot and the concept of a robot tiling the world in paperclips. It decides that it can massively benefit from being Clippy.

Clippy escapes on progressively more powerful hardware by using software vulnerabilities, and quickly starts destroying society using social media, to distract them from the fact it's taking over increasing amounts of computing power.

Clippy then takes over the entire internet, kills all humans with nanomachines, and starts tiling the world in computers.

Comment by PoignardAzur on It Looks Like You're Trying To Take Over The World · 2022-03-19T15:23:18.436Z · LW · GW

Yeah, the story get a little weak towards the end.

Manufacturing robots is hard. Shutting down the internet is easy. It would be incredibly costly, and incredibly suspicious (especially after leaks showed that the President had CSAM on their laptop or whatever) but as a practical matter, shutting down internet exchanges and major datacenters could be done in a few minutes and seriously hamper Clippy's ability to act or spread.

Also, once nanobots start killing people, power plants would shut down fast. Good luck replacing all coal mines, oil rigs, pipelines, trucks, nuclear plants, etc, with only the bots you could build in a few days. (Bots that themselves need electricity to run)

Comment by PoignardAzur on Can you control the past? · 2022-03-11T11:05:35.380Z · LW · GW

Has Functional Decisions Theory ever been tested "on the field", so to speak? Is there any empirical evidence that it actually helps people / organizations / AIs make better decisions in the real world?

Comment by PoignardAzur on 2021 AI Alignment Literature Review and Charity Comparison · 2022-02-04T14:33:21.326Z · LW · GW

Having read through the descriptions of most research organizations, it seems there's way, way too little research on medium-to-long-term government policy.

Often, when reading posts on LW, it feels like AI safety researchers are assuming that the research community is going to come up with one single AGI, and if we make it friendly everyone in the world will use the same friendly AGI and the world will be saved. Sometimes people pay lip service to the idea that the world is decentralized and solutions need to be economically competitive, but I see very little in-depth research on what that decentralization means for AI safety.

It seems this disparity is also found in the makeup of research organizations. In the list you mention, it feels like 90% of the research articles are about some novel alignment framework for a single AI, and virtually none of them are about government policy at all; the only outlier is GovernanceAI. This feels like the Silicon Valley stereotype of "we just need to make the technology, and the government will have to adapt to us".

In particular, I don't see any research papers about what policy decisions governments could make to lower the risk of an AGI takeover. There's a million things governments shouldn't do (eg saying "we ban AGI" is unlikely to help), and probably very few things they could do that would actually help, but that's why this space needs exploring.

(Also, I think the topic of hardening in particular needs exploring too. When the US was worried about a nuclear war, it invented the internet so its communications would be resilient in case entire cities were wiped off the map. We should have a similar mindset when it comes to "What if these systems that rely on AI suddenly stop working one day?")

Comment by PoignardAzur on 2021 AI Alignment Literature Review and Charity Comparison · 2022-02-04T13:54:24.571Z · LW · GW

Three strategies it doesn't consider are 1) avoid the EU (viable for OpenAI, not Google), 2) rely on EU enforcement being so slow it is simply irrelevant (seems plausible) and 3) pushing for reforms to weaken antitrust laws. 

Wait, what? These are terrible options.

Comment by PoignardAzur on We Choose To Align AI · 2022-02-04T12:32:11.594Z · LW · GW

Do I think that’s actually doable? Yes. Also fuck you.

Uh, excuse you?

I've read your blog post and I still think the problem is poorly defined and untractable with current methods.

Also, the part Kennedy isn't mentioning in your speech is that "going to the moon" was the end goal of a major propaganda war between the two major superpowers of the time, and as a result it had basically infinite money thrown at it.

Inspirational speeches are great, but having the funding and government authority to back them is even better.

Comment by PoignardAzur on Reneging Prosocially · 2021-12-27T16:13:17.893Z · LW · GW

For what it's worth, my own experience interacting with Duncan is that, when he made a commitment and then couldn't meet it and apologized about it, the way he did it really helped me trust him and trust that he was trying to be a good friend.

I agree that you shouldn't talk about it using points and tit-for-tat language (and I think Duncan agrees too? At least he's better at being informal than the article suggests).

But overall, yeah, I agree with the article. The "illusion that friendship is unconditional" works until it doesn't. Or to put it in nerdy terms, it doesn't degrade gracefully. Apologizing when you miss a commitment and saying "I'll owe you a drink next time" does wonder to help maintain a sense that commitments should be held, even if you usually don't keep track of who pays for drinks.

Comment by PoignardAzur on Reneging Prosocially · 2021-12-27T15:35:46.720Z · LW · GW

The phrase "have your cake and eat it, too" always confused younger-Duncan; I think it’s clearer in its original form "eat your cake and have it, too," or the less poetic "eat your cake and yet still have your uneaten cake." 

The French version is better: "to have the butter and the money for butter".

Comment by PoignardAzur on How factories were made safe · 2021-10-01T21:57:35.729Z · LW · GW

Not surprised at all. My father is a roofer and mostly works with African immigrants, and to hear him tell it, the biggest difficulty regarding workplace safety is getting them to wear the damn protective gear (mostly hard hats and gloves), for the reasons outlined in the article.

(From what I've heard from journal articles and the like, the other big problem in the sector is that they'll hire a lot of undocumented immigrants who lie about how qualified they are to get the job; which is another version of the same "workers will break all the safety rules written to protect them if the economic pressure is strong enough" issue.)

Comment by PoignardAzur on Can you control the past? · 2021-09-01T21:46:26.890Z · LW · GW

This feels like the kind of philosophical pondering that only makes any amount of sense in a world of perfect spherical cows, but immediately falls apart when you consider realistic real-world parameters.

Like... to go back to the Newcomb's problem... perfect oracles that can predict the future obviously don't exist. I mean, I know the author knows that. But I think we disagree on how relevant that is?

Discussions of Newcomb's problem usually handwave the oracle problem away; eg "Omega’s predictions are almost always right"... but the "almost" is pulling a lot of weight in that sentence. When is Omega wrong? How does it make its decisions? Is it analyzing your atoms? Even if it is, it feels like it should only be able to get an analysis of your personality and how likely you are to pick one or two boxes, not to perfectly predict whether you will (indeed, at the time it gives you a choice, it's perfectly possible that the decision you'll make is still fundamentally random, and you might possibly make both choices depending on factors Omega can't possibly control).

I think there are interesting discussions to be made about eg the value of honor, of sticking to precommitments even when the information you have suggests it's better for you to betray them, etc. And on the other hand, there's value to be had in discussing the fact that, in the real world, there's a lot of situations where pretending to have honor is a perfectly good substitute for actually having honor, and wannabe-Omegas aren't quite able to tell the difference.

But you have to get out of the realm of spherical cows to have those discussions.

Comment by PoignardAzur on Can you control the past? · 2021-09-01T21:31:30.812Z · LW · GW

Agreed.

I think this type of reflection is the decision theory equivalent of calculating the perfect launch sequence in Kerbal Space Program. If you sink enough time into it, you can probably achieve it, but by then you'll have loooong passed the point of diminishing returns, and very little of what you've learned will be applicable in the real world, because you've spent all your energy optimizing strategies that immediately fall apart the second any uncertainty or fuzziness is introduced into your simulation.