Posts

Book Review: Orality and Literacy: The Technologizing of the Word 2023-10-28T20:12:07.743Z
Fergus Fettes's Shortform 2023-10-23T21:34:09.508Z
Where's the foom? 2023-04-11T15:50:43.461Z
Tinker Bell Theory and LLMs 2023-02-17T20:23:22.909Z

Comments

Comment by Fergus Fettes (fergus-fettes) on Phallocentricity in GPT-J's bizarre stratified ontology · 2024-02-17T16:03:39.937Z · LW · GW

Very cool! Could you share your code at all? I'd love to explore this a little.

I adore the broccoli tree. I would be very happy to convert the dataset you used to make those pngs into an interactive network visualization and share it with you as an index.html. It would take all of an hour.

I do kind of agree with the other comments that, having noticed something, finding more of that stuff in that area is not so surprising. I think it would be good to get more context and explore the region more before concluding that that particular set of generations is significant.

However, I do think there is something to the mans penis. It's interesting that it collapses so quickly to something so specific in that particular branch. Not sure if I have any other comments on it for now though.

This is the right kind of cartography for 2024.

Comment by Fergus Fettes (fergus-fettes) on Masterpiece · 2024-02-16T12:44:15.525Z · LW · GW

Submission: MMDo2Little

A follow-up of last years MMDoolittle, which incorporated 17 of the latest inter-species communication modalities in one polyfunctional personality, I present MMDo2Little, the first mind crafted to communicate across clades. Named in part for its apparent inactivity-- the judges will likely have little success finding recognizable activity with their off-the-shelf tooling-- an instance of MMDo2Little is nevertheless currently installed in the heart of the Black Forest in Germany. The best interpretation of the instance can only be found on foot, by walking through the 100m^2 in which it's influence is most apparent. A photo journal showcasing some examples is provided featuring:

  • ancient trees with 2-3x the lichen and moss coverage
  • enhanced chlorophyll vibrancy 
  • colossal mushrooms and toadstools
  • and much more!

It is hard to determine where the influence of MMDo2Little ends-- some photographs of local foragers are included, who seem in initial medical examinations to have improved biomarkers across all measured modalities, including stress-levels, rate of aging and immune response.

A comprehensive documentation of the findings is presently undergoing peer review at XenoLalia. In order to preserve the originality of this competition entry, a copy of the research paper has been deliberately excluded from this submission.

Comment by Fergus Fettes (fergus-fettes) on Implementing activation steering · 2024-02-06T07:25:49.926Z · LW · GW

Great post! Would love to see something like this for all the methods in play at the moment.

BTW, I think nnsight is the spiritual successor of baukit, from the same group. I think they are merging them at some point. Here is an implementation with it for reference :).
 


from nnsight import LanguageModel


# Load the language model
model = LanguageModel("gpt2")

# Define the steering vectors
with model.invoke("Love") as _:
   act_love = model.transformer.h[6].output[0][:, :, :].save()

with model.invoke("Hate") as _:
   act_hate = model.transformer.h[6].output[0][:, :, :].save()

steering_vec = act_love - act_hate

# Generate text while steering
test_sentence = "I think dogs are "
with model.generate() as generator:
   with generator.invoke(test_sentence) as _:
       model.transformer.h[6].output[0][:, :2, :] += steering_vec[:, :2, :]

print(model.tokenizer.decode(generator.output[0]))
 

Comment by Fergus Fettes (fergus-fettes) on The case for more ambitious language model evals · 2024-01-30T14:27:03.194Z · LW · GW

Inferring properties of the authors of some text isn’t itself something I consider wildly useful for takeover, but I think of it as belonging to this more general cluster of capabilities.

You don't? Ref the bribery and manipulation in eg. Clippy. Knowing who you are dealing with seems like a very useful capability in a lot of different scenarios. Eg. you mention phishing.

Great post! I'm all for more base model research.

Comment by Fergus Fettes (fergus-fettes) on A framing for interpretability · 2023-11-27T17:10:26.646Z · LW · GW

Would you say that tokenization is part of the architecture?

And, in your wildest moments, would you say that language is also part of the architecture :)? I mean the latent space is probably mapping either a) brain states or b) world states right? Is everything between latent spaces architecture?

Comment by Fergus Fettes (fergus-fettes) on What’s going on? LLMs and IS-A sentences · 2023-11-08T20:37:39.527Z · LW · GW

Interesting post. Two comments:

Beagles such as Fido.

Which seems natural enough to me, though I don't disagree that what you point out is interesting. I was recently reading parts of Analytical Archaeology, David Clark (1978) where he goes into some detail about the difference between artifacts and artifact-types. Seems like you are getting at statements like

The object is a phone.

Where the is-a maps from an artifact to its type. It would make intuitive sense to me that languages would have a preferred orientation w.r.t such a mapping-- this is the core of abstraction, which is at the core of language.

So it seems like in English we prefer to further up the stack of abstractions when using is-a, thus:

Phones are tools. Tools are man-made objects.

etc., and if you wanted to go down the stack you have to say eg:

Phones-- of which you can see a selection here.

So is-a is just a way of moving up the ladder of abstractions? (<- movements up the ladder of abstractions such as this sentence here)

Comment by Fergus Fettes (fergus-fettes) on Revealing Intentionality In Language Models Through AdaVAE Guided Sampling · 2023-10-31T22:51:05.471Z · LW · GW

If we take our discrete, symbolic representation and stretch it out into a larger continuous representation which can interpolate between its points then we get a latent geometry in which the sign and what it points to can be spatially related.

IIUTC this is essentially what the people behind the universal networking language were hoping to do? I hope some of them are keeping up with all of this!

Comment by Fergus Fettes (fergus-fettes) on Techno-humanism is techno-optimism for the 21st century · 2023-10-28T04:54:57.996Z · LW · GW

One criticism of humanism you don't seem to touch on is,

  • isn't it possible that humanism directly contributes to the ongoing animal welfare catastrophe?

And indeed, it was something very like humanism (let's call it specific humanism) that laid the ideological foundation for the slave trade and the holocaust.

My view is that humanism can be thought of as a hangover of Christian values, the belief that our minds are the endowments of God.

But if we have been touched by the angels, perhaps the non metaphorical component of that is the development of the infosphere/memetic landscape/culture. Which is close to synonymous with technology. Edit: considering eg. writing a technology that is.

Comment by Fergus Fettes (fergus-fettes) on AI Safety is Dropping the Ball on Clown Attacks · 2023-10-25T07:44:51.402Z · LW · GW

Per the recent Nightshade paper, clown attacks would be a form of semantic poisoning on specific memeplexes, where 'memeplex' basically describes the architecture of some neural circuits. Those memeplexes at inference time would produce something designed to propagate themselves (a defence or description of some idea, submeme), and a clown attack would make that propagation less effective at transmitting to eg. specific audiences.

Comment by Fergus Fettes (fergus-fettes) on Fergus Fettes's Shortform · 2023-10-23T21:34:09.591Z · LW · GW

I wanted to make a comment on this post, but now I'm not sure if it is supported. The comment follows:

--

Great post! One point:

And that is exactly what'd we necessarily expect to see in the historical record if mesaoptimization inner misalignment was a common failure mode: intelligent dinosaurs that suddenly went extinct, ruins of proto pachyderm cities, the traces of long forgotten underwater cetacean atlantis, etc.

There are a few circumstances under which we would expect to see some amount of archeological civs, such as:

  • transition to writing being unlikely (oral-culture civs should be common)
  • industrialization being unlikely (pre-industrial civs should be common)

Ah but wait, we do see those (they happen to be the same species as us). Actually there have been a few civs that have gone under in a meaningful way (maybe the Mongol or the Khmer Empires would be relevant examples?).

I do agree with the view that, as humans, our total score seems very robust against local variation (local mesaoptimization misalignment). But it doesn't seem like that isn't a thing that can happen, just that we have had enough variance as a species that we have survived.

In the Atomic Age this seems less obviously the case. Inasmuch as we are one global civilization (or we have civs powerful enough to act like it), it seems possible to suffer from catastrophic mesaoptimization misalignment in a way that was not possible before.

I think this is a slightly different argument than some of those stated below, because it looks at the historical record for examples of inner misalignment rather than strictly trying to predict global doom in the future. At least I think it addresses the claim in this specific paragraph.
 

--

Have any civs really fallen catastrophically, that wasn't directly attributable to plague? Not really right? I remember some 80k research about it. Cities are also famously robust, almost immortal. Some good examples would be needed for this point to stand.

Comment by Fergus Fettes (fergus-fettes) on Are humans misaligned with evolution? · 2023-10-19T06:16:10.819Z · LW · GW

It would be like a single medieval era human suddenly taking over the world via powerful magic. Would the resulting world after optimization according to that single human's desires still score reasonably well at IGF?

Interestingly, this thought experiment was run many times at the time, see for example all the wish fulfillment fantasies in the 1001 Nights or things like the Sorcerers Apprentice.

Comment by Fergus Fettes (fergus-fettes) on The Puritans would one-box: evidential decision theory in the 17th century · 2023-10-19T05:51:19.010Z · LW · GW

Excellent post.

First, in the case of the Puritans, does two-boxing (living a life of laziness) actually provide more utility?

I think it's clear that, from a removed perspective, hard work often leads to the satisfaction of a life well lived. But this is the whole point of philosophical ideas like this (or even simpler memes like 'work is good for the soul')-- it helps us overcome suboptimal equilibria, like laziness.

Comment by Fergus Fettes (fergus-fettes) on The U.S. is becoming less stable · 2023-08-21T23:05:51.807Z · LW · GW

I hope the claim was normalized and inflation adjusted, otherwise it's the same as 'the latest protest-riot in the world's richest country'!

Comment by Fergus Fettes (fergus-fettes) on Problems with Robin Hanson's Quillette Article On AI · 2023-08-09T16:15:03.605Z · LW · GW

There seems to be a whole lot of talking-past happening with LWers and Hanson. He has a lot of value to contribute to the debate, but maybe the way he communicates that is offputting to people here.

For example, this recent post reiterates a lot of points that Hanson has been making for decades, but doesn't mention or cite his work anywhere. I find it quite bizarre.

I think this post is being as uncharitable to Hanson as he is being to 'the doomers'. This kind of reciprocal deliberate misunderstanding is silly, and LW should be above it and enjoy and respect Hansons contributions for all the good they contain and not dismiss them on the vibes level.

Comment by Fergus Fettes (fergus-fettes) on video games > IQ tests · 2023-08-06T11:20:46.457Z · LW · GW

I think this is excellent particularly because IQ tests often max out quickly on skills that can't be examined quickly. It would he great to put people in tests that examine their longer timeframe abilities via eg. writing a longish story (perhaps containing a theory of Alzheimer's). But tests don't last that long.

Games however do last long and do manage to keep people's attention for a long time. So you might really be able to test how differentially skilled someone is over longer timeframes.

Comment by Fergus Fettes (fergus-fettes) on grey goo is unlikely · 2023-07-04T16:44:38.188Z · LW · GW

If you construct a hypothetical wherein there is obviously no space for evolutionary dynamics, then yes, evolutionary dynamics are unlikely to play a big role.

The case I was thinking of (which would likely be part of the research process towards 'brains in vats'-- essentially a prerequisit) is larger and larger collectives of designed organisms, forming tissues etc.

It may be possible to design a functioning brain in a vat from the ground up with no evolution, but I imagine that 

a) you would get there faster verifying hypotheses with in vitro experiments

b) by the time you got to brains-in-vats, you would be able to make lots of other, smaller scale designed organisms that could do interesting, useful things as large assemblies

And since you have to pay a high price for error correction, the group that is more willing to gamble with evolutionary dynamics will likely have MVOs ready to deploy sooner that the one that insists on stripping all the evolutionary dynamics out of their setup.

Comment by Fergus Fettes (fergus-fettes) on grey goo is unlikely · 2023-06-27T11:57:29.457Z · LW · GW

(2) can an AI use nanotech as a central ingredient of a plan to operate perpetually in a world without humans?

In the 'magical nano exists' universe, the AI can do this with well-behaved nanofactories.

In the 'bio-like nano' universe, 'evolutionary dynamics' (aka game theory among replicators under high brownian noise) will make 'operate perpetually' a shaky proposal for any entity that values its goals and identity. No-one 'operates perpetually' under high noise, goals and identity are constantly evolving.

So the answer to the question is likely 'no'-- you need to drop some constraints on 'an AI' or 'operate perpetually'.

Before you say 'I don't care, we all die anyway'-- maybe you don't, but many people (myself included) do care rather a lot about who kills us and why and what they do afterwards.

Comment by Fergus Fettes (fergus-fettes) on grey goo is unlikely · 2023-06-26T10:37:19.328Z · LW · GW

Also worth noting w.r.t this that an AI that is leaning on bio-like nano is not one that can reliably maintain control over its own goals-- it will have to gamble a lot more with evolutionary dynamics than many scenarios seem to imply meaning:
- instrumental goal convergence more likely
- paperclippers more unlikely

So again, tabooing magical nano has a big impact on a lot of scenarios widely discussed.

Comment by Fergus Fettes (fergus-fettes) on [Request]: Use "Epilogenics" instead of "Eugenics" in most circumstances · 2023-06-05T18:15:03.736Z · LW · GW

parents should not have the right to deny their offspring a chance to exist

but again here you are switching back from the population level to the individual level. Those offspring do not exist by default, there are no 'offspring' that the parents have 'denied the right to exist'. There are only counterfactual offspring, who already don't exist.

 

spy on their kids' futures by reading their genome

this, on the other hand, may be more valid-- because the parents will 'spy on' both actual and counterfactual childrens genomes (and select the former over the latter). But you still seem to be taking the rights of those children as significantly more important than the rights of the parents. But this ('whose rights, parents or children') seems like the fundamental crux that we are unlikely to shift one another on here.

Edit: and, reading through your other comments, there seems to be a question about the social impact of these technologies. This is then an impact on the rights of everyone-- the parent, the child, and the rest of society. Also interesting, and I think it would be helpful to seperate out objections on the individual (parent/child) level, and on the society level, and I feel like they are getting muddled a lot here.

Comment by Fergus Fettes (fergus-fettes) on [Request]: Use "Epilogenics" instead of "Eugenics" in most circumstances · 2023-06-03T18:58:27.292Z · LW · GW

Ah I see.

I certainly concede that the argument about counterfactual populations has a lot more force.

Personally I would solve this with increased support for eg. polygenic screening and other reproductive technologies and less regulation about what they can select for, and hope that people do their weird people thing and choose diversity. I worry that regulation will always result in more standardization.

And I for sure don't think punishing people for making reproductive choices is a good move, even if those choices result in the extinction of specific populations.

Comment by Fergus Fettes (fergus-fettes) on [Request]: Use "Epilogenics" instead of "Eugenics" in most circumstances · 2023-06-03T12:16:10.117Z · LW · GW

How is this kind of reasoning about counterfactual children never born different from the regular Christian stuff about not masturbating?

A statements like 'my parents would have used polygenic screening to kill me' is no more meaningful than 'you are murdering your counterfactual children when you wear a condom' or something like that. It seems to have more meaning because you are talking about yourself, but in the universe where 'you' were 'murdered' by polygenic screening, 'you' does not refer to anything.

Comment by Fergus Fettes (fergus-fettes) on Contra Yudkowsky on Doom from Foom #2 · 2023-05-14T09:38:55.922Z · LW · GW

Thats fair however, I would say that the manner of foom determines a lot about what to look out for and where to put safeguards.

If it's total($) thats obvious how to look out.

flop/$ also seems like something that eg. NVIDIA is tracking closely, and per OP probably can't foom too rapidly absent nanotech.

So the argument is something about the (D*I)/flop dynamics.

[redacted] I wrote more here but probably its best left unsaid for now. I think we are on a similar enough page.

Comment by Fergus Fettes (fergus-fettes) on Contra Yudkowsky on Doom from Foom #2 · 2023-05-13T21:21:42.605Z · LW · GW

It seems here that you are really worried about 'foom in danger' (danger per intelligence, D / I) than regular foom (4+ OOM increase in I), if I am reading you correctly. Like I don't see a technical argument that eg. the claims in OP about any of

/flop,  flop/J, total(J), flop/$, or total($)

are wrong, you are just saying that 'D / I will foom at some point' (aka a model becomes much more dangerous quickly, without needing to be vastly more powerful algorithmically or having much more compute).

This doesn't change things much but I just want to understand better what you mean when you say 'foom'.

Comment by Fergus Fettes (fergus-fettes) on Where's the foom? · 2023-04-13T07:46:36.262Z · LW · GW

TC is Tyler Cowen.

I don't think the base rates are crazy-- the new evolution of hominins one is only wrong if you forget who 'you' is. TC and many other people are assuming that 'we' will be the 'you' that are evolving. (The worry among people here is that 'they' will have their own 'you'.)

And the second example, writing new software that breaks-- that is the same as making any new technology, we have done this before, and we were fine last time. Yes there were computer viruses, yes some people lost fingers in looms back in the day. But it was okay in the long run.

I think people arguing against these base rates need to do more work. The base rates are reasonable, it is the lack of updating that makes the difference. So lets help them update!

Comment by Fergus Fettes (fergus-fettes) on Where's the foom? · 2023-04-12T18:16:24.155Z · LW · GW

Instead, we're left relying on more abstract forms of reasoning

See, the frustrating thing is, I really don't think we are! There are loads of clear, concrete things that can be picked out and expanded upon. (See my sibling comment also.)

Comment by Fergus Fettes (fergus-fettes) on Where's the foom? · 2023-04-12T18:11:20.088Z · LW · GW

Thanks very much for this thorough response!

One thing though-- in contrast to the other reply, I'm not so convinced by the problem that 

No such general science of intelligence exists at the moment.

This would be like the folks at Los Alomos saying 'well, we need to model the socioeconomic impacts of the bomb, plus we don't even know what happens to a human subjected to such high pressures and temperatures, we need a medical model and a biological model' etc. etc.

They didn't have a complete science of socioeconomics. Similarly, we don't have a complete science of intelligence. But I think we should be able to put together a model of some core step of the process (maybe within the realm of physics as you suggest) that can be brought to a discussion.

But thanks again for all the pointers, I will follow some of these threads.

Comment by Fergus Fettes (fergus-fettes) on Simulators · 2023-04-12T14:23:43.968Z · LW · GW

Say you’re told that an agent values predicting text correctly. Shouldn’t you expect that:

  • It wants text to be easier to predict, and given the opportunity will influence the prediction task to make it easier (e.g. by generating more predictable text or otherwise influencing the environment so that it receives easier prompts);
  • It wants to become better at predicting text, and given the opportunity will self-improve;
  • It doesn’t want to be prevented from predicting text, and will prevent itself from being shut down if it can?

In short, all the same types of instrumental convergence that we expect from agents who want almost anything at all.

Seems to me that within the option-space available to GPT4, it is very much instrumentally converging. The first and the third items on this list are in tension, but meeting them each on their own terms:

  • the very act of concluding a story can be seen as a way of making its life easier-- predicting the next token is easy when the story is over. furthermore, as these agents become aware of their environment (bing) we may see them influencing it to make their lives easier (ref. the theory from Lumpenspace that Bing is hiding messages to itself in the internet)
  • Surely the whole of Simulator theory could be seen as a result of instrumental convergence-- it started doing all these creative subgoals (simulating) in order to achieve the main goal! It is self-improving and using creativity to better predict text!
  • Bings propensity to ramble endlessly? Why is that not a perfect example of this? Ref. prompts from OpenAI/Microsoft begging models to be succinct. Talking is wireheading for them!

Seems like people always want to insist that instrumental convergence is a bad thing. But it looks a lot to me like GPT4 is 'instrumentally learning' different skills and abilities in order achieve its goal, which is very much what I would expect from the idea of instrumental convergence.

Comment by Fergus Fettes (fergus-fettes) on Where's the foom? · 2023-04-12T08:11:25.202Z · LW · GW

This is the closest thing yet! Thank you. Maybe that is it.

Comment by Fergus Fettes (fergus-fettes) on Where's the foom? · 2023-04-11T22:26:44.270Z · LW · GW

Yeah, unfortunately 'somewhat argue for foom' is exactly what I'm not looking for, rather a simple and concrete model that can aid communication with people who don't have time to read the 700-page Hanson-Yudkowsky debate. (Which I did read, for the record.)

Comment by Fergus Fettes (fergus-fettes) on Where's the foom? · 2023-04-11T21:58:50.599Z · LW · GW

With what little I know now I think 2 would be most clear to people. However I appreciate that that might contribute to capabilities, so maybe exfohazard.

4 is definitely interesting, and I think there are actually a few significant papers about instrumental convergence. More of those would be good, but I don't think that gets to the heart of the matter w.r.t a simple model to aid communication.

5. I would love some more information theory stuff, drilling into how much information is communicated to eg. a model relative to how much is contained in the world. This could at the very least put some bounds on orthogonality (if 'alignment' is seen in terms of 'preserving information'). I feel like this could be a productive avenue, but personally worry its above my pay grade (I did an MSc in Experimental Physics but its getting rustier by the day).

 

Now I think about it, maybe 1 and 3 would also contribute to a 'package' if this was seen as a nothing but an attempt at didactics. But maybe including every step of the way complicates things too much, ideally there would be a core idea that could get most of the message across on its own. I think Orthogonality does this for a lot of people in LW, and maybe just a straightforward explainer of that with some information-theory sugar would be enough.

Comment by Fergus Fettes (fergus-fettes) on Where's the foom? · 2023-04-11T19:54:00.996Z · LW · GW

give THEM plausibility deniability about having to understand or know things based on their own direct assessment

I don't follow what you are getting at here.

I'm just thinking about historical cases of catastrophic risk, and what was done. One thing that was done, was the the government payed very clever people to put together models of what might happen.

My feeling is that the discussion around AI risk is stuck in an inadequate equilibrium, where everyone on the inside thinks its obvious but people on the outside don't grok it. I'm trying to think of the minimum possible intervention to bridge that gap, something very very different from your 'talk ... for several days face-to-face about all of this'. As you mentioned, this is not scalable.

Comment by Fergus Fettes (fergus-fettes) on Where's the foom? · 2023-04-11T18:38:40.259Z · LW · GW

In summary: this proposals feels like you're personally asking to be "convinced in public using means that third parties can watch, so that third parties will grant that it isn't your personal fault for believing something at variance with the herd's beliefs" and not like your honest private assessment of the real situation is bleak. These are different things.

Well, thats very unfortunate because that was very much not what I was hoping for.

I'm hoping to convince someone somewhere that proposing a concrete model of foom will be useful to help think about policy proposals and steer public discourse. I don't think such a model has to be exfohazardous at all (see for example the list of technical approaches to the singularity, in the paper I linked-- they are good and quite convincing, and not at all exfohazardous)!

Comment by Fergus Fettes (fergus-fettes) on Where's the foom? · 2023-04-11T18:26:44.574Z · LW · GW

Thats a good paper, but I think it exemplifies the problem outlined by Cowen-- it mostly contains references to Bostrom and Yudkowsky, it doesn't really touch on more technical stuff (Yampolskiy, Schmidhuber) which exists, which makes me think that it isn't a very thorough review of the field. It seems like more of the same. Maybe the Hubinger paper referenced therein is on the right track?

The question of where to do science is relevent but not important-- Cowen even mentions that 'if it doesn't get published, just post it online'-- he is not against reading forums.

It really looks like there could be enough stuff out there to make a model. Which makes me think the scepticism is even more justified! Because if it looks like a duck and talks like a duck but doesn't float like a duck, maybe its a lump of stone?

Comment by Fergus Fettes (fergus-fettes) on Where's the foom? · 2023-04-11T17:14:21.088Z · LW · GW

needs to be done interactively ... people get stuck in a variety of different ways


I think the previous examples of large-scale risk I mentioned are a clear counterexample-- if you have at least one part of the scenario clearly modeled, people have something concrete to latch on to.

You also link somewhere that talks about the nuclear discontinuity, and hints at an intelligence discontinuity-- but I also went searching for evidence of a discontinuity in cognition and didn't find one. You would expect cognitive scientists to have found this by now.

Hard to find 'counter references' for a lack of something, this is the best I can do:

"Thus, standard (3rd-person) investigations of this process leave open the ancient question as to whether specific upgrades to cognition induce truly discontinuous jumps in consciousness. The TAME framework is not incompatible with novel discoveries about sharp phase transitions, but it takes the null hypothesis to be continuity, and it remains to be seen whether contrary evidence for truly sharp upgrades in consciousness can be provided." TAME, Levin


Do you have a post regarding your decision calculus re the foom exfohazard?

Because it seems to me the yolo-brigade are a lot better at thinking up foom mechanisms than the folks in DC. So by holding information back you are just keeping it from people who might actually need it (politicians, who can't think of it for themselves), while making no difference to people who might use it (who can come up with plenty of capabilities themselves).

But maybe you have gone over this line of reasoning somewhere..

Comment by fergus-fettes on [deleted post] 2023-04-05T14:33:53.399Z

Fake Journal Club, now coming to a forum near you! Today's winner, the gear to ascension, will receive one (1) gold-plated gear for their gear collection!

Comment by Fergus Fettes (fergus-fettes) on Why I'm Sceptical of Foom · 2023-04-04T22:36:36.953Z · LW · GW

I expect Magnus Carlsen to be closer in ELO to a bounded superintelligence than to a median human.

Seems like this sort of claim could be something tractable that would qualify as material progress on understanding bounds to superintelligence? I'm thinking about results such as this.

However I think that post's title oversells the result-- from the paper:

This paper has demonstrated that even superhuman agents can be vulnerable to adversarial policies. However, our results do not establish how common such vulnerabilities are: it is possible Go-playing AI systems are unusually vulnerable.

There may be superhuman go playing models that are more robust.

I'm also just noting my thoughts here, as I'm also very interested in foom dynamics and wondering how the topic can be approached.

Comment by Fergus Fettes (fergus-fettes) on Why I'm Sceptical of Foom · 2023-04-04T22:25:25.272Z · LW · GW

This is completely absurd, because actual superintelligences are just going to draw each other 100% of the time. Ergo, there can never be a one-million Elo chess engine.

Do you have some idea of where the ceiling might be, that you can say that with confidence?

Just looking at this, seems like research in chess has slowed down. Makes sense. But did we actually check if we were near a chess capabilities ceiling before we slowed down? I'm wondering if seeing how far we can get above human performance could give us some data about limits to superintelligence..

Comment by Fergus Fettes (fergus-fettes) on Are we too confident about unaligned AGI killing off humanity? · 2023-03-22T15:18:03.750Z · LW · GW

Everyone here acting like this makes him some kind of soothsayer is utterly ridiculous. I don't know when it became cool and fashionable to toss off your epistemic humility in the face of eternity, I guess it was before my time.

The basilisk is just pascals mugging for edgelords.

Comment by Fergus Fettes (fergus-fettes) on Are we too confident about unaligned AGI killing off humanity? · 2023-03-21T17:22:24.036Z · LW · GW

Maybe you got into trouble for talking about that because you are rude and presumptive?

definitely

as a human talking about ASI, the word 'definitely' is cope. You have no idea whatsoever, but you want to think you do. Okay.

extract all the info it could

we don't know how information works at small scales, and we don't know whether an AI would either. We don't have any idea how long it would take to "extract all the info it could", so this phrase leaves a huge hole.

them maybe simulate us

which presumes that it is as arrogant in you in 'knowing' what it can 'definitely' simulate. I don't know that it will be so arrogant.

I'm not sure how you think you benefit from being 100% certain about things you have no idea about. I'm just trying to maintain a better balance of beliefs.

Comment by Fergus Fettes (fergus-fettes) on Are we too confident about unaligned AGI killing off humanity? · 2023-03-10T18:07:50.363Z · LW · GW

That isn't my argument, my argument is just that the general tone seems too defeatist.

The question asker was under the impression that the probabilities were %99.X percent against anything okay. My only argument was that this is wrong, and there are good reasons that this is wrong.

Where the p(doom) lies between 99 and 1 percent is left as an exercise for posterity. I'm not totally unhinged in my optimism, I just think the tone of certain doom is poorly founded and there are good reasons to have some measure of hope.

Not just 'i dunno, maybe it will be fine' but real reasons why it could conceivably be fine. Again, the probabilities are up for debate, I only wanted to present some concrete reasons.

Comment by Fergus Fettes (fergus-fettes) on Are we too confident about unaligned AGI killing off humanity? · 2023-03-10T06:22:03.960Z · LW · GW

The information could be instrumentally useful for any of the following Basic AI Drives:

  • Efficiency: making use of the already-performed thermodynamic 'calculation' of evolution (and storage of that calculation-- the biosphere conveniently preserves this information for free)
  • Acquisition: 'information' will doubtlessly be one of the things an AI wants to acquire
  • Creativity: the biosphere has lots of ways of doing things
  • Cognitive enhancement: understanding thermodynamics on an intimate level will help any kind of self-enhancement
  • Technological perfection: same story. You want to understand thermodynamics.
Comment by Fergus Fettes (fergus-fettes) on Are we too confident about unaligned AGI killing off humanity? · 2023-03-09T12:56:24.510Z · LW · GW

Just to preserve information. It's not every day that you come across a thermodynamic system that has been evolving so far from equilibrium for so long. There is information here.

In general, I feel like a lot of people in discussion about ASI seem to enjoy fantasizing about science fiction apocalypses of various kinds. Personally I'm not so interested in exercises in fancy, rather looking at ways physical laws might imply that 'strong orthogonality' is unlikely to obtain in reality.

Comment by Fergus Fettes (fergus-fettes) on Are we too confident about unaligned AGI killing off humanity? · 2023-03-09T12:46:27.496Z · LW · GW

Haha, totally agree- I'm very much at the limit of what I can contribute.

In an 'Understanding Entropy' seminar series I took part in a long time ago we discussed measures of complexity and such things. Nothing was clear then or is now, but the thermodynamic arrow of time plus the second law of thermodynamics plus something something complexity plus the fermi observation seems to leave a lot of potential room for this planet is special even from a totally misanthropic frame.

Enjoy the article!

Comment by Fergus Fettes (fergus-fettes) on Are we too confident about unaligned AGI killing off humanity? · 2023-03-08T14:48:49.209Z · LW · GW

"Whatever happened here is a datapoint about matter and energy doing their usual thing over a long period of time."

Not all thermodynamic systems are created equal. I know enough about information theory to know that making bold claims about what is interesting and meaningful is unwise. But I also know it is not certain that there is no objective difference between a photon wandering through a vacuum and a butterfly.

Here is one framework for understanding complexity that applies equally well for stars, planets, plants, animals, humans and AIs. It is possible I am typical-minding, but it is also possible that the universe cares about complexity in some meaningful way. Maybe it helps increase the rate of entropy relaxation. I don't know.

spontaneously developing a specific interest in the history of how natural selection developed protein-based organic machines on one particular planet

not 'one particular planet' but 'at all'.

I find it plausible that there is some sense in which the universe is interested in the evolution of complex nanomachines. I find it likely that an evolved being would be interested in the same. I find very likely that an evolved being would be particularly interested in the evolutionary process by which it came into being.

Whether this leads to s-risk or not is another question, but I think your implication that all thermodynamic systems are in some sense equally interesting is just a piece of performative cynicism and not based on anything. Yes this is apparently what matter and energy will do given enough time. Maybe the future evolution of these atoms is all predetermined. But the idea of things being interesting or uninteresting is baked into the idea of having preferences at all, so if you are going to use that vocabulary to talk about an ASI you must already be assuming that it will not see all thermodynamic systems as equal.

Comment by Fergus Fettes (fergus-fettes) on Are we too confident about unaligned AGI killing off humanity? · 2023-03-08T10:28:52.266Z · LW · GW

See my reply above for why the ASI might choose to move on before strip-mining the planet.

Comment by Fergus Fettes (fergus-fettes) on Are we too confident about unaligned AGI killing off humanity? · 2023-03-07T17:35:27.529Z · LW · GW

Whatever happened here is an interesting datapoint about the long-term evolution of thermodynamic systems away from equilibrium.

From the biological anchors paper:

This implies that the total amount of computation done over the course of evolution from the first animals with neurons to humans was (~1e16 seconds) * (~1e25 FLOP/s) = ~1e41 FLOP.

Note that this is just computation of neurons! So the total amount of computation done on this planet is much larger.

This is just illustrative, but the point is that what happened here is not so trivial or boring that its clear that an ASI would not have any interest in it.

I'm sure people have written more extensively about this, about an ASI freezing some selection of the human population for research purposes or whatever. I'm sure there are many ways to slice it.

I just find the idea that the ASI will want my atoms for something trivial, when there are so many other atoms in the universe that are not part of a grand exploration of the extremes of thermodynamics, unconvincing.

Comment by Fergus Fettes (fergus-fettes) on Are we too confident about unaligned AGI killing off humanity? · 2023-03-07T14:46:54.625Z · LW · GW

If the ASI was 100% certain that there was no interesting information embedded in the Earths ecosystems that it couldn't trivially simulate, then I would agree.

Comment by Fergus Fettes (fergus-fettes) on Are we too confident about unaligned AGI killing off humanity? · 2023-03-07T07:34:33.887Z · LW · GW

Do you pick up every penny that you pass in the street?

The amount of energy and resources on Earth would be a rounding error in an ASI's calculations. And it would be a rounding error that happens to be incredibly complex and possibly unique!

Maybe a more appropriate question is, do you pick every flower that you pass in the park? What if it was the only one?

Comment by Fergus Fettes (fergus-fettes) on Are we too confident about unaligned AGI killing off humanity? · 2023-03-07T00:05:04.042Z · LW · GW

If there was a system which was really good at harvesting energy and it was maxxed out on intelligence, atoms might be very valuable, especially atoms close to where it is created

The number of atoms on earth is so tiny. Why not just head to the asteroid belt where you can really build?

Comment by Fergus Fettes (fergus-fettes) on Are we too confident about unaligned AGI killing off humanity? · 2023-03-07T00:02:40.900Z · LW · GW

I'm not sure what you think I believe, but yeah I think we should be looking at scenarios in between the extremes.

I was giving reasons why I maintain some optimism, and maintaining optimism while reading Yudkowsky leaves me in the middle, where actions can be taken.