Posts

Community Notes by X 2024-03-18T17:13:33.195Z
Magic of synchronous conversation 2024-03-10T17:17:41.529Z
Making the "stance" explicit 2024-02-16T23:57:11.265Z
Why I take short timelines seriously 2024-01-28T22:27:21.098Z
Studying The Alien Mind 2023-12-05T17:27:28.049Z
Direction of Fit 2023-10-02T12:34:24.385Z
Philosophical Cyborg (Part 1) 2023-06-14T16:20:40.317Z
The Compleat Cybornaut 2023-05-19T08:44:38.274Z
Collective Identity 2023-05-18T09:00:24.410Z
NicholasKees's Shortform 2023-04-05T04:58:05.559Z
Questions about Conjecure's CoEm proposal 2023-03-09T19:32:50.600Z
Cyborgism 2023-02-10T14:47:48.172Z
Searching for Search 2022-11-28T15:31:49.974Z

Comments

Comment by NicholasKees (nick_kees) on NicholasKees's Shortform · 2024-04-26T08:51:19.386Z · LW · GW

I wish there were an option in the settings to opt out of seeing the LessWrong reacts. I personally find them quite distracting, and I'd like to be able to hover over text or highlight it without having to see the inline annotations. 

Comment by nick_kees on [deleted post] 2024-04-09T10:32:56.329Z

How would (unaligned) superintelligent AI interact with extraterrestrial life? 

Humans, at least, have the capacity for this kind of "cosmopolitanism about moral value." Would the kind of AI that causes human extinction share this? It would be such a tragedy if the legacy of the human race is to leave behind a kind of life that goes forth and paves the universe, obliterating any and all other kinds of life in its path. 

Comment by NicholasKees (nick_kees) on Plausibility of cyborgism for protecting boundaries? · 2024-04-03T09:31:50.510Z · LW · GW

Some thoughts:

First, it sounds like you might be interested the idea of d/acc from this Vitalik Buterin post, which advocates for building a "defense favoring" world. There are a lot of great examples of things we can do now to make the world more defense favoring, but when it comes to strongly superhuman AI I get the sense that things get a lot harder.

Second, there doesn't seem like a clear "boundaries good" or "boundaries bad" story to me. Keeping a boundary secure tends to impose some serious costs on the bandwidth of what can be shared across it. Pre-industrial Japan maintained a very strict boundary with the outside world to prevent foreign influence, and the cost was falling behind the rest of the world technologically.

My left and right hemispheres are able to work so well together because they don't have to spend resources protecting themselves from each other. Good cooperative thinking among people also relies on trust making it possible to loosen boundaries of thought. Weakening borders between countries can massively increase trade, and also relies on trust between the participant countries. The problem with AI is that we can't give it that level of trust, and so we need to build boundaries, but the ultimate cost seems to be that we eventually get left behind. Creating the perfect boundary that only lets in the good and never the bad, and doesn't incur a massive cost, seems like a really massive challenge and I'm not sure what that would look like. 

Finally, when I think of Cyborgism, I'm usually thinking of it in terms of taking control over the "cyborg period" of certain skills, or the period of time where human+AI teams still outperform either humans or AIs on their own. In this frame, if we reach a point where AIs broadly outperform human+AI teams, then baring some kind of coordination, humans won't have the power to protect themselves from all the non-human agency out there (and it's up to us to make good use of the cyborg period before then!) 

In that frame, I could see "protecting boundaries" intersecting with cyborgism, for example in that AI could help humans perform better oversight and guard against disempowerment around the end of some critical cyborg period. Developing a cyborgism that scales to strongly superhuman AI has both practical challenges (like the kind neuralink seeks to overcome), as well as requiring you to solve it's own particular version of alignment problem (e.g. how can you trust the AI you are merging with won't just eat your mind). 

 

Comment by NicholasKees (nick_kees) on Community Notes by X · 2024-03-25T20:45:34.158Z · LW · GW

Thank you, it's been fixed.

Comment by NicholasKees (nick_kees) on Could LLMs Help Generate New Concepts in Human Language? · 2024-03-25T10:51:17.222Z · LW · GW

In terms of LLM architecture, do transformer-based LLMs have the ability to invent new, genuinely useful concepts?

So I'm not sure how well the word "invent" fits here, but I think it's safe to say LLMs have concepts that we do not.

Comment by NicholasKees (nick_kees) on Could LLMs Help Generate New Concepts in Human Language? · 2024-03-25T10:47:52.272Z · LW · GW

Recently @Joseph Bloom was showing me Neuronpedia which catalogues features found in GPT-2 by sparse autoencoders, and there were many features which were semantically coherent, but I couldn't find a word in any of the languages I spoke that could point to these concepts exactly. It felt a little bit like how human languages often have words that don't translate, and this made us wonder whether we could learn useful abstractions about the world (e.g. that we actually import into English) by identifying the features being used by LLMs. 

Comment by NicholasKees (nick_kees) on Dangers of Closed-Loop AI · 2024-03-23T10:30:16.049Z · LW · GW

You might enjoy this post which approaches this topic of "closing the loop," but with an active inference lens: https://www.lesswrong.com/posts/YEioD8YLgxih3ydxP/why-simulator-ais-want-to-be-active-inference-ais 

Comment by NicholasKees (nick_kees) on Comparing Alignment to other AGI interventions: Basic model · 2024-03-20T23:07:29.150Z · LW · GW

A main motivation of this enterprise is to assess whether interventions in the realm of Cooperative AI, that increase collaboration or reduce costly conflict, can seem like an optimal marginal allocation of resources.

After reading the first three paragraphs, I had basically no idea what interventions you were aiming to evaluate. Later on in the text, I gather you are talking about coordination between AI singletons, but I still feel like I'm missing something about what problem exactly you are aiming to solve with this. I could have definitely used a longer, more explain-like-I'm-five level introduction. 

Comment by NicholasKees (nick_kees) on Community Notes by X · 2024-03-20T16:42:14.115Z · LW · GW

That sounds right intuitively. One thing worth noting though is that most notes get very few ratings, and most users rate very few notes, so it might be trickier than it sounds. Also if I were them I might worry about some drastic changes in note rankings as a result of switching models. Currently, just as notes can become helpful by reaching a threshold of 0.4, they can lose this status by dropping below 0.39. They may also have to manually pick new thresholds, as well as maybe redesign the algorithm slightly (since it seems that a lot of this algorithm was built via trial and error, rather than clear principles). 

Comment by NicholasKees (nick_kees) on Community Notes by X · 2024-03-20T15:48:16.183Z · LW · GW

"Note: for now, to avoid overfitting on our very small dataset, we only use 1-dimensional factors. We expect to increase this dimensionality as our dataset size grows significantly."


This was the reason given from the documentation.

Comment by NicholasKees (nick_kees) on Community Notes by X · 2024-03-19T12:00:14.654Z · LW · GW

Thanks for pointing that out. I've added some clarification.

Comment by NicholasKees (nick_kees) on Community Notes by X · 2024-03-18T21:02:49.129Z · LW · GW

That sounds cool! Though I think I'd be more interested using this to first visualize and understand current LW dynamics rather than immediately try to intervene on it by changing how comments are ranked. 

Comment by NicholasKees (nick_kees) on The Worst Form Of Government (Except For Everything Else We've Tried) · 2024-03-18T19:18:13.569Z · LW · GW

I'm confused by the way people are engaging with this post. That well functioning and stable democracies need protections against a "tyranny of the majority" is not at all a new idea; this seems like basic common sense. The idea that the American civil war was precipitated by the South perceiving an end to their balance of power with the North also seems pretty well accepted. Furthermore, there are lots of other things that make democratic systems work well: e.g. a system of laws/conflict resolution or mechanisms for peaceful transfers of power.

Comment by NicholasKees (nick_kees) on Twelve Lawsuits against OpenAI · 2024-03-10T17:24:06.998Z · LW · GW

fyi, the link chatgptiseatingtheworld.com does not have a secure connection. 

Comment by NicholasKees (nick_kees) on Future life · 2024-03-03T01:55:20.632Z · LW · GW

Even if you suppose that there are extremely good non-human futures, creating a new kind of life and unleashing it upon the world is a huge deal, with enormous ethical/philosophical implications! To unilaterally make a decision that would drastically affect (and endanger) the lives of everyone on earth (human and non-human) seems extremely bad, even if you had very good reasons to believe that this ends well (which as far as I can tell, you don't). 

I have sympathy for the idea of wanting AI systems to be able to pursue lives they find fulfilling and to find their own kinds of value, for the same reason I would, upon encountering alien life, want to let those aliens find value in their own ways.

But your post seems to imply that we should just give up on trying to positively affect the future, spend no real thought on what would be the biggest decision ever made in all of history, all based on a hunch that everything is guaranteed to end well no matter what we do? This perspective, to me, comes off as careless, selfish, and naive.

Comment by NicholasKees (nick_kees) on How is Chat-GPT4 Not Conscious? · 2024-02-28T03:27:12.041Z · LW · GW

I just ran into a post which, if you are interested in AI consciousness, you might find interesting: Improving the Welfare of AIs: A Nearcasted Proposal

There seem to be a lot of good reasons to take potential AI consciousness seriously, even if we haven't fully understood it yet.

Comment by NicholasKees (nick_kees) on How is Chat-GPT4 Not Conscious? · 2024-02-28T02:44:09.810Z · LW · GW

It seems hard to me to be extremely confident in either direction. I'm personally quite sympathetic to the idea, but there is very little consensus on what consciousness is, or what a principled approach would look like to determining whether/to what extent a system is conscious. 

Here is a recent paper that gives a pretty in-depth discussion: Consciousness in Artificial Intelligence: Insights from the Science of Consciousness

What you write seems to be focused entirely on the behavior of a system, and while I know there people who agree with that focus, from what I can tell most consciousness researchers are interested in particular properties of the internal process that produces that behavior.

Comment by NicholasKees (nick_kees) on A starting point for making sense of task structure (in machine learning) · 2024-02-24T21:42:55.650Z · LW · GW

More generally, science is about identifying the structure and patterns in the world; the task taxonomy learned by powerful language models may be very convergent and could be a useful map for understanding the territory of the world we are in. What’s more, such a decomposition would itself be of scientifico-philosophical interest — it would tell us something about thinking.

I would love to see someone expand on the ways we could use interpretability to learn about the world, or the structure of tasks (or perhaps examples of how we've already done this?). Aside from being interesting scientifically, maybe this could also help us build economically valuable systems which are more explicit and predictable?

Comment by NicholasKees (nick_kees) on Making the "stance" explicit · 2024-02-17T01:23:14.488Z · LW · GW

Credit goes to Daniel Biber: https://www.worldphoto.org/sony-world-photography-awards/winners-galleries/2018/professional/shortlisted/natural-world/very 
After the shape dissipated it actually reformed into another bird shape. 

Comment by NicholasKees (nick_kees) on Leading The Parade · 2024-02-04T22:37:31.833Z · LW · GW

It's been a while since I read about this, but I think your slavery example might be a bit misleading. If I'm not mistaken, the movement to abolish slavery initially only gained serious steam in the United Kingdom. Adam Hochschild tells a story in Bury the Chains that makes the abolition of slavery look extremely contingent on the role activists played in shaping the UK political climate. A big piece of this story is how the UK used their might as a global superpower to help force an end to the transatlantic slave trade (as well as precedent setting). 

Comment by NicholasKees (nick_kees) on How do you feel about LessWrong these days? [Open feedback thread] · 2023-12-06T00:45:03.632Z · LW · GW

What about leaning into the word-of-mouth sharing instead, and support that with features? For example, being able to as effortlessly as possible recommend posts to people you know from within LW?

Comment by NicholasKees (nick_kees) on Agents which are EU-maximizing as a group are not EU-maximizing individually · 2023-12-04T22:30:16.001Z · LW · GW

I think I must be missing something. As the number of traders increases, each trader can be less risk averse as their personal wealth is now a much smaller fraction of the whole, and this changes their strategy. In what way are these individuals now not EU-maximizing?

Comment by NicholasKees (nick_kees) on Taking Into Account Sentient Non-Humans in AI Ambitious Value Learning: Sentientist Coherent Extrapolated Volition · 2023-12-03T11:59:06.071Z · LW · GW

I like this thought experiment, but I feel like this points out a flaw in the concept of CEV in general, not SCEV in particular. 

If the entire future is determined by a singular set of values derived from an aggregation/extrapolation of the values of a group, then you would always run the risk of a "tyranny of the mob" kind of situation. 

If in CEV that group is specifically humans, it feels like all the author is calling for is expanding the franchise/inclusion to non-humans as well. 

Comment by NicholasKees (nick_kees) on D0TheMath's Shortform · 2023-11-23T11:13:08.329Z · LW · GW

@janus wrote a little bit about this in the final section here, particularly referencing the detection of situational awareness as a thing cyborgs might contribute to. It seems like a fairly straightforward thing to say that you would want the people overseeing AI systems to also be the ones who have the most direct experience interacting with them, especially for noticing anomalous behavior.

Comment by NicholasKees (nick_kees) on EA Vegan Advocacy is not truthseeking, and it’s everyone’s problem · 2023-09-29T10:37:14.109Z · LW · GW

This post feels to me like it doesn't take seriously the default problems with living in our particular epistemic environment. The meat and dairy industries have historically, and continue to have, a massive influence on our culture through advertisements and lobbying governments. We live in a culture where we now eat more meat than ever. What would this conversation be like if it were happening in a society where eating meat was as rare as being vegan now?

It feels like this is preaching to the choir, and picking on a very small group of people who are not as well resourced (financially or otherwise). The idea that people should be vegan by default is an extremely minority view, even in EA, and so anyone holding this position really has everything stacked against them. 

Comment by NicholasKees (nick_kees) on Orthogonal's Formal-Goal Alignment theory of change · 2023-05-06T16:16:01.228Z · LW · GW

This avoids spending lots of time getting confused about concepts that are confusing because they were the wrong thing to think about all along, such as "what is the shape of human values?" or "what does GPT4 want?"

These sound like exactly the sort of questions I'm most interested in answering. We live in a world of minds that have values and want things, and we are trying to prevent the creation of a mind that would be extremely dangerous to that world. These kind of questions feel to me like they tend to ground us to reality.

Comment by NicholasKees (nick_kees) on NicholasKees's Shortform · 2023-04-11T22:25:41.264Z · LW · GW

Try out The Most Dangerous Writing App if you are looking for ways to improve your babble. It forces you to keep writing continuously for a set amount of time, or else the text will fade and you will lose everything. 

Comment by NicholasKees (nick_kees) on Evolution provides no evidence for the sharp left turn · 2023-04-11T20:13:48.026Z · LW · GW

First of all, thank you so much for this post! I found it generally very convincing, but there were a few things that felt missing, and I was wondering if you could expand on them.

However, I expect that neither mechanism will produce as much of a relative jump in AI capabilities, as cultural development produced in humans. Neither mechanism would suddenly unleash an optimizer multiple orders of magnitude faster than anything that came before, as was the case when humans transitioned from biological evolution to cultural development.

Why do you expect this? Surely the difference between passive and active learning, or the ability to view and manipulate one's own source code (or that of a successor) would be pretty enormous? Also it feels like this implicitly assumes that relatively dumb algorithms like SGD or Predictive-processing/hebbian-learning will not be improved upon during such a feedback loop. 

On the topic of alignment, it feels like many of the techniques you mention are not at all good candidates, because they focus on correcting bad behavior as it appears. It seems like we mainly have a problem if powerful superhuman capabilities arrive before we have robustly aligned a system to good values. Currently, none of those methods have (as far as I can tell) any chance of scaling up, in particular because at some point we won't be able apply corrective pressures to a model that has decided to deceive us. Do we have any examples of a system where we apply corrective pressure early to instill some values, and then scale up performance without needing to continue to apply more corrective pressure? 
 

Comment by NicholasKees (nick_kees) on NicholasKees's Shortform · 2023-04-05T04:58:05.997Z · LW · GW

Are you lost and adrift, looking at the looming danger from AI and wondering how you can help? Are you feeling overwhelmed by the size and complexity of the problem, not sure where to start or what to do next?

I can't promise a lot, but if you reach out to me personally I commit to doing SOMETHING to help you help the world. Furthermore, if you are looking for specific things to do, I also have a long list of projects that need doing and questions that need answering. 

I spent so many years of my life just upskilling, because I thought I needed to be an expert to help. The truth is, there are no experts, and no time to become one. Please don't hesitate to reach out <3

Comment by NicholasKees (nick_kees) on CAIS-inspired approach towards safer and more interpretable AGIs · 2023-03-27T16:14:08.056Z · LW · GW

Natural language is more interpretable than the inner processes of large transformers.

There's certainly something here, but it's tricky because this implicitly assumes that the transformer is using natural language in the same way that a human is. I highly recommend these posts if you haven't read them already: 

Comment by NicholasKees (nick_kees) on Applying superintelligence without collusion · 2023-02-22T21:31:30.488Z · LW · GW

That's a good point. There are clearly examples of systems where more is better (e.g. blockchain). There are just also other examples where this opposite seems true.

Comment by NicholasKees (nick_kees) on Cyborgism · 2023-02-11T16:59:44.232Z · LW · GW

I agree that this is important. Are you more concerned about cyborgs than other human-in-the-loop systems? To me the whole point is figuring out how to make systems where the human remains fully in control (unlike, e.g. delegating to agents), and so answering this "how to say whether a person retains control" question seems critical to doing that successfully.

Comment by NicholasKees (nick_kees) on Cyborgism · 2023-02-11T03:57:55.646Z · LW · GW

Thank you for this gorgeously written comment. You really capture the heart of all this so perfectly, and I completely agree with your sentiments.  
 

Comment by NicholasKees (nick_kees) on If I encounter a capabilities paper that kinda spooks me, what should I do with it? · 2023-02-04T03:49:06.950Z · LW · GW

I think it's really important for everyone to always have a trusted confidant, and to go to them directly with this sort of thing first before doing anything. It is in fact a really tough question, and no one will be good at thinking about this on their own. Also, for situations that might breed a unilateralist's curse type of thing, strongly err on the side of NOT DOING ANYTHING. 

Comment by NicholasKees (nick_kees) on Pessimistic Shard Theory · 2023-01-25T18:22:46.740Z · LW · GW

An example I think about a lot is the naturalistic fallacy. There is a lot horrible suffering that happens in the natural world, and a lot of people seem to be way too comfortable with that. We don't have any really high leverage options right now to do anything about it, but it strikes me as plausible that even if we could do something about it, we wouldn't want to. (perhaps even even make it worse by populating other planets with life https://www.youtube.com/watch?v=HpcTJW4ur54)

Comment by NicholasKees (nick_kees) on Pessimistic Shard Theory · 2023-01-25T18:16:24.676Z · LW · GW

I really loved the post! I wish more people took S-risks completely seriously before dismissing them, and you make some really great points. 

In most of your examples, however, it seems the majority of the harm is in an inability to reason about the consequences of our actions, and if humans became smarter and better informed it seems like a lot of this would be ironed out. 

I will say the hospice/euthanasia example really strikes a chord with me, but even there, isn't it more a product of cowardice than a failure of our values?

Comment by NicholasKees (nick_kees) on Linda Linsefors's Shortform · 2023-01-06T18:06:39.272Z · LW · GW

GI is very efficient, if you consider that you can reuse a lot machinery that you learn, rather than needing to relearn it over and over again. https://towardsdatascience.com/what-is-better-one-general-model-or-many-specialized-models-9500d9f8751d 

Comment by NicholasKees (nick_kees) on Infohazards vs Fork Hazards · 2023-01-05T16:39:22.483Z · LW · GW

Sometimes something can be infohazardous even if it's not completely true. Even though the northwest passage didn't really exist, it inspired many European expeditions to find it. There's a lot of hype about AI right now, and I think the idea for a cool new capabilities idea (even if it turns out not to work well) can also do harm by inspiring people try similar things. 

Comment by NicholasKees (nick_kees) on [Simulators seminar sequence] #1 Background & shared assumptions · 2023-01-03T22:00:41.775Z · LW · GW

I interpret the goal as being more about figuring out how to use simulators as powerful tools to assist humans in solving alignment, and not at all shying away from the hard problems of alignment. Despite our lack of understanding of simulators, people (such as yourself) have already found them to be really useful, and I don't think it is unreasonable to expect that as we become less confused about simulators that we learn to use them in really powerful and game-changing ways. 

You gave "Google" as an example. I feel like having access to Google (or another search engine) improves my productivity by more than 100x. This seems like evidence that game-changing tools exist.

Comment by NicholasKees (nick_kees) on Applying superintelligence without collusion · 2023-01-03T20:41:41.293Z · LW · GW

and increasing the number of actors can make collusive cooperation more difficult

An empirical counterargument to this is in the incentives human leaders face when overseeing people who might coordinate against them. When authoritarian leaders come into power they will actively purge members from their inner circles in order to keep them small. The larger the inner circle, the harder it becomes to prevent a rebellious individual from gathering the critical mass needed for a full blown coup. 

Source: The Dictator's Handbook by Bruce Bueno de Mesquita and Alastair Smith

Comment by NicholasKees (nick_kees) on Human sexuality as an interesting case study of alignment · 2022-12-30T18:33:27.840Z · LW · GW

What is evolution's true goal? If it's genetic fitness, then I don't see how this demonstrates alignment. Human sexuality is still just an imperfect proxy, and doesn't point at the base objective at all. 

I agree that it's very interesting how robust this is to the environment we grow up in, and I would expect there to be valuable lessons here for how value formation happens (and how we can control this process in machines).

Comment by NicholasKees (nick_kees) on Simulators · 2022-12-30T18:20:39.348Z · LW · GW

To me this statement seems mostly tautological. Something is instrumental if it is helpful in bringing about some kind of outcome. The term "instrumental" is always (as far as I can tell) in reference to some sort of consequence based optimization. 

Comment by NicholasKees (nick_kees) on Human sexuality as an interesting case study of alignment · 2022-12-30T17:07:08.644Z · LW · GW

I agree that this is an important difference, but I think that "surely cannot be adaptive" ignores the power of group selection effects.

Comment by NicholasKees (nick_kees) on Models Don't "Get Reward" · 2022-12-30T17:01:26.634Z · LW · GW

Wow, this post is fantastic! In particular I love the point you make about goal-directedness:

If a model is goal-directed with respect to some goal, it is because such goal-directed cognition was selected for.

Looking at our algorithms as selection processes that incentivize different types of cognition seems really important and underappreciated. 

Comment by NicholasKees (nick_kees) on Human sexuality as an interesting case study of alignment · 2022-12-30T16:45:22.544Z · LW · GW

Assuming that what evolution 'wants' is child-bearing heterosexual sex, then human sexuality has a large number of deviations from this in practice including homosexuality, asexuality, and various paraphilias.

I don't think this is a safe assumption. Sex also serves a social bonding function beyond procreation, and there are many theories about the potential advantages of non-heterosexual sex from an evolutionary perspective. 

A couple things you might find interesting:
-Men are 33% more likely to be gay for every older brother they have: https://pubmed.ncbi.nlm.nih.gov/11534970/
-Women are more likely to be bisexual than men, which may have been advantageous for raising children: https://pubmed.ncbi.nlm.nih.gov/23563096/ 
- Homosexuality is extremely common in the animal kingdom (in fact the majority of giraffe sex is homosexual): https://en.wikipedia.org/wiki/List_of_mammals_displaying_homosexual_behavior 
 

Comment by NicholasKees (nick_kees) on Human sexuality as an interesting case study of alignment · 2022-12-30T16:19:25.555Z · LW · GW

It seems to occur mostly without RL. People start wanting to have sex before they have actually had sex.

This doesn't mean that it isn't a byproduct of RL. Something needs to be hardcoded, but a simple reward circuit might lead to a highly complex set of desires and cognitive machinery. I think the things you are pointing to in this post sound extremely related to what Shard Theory is trying to tackle. 
https://www.lesswrong.com/posts/iCfdcxiyr2Kj8m8mT/the-shard-theory-of-human-values 

Comment by NicholasKees (nick_kees) on Against Agents as an Approach to Aligned Transformative AI · 2022-12-29T18:28:36.042Z · LW · GW

I am also completely against building powerful autonomous agents (albeit for different reasons), but to avoid doing this seems to require extremely high levels of coordination. All it takes is one lab to build a singleton capable of disempowering humanity. It would be great to stay in the "tool AI" regime for as long as possible, but how?

Comment by NicholasKees (nick_kees) on Why I'm Sceptical of Foom · 2022-12-29T18:20:32.773Z · LW · GW

On many useful cognitive tasks(chess, theoretical research, invention, mathematics, etc.), beginner/dumb/unskilled humans are closer to a chimpanzee/rock than peak humans (for some fields, only a small minority of humans are able to perform the task at all, or perform the task in a useful manner

This seems due to the fact that most tasks are "all or nothing", or at least have a really steep learning curve. I don't think that humans differ that much in intelligence, but rather that small differences result in hugely different abilities. This is part of why I expect foom. Small improvements to an AI's cognition seem likely to deliver massive payoffs in terms of their ability to affect the world.
 

Comment by NicholasKees (nick_kees) on Mechanistic anomaly detection and ELK · 2022-11-27T17:57:27.643Z · LW · GW

If we are able to flag a treacherous turn as cognitively anomalous, then we can take that opportunity to shut down a system and retrain on the offending datapoint.

What do you mean by "retrain on the offending datapoint"? I would be worried about Goodhearting on this by selecting for systems which don't set off the anomaly detector, and thereby making it a less reliable safeguard.

Comment by NicholasKees (nick_kees) on Killing the ants · 2021-02-08T12:22:51.090Z · LW · GW

I really enjoyed this post. No assumptions are made about the moral value of insects, but rather the author just points out just how little we ever thought about it in the first place. Given that, as a species, we already tend to ignore a lot of atrocities that form a part of our daily lives, if it WERE true, beyond a reasonable doubt, that washing our sheets killed thousands of sentient creatures, I still can't imagine we'd put in a significant effort to find an alternative. (And it certainly wouldn't be socially acceptable to have stinky sheets!) I think it would be healthy to cultivate genuine curiosity and caring about these things, rather than ridicule people who depart from social norms. If insects do deserve moral weight, I'd like to be the sort of person who, and a part of a community that, would notice and take that seriously.