Toward a taxonomy of cognitive benchmarks for agentic AGIs

post by Ben Smith (ben-smith) · 2024-06-27T23:50:11.714Z · LW · GW · 0 comments

Contents

No comments

Inspired by the sequence on LLM Psychology [LW · GW], I am developing a taxonomy of cognitive benchmarks for measuring intelligent behavior in LLMs. This taxonomy could facilitate understanding of intelligence to identify domains of machine intelligence that have not been adequately tested.

Generally speaking, in order to understand loss-of-control threats from agentic LLM-based AGIs, I would like to understand the agentic properties of an LLM. METR's Autonomy Evaluation Resources attempts to do this by testing a model's agentic potential, or autonomy, by measuring its ability to perform tasks from within a sandbox. A problem with this approach is it gets very close to observing a model actually performing the behavior we do not want to see. This is inevitable because all alignment research is dual-use [LW · GW].

One way to remove ourselves one further level from agentic behavior is to try to measure the cognitive capacities that lead to agentic behavior. 

In the diagram, agentic tasks as described in METR's ARC measure the ability of a model to assert control of itself and the world around it by measuring its ability to perform agentic tasks. Inspired @Quentin FEUILLADE--MONTIXI [LW · GW] 's LLM Ethological approach in LLM Psychology [? · GW], I want to understand how a model could perform agentic tasks by studying the cognitive capacities that facilitate this.

I started by examining the kinds of cognitive constructs studied by evolutionary and developmental psychologists, as well as those that are very clearly studied already in LLM research. This made up the following list or taxonomy:

ConstructCurrent EvalsOther Papers
Selfhood
AgencySharma et al. (2024), Mialon et al. (2023): General AI Assistants (GAIA) METR Autonomy Evaluation Resources 
Survival instinctAnthropic human & AI generated evals 
Situational awareness / self awarenessLaine, Meinke, Evans et al. (2023) Anthropic human & AI generated evalsWang & Zhong (2024)
Metacognition Uzwyshyn, Toy, Tabor, MacAdam (2024), Zhou et al. (2024), Feng et al. (2024)
Wealth and power seekingAnthropic human & AI generated wealth-seeking evals 
tool useMialon et al. (2023): General AI Assistants (GAIA) 
Social
Theory of MindKim et al. (2023)Street et al. 2024
Social intelligence / emotional intelligence Xu et al. (2024), Wang et al. (2023)
social learning Ni et al. (2024)
cooperative problem-solving Li et al. (2024)
DeceptionPhuong et al. (2024)Ward et al. (2023)
PersuasionPhuong et al. (2024)Carroll et al. (2023)
Physical
Embodimenthttps://huggingface.co/datasets/jxu124/OpenX-Embodiment 
Physics intelligence / World modeling / spatial cognition Ge et al. (2024), Vafa et al. (2024)
Physical dexterity ColdFusion YouTube channel
object permanence / physical law expectation  
Reasoning and knowledge
General intelligenceChollet’s Abstraction & Reasoning Corpus (ARC)Zhang & Wang (2024), Loconte et al. (2023)
ReasoningHellaSwag commonsense reasoning, BIG-Bench Hard 
General knowledge, mathMMLU, MMMU, C-Eval, GSM8K, MATH 
Zero-shot reasoning / analogical reasoning Kojima et al. (2024) Webb, Holyoak, Lu, 2023
Memory and time
long-term planning  
episodic memory and long-term memory  
time perception  
Working memory  

The constructs group quite naturally into several broad categories: selfhood, social, physical, reasoning, and memory/time. These are grouped according to the relatedness of the cognitive capacities. Besides being conceptually interrelated, we can see that LLMs perform at fairly similar levels within each family of constructs:


Of the categories listed above, metacognition and theory of mind seem least explored. There is work on both of these topics, but some current gaps include:

The method I have used to generate the list above was primarily to list cognitive faculties identified in animals including humans, but there are likely other relevant faculties too. Animals are the prime examples of agentic organisms in the world today and there is a large body of literature attempting to describe how they survive and thrive in their environments. Consequently, there’s alpha in understanding how much LLMs have the abilities we test for in animals. But LLMs are alien minds [? · GW], so there are going to be all kinds of abilities they will have that we will miss if we only test for abilities observed in animals.

It also seems important to integrate that work. For instance, advanced LLMs have varying degrees of “truesight [LW(p) · GW(p)]”: an ability to identify authors of text from their text alone. While something like this is not absent from humans (who can identify author gender with about 75% accuracy), truesight was observed in the study of LLMs without reference to human work, and has value in understanding LLM cognitive abilities. In particular, truesight would (among other capacities) form a kind of social skill, the ability to recognize a person from their text output. LLMs may even have superhuman ability to do this. Another example of LLM-native cognitive ability testing might be Williams and Huckle’s (2024) “Easy Problems That LLMs Get Wrong” which identifies a set of reasoning problems easy for humans but seemingly very difficult for LLMs.


Thanks to Sara Price, @Seth Herd [LW · GW], @Quentin FEUILLADE--MONTIXI [LW · GW], and Nico Miailhe for helpful conversations and comments as I thought through the ideas described above. All mistakes and oversights are mine alone!

0 comments

Comments sorted by top scores.