nora_ammann

I found this article ~very poor. Much of the rhetorical moves adopted in the pieces seem largely optimised for making it easy to stay on the "high horse". Talking about a singular AI doomer movement being one of them. Having the stance that AGI is not near and thus there is nothing to worry about is another. Whether or not that's true, it certainly makes it easy to point your finger at folks who are worried and say 'look what silly theater'.

I think it's somewhat interesting to ask whether there should be more coherence across safety efforts, and at the margins, the answer might be yes. But I'm also confused about the social model that suggests that there could be something like a singular safety plan (instead, I think we live in a world where increasingly more people are waking up to the implications of AI progress, and of course there will be diverse and to some extent non-coherent reactions to this), OR that a singular coherent safety plan would be desirable given the complexity and amount of uncertainty invovled in the challenge.

Comment by Nora_Ammann on In response to critiques of Guaranteed Safe AI · 2025-01-31T05:13:06.582Z · LW · GW

What's the case for it being a swiss cheese approach? That doesn't match how I think of it.

Comment by Nora_Ammann on (The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser · 2024-12-19T17:31:36.573Z · LW · GW

Donated 1k USD, and might donate more after some further reflection.

I have myself and seen many others benefit a lot from the many things you've done over the years, in particular LW and Lighthaven. But beyond that, I also particularly value and respect you for the integrity and intellectual honesty I have found you demonstrate consistently.

Comment by Nora_Ammann on Why I funded PIBBSS · 2024-09-17T21:43:26.312Z · LW · GW

[Edited a bit for clafity]

(To clarify: I co-founded and led PIBBSS since 2021, but stepped down from leadership in June this year to work with davidad's on the Safeguarded AI programme. This means I'm no longer in charge of executive & day-to-day decisions at PIBBSS. As such, nothing of what I say below should be taking as authoritative source about what PIBBSS is going to do. I do serve on the board.)

Ryan - I appreciate the donation, and in particular you sharing your reasoning here.

I agree with a lot of what you write. Especially "connectors" (point 2) and bringing in relatively more senior academics from non-CS/non-ML fields (point 3) are IMO things that are valuable and PIBBSS has a good track record for delivering on.

Regarding both your point 1 and reservation 1 (while trying to abstract a bit from the fact that terms like 'blue sky' research are somewhat fuzzy, and that I expect at least some parts of a disagreement here might turn out to disappear when considering concrete examples), I do think there has been some change in PIBBSS' research thinking & prioritization which has been unfolding in my head since summer 2023, and finding its way more concretely into PIBBSS' strategy since the start of 2024. Lucas is the best person to talk to this in more detail, but I'll still share a few thoughts that were on my mind back when I was still leading PIBBSS.

I continue to believe that there is a lot of value to be had in investigating the underlying principles of intelligent behaviour (what one might refer to as blue sky or basic research), and to do so with a good dose of epistemic pluralism (studying such intelligent behaviour from/across a range of different systems, substrates and perspectives). I think this is a or the core aspect of the PIBBSS spirit. However, after the first 2 years of PIBBSS, I also wasn't entirely happy with our concrete research outputs. I thought a lot about what's up here (first mainly by myself, and later together with Lucas) and about how to do better - all the while staying true to the roots/generators/principles of PIBBSS, as I see them.

One of the key axis of improvement we've grown increasingly confident about is what we sometimes refer to as "bridging the theory-practice gap". I'm pretty bullish on theory(!) -- but theory alone is not the answer. Theory on its own isn't in a great position to know where to go next/what to prioritise, or whether it's making progress at all. I have seen many theoretical threads that felt intriguing/compelling, but failed to bottom out in something tangibly productive because they were lacking feedback loops that help guide them, and that would force them to operationalise abstract notions into something of concrete empirical and interventionist value. (EDIT: 'interventionist' is me trying to point to when a theory is good enough to allow you to intervene in the world or design artifacts in such a way that they reliably lead to what you intended.)

This is not an argument against theory, in my eyes, but an argument that theorizing about intelligent behaviour (or any given phenomena) will benefit a lot from having productive feedback loops with the empirical. As such, what we've come to want to foster at PIBBSS is (a) ambitious, 'theory-first' AI safety research, (b) with an interdisciplinary angle, (c) that is able to find meaningful and iterative empirical feedback loops. It's not so much that a theoretical project that doesn't yet have a meaningful way of making contact with reality should be disregarded -- and more than a key source of progress for said project will be to find ways of articulating/operationalising that theory, so that it starts making actual (testable) predictions about a real system and can come to inform design choices.

These updates in our thinking led to a few different downstream decisions. One of them was trying to have our fellowship cohort have include some empirical ML profiles/projects (I endorse roughly a 10-15% fraction, similar to what Joseph suggests). Reasons for this are, both, that we think this work is likely to be useful, and also because it changes (and IMO improves) the cohort dynamics (compared to, say, 0-5% ML). That said, I agree that once going above 20%, I would start to worry that something essential about the PIBBSS spirit might get lost, and I'm not excited about that from an ecosystem perspective (given that e.g. MATS is doing what it's doing).

Another downstream implication (though it's somewhat earlier days for PIBBSS on that one) is that I've become pretty excited about trying to help move ambitious ideas from (what I call) an 'Idea Readiness Level' (IDL; borrowing from the notion of 'Technology Readiness Levels') 1-3, to an IDL of 5-7. My thinking here is that once an idea/research agenda is at IDLs 5-7, it typically has been able to enter the broader epistemic discourse, it has some initial legible evidence/output it can rely on to making its own case -- and at that point I would say it no longer is in the area where PIBBSS has the greatest comparative advantage to support it. On the other hand, I think there isn't much (if any) obvious places where IDL 1-3 ideas get a chance to get iterate on quickly and stress-tested to develop into a more mature & empirically grounded agenda. (I think the way we were able to support Adam Shai & co in developing the computational mechanics agenda is a pretty great example of this use case I'm painting here -though notably their ideas were already relatively mature compared to other things PIBBSS might ambitiously help to mature.)

I'd personally be very excited for a PIBBSS that becomes excellent at filling that gap, and think PIBBSS has a bunch of the necessary ingredients for that already. I see this as a potentially critical investment in medium term robustness of the research ecosystem, and into what i think is an undeniable need to come to base AI Safety on rigorous scientific understanding. (Though notably Lucas/Dusan, PIBBSS' new leadership, might disagree & should have a chance to speak for themselves here.)

Comment by Nora_Ammann on PIBBSS Speaker events comings up in February · 2024-02-21T16:41:27.750Z · LW · GW

Yes, we upload them to our Youtube account modulo the speaker agreeing to it. The first few recordings from this series should be uploaded very shortly.

Comment by Nora_Ammann on Aligned AI is dual use technology · 2024-01-27T20:25:25.608Z · LW · GW

While I don't think it's so much about selfishness as such, I think this points at something important, also discussed eg here: The self-unalignment problem

Comment by Nora_Ammann on Non-directed conceptual founding · 2023-12-10T18:09:53.998Z · LW · GW

Does it seem like I'm missing something important if I say "Thing = Nexus" gives a "functional" explanation of what thing is, i.e. it serves the function of being an "inductive nexus of reference". This is not a foundational/physicalist/mechanistic explanation, but it is very much a sort of explanation that I can imagine being useful in some cases/for some purposes.

I'm suggesting this as a possibly different angle at "what sort of explanation is Thing=Nexus, and why is it plausibly not fraught despite it's somewhat-circularity?" It seems like it maps on to /doesn't contract anything you say (note: I only skimmed the post so might have missed some relevant detail, sorry!), but I wanted to check whether, even if not conflicting, it misses something you think is or might be important somehow.

Comment by Nora_Ammann on Complex systems research as a field (and its relevance to AI Alignment) · 2023-12-04T20:36:47.290Z · LW · GW

Yeah, would be pretty keen to see more work trying to do this for AI risk/safety questions specifically: contrasting what different lenses "see" and emphasize, and what productive they critiques they have to offer to each other.

Over the last couple of years, valuable progress has been made towards stating the (more classical) AI risk/safety arguments more clearly, and I think that's very productive for leading to better discourse (including critiques of those ideas). I think we're a bit behind on developing clear articulations of the complex systems/emergent risk/multi-multi/"messy transitions" angle on AI risk/safety, and also that progress on this would be productive on many fronts.

If I'm not mistaken there is some work on this in progress from CAIF (?), but I think more is needed.

Comment by Nora_Ammann on What's next for the field of Agent Foundations? · 2023-12-04T17:48:07.613Z · LW · GW

To follow up on this, we'll be hosting John's talk on Dec 12th, 9:30AM Pacific / 6:30PM CET.

Join through this Zoom Link.

Title: AI would be a lot less alarming if we understood agents

Description: In this talk, John will discuss why and how fundamental questions about agency - as they are asked, among others, by scholars in biology, artificial life, systems theory, etc. - are important to making progress in AI alignment. John gave a similar talk at the annual ALIFE conference in 2023, as an attempt to nerd-snipe researchers studying agency in a biological context.

--

To be informed about future Speaker Series events by subscribing to our SS Mailing List here. You can also add the PIBBSS Speaker Events to your calendar through this link.

Comment by Nora_Ammann on What's next for the field of Agent Foundations? · 2023-11-30T19:37:42.995Z · LW · GW

I have no doubt Alexander would shine!

Happy to run a PIBBSS speaker event for this, record it and make it publicly available. Let me know if you're keen and we'll reach out to find a time.

Comment by Nora_Ammann on What's next for the field of Agent Foundations? · 2023-11-30T18:44:22.571Z · LW · GW

FWIW I also think the "Key Phenomena of AI risk" reading curriculum (h/t TJ) does some of this at least indirectly (it doesn't set out to directly answer this question, but I think a lot of the answers to the question are comprise in the curriculum).

(Edit: fixed link)

Comment by Nora_Ammann on What's next for the field of Agent Foundations? · 2023-11-30T18:42:57.325Z · LW · GW

How confident are you about it not having been recorded? If not very, seems props worth checking again

Comment by Nora_Ammann on “Clean” vs. “messy” goal-directedness (Section 2.2.3 of “Scheming AIs”) · 2023-11-30T17:51:57.480Z · LW · GW

Re whether messy goal-seekers can be schemers, you may address this in a different place (and if so forgive me, and I'd appreciate you pointing me to where), but I keep wondering what notion of scheming (or deception, etc.) we should be adopting when, in particular:

an "internalist" notion, where 'scheming' is defined via the "system's internals", i.e. roughly: the system has goal A, acts as if it has goal B, until the moment is suitable to reveal it's true goal A.
an "externalist" notion, where 'scheming' is defined, either, from the perspective of an observer (e.g. I though the system has goal B, maybe I even did a bunch of more or less careful behavioral tests to raise my confidence in this assumption, but in some new salutation, it gets revealed that the system pursues B instead)
or an externalist notion but defined via the effects on the world that manifest (e.g. from a more 'bird's-eye' perspective, we can observe that the system had a number of concrete (harmful) effects on one or several agents via the mechanisms that those agents misjudged what goal the system is pursuing (therefor e.g. mispredicting its future behaviour, and basing their own actions on this wrong assumption)

It seems to me like all of these notions have different upsides and downsides. For example:

the internalist notion seems (?) to assume/bake into its definition of scheming a high degree of non-sphexishness/consequentialist cognition
the observer-dependent notion comes down to being a measure of the observer's knowledge about the system
the effects-on-the-world based notion seems plausibly too weak/non mechanistic to be helpful in the context of crafting concrete alignment proposals/safety tooling

Comment by Nora_Ammann on 'Theories of Values' and 'Theories of Agents': confusions, musings and desiderata · 2023-11-16T02:42:58.436Z · LW · GW

Yeah neat, I haven't yet gotten to reading it but is definitely on my list. Seems (and some folks suggested to me) that it's quite related to the sort of thing I'm discussing in value change problem too.

Comment by Nora_Ammann on 'Theories of Values' and 'Theories of Agents': confusions, musings and desiderata · 2023-11-15T20:17:43.230Z · LW · GW

Roughly... refers to/emphasizes the dynamic interaction between agent and environment and understands behavior/cognition/agency/... to emerge through that interaction/at that interface (rather than, e.g, trying to understand them as an internal property of the agent only)

Comment by Nora_Ammann on 4. Risks from causing illegitimate value change (performative predictors) · 2023-11-09T14:22:42.808Z · LW · GW

Related to my point above (and this quoted paragraph), a fundamental nuance here is the distinction between "accidental influence side effects" and "incentivized influence effects". I'm happy to answer more questions on this difference if it's not clear from the rest of my comment.

Thanks for clarifying; I agree it's important to be nuanced here!

I basically agree with what you say. I also want to say something like: whether to best count it as side effect or incentivized depends on what optimizer we're looking at/where you draw the boundary around the optimizer in question. I agree that a) at the moment, recommender systems are myopic in the way you describe, and the larger economic logic is where some of the pressure towards homogenization comes from (while other stuff is happening to, including humans pushing to some extent against that pressure, more or less successfully); b) at some limit, we might be worried about an AI systems becoming so powerful that its optimization arc comes to sufficiently large in scope that it's correctly understood as directly doign incentivized influence; but I also want to point out a third scanrios, c) where we should be worried about basically incentivized influence but not all of the causal force/optimization has to be enacted from wihtin the boundaries of a single/specific AI system, but where the economy as a whole is sufficiently integrated with and accelerated by advanced AI to justify the incentivized influence frame (e.g. a la ascended economy, fully automated tech company singularity). I think the general pattern here is basically one of "we continue to outsource ever more consequential decisions to advanced AI systems, without having figured out how to make these systems reliably (not) do any thing in particular".

Comment by Nora_Ammann on 4. Risks from causing illegitimate value change (performative predictors) · 2023-11-09T14:10:54.983Z · LW · GW

A small misconception that lies at the heart of this section is that AI systems (and specifically recommenders) will try to make people more predictable. This is not necessarily the case.

Yes, I'd agree (and didn't make this clear in the post, sorry) -- the pressure towards predictability comes from a combination of the logic of performative prediction AND the "economic logic" that provide the context in which these performative predictors are being used/applied. This is certainly an important thing to be clear about!

(Though it also can only give us so much reassurance: I think it's an extremely hard problem to find reliable ways for AI models to NOT be applied inside of the capitalist economic logic, if that's what we're hoping to do to avoid the legibilisation risk.)

Comment by Nora_Ammann on 3. Premise three & Conclusion: AI systems can affect value change trajectories & the Value Change Problem · 2023-11-09T14:06:15.007Z · LW · GW

Agree! Examples abound. You can never escape your local ideological context - you can only try to find processes that have some hope at occasionally pumping into the bounds of your current ideology and press beyond it - no reliably receipt (just like there is no reliably receipt to make yourself notice your own blind spot) - but there is the hope for things that in expectation and intertemporally can help us with this.

Which poses a new problem (or clarifies the problem we're facing): we don't get to answer the question of value change legitimacy in a theoretical vacuum -- instead we are already historically embedded in a collective value change trajectory, affecting both what we value but also what we (can) know.

I think that makes it sound a bit hopeless from one perspective, but on the other hand, we probably also shouldn't let hypothetical worlds we could never have reached weight us down -- there are many hypothetical worlds we still can reach that it is worth fighting for.

Comment by Nora_Ammann on 2. Premise two: Some cases of value change are (il)legitimate · 2023-11-09T13:55:38.159Z · LW · GW

Yeah interesting point. I do see the pull of the argument. In particular the example seems well chosen -- where the general form seems to be something like: we can think of cases where our agent can be said to be better off (according to some reasonable standards/form some reasonable vantage point) if the agent can make themselves be committed to continue doing a thing/undergoing a change for at least a certain amount of time.

That said, I think there are also some problems with it. For example, I'm wary of reifying "I-as-in-CEV" more than what is warranted. For one, I don't know whether there is a single coherent "I-as-in-CEV" or whether there could be several; for two, how should I apply this argument practically speaking given that I don't know what "I-as-in-CEV" would consider acceptable.

I think there is some sense in which proposing to use legitimacy as criterion has a flavour of "limited ambition" - using it will in fact mean that you will sometimes miss out of making value changes that would have been "good/acceptable" from various vantage points (e.g. legitimacy would say NO to pressing the button that would magically make everyone in the world peaceful/against war (unless the button involves some sophisticated process that allows you to back out legitimacy for everyone involved)). At the same time, I am wary we cannot give up on legitimacy without risking much worse fates, and as such, I feel currently fairly compelled to opt for legitimacy form an intertemporal perspective.

Comment by Nora_Ammann on 2. Premise two: Some cases of value change are (il)legitimate · 2023-11-09T13:47:43.789Z · LW · GW

yes, sorry! I'm not making it super explicit, actually, but the point is that, if you read e.g. Paul or Callard's accounts of value change (via transformative experiences and via aspiration respectively), a large part of how they even set up their inquiries is with respect to the question whether value change is irrational or not (or what problem value change poses to rational agency). The rationality problem comes up bc it's unclear from what vantage point one should evaluate the rationality (i.e. the "keeping with what expected utiltiy theory tells you to do") of the (decision to undergo) value change. From the vantage point of your past self, it's irrational; from the vantage point of your new self (be it as parent, vampire or jazz lover), it may be rational.

Form what I can tell, Paul's framing of transformative experiences is closer to "yes, transformative experiences are irrational (or a-rational) but they still happen; I guess we have to just accept that as a 'glitch' in humans as rational agents"; while Callard's core contribution (in my eyes) is her case for why aspiration is a rational process of value development.

Comment by Nora_Ammann on Become a PIBBSS Research Affiliate · 2023-10-12T06:57:59.734Z · LW · GW

Yes, as stated the requirements section, affiliates are expected to attend retreats and I expect about 50/50 of events will be happening in US/Europe.

Comment by Nora_Ammann on Become a PIBBSS Research Affiliate · 2023-10-11T07:45:23.342Z · LW · GW

We're not based in a single location. We are open to accepting affiliates that are based in whatever place, and are also happy to help them relocate (within some constraints) to somewhere else (e.g. where they have access to a more lively research/epistemic community) if that would be beneficial for them. That said, I also think we are best placed to help people (and have historically tended to run things) in either London/Oxford, Prague or Berkeley.

Comment by Nora_Ammann on Become a PIBBSS Research Affiliate · 2023-10-11T07:39:14.025Z · LW · GW

Starting dates might indeed differ depending on candidates' situation. That said, we expect affiliates of this round will start sometime between mid-December and mid-January. We'll be running a research retreat to onboard affiliates within that same time frame.

Comment by Nora_Ammann on Telopheme, telophore, and telotect · 2023-10-10T07:43:43.226Z · LW · GW

Right, but I feel like I want to say something like "value grounding" as its analogue.

Also... I do think there is a crucial epistemic dymension to values, and the "[symbol/value] grounding" thing seems like one place where this shows quite well.

Comment by Nora_Ammann on Telopheme, telophore, and telotect · 2023-09-24T04:39:14.602Z · LW · GW

The process that invents democracy is part of some telotect, but is it part of a telophore? Or is the telophore only reached when democracy is implemented?

Musing about how (maybe) certain telopheme impose constraints on the structure (logic) of their corresonding telophores and telotects. Eg democracy, freedom, autonomy, justice, corrigibility, rationality, ... (thought plausibly you'd not want to count (some of) those examples as telophemes in the first place?)

Comment by Nora_Ammann on Telopheme, telophore, and telotect · 2023-09-24T04:10:02.926Z · LW · GW

Curious whether the following idea rhymes with what you have in mind: telophore as (sort of) doing ~symbol grounding, i.e. the translation (or capacity to translate) from description to (wordly) effect?

Comment by Nora_Ammann on Announcing “Key Phenomena in AI Risk” (facilitated reading group) · 2023-05-09T18:26:30.749Z · LW · GW

Indeed that wasn't intended. Thanks a lot for spotting & sharing it! It's fixed now.

Comment by Nora_Ammann on Announcing “Key Phenomena in AI Risk” (facilitated reading group) · 2023-05-09T00:53:13.452Z · LW · GW

Good point! We are planning to gauge time preferences among the participants and fix slots then. What is maybe most relevant, we are intending to accommodate all time zones. (We have been doing this with PIBBSS fellows as well, so I am pretty confident we will be able to find time slots that work pretty well across the globe.)

Comment by Nora_Ammann on Robustness to Scaling Down: More Important Than I Thought · 2022-08-05T18:14:48.537Z · LW · GW

Here is another interpretation of what can cause a lack of robustness to scaling down:

(Maybe this is what you have in mind when you talk about single-single alignment not (necessaeraily) scaling to multi-multi alignment - but I am not sure that is the case, and even if it ism I feel pulled to stating it again more as I don't think it comes out as clearly as I would want it to in the original post.)

Taking the example of an "alignment strategy [that makes] the AI find the preferences of values and humans, and then pursu[e] that", robustness to scaling down can break if "human values" (as invoked in the example) don't "survive" reductionism; i.e. if, when we try to apply reducitonism to "human values", we are left with "less" than what we hoped for.

This is the inverse of saying that there is an important non-linearity when trying to scale (up) from single-single alignment to multi-multi alignment.

I think interpretation locates the reason for the lack of robustness in neither capabilities nor alignment regime, which is why I wanted to raise it. It's a claim about the nature or structural properties of "human values"; or a hint that we are deeply confused about human values (e.g. that the term currently refers to an incohernet cluster or "unnatural" abstraction)).

What you say about CEV might capture this fully, but whether it does, I think, is an empirical claim of sorts; a proposed solution to the more general diagnosis that I am trying to propose, namely that (the way we currently use the term) "human values" may itself not be robust to scaling down.

Comment by Nora_Ammann on Monks of Magnitude · 2022-02-18T11:13:11.043Z · LW · GW

Curious what different aspects the "duration of seclusion" is meant to be a proxy for?

You defindefinitelyitly point at things like "when are they expected to produce intelligible output" and "what sorts of questions appear most relevant to them". Another dimension that came to mind - but I am not sure you mean or not to include that in the concept - is something like "how often are they allowed/able to peak directly at the world, relative to the length of periods during which they reason about things in ways that are removed from empirical data"?

Comment by Nora_Ammann on Introducing the Principles of Intelligent Behaviour in Biological and Social Systems (PIBBSS) Fellowship · 2022-01-04T19:36:44.616Z · LW · GW

PIBBSS Summer Research Fellowship -- Q&A event

What? Q&A session with the fellowship organizers about the program and application process. You can submit your questions here.
For whom? For everyone curious about the fellowship and for those uncertain whether they should apply.
When? Wednesday 12th January, 7 pm GMT
Where? On Google Meet, add to your calendar

Comment by Nora_Ammann on [Extended Deadline: Jan 23rd] Announcing the PIBBSS Summer Research Fellowship · 2022-01-04T19:35:40.378Z · LW · GW

PIBBSS Summer Research Fellowship -- Q&A event

What? Q&A session with the fellowship organizers about the program and application process. You can submit your questions here.
For whom? For everyone curious about the fellowship and for those uncertain whether they should apply.
When? Wednesday 12th January, 7 pm GMT
Where? On Google Meet, add to your calendar

Comment by Nora_Ammann on The role of tribes in achieving lasting impact and how to create them · 2021-10-03T18:15:14.779Z · LW · GW

I think it's a shame that these days for many people the primary connotation of the word "tribe" is connected to culture wars. In fact, our decision to use this term was in part motivated by wanting to re-appropriate the term to something less politically loaded.

As you can read in our post (see "What is a tribe?"), we mean something particular. As any collective of human beings, it can in principle be subject to excessive in-group/out-group dynamics but that's by far not the only, nor the most interesting part of it.

Comment by Nora_Ammann on Nora_Ammann's Shortform · 2021-07-12T08:03:06.846Z · LW · GW

Context: (1) Motivations for fostering EA-relevant interdisciplinary research; (2) "domain scanning" and "epistemic translation" as a way of thinking about interdisciplinary research

[cross-posted to the EA forum in shortform]

List of fields/questions for interdisciplinary AI alignment research

The following list of fields and leading questions could be interesting for interdisciplinry AI alignment reserach. I started to compile this list to provide some anchorage for evaluating the value of interdiscplinary research for EA causes, specifically AI alignment.

Some comments on the list:

Some of these domains are likely already very much on the radar of some people, other’s are more speculative.
In some cases I have a decent idea of concrete lines of question that might be interesting, in other cases all I do is very broadly gesturing that “something here might be of interest”.
I don’t mean this list to be comprehensive or authoritative. On the contrary, this list is definitely skewed by domains I happened to have come across and found myself interested in.
While this list is specific to AI alignment (/safety/governance), I think the same rationale applies to other EA-relevant domains and I'd be excited for other people to compile similar lists relevant to their area of interest/expertise.

Very interested in hearing thoughts on the below!

Target domain: AI alignment/safety/governance

Evolutionary biology
1. Evolutionary biology seems to have a lot of potentially interesting things to say about AI alignment. Just a few examples include:
  1. The relationship between environment, agent, evolutionary paths (which e.g. relates to to the role of training environments)
  2. Niche construction as an angle on embedded agency
  3. The nature of intelligence
Linguistics and Philosophy of language
1. Lots of things that are relevant to understanding the nature and origin of (general) intelligence better.
2. Sub-domains, such as semiotics could, for example, have relevant insights on topics like delegation and interpretability.
Cognitive science and neuroscience
1. Examples include Minsky’s Society of Minds (“The power of intelligence stems from our vast diversity, not from any single, perfect principle”), Hawkin’s A thousand brains (the role of reference frames for general intelligence), Frinston et al’s Predictive Coding/Predictive Processing (in its most ambitious versions a near universal theory of all things cognition, perception, comprehension and agency), and many more
Information theory
1. Information theory is hardly news to the AI alignment idea space. However, there might still be value on the table from deeper dives or more out-of-the-orderly applications of its insights. One example of this might be this paper on The Information Theory of Individuality.
Cybernetics/Control Systems
1. Cybernetics seems straightforwardly relevant to AI alignment. Personally, I’d love to have a piece of writing synthesising the most exciting intellectual developments under cybernetics done by someone with awareness of where the AI alignment field is at currently.
Complex systems studies
1. What does the study of complex systems have to say about robustness, interoperability, emergent alignment? It also offers insights into and methodology for approaching self-organization and collective intelligence which is interesting in particular in multi-multi scenarios.
Heterodox schools of economic thinking
1. Schools of thought are trying to reimagine the economy/capitalism and (political) organization, e.g. through decentralization and self-organization, by working on antitrust, by trying to understand potentially radical implications of digitalization on the fabric of the economy, etc. Complexity economics, for example, can help understanding the out-of-equilibrium dynamics that shape much of our economy and lives.
Political economy
1. An interesting framework for thinking about AI alignment as a socio-technical challenge. Particularly relevant from a multi-multi perspective, or for thinking along the lines of cooperative AI. Pointer: Mapping the Political Economy of Reinforcement Learning Systems: The Case of Autonomous Vehicles
Political theory
1. The richness of the history of political thought is astonishing; the most obvious might be ideas related to social choice or principles of governance. (A denses while also high-quality overview is offered by this podcast series History Of Ideas.) The crux in making the depth of political thought available and relevant to AI alignment is formalization, which seems extremely undersupplied in current academia for very similar reasons as I’ve argued above.
Management and organizational theory, Institutional economics and Institutional design
1. Has things to say about e.g. interfaces (read this to get a gist for why I think interfaces are interesting for AI alignment); delegation ( e.g. Organizations and Markets by Herbert SImon; (potentially) the ontology form forms and (the relevant) agent boundaries (e.g. The secret to social forms has been in institutional economics all along?)
2. Talks for example about desiderata for institutions like robustness (e.g. here), or about how to understand and deal with institutional path-dependencies (e.g. here).

Comment by Nora_Ammann on Maps of Maps, and Empty Expectations · 2021-05-07T07:31:37.558Z · LW · GW

Glad to hear it seemed helpful!

FWIW I'd be interested in reading you spell out in more detail what you think you learnt from it about simulacra levels 3+4.

Re "writing the bottom line first": I'm not sure. I think it might be, but at least this connection didn't feel salient, or like it would buy me anything in terms of understanding, when thinking about this so far. Again interested in reading more about where you think the connections are.

To maybe say more about why (so far) it didn't seem clearly relevant to me: "Writing the bottom line first", to me, comes with a sense of actively not wanting, and taking steps to avoid, figuring out where the arguments/evidence leads you. Maps of maps feels slightly different in so far as the person really wants to find the correct solution but they are utterly confused about how to do that, or where to look. Similarly, "writing the bottom line first" suggests that you do have a concrete "bottom line" that you want to be true, wherelse empty expectations don't have anything concrete to say about what you would want to be true - there isn't (hardly) any object-level substance there.
Most succinctly, "writing the bottom line first" seems closer to motivated reasoning, and maps of maps/empty expectation seem closer to (some fundamental sense of) confusion (about where to even look to figure out the truth/solution). (Which, having spelt this out just now, makes the connection to simulacra levels 3+4 more salient.)

Comment by Nora_Ammann on How do we prepare for final crunch time? · 2021-04-01T10:09:41.509Z · LW · GW

Regarding "Staying grounded and stable in spite of the stakes":
I think it might be helpful to unpack the vritue/skill(s) involved according to the different timescales at which emergencies unfold.

For example:

1. At the time scale of minutes or hours, there is a virtue/skill of "staying level headed in a situation of accute crisis". This is the sort of skill you want your emergency doctor or firefighter to have. (When you pointed to the military, I think you in part pointed to this scale but I assume not only.)

From talking to people who do or did jobs like this, a typical pattern seems to be that some types of people when in siutations like this basically "freeze" and others basically move into a mode of "just functioning". There might be some margin for practice here (maybe you freeze the first time around and are able to snap out of the freeze the second time around, and after that, you can "remember" what it feels like to shift into funcitoning mode ever after) but, according to the "common wisdom" in these prfoessions (as I undestand it), mostly people seem to fall in one or the other category.

The sort of practice that I see being helpful here is a) overtraining on whatever skill you will need in the moment (e.g. imagine the emergency doctor) such that you can hand over most cognitive work to your autopilot once the emergency occurs; and b) train the skill of switching from freeze into high-functioning mode. I would expect "drill-type practices" are the most abt to get at that, but as noted above I don't know how large the margin for improvement is. (A subtlety here: there seems to be a massive difference between "being the first person to switch in to funcitoning mode", vs "switching into functioning mode after (literally or metaphorically speaking) someone screamed at your face to get moving". (Thinking of the military here.))

All that said, I don't feel particularly excited for people to start doing a bunch of drill practice or the like. I think there are possible extreme scenarios of "narrow hingy moments" that will involve this skill but overall this doesn't seem to me not to be the thing that is most needed/with highest EV.

(Probably also worth putting some sort of warning flag here: genuinly high-intensity situations can be harmful to people's psychy so one should be very cautious about experimenting with things in this space.)

2. Next, there might be a related virtue/skill at the timescale of weeks and months. I think the pandemic, especially from ~March to May/June is an excellent example of this, and was also an excellent learning opportunities for people involved in some time-sensitive covid-19 problem. I definitely think I've gained some gears on what a genuin (i.e. highly stakey) 1-3 month sprint involves, and what challenges and risks are invovled for you as an "agent" who is trying to also protect their agency/ability to think and act (though I think others have learnt and been stress-tested much more than I have).

Personally, my sense is that this is "harder" than the thing in 1., because you can't rely on your autopilot much, and this makes things feel more like an adaptive rather than technical problem (where the latter is aproblem where the solution is basically clear, you just have to do it; and the latter is a problem most of the work needed is in figuring out the solution, not so much (necessarily) in executing it.)

One difficulty is that this skill/virtue involves managing your energy not only spending it well. Knowing yourself and hoy your energy and motivation structures work - and in particular how they work in extreme scenarios - seems very important. I can see how people who have meditated a lot have gained valuable skills here. I don't think it's th eonly way to get these skills, and I expect the thing that is paying off here is more "being able to look back on years of meditaton practice and the ways this has rewired one's brain in some deep sense" rather than "benefits from having a routine to meditate" or something like this.

During the first couple of COVID-19 months, I was also surprised how "doing well at this" was more a question of collective rationality than I would have thought (by collective rationality I mean things like: ability to communciate effectively, ability to mobilise people/people with the right skills, abilty to delegate work effectively). There is still a large individual component of "staying on top of it all/keeping the horizon in sight" such that you are able to make hard decisoins (which you will be faced with en masse).

I think it could be really good to collect lessons learnt from the folks invovled in some EA/rationlaist-adjacent COVID-19 projects.

3. The scale of ~(a few) years seems quite similar in type to 2. The main thing that I'd want to add here is that the challenge of dealing with strong uncertainty while the stakes are massive can be very psychologically challenge. I do think meditation and related practices can be helpful in dealing with that in a way that is both grounded and not flinching from the truth.

I find myself wondering whether the miliatry does anything to help soldiers prepare for the act of "going to war" where the posisbility of death is extremely real. I imaigne they must do things to support people in this process. It's not exactly the same but there certainly are parallels with what we want.

Comment by Nora_Ammann on On the nature of purpose · 2021-03-26T22:18:54.241Z · LW · GW

Re language as an example: parties involved in communication using language have comparable intelligence (and even there I would say someone just a bit smarter can cheat their way around you using language).

Mhh yeah so I agree these examples of ways in which language "fails". But I think they don't bother me too much?
I put them in the same category as "two agents with good faith sometimes miscommunicate - and still, language overall is pragmatically", or "works good enough". In other words, even though there is potential for exploitation, that potential is in fact meaningfully constraint. More importantly, I would argue that the constraint comes (in large parts) from the way the language has been (co-)constructed.

Comment by Nora_Ammann on On the nature of purpose · 2021-03-26T22:11:53.403Z · LW · GW

a cascade of practically sufficient alignment mechanisms is one of my favorite ways to interpret Paul's IDA (Iterated Distillation-Amplification)

Yeah, great point!

Comment by Nora_Ammann on On the nature of purpose · 2021-03-26T22:11:02.124Z · LW · GW

However, I think its usefulness hinges on ability to robustly quantify the required alignment reliability / precision for various levels of optimization power involved.

I agree and think this is a good point! I think on top of quantifying the required alignment reliability "at various levels of optimization" it would also be relevant to take the underlying territory/domain into account. We can say that a territory/domain has a specific epistemic and normative structure (which e.g. defines the error margin that is acceptable, or tracks the co-evolutionary dynamics).

Comment by Nora_Ammann on Nora_Ammann's Shortform · 2021-03-25T09:07:50.183Z · LW · GW

Pragmatically reliable alignment
[taken from On purpose (footnotes); sharing this here because I want to be able to link to this extract specifically]

AI safety-relevant side note: The idea that translations of meaning need only be sufficiently reliable in order to be reliably useful might provide an interesting avenue for AI safety research.

Language works, evidenced by the striking success of human civilisations made possible through advanced coordination which in return requires advanced communication. (Sure, humans miscommunicate what feels like a whole lot, but in the bigger scheme of things, we still appear to be pretty damn good at this communication thing.)

Notably, language works without there being theoretically air-tight proofs that map meanings on words.

Right there, we have an empirical case study of a symbolic system that functions on a (merely) pragmatically reliable regime. We can use it to inform our priors on how well this regime might work in other systems, such as AI, and how and why it tends to fail.

One might argue that a pragmatically reliable alignment isn’t enough - not given the sheer optimization power of the systems we are talking about. Maybe that is true; maybe we do need more certainty than pragmatism can provide. Nevertheless, I believe that there are sufficient reasons for why this is an avenue worth exploring further.

Comment by Nora_Ammann on The Inner Workings of Resourcefulness · 2021-03-03T15:17:43.188Z · LW · GW

The question you're pointing at is definitely interstinterestinging. A Freudian, slightly pointed way of phrasing it is something like: are human's deepest desires, in essence, good and altruistic, or violent and selifsh?

My guess is that this question is wrong-headed. For example, I think this is making a mistake of drawing a dichotomy and rivalry between my "oldest and deepest drives" and "reflective reasoning", and depending on your conception of which of these two wins, your answer to the above questions ends up being positive or negative. I don't really think those people would endorse that, but I do have a sense that something like this influences their background model of the world, and informs their intuitions about the "essence of human nature" or whatever.

This dichotomy/rivalry seems wrong to me. In my experience, my intuition/drives and my explicit reasoning can very much "talk to each other". For example, I can actually integrate the knowledge that our minds have evolved such that we are scope insensitive into the whole of my overall reasoning/worldview. Or I can integrate the knowledge that, given I know that I can suffer or be happy, it's very likely other people can also suffer or feel pleasure, and that does translate into my S1-type drivers. Self-alignment, as I understand it, is very much about having my reflective beliefs and my intuitions inform one another, and I don' need o through either one overboard to become more self-aligned.

That said, my belief that self-alignment is worth pursuing is definitely based on the belief that this leads people to be more in touch with the Good and more effective in pursuing it. In that belief in turn is mostly informed by my own experience and reports from other people. I acknowledge that that likely doesn't sound very convincing to someone whose experience points in the exact opposite direction.

Comment by Nora_Ammann on The feeling of breaking an Overton window · 2021-02-28T15:40:52.770Z · LW · GW

[I felt inclined to look for observations of this thing outside of the context of the pandemic.]

Some observations:

I experience this process (either in full or the initial stages of it) for example when asked about my work (as it relates to EA, x-risks, AI safety, rationality and the like), or when sharing ~unconventional plan (e.g. "I'll just spend the next few months thinking about this") when talking to e.g. old friends from when I was growing up, people in the public sphere like a dentist, physiotherapist etc. This used to be also somewhat the case with my family but I've made some conscious (and successful) effort to reduce it.

My default reaction is [exagerating a bit for the purpose of pulling out the main contures] to sort of duck; my brain kicks into a process that feels like "oh we need to fabricate a lie now, focus" (though, lying is a bit misleading - it's more like "what are the fewest, least reveiling words I can say about this that will still be taken as a 'sufficient' answer"); my thinking feels restrained, quite the opposite of being able to think freely, clearly and calmly; often there is an experience (reminiscent) of something like shame ; also some feeling of helplessness, "I can't explain myself" or "they won't understand me"; sometimes the question feels a bit intrusive, like if they wanted to come in(to my mind?) and break things. (?)

Some reflections:

"Inferential distance and the cost of explaining":
- This is very viscerally salient to me in the moment when the "alien process" kicks in. I basically have the thought pattern of "They won't understand what I'm talking about. Mhh I guess I could explain it to them? But that will be lengthy and effortful, and I don't want to spend that effort."
  - I think this pragmatic consideration is often legitimate. At the same time I also suspect that my mind often uses this as an excuse/cover-up for something else.
  - For example, I am on average much less reluctant to give answers to such questions in English compared to German or French. I think about my work in English, thus, explaining my beliefs in another language is extra costly because it requires lots of non-trivial translation. That said, speaking German or French is also correlated with being in specific environments, notably environments I grew up in and that trigger memories of older self-conceptions of mine, and where I generally feel more expectations from the society, or soemthing.
"Updating based on someone's conclusions (including their observed behaviour) is often misleading (as opposed to updating on based one someone's reasoning/map)":
- Based on the above, if I inner sim telling someone about my belief X that is, say, slightly outside of their overton window , I feel kinda doomy, like things will go wrong or at the very least it won't be useful. So, it feels like I either want to get the chance to sit down with them for 2h+ or say as little as possible about my belief X.
- I think it's interesting to double click on what "things go wrong" means here. The two main things that come up are:
  - An epsitemic worry: they will objectively-speaking make a wrong update and walk away with more wrong rather than less wrong beliefs
  - A ~social worry: all they will update about is me being ~weird. A decent part of the worry here is something like: they will distance themselves from me because they will feel like they can't talk to me/like we're not talking the same language, a sense of isolation. Another part seems more extrem: They will think I'm crazy(?) (I sort of crinche at this one. I don't really think they will think I'm crazy(?). Idk - there is something here, but I'm confused about what it is.)

Comment by Nora_Ammann on On the nature of purpose · 2021-02-10T17:06:47.767Z · LW · GW

As far as I can tell, I agree with what you say - this seems like a good account of how the cryptophraher's constraint cashes out in language.

To your confusion: I think Dennett would agree that it is Darwianian all the way down, and that their disagreement lies elsewhere. Dennet's account for how "reasons turn into causes" is made on Darwinian grounds, and it compels Dennett (but not Rosenberg) to conclude that purposes deserve to be treated as real, because (compressing the argument a lot) they have the capacity to affect the causal world.

Not sure this is useful?

Comment by Nora_Ammann on On the nature of purpose · 2021-02-10T14:39:50.295Z · LW · GW

I'm inclined to map your idea of "reference input of a control system" onto the concept of homeostasis, homeostatic set points and homeostatic loops. Does that capture what you're trying to point at?

(Assuming it does) I agree that that homeostasis is an interesting puzzle piece here. My guess for why this didn't come up in the letter exchange is that D/R are trying to resolve a related but slightly different question: the nature and role of an organism's conscious, internal experience of "purpose".

Purpose and its pursuit have a special role in how human make sense of the world and themselves, in a way non-human animals don't (though it's not a binary).

The suggested answer to this puzzle is that, basically, the conscious experience of purpose and intent (and the allocation of this conscious experience to other creatures) is useful and thus selected for.

Why? They are meaningful patterns in the world. An observer with limited resource who wants to make senes of the world (i.e. an agent that wants to do sample complexity reduction) can abstract along the dimension of "purpose"/"intentionality" to reliably get good predictions about the world. (Except, "abstracting along the dimension of intentionality" isn't an active choice of the observer, rather than a results of the fact that intentions are a meaningful pattern.) The "intentionality-based" prediction does well at ignoring variables that aren't very predictive and capturing the ones that are, in the context of a bounded agent.

Comment by Nora_Ammann on On the nature of purpose · 2021-01-25T16:57:41.804Z · LW · GW

In regards to "the meaning of life is what we give it", that's like saying "the price of an apple is what we give it". While true, it doesn't tell the whole story. There's actual market forces that dictate apple prices, just like there are actual darwinian forces that dictate meaning and purpose.

Agree; the causes that we create ourselves aren't all that governs us - in fact, it's a small fraction of that, considering physical, chemical, biological, game-theoretic, etc. constraints. And yet, there appears to be an interesting difference between the causes that govern simply animals and those that govern human-animals. Which is what I wanted to point at in the paragraph you're quoting and the few above it.

Comment by Nora_Ammann on On the nature of purpose · 2021-01-25T16:50:33.640Z · LW · GW

I'm confused about the "purposes don't affect the world" part. If I think my purpose is to eat an apple, then there will not be an apple in the world that would have otherwise still been there if my purpose wasn't to eat the apple. My purpose has actual effects on the world, so my purpose actually exists.

So, yes, basically this is what Dennett reasons in favour of, and what Rosenberg is skeptical of.

I think the thing here that needs reconciliation - and what Dennett is trying to do - is to explain why, in your apple story, it's justified to use the term "purpose", as opposed to only invoking arguments of natural selection, i.e. saying (roughly) that you (want to) eat apples because this is an evolutionarily adaptive behaviour and has therefore been selected for.

According to this view, purposes are at most a higher-level description that might be convenient for communication but that can entirely be explained away in evolutionary terms. In terms of the epistemic virtues of explanations, you wouldn't want to add conceptual entities without them improving the predictive power to your theory. I.e. adding the concept of purposes to your explanation if you could just as well explain the observation without that concept makes your explanation more complication without it buying you predictive power. All else equal, we prefer simple/parsimonious explanations over more complicated ones (c.f. Occam's razor).

So, while Rosenberg advocates for the "Darwinism is the only game in town"-view, Dennett is trying to make the case that, actually, purposes cannot be fully explained away by a simple evolutionary account, because the act of representing purposes (e.g. a parent telling their children to eat an apple every day because it's good for them, a add campaign promoting the consumption scepticalof local fruit, ...) does itself affect people's action, and thereby purposes become causes.

Comment by Nora_Ammann on On the nature of purpose · 2021-01-25T16:25:16.349Z · LW · GW

Thanks :)

> I will note that I found the "Rosenberg's crux" section pretty hard to read, because it was quite dense.

Yeah, you're right - thanks for the concrete feedback !

I wasn't originally planning to make this a public post and later failed to take a step back and properly model what it would be like as a reader without the context of having read the letter exchange.

I consider adding a short intro paragraph to partially remedy this.

Comment by Nora_Ammann on Swiss Political System: More than You ever Wanted to Know (I.) · 2020-07-19T13:56:35.223Z · LW · GW

While I'm not an expert, I did study political science and am Swiss. I think this post paints an accurate picture of important parts of the Swiss political system. Also, I think (and admire) how it explains very nicely the basic workings of a naturally fairly complicated system.

If people are interested in reading more about Swiss Democracy and its underlying political/institutional culture (which, as pointed out in the post, is pretty informal and shaped by its historic context), I can recommend this book: https://www.amazon.com/Swiss-Democracy-Solutions-Multicultural-Societies-dp-0230231888/dp/0230231888/

It talks about "consensus democracy", Swiss federalism, political power-sharing, the scope and limits of citizen's participation in-direct democracy, and treats Swiss history of being a multicultural, heterogeneous society.

[slight edit to improve framing]

Comment by Nora_Ammann on Should I self-variolate to COVID-19 · 2020-06-03T13:53:32.312Z · LW · GW

Are there any existing variolation projects that I can join?

FWIW, there is this I know of: https://1daysooner.org/

That said, last time I've got an update from them (~1 month ago), any execution of these trials was still at least a few months away. (You could reach out to them via the website for more up to date information.) Also, there is a limited number of places where the trials can actually take place, so you'd have to check whether there is anything close to where you are.

(Meta: This isn't necessarily an endoresement of your main qusetion.)

Comment by Nora_Ammann on Epistea Summer Experiment (ESE) · 2020-02-26T15:09:08.838Z · LW · GW

That's cool to hear!

We are hoping to write up our current thinking on ICF at some point (although I don't expect it to happened within the next 3 months) and will make sure to share it.

Happy to talk!

User info

Posts

Comments

List of fields/questions for interdisciplinary AI alignment research