Essay competition on the Automation of Wisdom and Philosophy — $25k in prizes

owencb

Essay competition on the Automation of Wisdom and Philosophy — $25k in prizes

post by owencb, AI Impacts (AI Imacts) · 2024-04-16T10:10:13.338Z · LW · GW · 12 comments

This is a link post for https://blog.aiimpacts.org/p/essay-competition-on-the-automation

  Background
  Scope
      Automation of wisdom
      Automation of philosophy
      Thinking ahead
      Ecosystems
  Judging
  Details
  FAQ on the automation of wisdom and philosophy
    What’s the basic idea here?
    What do you want to know about such automation?
    What do you mean by “wisdom” and “philosophy”?
    What threats are you concerned about?
      Unwise human actions
      Human philosophical errors
      Unwise AI actions
      AI philosophical errors
    Is there a particular threat model you’re focused on?
    Automating wisdom, philosophy — isn’t this all just AI capabilities work?
None
12 comments

With AI Impacts, we’re pleased to announce an essay competition on the automation of wisdom and philosophy. Submissions are due by July 14th. The first prize is $10,000, and there is a total of $25,000 in prizes available.

Submit an entry via this form.

The full announcement text is reproduced here:

Background

AI is likely to automate more and more categories of thinking with time.

By default, the direction the world goes in will be a result of the choices people make, and these choices will be informed by the best thinking available to them. People systematically make better, wiser choices when they understand more about issues, and when they are advised by deep and wise thinking.

Advanced AI will reshape the world, and create many new situations with potentially high-stakes decisions for people to make. To what degree people will understand these situations well enough to make wise choices remains to be seen. To some extent this will depend on how much good human thinking is devoted to these questions; but at some point it will probably depend crucially on how advanced, reliable, and widespread the automation of high-quality thinking about novel situations is.

We believe^[1] that this area could be a crucial target for differential technological development, but is at present poorly understood and receives little attention. This competition aims to encourage and to highlight good thinking on the topics of what would be needed for such automation, and how it might (or might not) arise in the world.

For more information about what we have in mind, see some of the suggested essay prompts or the FAQ below.

Scope

To enter, please submit a link to a piece of writing, not published before 2024. This could be published or unpublished; although if selected for a prize we will require publication (at least in pre-print form; optionally on the AI Impacts website) in order to pay out the prize.

There are no constraints on the format — we will accept essays, blog posts, papers^[2], websites, or other written artefacts^[3] of any length. However, we primarily have in mind essays of 500–5,000 words. AI assistance is welcome but its nature and extent should be disclosed. As part of your submission you will be asked to provide a summary of 100–200 words.

Your writing should aim to make progress on a question related to the automation of wisdom and philosophy. A non-exhaustive set of questions of interest, in four broad categories:

Automation of wisdom

What is the nature of the sort of good thinking we want to be able to automate? How can we distinguish the type of thinking it’s important to automate well and early from types of thinking where that’s less important?
What are the key features or components of this good thinking?
- How do we come to recognise new ones?
What are traps in thinking that is smart but not wise?
- How can this be identified in automatable ways?
How could we build metrics for any of these things?

Automation of philosophy

What types of philosophy are language models well-equipped to produce, and what do they struggle with?
What would it look like to develop a “science of philosophy”, testing models’ abilities to think through new questions, with ground truth held back, and seeing empirically what is effective?
What have the trend lines for automating philosophy looked like, compared to other tasks performed by language models?
What types of training/finetuning/prompting/scaffolding help with the automation of wisdom/philosophy?
- How much do they help, especially compared to how much they help other types of reasoning?

Thinking ahead

Considering the research agenda that will (presumably) eventually be needed to automate high quality wisdom/philosophy:
- Which parts of the agenda can we expect to automate in a timely fashion?
- What is the core that we will need humans to address?
- What do we expect the thorny sticking points to be?
Why may or may not this problem be solved “by default”? (from a technical standpoint)
Can we tell concrete stories or vignettes in which the automation of wisdom/philosophy is/isn’t important, to triangulate our understanding of what matters?
What preparatory research could provide the best groundwork for humanity to automate high-quality wisdom/philosophy before it is necessary?
What projects today or in the near future would be valuable to undertake?

Ecosystems

If the world were devoting serious attention to this, what would that look like?
- What incentives on institutional actors could push work onto related but less important questions; vice-versa what could help ensure that work remained well-targeted?
What are the natural institutional homes for this research in the short term?
- Academia? Nonprofits? Frontier AI labs? Elsewhere in industry?
What might be needed (proofs, audits, track record?) to enable humans (decision-makers, voters) and human institutions to correctly trust wise advice from AI systems?
- How could we lay the groundwork for this?
Ideas for catalysing/sustaining this field?
Why may or may not this problem be solved “by default”? (from a social standpoint)

If you’re not sure whether a topic would be within scope, feel free to check with us.

Judging

The judging process will be coordinated by Owen Cotton-Barratt. After shortlisting, entries will be assessed by a panel of judges: Andreas Stuhlmüller, Brad Saad, David Manley, Linh Chi Nguyen [EA · GW], and Wei Dai.

Judging criteria will be:

Does the entry tackle an important facet of the automation of wisdom/philosophy?
Does the entry contain good analysis or valuable new ideas?
Is the writing clear, succinct, and epistemically appropriate?
Does the entry provide something that we are excited to see built upon or explored further?

The prize pool is $25,000, and the prize schedule will be:

$10,000 First Prize
$5,000 Second Prize
4x $2,000 Best-in-Category Prizes
- Judging for these will exclude the overall First and Second Prize winners from consideration
  - So if e.g. the overall First Prize and Second Prize both went to entries in the “Ecosystems” category, then the third-best entry in that category would receive $2,000
4x $500 Runner Up Prize, for the best entries across any category that did not receive another prize
- For these prizes, the judges may give preference to impressive entries by people at early career stages
  - Whereas judging for the main prizes will — insofar as this is feasible — be blind to the identities and personal characteristics of the authors

We may contact entrants whose work impresses us about possible further opportunities (e.g. conferences or research positions) on these topics.

Details

Entries should be submitted via this form, which asks for:

Your name and email address
A link to your entry
A 100–200 word summary
Which if any of our four categories your entry falls under
Statement of authorship credit (including AI credit)
A brief description of career stage (so that judges can at their discretion account for this in awarding Runner Up prizes)
Opportunity to opt out of future contact not directly related to this competition
Anything else we should know

You are of course welcome to seek feedback on drafts before submission. Coauthored articles are also very welcome.

The deadline for submissions is midnight anywhere in the world on Sunday 14th July. We hope to complete shortlisting within two weeks of the submission deadline, and contact winners within four weeks of the submission deadline. Winners whose entries are not yet public will have two weeks after we contact them to provide a public version, or agree to us publishing it on the AI Impacts website. Payment will be made by ACH (for US-based winners) or wire transfer (for international winners).

We reserve the right to extend the submission deadline or increase the prize pool without notice. Judges have the right to split prizes in cases of ties, or to not award prizes in the unlikely event that no submissions are found to merit them.

If you want to ask questions about the competition, feel free to comment, or to email essaycompetition@aiimpacts.org

FAQ on the automation of wisdom and philosophy

What’s the basic idea here?

We're interested in the automation of thinking that can help actors to take wise actions (whatever that means) and avoid unwise actions. As an important subcategory, we're interested in the automation of philosophical thinking, and how to avoid practical errors grounded in philosophical mistakes.

What do you want to know about such automation?

We're not certain! We think it's a potentially important area which hasn't received that much attention. We'd like people to explore more of the ideas around this. If we understood more of the contours of when such automation might be helpful (or unhelpful!), that would seem good. If we understood more about what would be necessary for automation, that would seem good. If people developed a sense of things it would be good for someone to do in the world, that's potentially great.

We give a bunch of example questions we'd be interested in people addressing in the essay prompts part of the announcement, but because it seems like a broad area we've preferred to leave the competition fairly open, and wait to see which parts people can make meaningful contributions to.

What do you mean by “wisdom” and “philosophy”?

By “wisdom”, we mean something like “thinking/planning which is good at avoiding large-scale errors”. An archetype of something which is smart-but-not-wise might be a plan full of clever steps which are each individually well-chosen to chain to the previous step in the plan, but which collectively forget why they were doing this, and end up taking actions which are in conflict with the original goal. Wisdom is also what’s needed for noticing that an old ontology was baking in some problematic assumptions about what was going on.

By “philosophy”, we mean something like “the activity of trying to find answers by thinking things through, without the ability to observe answers”. This is close to the sense understood in the academic discipline of philosophy.

We’re not sure if automating these things is most naturally thought of as one topic, two topics, or more …

What threats are you concerned about?

Progress in these areas seems like it could potentially help avoid a number of different issues:

Unwise human actions

Humans sometimes take actions which are predictably unwise (from some perspectives), and which they later regret. Such actions could be really bad if they interact with high stakes situations. If people had access to trusted high wisdom automated advice, this could help them to reduce the rate of these errors.

This might be particularly important around issues coming with the development of AI, as people will be facing very novel situations and be less able to rely on experience.

Human philosophical errors

People sometimes make decisions that are influenced by their philosophical understanding of an issue. This could happen in the future, e.g. around understanding of AI consciousness/rights. Automation of good work, if achievable, could help people to have deeper understanding by the times they need to make key decisions.

Unwise AI actions

If people empower AI agents, ensuring that they are in some sense wise and not just smart could help to reduce rare damaging actions. In the extreme this could reduce risk of human extinction (imagine an AI system which wipes out humans in order to secure its own power, and later on reflection wishes it hadn't; a wiser system might have avoided taking that action in the first place).

AI philosophical errors

If AI systems become superintelligent and are meaningfully running the world, their stances on philosophical questions could matter. e.g. deciding to engage in acausal trade (if it doesn’t actually make sense), or deciding not to (if it does) could be a large and consequential error. Better understanding of the automation of philosophy could help either to lead to more philosophically-competent AI systems, or alternatively could help people to coordinate about which parts of thinking should not be delegated to AI systems.

Is there a particular threat model you’re focused on?

No. We could make some guesses (both about which of the above categories are most concerning, and more concretely what the most concerning threats within them are), but we feel like the whole area is under-explored, and wouldn’t be confident in our guesses. We’d love to see high-quality analysis of this.

The fact that the automation of wisdom/philosophy seems important to better understand for multiple different threats — and also seems like a plausibly useful intervention for improving our ability to handle unknown unknowns — feeds into our desire to see it prioritized more than at present.

Automating wisdom, philosophy — isn’t this all just AI capabilities work?

Maybe! Certainly this is a type of capability (and high performance probably requires significantly advanced general capabilities, relative to today).

However, it seems to us that for a given level of general smarts in a system, the capacity for wisdom or philosophy could keep up with that, or could fail to. We are concerned about worlds where the ability to automate wise actions is outstripped by the ability to automate smart ones. So it seems like it may (at least in part) be a problem of differential technological development. We would be interested in further analysis of this question.

^{^}
The precise opinions expressed in this post should not be taken as institutional views of AI Impacts, but as approximate views of the competition organizers. We offer them not because we're sure they're exactly right, but because we think they're pointing in a promising direction and it's more likely to provoke high quality interesting entries if we provide some concrete starting points.
^{^}
We recognise that the timeline may be on the tight side for thoroughly researched papers. We are very happy to consider papers (and note that most journals accept papers that have been available as pre-prints, e.g. see https://philarchive.org/journals.html for philosophy journals), but for entrants who are targeting academic publication we also welcome people putting the heart of their argument into an essay for the competition and later expanding it into a paper.
^{^}
Feel free to use unusual formats if you consider them best for exploring the ideas. e.g. we would be happy to receive a fictional business plan or technical roadmap for a hypothetical firm working on a challenge in these areas.

12 comments

Comments sorted by top scores.

comment by Alex Mallen (alex-mallen) · 2024-04-16T15:37:54.440Z · LW(p) · GW(p)

I think this is a very important and neglected area! That its tractability is low is one of its central features, but progress on it now greatly aides in progress later by making these hard-to-meaure aims easier to measure.

comment by sweenesm · 2024-09-13T14:31:07.327Z · LW(p) · GW(p)

Any update on when/if prizes are expected to be awarded? Thank you.

Replies from: owencb, owencb

↑ comment by owencb · 2024-09-13T14:54:03.515Z · LW(p) · GW(p)

The judging process should be complete in the next few days. I expect we'll write to winners at the end of next week, although it's possible that will be delayed. A public announcement of the winners is likely to be a few more weeks.

↑ comment by owencb · 2024-09-22T11:19:13.316Z · LW(p) · GW(p)

I've now sent emails contacting all of the prize-winners.

comment by Yitz (yitz) · 2024-04-16T19:11:38.224Z · LW(p) · GW(p)

imagine an AI system which wipes out humans in order to secure its own power, and later on reflection wishes it hadn't; a wiser system might have avoided taking that action in the first place

I’m not confident this couldn’t swing just as easily (if not more so) in the opposite direction—a wiser system with unaligned goals would be more dangerous, not less. I feel moderately confident that wisdom and human-centered ethics are orthogonal categories, and being wiser therefore does not necessitate greater alignment.

On the topic of the competition itself, are contestants allowed to submit multiple entries?

Replies from: owencb, owencb

↑ comment by owencb · 2024-04-16T20:56:26.014Z · LW(p) · GW(p)

Multiple entries are very welcome!

[With some kind of anti-munchkin caveat. Submitting your analyses of several different disjoint questions seems great; submitting two versions of largely the same basic content in different styles not so great. I'm not sure exactly how we'd handle it if someone did the latter, but we'd aim for something sensible that didn't incentivise people to have been silly about it.]

↑ comment by owencb · 2024-04-16T21:06:01.432Z · LW(p) · GW(p)

It's a fair point that wisdom might not be straightforwardly safety-increasing. If someone wanted to explore e.g. assumptions/circumstances under which it is vs isn't, that would certainly be within scope for the competition.

comment by Mitchell_Porter · 2024-07-15T06:17:22.959Z · LW(p) · GW(p)

I got nowhere near to writing an entry for this competition, but I will link to an essay from 12 years ago [LW · GW], which contains some of the concerns I might have tried to develop.

comment by Chris_Leong · 2024-07-09T13:54:30.047Z · LW(p) · GW(p)

Do you only take single-author submissions?

Replies from: owencb

↑ comment by owencb · 2024-07-10T09:23:01.620Z · LW(p) · GW(p)

No, multi-author submissions are welcome! (There's space to disclose this on the entry form.)

comment by Lao Mein (derpherpize) · 2024-04-19T15:03:18.092Z · LW(p) · GW(p)

Can you give examples of what you're looking for? Can I email you entries and expect a response?