What's important in "AI for epistemics"?

post by Lukas Finnveden (Lanrian) · 2024-08-24T01:27:06.771Z · LW · GW · 0 comments

This is a link post for https://lukasfinnveden.substack.com/p/whats-important-in-ai-for-epistemics

Contents

  Summary
  Structure of the post
  Previous work
  Why work on AI for epistemics?
    Summary
    Importance
    Path-dependence
    To be more concrete
      Good norms & practices for AI-as-knowledge-producers
      Good norms & practices for AI-as-communicators
      Differentially high epistemic capabilities
  Heuristics for good interventions
    Direct vs. indirect strategies
    Indirect value generation
    On long-lasting differential capability improvements
    Painting a picture of the future
  Concrete projects for differentially advancing epistemic capabilities
      Evals/benchmarks for forecasting (or other ambitious epistemic assistance)
      Automate forecasting question-generation and -resolution
      Logistics of past-casting
      Start efforts inside of AI companies for AI forecasting or other ambitious epistemic assistance
      Scalable oversight / weak-to-strong-generalization / ELK
      Experiments on what type of arguments and AI interactions tend to lead humans toward truth vs. mislead them
  Concluding thoughts
None
No comments

Summary

This post gives my personal take on “AI for epistemics” and how important it might be to work on.

Some background context:

So: How can we affect AI to contribute to better epistemic processes? When looking at concrete projects, here, I find it helpful to distinguish between two different categories of work:

  1. Working to increase AIs’ epistemic capabilities, and in particular, differentially advancing them compared to other AI capabilities. Here, I also include technical work to measure AIs’ epistemic capabilities.[2]

  2. Efforts to enable the diffusion and appropriate trust of AI-discovered information. This is focused on social dynamics that could cause AI-produced information to be insufficiently or excessively trusted. It’s also focused on AIs’ role in communicating information (as opposed to just producing it). Examples of interventions, here, include “create an independent organization that evaluates popular AIs’ truthfulness”, or “work for countries to adopt good (and avoid bad) legislation of AI communication”.

I’d be very excited about thoughtful and competent efforts in this second category. However, I talk significantly more about efforts in the first category, in this post. This is just an artifact of how this post came to be, historically — it’s not because I think work on the second category of projects is less important.[3]

For the first category of projects: Technical projects to differentially advance epistemic capabilities seem somewhat more “shovel-ready”. Here, I’m especially excited about projects that differentially boost AI epistemic capabilities in a manner that’s some combination of durable and/or especially good at demonstrating those capabilities to key actors.

Durable means that projects should (i) take the bitter lesson into account by working on problems that won’t be solved-by-default when more compute is available, and (ii) work on problems that industry isn’t already incentivized to put huge efforts into (such as “making AIs into generally better agents”). (More on these criteria here.)

Two example projects that I think fulfill these criteria (I discuss a lot more projects here):

Separately, I think there’s value in demonstrating the potential of AI epistemic advice to key actors — especially frontier AI companies and governments. When transformative AI (TAI)[4] is first developed, it seems likely that these actors will (i) have a big advantage in their ability to accelerate AI-for-epistemics via their access to frontier models and algorithms, and (ii) that I especially care about their decisions being well-informed. Thus, I’d like these actors to be impressed by the potential of AI-for-epistemics as soon as possible, so that they start investing and preparing appropriately.

If you, above, wondered why I group “measuring epistemic capabilities” into the same category of project as “differentially advancing AI capabilities”, this is now easier to explain. I think good benchmarks could be both a relatively durable intervention for increasing capabilities, via inspiring work to beat the benchmark for a long time, and that they’re a good way of demonstrating capabilities.

Structure of the post

In the rest of this post, I:

Previous work

Here is an incomplete list of previous work on this topic:

Why work on AI for epistemics?

Summary

I think there’s very solid grounds to believe that AI’s influence on epistemics is important. Having good epistemics is super valuable, and human-level AI would clearly have a huge impact on our epistemic landscape. (See here for more on importance.)

I also think there are decent plausibility arguments for why epistemics may be important: Today, we are substantially less epistemically capable than our technology allows for, due to various political and social dynamics which don’t all seem inevitable. And I think there are plausible ways in which poor epistemics can be self-reinforcing (because it makes it harder to clearly see what’s the direction towards better epistemics). And vice-versa that good epistemics can be self-reinforcing. (See here for more on path-dependence.)

That’s not very concrete though. To be more specific, I will go through some more specific goals that I think are both important and plausible path-dependent:

Let’s go through all of this in more detail.

Importance

I think there’s very solid grounds to believe that AI’s influence on epistemics is important.

Path-dependence

While less solid than the arguments for importance, I think there are decent plausibility arguments for why AI’s role in societal epistemics may be importantly path-dependent.

To be more concrete

Now, let’s be more specific about what goals could be important to achieve in this area. I think these are the 3 most important instrumental goals to be working towards:

Let’s go through these in order.

Good norms & practices for AI-as-knowledge-producers

Let’s talk about norms and practices for AIs as knowledge-producers. With this, I mean AIs doing original research, rather than just reporting claims discovered elsewhere. (I.e., AIs doing the sort of work that you wouldn’t get to publish on Wikipedia.)

Here are some norms/institutions/practices that I think would contribute to good usage of AI-as-knowledge-producers:

Good norms & practices for AI-as-communicators

Now let’s talk about norms for AIs as communicators. This is the other side of the coin from “AI as knowledge producers”. I’m centrally thinking about AIs talking with people and answering their questions.

Here are some norms/institutions/practices that I think would enable good usage of AI-as-communicators:

Differentially high epistemic capabilities

Finally: I want AIs to have high epistemic capabilities compared to their other capabilities. (Especially dangerous ones.) Here are three metrics of “epistemic capabilities” that I care about (and what “other capabilities” to contrast them with):

In order for these distinctions to be decision-relevant, there needs to be ways of differentially accelerating one side of the comparison compared to the other. Here are two broad categories of interventions that I think have a good shot at doing so:

Heuristics for good interventions

Having spelled-out what we want in the way of epistemic capabilities, practices for AI-as-knowledge-producers, and AI-as-communicators: Let’s talk about how we can achieve these goals. This section will talk about broad guidelines and heuristics, while the next section will talk about concrete interventions. I discuss:

Direct vs. indirect strategies

One useful distinction is between direct and indirect strategies. While direct strategies aim to directly push for the above goals, indirect strategies instead focus on producing demos, evals, and/or arguments indicating that epistemically powerful AI will soon be possible, in order to motivate further investment & preparation pushing toward the above goals.

My current take is that:

Indirect value generation

One possible path-to-impact from building and iteratively improving capabilities on “understanding”-loaded tasks is that this gives everyone an earlier glimpse of a future where AIs are very epistemically capable. This could then motivate:

The core advantage of the indirect approach is that it seems way easier to pursue than the direct approach.

Core questions about the indirect approach: Are there really any domain-specific demos/evals that would be convincing to people here, on the margin? Or will people’s impressions be dominated by “gut impression of how smart the model is” or “benchmark performance on other tasks” or “impression of how fast the model is affecting the world-at-large”? I feel unsure about this, because I don’t have a great sense of what drives people’s expectations, here.

A more specific concern: Judgmental forecasting hasn’t “taken off” among humans. Maybe that indicates that people won’t be interested in AI forecasting? This one I feel more skeptical of. My best-guess is that AI forecasting will have an easier time of becoming widely adopted. Here’s my argument.

I don’t know a lot of why forecasting hasn’t been more widely adopted. But my guess would be that the story if something like:

For AIs, these problems seem smaller:

Overall, I feel somewhat into “indirect” approaches as a path-to-impact, but only somewhat. But it at least seems worth pursuing the most leveraged efforts here: Such as making sure that we always have great forecasting benchmarks and getting AI forecasting services to work with important actors as soon as (or even before) they start working well.

On long-lasting differential capability improvements

It seems straightforward and scalable to boost epistemic capabilities in the short run. But I expect a lot of work that leads to short-run improvements won’t matter after a couple of year. (This completely ruins your path-to-impact if you’re trying to directly improve long-term capabilities — but even if you’re pursuing an indirect strategy, it’s worse for improvements to last for months than for them to last for years.)

So ideally, we want to avoid pouring effort into projects that aren’t relevant in the long-run. I think there’s two primary reasons for why projects may become irrelevant in the long run: either due to the bitter lesson or due to other people doing them better with more resources.

That said, even if we mess this one up, there’s still some value in projects that temporarily boost epistemic capabilities, even if the technological discoveries don’t last long: The people who work on the project may have developed skills that let them improve future models faster, and we may get some of the indirect sources of value mentioned above.

Ultimately, key guidelines that I think are useful for this work are:

Painting a picture of the future

To better understand which ones of today’s innovations will be more/less helpful for boosting future epistemics, it’s helpful to try to envision what the systems of the future will look like. In particular: It’s useful to think about the systems that we especially care about being well-designed. For me, these are the systems that can first provide a very significant boost on top of what humans can do alone, and that get used during the most high-stakes period around TAI-development.

Let’s talk about forecasting in particular. Here’s what I imagine such future forecasting systems will look like:

Concrete projects for differentially advancing epistemic capabilities

Now, let’s talk about concrete projects for differentially advancing epistemic capabilities, and how well they do according to the above criteria and vision.

Here’s a summary/table-of-contents of projects that I feel excited about (no particular order). More discussion below.

Effort to provide AI forecasting assistance (or other ambitious epistemic assistance) to governments is another category of work that I’d really like to happen eventually. But I’m worried that there will be more friction in working with governments, so that it’s better to iterate outside them first and then try to provide services to them once they’re better. This is only a weakly held guess, though. If someone who was more familiar with governments thought they had a good chance of usefully working with them, I would be excited for them to try it.

In the above paragraph, and the above project titles, I refer to AI forecasting or “other ambitious epistemic assistance”. What do I mean by this?

Now for more detail on the projects I’m most excited about.

Evals/benchmarks for forecasting (or other ambitious epistemic assistance)

Automate forecasting question-generation and -resolution

Logistics of past-casting

Start efforts inside of AI companies for AI forecasting or other ambitious epistemic assistance

Scalable oversight / weak-to-strong-generalization / ELK

Experiments on what type of arguments and AI interactions tend to lead humans toward truth vs. mislead them

Concluding thoughts

The development of AI systems with powerful epistemic capabilities presents both opportunities and significant challenges for our society. Transformative AI will have a big impact on our society’s epistemic processes, and how good or bad this impact is may depend on what we do today.

I started out this post by distinguishing between efforts to differentially increase AI capabilities and efforts to enable the diffusion and appropriate trust of AI-discovered information. While I wrote a bit about this second category (characterizing it as good norms & practices for AI as knowledge producers and communicators), I will again note that the relative lack of content on it doesn’t mean that I think it’s any less important the first category.

On the topic of differentially increasing epistemic AI capabilities, I’ve argued that work on this today should (i) focus on methods that will complement rather than substitute for greater compute budgets, (ii) prioritize problems that industry isn’t already trying hard to solve, and (iii) be especially interested to show people what the future have in store by demonstrating what’s currently possible and prototyping what’s yet to come. I think that all the project ideas I listed do well according to these criteria, and I’d be excited to see more work on them.

  1. ^

     Personally, I focus a lot on the possibility of this happening within the next 10 years. Because I think that’s plausible, and that our society would be woefully underprepared for it. But I think this blog post is relevant even if you’re planning for longer timelines.

  2. ^

     I explain why below.

  3. ^

     Feel free to reach out to me via DM or email at [my last name].[my first name]@gmail.com if you’re considering working on this and would be interested in my takes on what good versions could look like.

  4. ^

     Here, I’m using a definition of “transformative AI” that’s similar to the one discussed in this note.

  5. ^

     Other than underestimates of AI takeover risk, another significant reason I’m worried about AI takeover is AI races where participants think that the difference in stakes between “winning the race” and “losing the race” is on the same scale as the difference in stakes between “losing the race” and “AI takeover”. Assuming that no important player underestimated the probability of AI takeover, I expect this sort of race to happen between nation states, because if a state thought there was a significant probability of AI takeover, I would expect them to stop domestic races. On the international scene, it’s somewhat less obvious how a race would be stopped, but I’m decently optimistic that it would happen if everyone involved estimated, say, ≥20% probability of AI takeover.

  6. ^

     Even for extinction-risk that comes from “rational” brinksmanship, I suspect that the world offers enough affordances that countries could find a better way if there was common knowledge that the brinkmanship route would lead to a high probability of doom. It’s plausible that optimal play could risk a small probability of extinction, but I don’t think this is where most extinction-risk comes from.

  7. ^

     I think there’s two mutually reinforcing effects, here. One is that people may try to learn the truth, but make genuine mistakes along the way. The other is that people may (consciously or sub-consciously) prefer to believe X over Y, and the ambiguity in what’s true gives them enough cover to claim to (and often actually) believe X without compromising their identity as a truth-seeker. Note that there’s a spectrum, here: Some people may be totally insensitive to what evidence is presented to them while some people are good at finding the truth even in murky areas. I think most people are somewhere in the middle.

  8. ^

     Though this has exceptions. For example, Alex may already be skeptical of an existing epistemic method M’s ability to answer certain types of questions, perhaps because M contradicts Alex’s existing beliefs on the topic. If a new epistemic method is similar to M, then Alex may suspect that this method, too, will give unsatisfying answers on those questions — even if it looks good on the merits, and perhaps even if Alex will be inclined to trust it on other topics..

  9. ^

     I don’t think this would permanently preclude companies from using their AIs for epistemic tasks, because when general capabilities are high enough, I expect it to be easy to use them for super-epistemics. (Except for some caveats about the alignment problem.) But it could impose delays, which could be costly if it leads to mistakes around the time when TAI is first developed.

  10. ^

     If necessary: After being separated from any dangerous AI capabilities, such as instructions for how to cheaply construct weapons.

  11. ^

     One analogy here is the Congressional Budget Office (CBO). The CBO was set up in the 1970s as a non-partisan source of information for Congress and to reduce Congress’ reliance on the Office of Management and Budget (which resides in the executive branch and has a director that is appointed by the currently sitting president). My impression is that the CBO is fairly successful, though this is only based on reading the Wikipedia page and this survey which has >30 economists “Agree” or “Strongly agree”  (and 0 respondents disagree) with “Adjusting for legal restrictions on what the CBO can assume about future legislation and events, the CBO has historically issued credible forecasts of the effects of both Democratic and Republican legislative proposals.”

  12. ^

     I.e.: It would be illegal for Walmart to pay OpenAI to make ChatGPT occasionally promote/be-more-positive on Walmart. But it would be legal for Walmart to offer their own chatbot (that told people about why they should use Walmart) and to buy API access from OpenAI to run that chatbot.

  13. ^

     C.f. the discussion of “asymmetric” vs “symmetric” tools in Guided By The Beauty Of Our Weapons.

  14. ^

     I was uncertain about whether this might have been confounded by the AIs having been fine-tuned to be honest, so I asked [LW(p) · GW(p)] about this, and Rohin Shah says “I don't know the exact details but to my knowledge we didn't have trouble getting the model to lie (e.g. for web of lies).”

  15. ^

     Which is an accident in the sense that it’s not intended by any human, though it’s also not an accident in the sense that it is intended by the AI systems themselves.

  16. ^

     I think the most important differences here are 1 & 2, because they have big implications for what your main epistemic strategies are. If you have good feedback loops, you can follow strategies that look more like "generate lots of plausible ideas until one of them work" (or maybe: train an opaque neural network to solve your problem). If your problem can be boiled down to math, then it's probably not too hard to verify a theory once it's been produced, and you can iterate pretty quickly in pure theory-land. But without these, you need to rely more on imprecise reasoning and intuition trained on few data points (or maybe just in other domains). And you need these to be not-only good enough to generate plausible ideas, but good enough that you can trust the results.

  17. ^

     Such as:

    Write-ups on what type of transparency is sufficient for outsiders to trust AI-as-knowledge-producers, and arguments for why AI companies should provide it.

    Write-ups or lobbying pushing for governments (and sub-parts of governments, such as the legislative branch and opposition parties) to acquire AI expertise. To either verify or be directly involved in the production of key future AI advice.

    Evaluations testing AI trustworthiness on e.g. forecasting.

  18. ^

     Such as:

    Write-ups on what type of transparency is sufficient to trust AI-as-communicators, and arguments for why AI companies should provide it.

    Setting up an independent organization for evaluating AI truthfulness.

    Developing and advocating for possible laws (or counter-arguments to laws) about AI speech.

  19. ^

     This could include asking short-horizon forecasters about hypothetical scenarios, insofar as we have short-term forecasters that have been trained in ways that makes it hard for them to distinguish real and hypothetical scenarios. (For example: Even when trained on real scenarios, it might be important to not give these AIs too much background knowledge or too many details, because that might be hard to generate for hypothetical scenarios.)

  20. ^

     Significantly implemented via AIs imitating human feedback.

  21. ^

     I.e., use each data point several times.

  22. ^

     Indeed, it seems useful enough for capabilities that it might be net-negative to advance, due to shorter timelines and less time to prepare for TAI.

0 comments

Comments sorted by top scores.