Reinforcement Learning from Information Bazaar Feedback, and other uses of information markets

post by Abhimanyu Pallavi Sudhir (abhimanyu-pallavi-sudhir) · 2024-09-16T01:04:32.953Z · LW · GW · 1 comments

Contents

  Google ads but for information
  Reinforcement learning from market feedback
  IP rights
  Prediction markets
  Positive externalities
  Useful reading
None
1 comment

Markets for information are inefficient, in large part due to the Buyer’s Inspection Paradox: you can’t “inspect” information like you would any other good before buying — the moment you inspect the information, you have obtained it and cannot return it. More generally, the problem is an inability to reliably commit to forgetting.

A comment by John Wentworth [LW · GW] mentioned that you can use amnestics to overcome this: make your purchase decision while under the influence of an amnestic that blocks your mind’s ability to write to long-term memory.

When you read this, it’s hard not to immediately think of LLMs, which can make purchase decisions without committing anything to long-term memory.[1] Apparently the authors of Language Models Can Reduce Asymmetry in Information Markets (2024) had the same idea, and propose the “Information Bazaar”: a digital marketplace in which LLM agents trade information on behalf of external human principals.

I think the implications of this idea are quite promising and underrated. In particular this can solve:

If you are familiar with Yoram Barzel’s view that “the key friction in society is information” — or that Paul Christiano’s HCH-style alignment proposals hinge on “verification is generally easy” [LW · GW], then you might see how this can be pretty big.

In general, in all of these situations we have some intelligent entity (an AI agent, or a community of contributors) we want to "align" i.e. incentivize to give us true and useful information. We can do so by letting it sell its information on the market and collect the profit as reward -- usually this doesn't work because information markets are a problem, but now we can use the information bazaar. Details follow.

The main cause of most market failures is information asymmetry: many quality improvements to goods are never done because buyers cannot verify them; many goods aren’t even produced because buyers cannot verify their quality.

There might be specific pieces of information that will influence the buyer’s decision: e.g. “in a randomized test of 1000 of these appliances, only 5 were faulty!”. But also crucially, the buyer cannot verify the quality of this information: there may be information that will influence the buyer’s decision to buy this information: “The industry average for such faults is 1/1000” “There is no such study this is fake news” “I am the author of that study and I approve this message, here’s my signature”.

Here’s a sketch of how you could implement a marketplace for such information, with each agent LLM recursively spinning off its own agents to consider such sub-information:


class Buyer:
	goal: str
	wealth: float
	info_processing_cost: float # could be a function instead
	
	def __call__(self) -> tuple[list[str], bool]:
		# initialize info_collected
		info_collected = []
		
		# tell information agents your goals and get info offers from them
		info_offers = Arena.offer_info["self"]
		top_offer = max(info_offer, key=lambda offer, offer.bid)
		
		if top_offer.bid > info_processing_cost:
			# charge winning advertiser for cost of considering info
			self.wealth += top_offer.bid
			top_offer.parent.wealth -= top_offer.bid 
			
			# spin off contractor agent to decide if to buy top offer
			contractor = Buyer(
				goal=DecideToBuy(top_offer),
				wealth=self.wealth,
				info_processing_cost = self.info_processing_cost
			)
			info_collected, decision = contractor()
			
			if decision:
				# buy the info
				info_collected.append(top_offer.info) 
				self.wealth -= top_offer.price
				top_offer.parent.wealth += top_offer.price
		
		return info_collected, self.decide(info=info_collected)
	
	def decide(self, info: list[str]):
		... # some intelligent behaviour

class Informer:
	
	wealth: float
	
	@dataclass
	class InfoOffer:
		bid: float
		price: float
		info: str
		parent: Informer
	
	@property
	def offer_info(self) -> dict[Buyer, InfoOffer]:
		# some daemon that monitors the arena for places it could
		# be useful, and advertises its info there
		...

class Arena:
	buyers: list[Buyer]
    informers: list[Informer]

	@property
    def offer_info(self):
	    info_offers = {buyer: [] for buyer in self.buyers}
	    for informer in self.informers:
		    for buyer in info_offers:
			    info_offers["buyer"].append(informer.offer_info)
		return info_offers

(For simplicity I have pretended that the buyer can only process one piece of information at a time — of course, it can instead have multiple attention slots to auction to advertisers)

You could imagine this being implemented e.g. on Amazon. When you go to buy something, an “information ad” can pop up (its contents invisible to you). You have your LLM agent look at them and decide whether to buy these information ads or not (which will make them visible to you) — recursively, the LLM agent gets information ads informing its decisions and so on — and make your final decision based on all the information acquired.

It is also straightforward to see how this could be applied to say, fact-checking/community notes, or for recommender algorithms.

Reinforcement learning from market feedback

Basically this exact protocol can be used as an RLHF alternative. The “buyers” are human raters (and LLM agents employed thereof) who have to solve some problems; the “informer” is the LLM being trained. The informer is given the buyer’s goal as context/input, and returns its output in the form of an InfoOffer; the wealth updates are used as rewards.

This is basically a generalized form of Debate, except with proper incentives for the human rater.

This can also be used for benchmarking.

IP rights

Similarly, you can have an idea market. Your “goal” here might be something like “I want to find a well-defined and impactful problem I can write a solid research paper on in 3 months, given my CV and background” or “I want an AI start-up idea to work on”.

Prediction markets

The “subsidy parameter” in LMSR can be understood as the “price of information” [LW · GW], but this is entirely a positive externality. In particular, it prevents us from having “deep” prediction markets, where intermediate agents would subsidize markets for subsidiary relevant information etc.

(Perhaps this can be mitigated by Latent Variable Prediction Markets [LW · GW] or Combinatorial Prediction Markets, but I’m not sufficiently familiar with those.)

But again, it is solved by our recursive information markets: the buyer’s “goal” is now simply his probability for some forecasting question.


Positive externalities

Not everything is solved.

Buyer’s inspection is one of two problems with information markets, the other being positive externalities (it’s hard to prevent information from “leaking” outside your property).

Even if you ensure (via legal enforcement, or by having the whole decision done by an LLM buyer) that people don’t leak the information they buy, the decision of whether you buy some information will itself correlate with the information. For example: you are much more likely to buy information confirming that big foot is real, than information rejecting it.

I’m not sure how big of a problem this is. My initial impression is that (1) it can be mitigated for applications like Reinforcement Learning from Market Feedback where we can just control what information is received by who, and (2) it is much less a problem for ideas than for answers and proofs, because the search space in the former is larger — from a framing I like [LW(p) · GW(p)]:

It seems to me there are actually three sorts of information in the world:

  • "Ideas": math/science theories and models, inventions, business ideas, solutions to open-ended problems :: search
  • "Answers": math theorems, experimental observations, results of computations :: inference
  • "Proofs": math proofs, arguments, evidence, digital signatures, certifications, reputations, signalling :: alignment

There’s also the fact that e.g. Barzel believes that all transaction costs and lack of definition in property rights (and therefore externalities) are fundamentally about information. I’m not sure how to evaluate this claim though.

Useful reading

stuff that I haven’t read or properly internalized myself are marked with [???]


  1. More generally I would say, something people miss about LLMs is that they aren’t just cheaper, more reliable humans: intelligence is now decoupled from the human “architecture”. Things like memory, train-of-thought, continual learning, inability to insulate from outside info. This opens up new opportunities in epistemics — questions like counterfactuals and “What would I think if I didn’t know info X?” are now meaningful — and institutions like prediction markets and information markets. ↩︎

1 comments

Comments sorted by top scores.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-09-20T21:34:01.287Z · LW(p) · GW(p)

Neat idea! I've been thinking about things along this line for years because of science fiction writers like Charles Stross who wrote about the idea of having digital clones of yourself that you could spawn to act as assistants and do things like enter a simulation with a set of information to analyzed, and return a report after being run faster-than-realtime. For example, to analyze potentially infohazardous information sent by an untrusted party.

Also, the idea that people might assign their digital clones to go on a date, and then agree to go on a date if both their clones came back with a positive recommendation.

Of course, literally using a digital clone of yourself requires being rather cavalier about destroying conscious beings after you are done using them as a tool. Seems like it makes a lot more sense to use a non-conscious tool-AI without emotions for this sort of purpose.