post by [deleted] · · ? · GW · 0 comments

This is a link post for

0 comments

Comments sorted by top scores.

comment by Bjartur Tómas · 2021-09-25T16:06:46.518Z · LW(p) · GW(p)

Topic: AI

Question: Why can't we use MuZero on Core Wars to train models to code? MuZero uses self-play to master games, and Core Wars is a programming game amenable to self-play. Why has no one tried this? Or if they have, why does it not work?

Why my research failed: Googling this question, I get results that describe how MuZero works, DeepMind's blog post on the topic, and various Hacker News threads that do not address my question.

Replies from: Frederik
comment by Frederik · 2021-09-25T19:41:08.826Z · LW(p) · GW(p)

Background: AI Master Student, some practice in RL

I don't think there is a fundamental reason that we can't but it's rather that no one did it. I don't know a definitive answer as to why but here are some options:

  • too obscure ('no-one has thought of it', or 'no-one thought it was a good idea, it's only assembly-like code after all')
  • high barrier of entry (you need to write an RL environment that you can query fast, and you need a lot of compute)
    • this makes it harder for individuals or small teams to do this, and larger players like DeepMind and OpenAI might have different priorities
  • now that we have Codex (and soon^TM its newer, allegedly much better version), there might not be any (economic or scientific) reason to do this
  • What's the value you'd get out of a code wars expert model? I glanced at the Wikipedia page. How would you convert its outputs to a useful program, that does more than gobble up all your memory?