Posts

Comments

Comment by Aaron Sandoval (aaron-sandoval) on Open Thread Summer 2024 · 2024-06-13T03:14:19.940Z · LW · GW

Hello! A friend and I are working on an idea for the AI Impacts Essay Competition. We're both relatively new to AI and pivoting careers in that direction, so I wanted to float our idea here first before diving too deep. Our main idea is to propose a new method for training rational language models inspired by human collaborative rationality methods. We're basically agreeing with Conjecture's and Elicit's foundational ideas and proposing a specific method for building CoEms for philosophical and forecasting applications. The method is centered around a discussion RL training environment where a model is given reward based on how well it contributes to a group discussion with other models to solve a reasoning problem. This is supposed to be an instance of training by process rather than by outcome, per Elicit's terminology. I found a few papers that evaluated performance of discussion or other collaborative ensembles on inference, but nothing about training in such an environment. I'm hoping that more seasoned people could comment on the originality of this idea and point to any particularly relevant literature or posts.