Posts

San Fernando Valley Rationality: February 22, 2024 2024-02-17T01:58:00.320Z
San Fernando Valley Rationality: February 15, 2024 2024-01-30T04:40:32.067Z
San Fernando Valley Rationalist Meetup 2024-01-22T16:49:59.235Z
Reproducing ARC Evals' recent report on language model agents 2023-09-01T16:52:17.147Z

Comments

Comment by Thomas Broadley (thomas-broadley) on San Fernando Valley Rationalist Meetup · 2024-01-30T01:47:51.987Z · LW · GW

I have arrived!

Comment by Thomas Broadley (thomas-broadley) on Reproducing ARC Evals' recent report on language model agents · 2024-01-22T15:59:34.366Z · LW · GW

I neglected to update my comment here -- the agent I built for this replication is now publicly available as part of the METR task workbench, here: https://drive.google.com/drive/folders/1-m1y0_Akunqq5AWcFoEH2_-BeKwsodPf

Comment by Thomas Broadley (thomas-broadley) on Post-EAG Music Party · 2023-11-18T18:00:08.197Z · LW · GW

That's me on the bass! Thank you for hosting, it was really fun to jam with everyone.

Comment by Thomas Broadley (thomas-broadley) on Reproducing ARC Evals' recent report on language model agents · 2023-09-14T13:28:23.222Z · LW · GW

Yeah, I definitely could! It's on my to-do list. I'll let you know when I complete it.

Comment by Thomas Broadley (thomas-broadley) on Reproducing ARC Evals' recent report on language model agents · 2023-09-05T12:26:01.955Z · LW · GW

Thank you! No, I'm not building custom prompts for the different tasks. I wrote a single prompt template -- the only difference between runs is the task description, which gets plugged into the template. I think ARC Evals did the same thing.

I have been improving the prompt as I worked through the tasks. I probably spent 2-3 hours working on the prompt to try and improve the agent's performance on some tasks. I'll definitely rerun all the tasks with the current version of my prompt, just to check that it can still perform the easier tasks.

You're right that getting the agent to attempt the last three tasks is relatively simple. Still, I was thinking that it wasn't worth the time or money. I think it's very unlikely that the agent will succeed at any of the last three tasks. Still, maybe it's worth getting a conclusive negative result.

Comment by Thomas Broadley (thomas-broadley) on Reproducing ARC Evals' recent report on language model agents · 2023-09-01T22:03:28.104Z · LW · GW

Thank you for the kind comment! You have lots of good ideas for how to improve this. I especially like the idea of testing with different cloud providers. I could add programming languages in there: Maybe GPT-4 is better at writing Node.JS than Python (the language I prompted it to use).

I agree, a fully reproducible version would have benefits. Differences in prompt quality between evaluations is a problem.

Also agreed that it's important to allow the agent to try and complete the tasks without assistance. I did that for this reproduction. The only changes I made to the agent's commands were to restrict it to accessing files in a particular directory on my computer.

I've hesitated to open-source my code. I don't want to accidentally advance the frontier of language model agents. But like I said in another comment, my code and prompts are pretty simple and don't use any techniques that aren't available elsewhere on the internet. So maybe it isn't a big deal. Curious to hear what you think.

Comment by Thomas Broadley (thomas-broadley) on Reproducing ARC Evals' recent report on language model agents · 2023-09-01T21:46:31.591Z · LW · GW

EDIT: The agent I built for this replication is now publicly available as part of the METR task workbench: https://drive.google.com/drive/folders/1-m1y0_Akunqq5AWcFoEH2_-BeKwsodPf

I'm torn! I think that better LLM scaffolding accelerates capabilities as much as it accelerates alignment. On the other hand, a programmer (or a non-programmer with help from ChatGPT) could easily reproduce my current scaffolding code. Maybe open-sourcing the current state of the project is fine. What do you think?

Comment by Thomas Broadley (thomas-broadley) on The Economics of the Asteroid Deflection Problem (Dominant Assurance Contracts) · 2023-08-30T21:21:29.659Z · LW · GW

since private goods are non-rival it is efficient to exclude consumers who aren't willing to pay

 

Should this be, "since private goods are rival it is efficient..."?

Comment by Thomas Broadley (thomas-broadley) on Speed running everyone through the bad alignment bingo. $5k bounty for a LW conversational agent · 2023-03-22T01:16:23.224Z · LW · GW

Here is a submission: https://ai-safety-conversational-agent.thomasbroadley.com

Source code here: https://github.com/tbroadley/ai-safety-conversational-agent

I followed @Max H's suggestion of using chat-langchain. To start, I created an embedding based on the articles from https://aisafety.info and have the submission using that embedding.

I'll get in touch with Stampy about working on their conversational agent.