chengcheng

Posts
Comments

Posts

Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google 2025-02-07T03:57:30.904Z

GPT-4o Guardrails Gone: Data Poisoning & Jailbreak-Tuning 2024-11-01T00:10:50.718Z

Pacing Outside the Box: RNNs Learn to Plan in Sokoban 2024-07-25T22:00:55.398Z

Does robustness improve with scale? 2024-07-25T20:55:53.359Z

VLM-RM: Specifying Rewards with Natural Language 2023-10-23T14:11:34.493Z

Uncovering Latent Human Wellbeing in LLM Embeddings 2023-09-14T01:40:24.483Z

Comments

Comment by ChengCheng (ccstan99) on Speed running everyone through the bad alignment bingo. $5k bounty for a LW conversational agent · 2023-03-31T00:46:14.515Z · LW · GW

First of all, thank you @ArthurB for offering this bounty and raising the awareness of the need for quality AI alignment educational resources! We are particularly grateful to those who mentioned the Stampy project and also to people who have reached out offering to help in our efforts. Our submission https://chat.stampy.ai/ is a very early prototype focused primarily on summarizing and synthesizing information from our own database of FAQs along with selected documents collected from the alignment research dataset. The conversational feature still requires considerable work. Nevertheless, we would love to get input and feedback to further develop this tool for anyone seeking to better understand or contribute to AI safety. This would not have been possible without the support of our volunteers and collaborators. We welcome all who are interested in using AI to advance alignment.

User info

Posts

Comments