Posts

Take SCIFs, it’s dangerous to go alone 2024-05-01T08:02:38.067Z

Comments

Comment by latterframe on Open Thread Spring 2024 · 2024-04-29T18:26:49.686Z · LW · GW

Hey everyone! I work on quantifying and demonstrating AI cybersecurity impacts at Palisade Research with @Jeffrey Ladish.

We have a bunch of exciting work in the pipeline, including:

  • demos of well-known safety issues like agent jailbreaks or voice cloning 
  • replications of prior work on self-replication and hacking capabilities
  • modelling of above capabilities' economic impact
  • novel evaluations and tools

Most of my posts here will probably detail technical research or announce new evaluation benchmarks and tools. I also think a lot about responsible release, offence/defence balance, and general governance to flesh out my work's theory of change; some of that might also slip in.

See you around 🙃